Essential Linux Device Drivers [1 ed.] 0-13-239655-6, 978-0-13-239655-4

“Probably the most wide ranging and complete Linux device driver book I’ve read.” --Alan Cox, Linux Guru and Key Kernel

303 39 5MB

English Pages 744 [850] Year 2008

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Essential Linux Device Drivers [1 ed.]
 0-13-239655-6, 978-0-13-239655-4

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Esse n t ia l Lin u x D e vice D r ive r s by Sreekrishnan Venkat eswaran Publisher : Pr e n t ice H a ll Pub Dat e: M a r ch 2 7 , 2 0 0 8 Print I SBN- 10: 0 - 1 3 - 2 3 9 6 5 5 - 6 Print I SBN- 13: 9 7 8 - 0 - 1 3 - 2 3 9 6 5 5 - 4 Pages: 7 4 4 Table of Cont ent s | I ndex

Overview " Probably t he m ost wide ranging and com plet e Linux device driver book I 've read." - - Alan Cox, Linux Guru and Key Kernel Developer " Very com prehensive and det ailed, covering alm ost every single Linux device driver t ype." - - Theodore Ts'o, First Linux Kernel Developer in Nort h Am erica and Chief Plat form St rat egist of t he Linux Foundat ion Th e M ost Pr a ct ica l Gu ide t o W r it in g Lin u x D e vice D r ive r s Linux now offers an except ionally robust environm ent for driver developm ent : wit h t oday's kernels, what once required years of developm ent t im e can be accom plished in days. I n t his pract ical, exam ple- driven book, one of t he world's m ost experienced Linux driver developers syst em at ically dem onst rat es how t o develop reliable Linux drivers for virt ually any device. Esse n t ia l Lin u x D e vice D r ive r s is for any program m er wit h a working knowledge of operat ing syst em s and C, including program m ers who have never writ t en drivers before. Sreekrishnan Venkat eswaran focuses on t he essent ials, bringing t oget her all t he concept s and t echniques you need, while avoiding t opics t hat only m at t er in highly specialized sit uat ions. Venkat eswaran begins by reviewing t he Linux 2.6 kernel capabilit ies t hat are m ost relevant t o driver developers. He int roduces sim ple device classes; t hen t urns t o serial buses such as I 2C and SPI ; ext ernal buses such as PCMCI A, PCI , and USB; video, audio, block, net work, and wireless device drivers; user- space drivers; and drivers for em bedded Linux–one of t oday's fast est growing areas of Linux developm ent . For each, Venkat eswaran explains t he t echnology, inspect s relevant kernel source files, and walks t hrough developing a com plet e exam ple. • Addresses drivers discussed in no ot her book, including drivers for I 2C, video, sound, PCMCI A, and different t ypes of flash m em ory • Dem yst ifies essent ial kernel services and faciliti es, including kernel t hreads and helper int erfaces • Teaches polling, asynchronous not ificat ion, and I/ O cont rol • I nt roduces t he I nt er- I nt egrat ed Circuit Prot ocol for em bedded Linux drivers • Covers m ult im edia device drivers using t he Linux-Video subsyst em and Linux- Audio fram ework • Shows how Linux im plem ent s support for wireless te chnologies such as Bluet oot h, I nfrared, WiFi, and cellular net working • Describes t he ent ire driver developm ent lifecycle, t hrough debugging and m aint enance • I ncludes reference appendixes covering Linux assem bly, BI OS calls, and Seq files

Esse n t ia l Lin u x D e vice D r ive r s by Sreekrishnan Venkat eswaran Publisher : Pr e n t ice H a ll Pub Dat e: M a r ch 2 7 , 2 0 0 8 Print I SBN- 10: 0 - 1 3 - 2 3 9 6 5 5 - 6 Print I SBN- 13: 9 7 8 - 0 - 1 3 - 2 3 9 6 5 5 - 4 Pages: 7 4 4 Table of Cont ent s | I ndex

Copyright Prent ice Hall Open Source Soft ware Developm ent Series For ew or d Preface Acknowledgm ent s About t he Aut hor Chapt er 1. I nt roduct ion Ev olut ion The GNU Copyleft Kernel.org Mailing List s and Forum s Linux Dist ribut ions Looking at t he Sources Building t he Kernel Loadable Modules Before St art ing Chapt er 2. A Peek I nside t he Kernel Boot ing Up Kernel Mode and User Mode Process Cont ext and I nt errupt Cont ext Kernel Tim ers Concurrency in t he Kernel Process Filesyst em Allocat ing Mem ory Looking at t he Sources Chapt er 3. Kernel Facilit ies Kernel Threads Helper I nt erfaces Looking at t he Sources Chapt er 4. Laying t he Groundwork I nt roducing Devices and Drivers I nt errupt Handling The Linux Device Model Mem ory Barriers Power Managem ent Looking at t he Sources Chapt er 5. Charact er Drivers Char Driver Basics Device Exam ple: Syst em CMOS Sensing Dat a Availabilit y Talking t o t he Parallel Port RTC Subsyst em Pseudo Char Drivers Misc Drivers Charact er Caveat s Looking at t he Sources Chapt er 6. Serial Drivers Layered Archit ect ure UART Drivers TTY Drivers Line Disciplines

Looking at t he Sources Chapt er 7. I nput Drivers I nput Event Drivers I nput Device Drivers Debugging Looking at t he Sources Chapt er 8. The I nt er- I nt egrat ed Circuit Prot ocol What 's I 2C/ SMBus? I 2C Core Bus Transact ions Device Exam ple: EEPROM Device Exam ple: Real Tim e Clock I 2C- dev Hardware Monit oring Using LM- Sensors The Serial Peripheral I nt erface Bus The 1- Wire Bus Debugging Looking at t he Sources Chapt er 9. PCMCI A and Com pact Flash What 's PCMCI A/ CF? Linux- PCMCI A Subsyst em Host Cont roller Drivers PCMCI A Core Driver Services Client Drivers Tying t he Pieces Toget her PCMCI A St orage Serial PCMCI A Debugging Looking at t he Sources Chapt er 10. Peripheral Com ponent I nt erconnect The PCI Fam ily Addressing and I dent ificat ion Accessing PCI Regions Direct Mem ory Access Device Exam ple: Et hernet - Modem Card Debugging Looking at t he Sources Chapt er 11. Universal Serial Bus USB Archit ect ure Linux- USB Subsyst em Driver Dat a St ruct ures Enum er at ion Device Exam ple: Telem et ry Card Class Drivers Gadget Drivers Debugging Looking at t he Sources Chapt er 12. Video Drivers Display Archit ect ure Linux- Video Subsyst em Display Param et ers The Fram e Buffer API Fram e Buffer Drivers Console Drivers Debugging Looking at t he Sources Chapt er 13. Audio Drivers Audio Archit ect ure Linux- Sound Subsyst em Device Exam ple: MP3 Player Debugging Looking at t he Sources Chapt er 14. Block Drivers

St orage Technologies Linux Block I / O Layer I / O Schedulers Block Driver Dat a St ruct ures and Met hods Device Exam ple: Sim ple St orage Cont roller Advanced Topics Debugging Looking at t he Sources Chapt er 15. Net work I nt erface Cards Driver Dat a St ruct ures Talking wit h Prot ocol Layers Buffer Managem ent and Concurrency Cont rol Device Exam ple: Et hernet NI C I SA Net work Drivers Asynchronous Transfer Mode Net work Throughput Looking at t he Sources Chapt er 16. Linux Wit hout Wires Bluet oot h I nfrared WiFi Cellular Net working Current Trends Chapt er 17. Mem ory Technology Devices What 's Flash Mem ory? Linux- MTD Subsyst em Map Drivers NOR Chip Drivers NAND Chip Drivers User Modules MTD- Ut ils Configuring MTD eXecut e I n Place The Firm ware Hub Debugging Looking at t he Sources Chapt er 18. Em bedding Linux Challenges Com ponent Select ion Tool Chains Em bedded Boot loaders Mem ory Layout Kernel Port ing Em bedded Drivers The Root Filesyst em Test I nfrast ruct ure Debugging Chapt er 19. Drivers in User Space Process Scheduling and Response Tim es Accessing I / O Regions Accessing Mem ory Regions User Mode SCSI User Mode USB User Mode I 2C UI O Looking at t he Sources Chapt er 20. More Devices and Drivers ECC Report ing Frequency Scaling Em bedded Cont rollers ACPI I SA and MCA FireWire I nt elligent I nput / Out put

Am at eur Radio Voice over I P High- Speed I nt erconnect s Chapt er 21. Debugging Device Drivers Kernel Debuggers Kernel Probes Kexec and Kdum p Profiling Tracing Linux Test Proj ect User Mode Linux Diagnost ic Tools Kernel Hacking Config Opt ions Test Equipm ent Chapt er 22. Maint enance and Delivery Coding St yle Change Markers Version Cont rol Consist ent Checksum s Build Script s Port able Code Chapt er 23. Shut t ing Down Checklist What Next ? Appendix A. Linux Assem bly Debugging Appendix B. Linux and t he BI OS Real Mode Calls Prot ect ed Mode Calls BI OS and Legacy Drivers Appendix C. Seq Files The Seq File Advant age Updat ing t he NVRAM Driver Looking at t he Sources I ndex

Copyr igh t Many of t he designat ions used by m anufact urers and sellers t o dist inguish t heir product s are claim ed as t radem arks. Where t hose designat ions appear in t his book, and t he publisher was aware of a t radem ark claim , t he designat ions have been print ed wit h init ial capit al let t ers or in all capit als. The aut hor and publisher have t aken care in t he preparat ion of t his book, but m ake no expressed or im plied warrant y of any kind and assum e no responsibilit y for errors or om issions. No liabilit y is assum ed for incident al or consequent ial dam ages in connect ion wit h or arising out of t he use of t he inform at ion or program s cont ained herein. The publisher offers excellent discount s on t his book when ordered in quant it y for bulk purchases or special sales, which m ay include elect ronic versions and/ or cust om covers and cont ent part icular t o your business, t raining goals, m arket ing focus, and branding int erest s. For m ore inform at ion, please cont act : U.S. Corporat e and Governm ent Sales ( 800) 382- 3419 corpsales@pearsont echgroup.com For sales out side t he Unit ed St at es please cont act : I nt ernat ional Sales int ernat [email protected] Visit us on t he Web: www.inform it .com / ph Library of Congress Cat aloging- in- Publicat ion Dat a: Venkat eswaran, Sreekrishnan, 1972Essent ial Linux device drivers / Sreekrishnan Venkat eswaran.- - 1st ed. p. cm . I SBN 0- 13- 239655- 6 ( hardback : alk. paper) 1. Linux device drivers ( Com put er program s) I . Tit le. QA76.76.D49V35 2008 005.4'32- - dc22 2008000249 Copyright © 2008 Pearson Educat ion, I nc. All right s reserved. Print ed in t he Unit ed St at es of Am erica. This publicat ion is prot ect ed by copyright , and perm ission m ust be obt ained from t he publisher prior t o any prohibit ed reproduct ion, st orage in a ret rieval syst em , or t ransm ission in any form or by any m eans, elect ronic, m echanical, phot ocopying, recording, or likewise. For inform at ion regarding perm issions, writ e t o: Pearson Educat ion, I nc Right s and Cont ract s Depart m ent 501 Boylst on St reet , Suit e 900 Bost on, MA 02116 Fax ( 617) 671 3447 This m at erial m ay be dist ribut ed only subj ect t o t he t erm s and condit ions set fort h in t he Open Publicat ion License, v1.0 or lat er ( t he lat est version is present ly available at ht t p: / / www.opencont ent .org/ openpub/ ) . I SBN- 13: 978- 0- 132- 39655- 4

Text print ed in t he Unit ed St at es on recycled paper at RR Donnelly in Crawfordsville, I N. First print ing March 2008 Edit or - in - Ch ie f Mark Taub Ex e cut ive Edit or Debra William s Cauley M a na ging Edit or Gina Kanouse Pr oj e ct Edit o Anne Goebel Copy Edit or Keit h Cline I n de x e r Erika Millen Proofrea der San Dee Phillips Te ch n ica l Edit or s Vam si Krishna Jim Lieb Publishing Coor dina t or Heat her Fox I n t e r ior D e sign e r Laura Robbins Cove r D e sign e r Alan Clem ent s Com posit or Molly Sharp

D e dica t ion This book is dedicat ed t o t he t en m illion visually challenged cit izens of I ndia. All aut hor proceeds will go t o t heir cause.

Pr e n t ice H a ll Ope n Sou r ce Soft w a r e D e ve lopm e n t Se r ie s Ar n old Robbin s, Se r ie s Edit or " Re a l w or ld code fr om r e a l w or ld a pplica t ion s" Open Source t echnology has revolut ionized t he com put ing world. Many large- scale proj ect s are in product ion use worldwide, such as Apache, MySQL, and Post gres, wit h program m ers writ ing applicat ions in a variet y of languages including Perl, Pyt hon, and PHP. These t echnologies are in use on m any different syst em s, ranging from propriet ary syst em s, t o Linux syst em s, t o t radit ional UNI X syst em s, t o m ainfram es. The Pr e n t ice H a ll Ope n Sou r ce Soft w a r e D e ve lopm e n t Se r ie s is designed t o bring you t he best of t hese Open Source t echnologies. Not only will you learn how t o use t hem for your proj ect s, but you will learn from t hem . By seeing real code from real applicat ions, you will learn t he best pract ices of Open Source developers t he world over. Tit le s cu r r e n t ly in t h e se r ie s in clu de : Linux ® Debugging and Perform ance Tuning St eve Best 0131492470, Paper, © 2006 C+ + GUI Program m ing wit h Qt 4 Jasm in Blanchet t e, Mark Sum m erfield 0132354160, Hard, © 2008 The Definit ive Guide t o t he Xen Hypervisor David Chisnall 013234971X, Hard, © 2008 Underst anding AJAX Joshua Eichorn 0132216353, Paper, © 2007 The Linux Program m er's Toolbox John Fusco 0132198576, Paper, © 2007 Em bedded Linux Prim er Christ opher Hallinan 0131679848, Paper, © 2007 The Apache Modules Book Nick Kew 0132409674, Paper, © 2007 SELinux by Exam ple Frank Mayer, David Caplan, Karl MacMillan 0131963694, Paper, © 2007 UNI X t o Linux ® Port ing Alfredo Mendoza, Chakarat Skawrat ananond, Art is Walker 0131871099, Paper, © 2006

Rapid Web Applicat ions wit h TurboGears Mark Ram m , Kevin Dangoor, Gigi Sayfan 0132433885, Paper, © 2007 Linux Program m ing by Exam ple Arnold Robbins 0131429647, Paper, © 2004 The Linux ® Kernel Prim er Claudia Salzberg, Gordon Fischer, St even Sm olski 0131181637, Paper, © 2006 Rapid GUI Program m ing wit h Pyt hon and Qt Mark Sum m erfield 0132354187, Hard, © 2008 Essent ial Linux Device Drivers Sreekrishnan Venkat eswaran 0132396556, Hard, © 2008 N e w t o t h e se r ie s: D igit a l Sh or t Cu t s Short Cut s are short , concise, PDF docum ent s designed specifically for busy t echnical professionals like you. Each Short Cut is t ight ly focused on a specific t echnology or t echnical problem . Writ t en by indust ry expert s and best selling aut hors, Short Cut s are published wit h you in m ind — get t ing you t he t echnical inform at ion t hat you need — now. Underst anding AJAX: Consum ing t he Sent Dat a wit h XML and JSON Joshua Eichorn 0132337932, Adobe Acrobat PDF, © 2007 Debugging Em bedded Linux Christ opher Hallinan 0131580132, Adobe Acrobat PDF, © 2007 Using BusyBox Christ opher Hallinan 0132335921, Adobe Acrobat PDF, © 2007

For e w or d I f you're holding t his book, you m ay be asking yourself: Why " yet anot her" Linux device driver book? Aren't t here already a bunch of t hem ? The answer is: This book is a quant um leap ahead of t he ot hers. First , it is up- t o- dat e, covering recent 2.6 kernels. Second, and m ore im port ant , t his book is t horough . Most device driver books j ust cover t he t opics described in st andard Unix int ernals books or operat ing syst em books, such as serial lines, disk drives, and filesyst em s, and, if you're lucky, t he net working st ack. This book goes m uch furt her; it doesn't shy away from t he hard st uff t hat you have t o deal wit h on m odern PC and em bedded hardware, such as PCMCI A, USB, I 2 C, video, audio, flash m em ory, wireless com m unicat ions, and so on. You nam e it , if t he Linux kernel t alks t o it , t hen t his book t ells you about it . No st one is left unt urned; no dark corner is left unillum inat ed. Furt herm ore, t he aut hor has earned his st ripes: I t 's a t hrill ride j ust t o read his descript ion of put t ing Linux on a wrist wat ch in t he lat e 1990s! I 'm pleased and excit ed t o have t his book as part of t he Prent ice Hall Open Source Soft ware Developm ent Series. I t is a shining exam ple of t he excit ing t hings happening in t he Open Source world. I hope t hat you will find here what you need for your work on t he kernel, and t hat you will enj oy t he process, t oo! Arnold Robbins Series Edit or

Pr e fa ce I t was t he lat e 1990s, and at I BM we were put t ing t he Linux kernel on a wrist wat ch. The t arget device was t iny, but t he t ask was t urning out t o be t ough. The Mem ory Technology Devices subsyst em didn't exist in t he kernel, which m eant t hat before a filesyst em could st art life on t he wat ch's flash m em ory, we had t o develop t he necessary st orage driver from scrat ch. I nt erfacing t he wat ch's t ouch screen wit h user applicat ions was com plicat ed because t he kernel's input event driver int erface hadn't been conceived yet . Get t ing X Windows t o run on t he wat ch's LCD wasn't easy because it didn't work well wit h fram e buffer drivers. Of what use is a wat erproof Linux wrist wat ch if you can't st ream st ock quot es from your bat ht ub? Bluet oot h int egrat ion wit h Linux was several years away, and m ont hs were spent port ing a propriet ary Bluet oot h st ack t o I nt ernet - enable t he wat ch. Power m anagem ent support was good enough only t o squeeze a few hours of j uice from t he wat ch's bat t ery; hence we had work cut out on t hat front , t oo. Linux- I nfrared was st ill unst able, so we had t o coax t he st ack before we could use an I nfrared keyboard for dat a ent ry. And we had t o com pile t he com piler and crosscom pile a com pact applicat ion- set because t here were no accept ed dist ribut ions in t he consum er elect ronics space. Fast forward t o t he present : The baby penguin has grown int o a healt hy t eenager. What t ook t housands of lines of code and a year in developm ent back t hen can be accom plished in a few days wit h t he current kernels. But t o becom e a versat ile kernel engineer who can m agically weave solut ions, you need t o underst and t he m yriad feat ures and facilit ies t hat Linux offers t oday.

Abou t t h e Book Am ong t he various subsyst em s residing in t he kernel source t ree, t he drivers/ direct ory const it ut es t he single largest chunk and is several t im es bigger t han t he ot hers. Wit h new and diverse t echnologies arriving in popular form fact ors, t he developm ent of new device drivers in t he kernel is accelerat ing st eadily. The lat est kernels support m ore t han 70 device driver fam ilies. This book is about writ ing Linux device drivers. I t covers t he design and developm ent of m aj or device classes support ed by t he kernel, including t hose I m issed during m y Linux- on- Wat ch days. The discussion of each driver fam ily st art s by looking at t he corresponding t echnology, m oves on t o develop a pract ical exam ple, and ends by looking at relevant kernel source files. Before foraying int o t he world of device drivers, however, t his book int roduces you t o t he kernel and discusses t he im port ant feat ures of 2.6 Linux, em phasizing t hose port ions t hat are of special int erest t o device driver writ ers.

Audience This book is int ended for t he int erm ediat e- level program m er eager t o t weak t he kernel t o enable new devices. You should have a working knowledge of operat ing syst em concept s. For exam ple, you should know what a syst em call is and why concurrency issues have t o be fact ored in while writ ing kernel code. The book assum es t hat you have downloaded Linux on your syst em , poked t hrough t he kernel sources, and at least skim m ed t hrough som e relat ed docum ent at ion. And you should be pret t y good in C.

Su m m a r y of Ch a pt e r s The first 4 chapt ers prepare you t o digest t he rest of t he book. The next 16 chapt ers discuss drivers for different device fam ilies. A chapt er t hat describes device driver debugging t echniques com es next . The penult im at e chapt er provides perspect ive on m aint enance and delivery. We shut down by walking t hrough a checklist t hat sum m arizes how t o set fort h on your way t o Linux- enablem ent when you get hold of a new device. Chapt er 1 , " I nt roduct ion," st art s our t ryst wit h Linux. I t hurries you t hrough downloading t he kernel sources, m aking t rivial code changes, and building a boot able kernel im age.

Chapt er 2 , " A Peek I nside t he Kernel," t akes a brisk look int o t he innards of t he Linux kernel and t eaches you som e m ust - know kernel concept s. I t first t akes you t hrough t he boot process and t hen describes kernel services part icularly relevant t o driver developm ent , such as kernel t im ers, concurrency m anagem ent , and m em ory allocat ion. Chapt er 3 , " Kernel Facilit ies," exam ines several kernel services t hat are useful com ponent s in t he t oolbox of driver developers. The chapt er st art s by looking at kernel t hreads, which is a way t o im plem ent background t asks inside t he kernel. I t t hen m oves on t o helper int erfaces such as linked list s, work queues, com plet ion funct ions, and not ifier chains. These helper facilit ies sim plify your code, weed out redundancies from t he kernel, and help long- t erm m aint enance. Chapt er 4 , " Laying t he Groundwork," builds t he foundat ion for m ast ering t he art of writ ing Linux device drivers. I t int roduces devices and drivers by giving you a bird's- eye view of t he archit ect ure of a t ypical PC- com pat ible syst em and an em bedded device. I t t hen looks at basic driver concept s such as int errupt handling and t he kernel's device m odel. Chapt er 5 , " Charact er Drivers," looks at t he archit ect ure of charact er device drivers. Several concept s int roduced in t his chapt er, such as polling, asynchronous not ificat ion, and I / O cont rol, are relevant t o subsequent chapt ers, t oo, because m any device classes discussed in t he rest of t he book are " super" charact er devices. Chapt er 6 , " Serial Drivers," explains t he kernel layer t hat handles serial devices. Chapt er 7 , " I nput Drivers," discusses t he kernel's input subsyst em t hat is responsible for servicing devices such as keyboards, m ice, and t ouch- screen cont rollers. Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol," dissect s drivers for devices such as EEPROMs t hat are connect ed t o a syst em 's I 2 C bus or SMBus. This chapt er also looks at ot her serial int erfaces such as SPI bus and 1- wire bus. Chapt er 9 , " PCMCI A and Com pact Flash," delves int o t he PCMCI A subsyst em . I t t eaches you t o writ e drivers for devices having a PCMCI A or Com pact Flash form fact or. Chapt er 10, " Peripheral Com ponent I nt erconnect ," looks at kernel support for PCI and it s derivat ives. Chapt er 11, " Universal Serial Bus," explores USB archit ect ure and explains how you can use t he services of t he Linux- USB subsyst em t o writ e drivers for USB devices. Chapt er 12, " Video Drivers," exam ines t he Linux- Video subsyst em . I t finds out t he advant ages offered by t he fram e buffer abst ract ion and t eaches you t o writ e fram e buffer drivers. Chapt er 13, " Audio Drivers," describes t he Linux- Audio fram ework and explains how t o im plem ent audio drivers. Chapt er 14, " Block Drivers," focuses on drivers for st orage devices such as hard disks. I n t his chapt er, you also learn about t he different I / O schedulers support ed by t he Linux- Block subsyst em . Chapt er 15, " Net work I nt erface Cards," is devot ed t o net work device drivers. You learn about kernel net working dat a st ruct ures and how t o int erface net work drivers wit h prot ocol layers. Chapt er 16, " Linux Wit hout Wires," looks at driving different wireless t echnologies such as Bluet oot h, I nfrared, WiFi, and cellular com m unicat ion. Chapt er 17, " Mem ory Technology Devices," discusses flash m em ory enablem ent on em bedded devices. The chapt er ends by exam ining drivers for t he Firm ware Hub found on PC syst em s. Chapt er 18, " Em bedding Linux," st eps int o t he world of em bedded Linux. I t t akes you t hrough t he m ain firm ware com ponent s of an em bedded solut ion such as boot loader, kernel, and device drivers. Given t he soaring popularit y of Linux in t he em bedded space, it 's m ore likely t hat you will use t he device driver skills t hat you

acquire from t his book t o enable em bedded syst em s. Chapt er 19, " Drivers in User Space," looks at driving different t ypes of devices from user space. Som e device drivers, especially ones t hat are heavy on policy and light on perform ance requirem ent s, are bet t er off residing in user land. This chapt er also explains how t he Linux process scheduler affect s t he response t im es of user m ode drivers. Chapt er 20, " More Devices and Drivers," t akes a t our of a pot pourri of driver fam ilies not covered t hus far, such as Error Det ect ion And Correct ion ( EDAC) , FireWire, and ACPI . Chapt er 21, " Debugging Device Drivers," t eaches about different t ypes of debuggers t hat you can use t o debug kernel code. I n t his chapt er, you also learn t o use t race t ools, kernel probes, crash- dum p, and profilers. When you develop a driver, be arm ed wit h t he driver debugging skills t hat you learn in t his chapt er. Chapt er 22, " Maint enance and Delivery," provides perspect ive on t he soft ware developm ent life cycle. Chapt er 23, " Shut t ing Down," t akes you t hrough a checklist of work it em s when you em bark on Linux- enabling a new device. The book ends by pondering What next ? Device drivers som et im es need t o im plem ent code snippet s in assem bly, so Appendix A, " Linux Assem bly," t akes a look at t he different facet s of assem bly program m ing on Linux. Som e device drivers on x86- based syst em s depend direct ly or indirect ly on t he BI OS, so Appendix B, " Linux and t he BI OS," t eaches you how Linux int eract s wit h t he BI OS. Appendix C, " Seq Files," describes seq files, a kernel helper int erface int roduced in t he 2.6 kernel t hat device drivers can use t o m onit or and t rend dat a point s. The book is generally organized according t o device and bus com plexit y, coupled wit h pract ical reasons of dependencies bet ween chapt ers. So, we st art off wit h basic device classes such as charact er, serial, and input . Next , we look at sim ple serial buses such as I 2 C and SMBus. Ext ernal I / O buses such as PCMCI A, PCI , and USB follow. Video, audio, block, and net work devices usually int erface wit h t he processor via t hese I / O buses, so we look at t hem soon aft er. The next port ions of t he book are orient ed t oward em bedded Linux and cover t echnologies such as wireless net working and flash m em ory. User- space drivers are discussed t oward t he end of t he book.

Ke r n e l Ve r sion This book is generally up t o dat e as of t he 2.6.23/ 2.6.24 kernel versions. Most code list ings in t his book have been t est ed on a 2.6.23 kernel. I f you are using a lat er version, look at Linux websit es such as lwn.net t o learn about t he kernel changes since 2.6.23/ 24.

Book W e bsit e I 've set up a websit e at elinuxdd.com t o provide updat es, errat a, and ot her inform at ion relat ed t o t his book.

Con ve n t ion s Use d Source code, funct ion nam es, and shell com m ands are writ t en like this. The shell prom pt used is bash>. Filenam e are writ t en in it alics, like t his. I t alics are also used t o int roduce new t erm s. Som e chapt ers m odify original kernel source files while im plem ent ing code exam ples. To clearly point out t he changes, newly insert ed code lines are prefixed wit h +, and any delet ed code lines wit h -. Som et im es, for sim plicit y, t he book uses generic references. So if t he t ext point s you t o t he arch/ your- arch/ direct ory, it should be t ranslat ed, for exam ple, t o arch/ x86/ if you are com piling t he kernel for t he x86 archit ect ure. Sim ilarly, any m ent ion of t he include/ asm - your- arch/ direct ory should be read as include/ asm arm / if you are, for inst ance, building t he kernel for t he ARM archit ect ure. The * sym bol and X are occasionally used as wildcard charact ers in filenam es. So, if a chapt er asks you t o look at include/ linux/ t im e* .h, look at t he

header files, t im e.h, t im er.h, t im es.h, and t im ex.h residing in t he include/ linux/ direct ory. I f a sect ion t alks about / dev/ input / event X or / sys/ devices/ plat form / i8042/ serioX/ , X is t he int erface num ber t hat t he kernel assigns t o your device in t he cont ext of your syst em configurat ion. The

sym bol is som et im es insert ed bet ween com m and or kernel out put t o at t ach explanat ions.

Sim ple regular expressions are occasionally used t o com pact ly list funct ion prot ot ypes. For exam ple, t he sect ion "Direct Mem ory Access" in Chapt er 10, " Peripheral Com ponent I nt erconnect ," refers t o pci_[map|unmap|dma_sync]_single() inst ead of explicit ly cit ing pci_map_single(), pci_umap_single(), and pci_dma_sync_single(). Several chapt ers refer you t o user- space configurat ion files. For exam ple, t he sect ion t hat describes t he boot process opens / et c/ rc.sysinit, and t he chapt er t hat discusses Bluet oot h refers t o / et c/ bluet oot h/ pin. The exact nam es and locat ions of such files m ight , however, vary according t o t he Linux dist ribut ion you use.

Ack n ow le dgm e n t s First , I raise m y hat t o m y edit ors at Prent ice Hall: Debra William s Cauley, Anne Goebel, and Keit h Cline. Wit hout t heir support ing work, t his book would not have m at erialized. I t hank Mark Taub for his int erest in t his proj ect and for init iat ing it . Several sources have cont ribut ed t o m y learning in t he past decade: t he m any t eam m at es wit h whom I worked on Linux proj ect s, t he m ight y kernel sources, m ailing list s, and t he I nt ernet . All t hese have played a part in helping m e writ e t his book. Mart in St reicher of Linux Magazine changed m e from a full- t im e coder t o a spare- t im e writ er when he offered m e t he m agazine's " Gearheads" kernel colum n. I grat efully acknowledge t he m any lessons in t echnical writ ing t hat I 've learned from him . I owe a special debt of grat it ude t o m y t echnical reviewers. Vam si Krishna pat ient ly read t hrough each chapt er of t he m anuscript . His num erous suggest ions have m ade t his a bet t er book. Jim Lieb provided valuable feedback on several chapt ers. Arnold Robbins reviewed t he first few chapt ers and provided insight ful com m ent s. Finally, I t hank m y parent s and m y wife for t heir love and support . And t hanks t o m y baby daught er for const ant ly rem inding m e t o spend cycles on t he book by her wobbly walk t hat bears an uncanny resem blance t o t hat of a penguin.

Abou t t h e Au t h or Sr e e k r ish n a n Ve n k a t e sw a r a n has a m ast er's degree in com put er science from t he I ndian I nst it ut e of Technology, Kanpur, I ndia. During t he past 12 years t hat he has been working for I BM, he has port ed Linux t o various em bedded devices such as a wrist wat ch, handheld, m usic player, VoI P phone, pacem aker program m er, and rem ot e pat ient m onit oring syst em . Sreekrishnan was a cont ribut ing edit or and kernel colum nist t o t he Linux Magazine for m ore t han 2 years. Current ly, he m anages t he em bedded solut ions group at I BM I ndia.

Ch a pt e r 1 . I n t r odu ct ion I n Th is Ch a pt e r 2 Evolut ion 3 The GNU Copyleft 4 Kernel.org 4 Mailing List s and Forum s 5 Linux Dist ribut ions 6 Looking at t he Sources 10 Building t he Kernel 12 Loadable Modules 14 Before St art ing

Linux lures. I t has t he ent icing arom a of an int ernat ionalist proj ect where people of all nat ionalit ies, creed, and gender collaborat e. Free availabilit y of source code and a well- underst ood UNI X- like applicat ion program m ing environm ent have cont ribut ed t o it s runaway success. Highqualit y support from expert s available inst ant ly over t he I nt ernet at no charge has also played a m aj or role in st it ching t oget her a huge Linux com m unit y. Developers get incredibly excit ed about working on t echnologies where t hey have access t o all t he sources because t hat let s t hem creat e innovat ive solut ions. You can, for exam ple, hack t he sources and cust om ize Linux t o boot in a few seconds on your device, a feat t hat is hard t o achieve wit h a propriet ary operat ing syst em .

Evolut ion Linux st art ed as t he hobby of a Finnish college st udent nam ed Linus Torvalds in 1991, but quickly m et am orphed int o an advanced operat ing syst em popular all over t he planet . From it s first release for t he I nt el 386 processor, t he kernel has gradually grown in com plexit y t o support num erous archit ect ures, m ult iprocessor hardware, and high- perform ance clust ers. The full list of support ed CPUs is long, but som e of t he m aj or support ed archit ect ures are x86, I A64, ARM, PowerPC, Alpha, s390, MI PS, and SPARC. Linux has been port ed t o hundreds of hardware plat form s built around t hese processors. The kernel is cont inuously get t ing bet t er, and t he evolut ion is progressing at a frant ic pace. Alt hough it st art ed life as a deskt op- operat ing syst em , Linux has penet rat ed t he em bedded and ent erprise worlds and is t ouching our daily lives. When you push t he but t ons on your handheld, flip your rem ot e t o t he weat her channel, or visit t he hospit al for a physical checkup, it 's increasingly likely t hat som e Linux code is being set int o m ot ion t o com e t o your service. Linux's free availabilit y is helping it s evolut ion as m uch as it s t echnical superiorit y. Whet her it 's an init iat ive t o develop sub- $100 com put ers t o enable t he world's poor or pricing pressure in t he consum er elect ronics space, Linux is t oday's operat ing syst em of choice, because propriet ary operat ing syst em s som et im es cost m ore t han t he desired price of t he com put ers t hem selves.

Ch a pt e r 1 . I n t r odu ct ion I n Th is Ch a pt e r 2 Evolut ion 3 The GNU Copyleft 4 Kernel.org 4 Mailing List s and Forum s 5 Linux Dist ribut ions 6 Looking at t he Sources 10 Building t he Kernel 12 Loadable Modules 14 Before St art ing

Linux lures. I t has t he ent icing arom a of an int ernat ionalist proj ect where people of all nat ionalit ies, creed, and gender collaborat e. Free availabilit y of source code and a well- underst ood UNI X- like applicat ion program m ing environm ent have cont ribut ed t o it s runaway success. Highqualit y support from expert s available inst ant ly over t he I nt ernet at no charge has also played a m aj or role in st it ching t oget her a huge Linux com m unit y. Developers get incredibly excit ed about working on t echnologies where t hey have access t o all t he sources because t hat let s t hem creat e innovat ive solut ions. You can, for exam ple, hack t he sources and cust om ize Linux t o boot in a few seconds on your device, a feat t hat is hard t o achieve wit h a propriet ary operat ing syst em .

Evolut ion Linux st art ed as t he hobby of a Finnish college st udent nam ed Linus Torvalds in 1991, but quickly m et am orphed int o an advanced operat ing syst em popular all over t he planet . From it s first release for t he I nt el 386 processor, t he kernel has gradually grown in com plexit y t o support num erous archit ect ures, m ult iprocessor hardware, and high- perform ance clust ers. The full list of support ed CPUs is long, but som e of t he m aj or support ed archit ect ures are x86, I A64, ARM, PowerPC, Alpha, s390, MI PS, and SPARC. Linux has been port ed t o hundreds of hardware plat form s built around t hese processors. The kernel is cont inuously get t ing bet t er, and t he evolut ion is progressing at a frant ic pace. Alt hough it st art ed life as a deskt op- operat ing syst em , Linux has penet rat ed t he em bedded and ent erprise worlds and is t ouching our daily lives. When you push t he but t ons on your handheld, flip your rem ot e t o t he weat her channel, or visit t he hospit al for a physical checkup, it 's increasingly likely t hat som e Linux code is being set int o m ot ion t o com e t o your service. Linux's free availabilit y is helping it s evolut ion as m uch as it s t echnical superiorit y. Whet her it 's an init iat ive t o develop sub- $100 com put ers t o enable t he world's poor or pricing pressure in t he consum er elect ronics space, Linux is t oday's operat ing syst em of choice, because propriet ary operat ing syst em s som et im es cost m ore t han t he desired price of t he com put ers t hem selves.

Th e GN U Copyle ft The GNU proj ect ( GNU is a recursive acronym for GNU's Not UNI X) predat es Linux and was launched t o develop a free UNI X- like operat ing syst em . A com plet e GNU operat ing syst em is powered by t he Linux kernel but also cont ains com ponent s such as libraries, com pilers, and ut ilit ies. A Linux- based com put er is hence m ore accurat ely a GNU/ Linux syst em . All com ponent s of a GNU/ Linux syst em are built using free soft ware. There are different flavors of free soft ware. One such flavor is called public dom ain soft ware. Soft ware released under t he public dom ain is not copyright ed, and no rest rict ions are im posed on it s usage. You can use it for free, m ake changes t o it , and even rest rict t he dist ribut ion of your m odified sources. As you can see, t he " no rest rict ions" clause int roduces t he power t o int roduce rest rict ions downst ream . The Free Soft ware Foundat ion, t he prim ary sponsor of t he GNU proj ect , creat ed t he GNU Public License ( GPL) , also called a copy left, t o prevent t he possibilit y of m iddlem en t ransform ing free soft ware int o propriet ary soft ware. Those who m odify copyleft ed soft ware are required t o also copyleft t heir derived work. The Linux kernel and m ost com ponent s of a GNU syst em such as t he GNU Com piler Collect ion ( GCC) are released under t he GPL. So, if you m ake m odificat ions t o t he kernel, you have t o ret urn your changes back t o t he com m unit y. Essent ially, you have t o pass on t he right s vest ed on you by t he copyleft .

The Linux kernel is licensed under GPL version 2. There is an ongoing debat e in t he kernel com m unit y about whet her t he kernel should m ove t o GPLv3, t he lat est version of t he GPL. The current t ide seem s t o be against relicensing t he kernel t o adopt GPLv3.

Linux applicat ions t hat invoke syst em calls t o access kernel services are not considered derived work, however, and won't be rest rict ed by t he GPL. Sim ilarly, libraries are covered by a less- st ringent license called t he GNU Lesser General Public License ( LGPL) . Propriet ary soft ware is perm it t ed t o dynam ically link wit h libraries released under t he LGPL.

Ke r n e l.or g The prim ary reposit ory of Linux kernel sources is www.kernel.org. The websit e cont ains all released kernel versions. A num ber of websit es around t he world m irror t he cont ent s of kernel.org. I n addit ion t o released kernels, kernel.org also host s a set of pat ches m aint ained by front - line developers t hat serve as a t est bed for fut ure st able releases. A pat ch is a t ext file cont aining source code differences bet ween a developm ent t ree and t he original snapshot from which t he developer st art ed work. A popular pat ch- set available at kernel.org is t he -mm pat ch periodically released by Andrew Mort on, t he lead m aint ainer of t he Linux kernel. You will find experim ent al feat ures in t he -mm pat ch t hat have not yet m ade it t o t he m ainline source t ree. Anot her pat ch- set periodically released on kernel.org is t he –rt ( real t im e) pat ch m aint ained by I ngo Molnar. Several –rt feat ures have been m erged int o t he m ainline kernel.

M a ilin g List s a n d For u m s The Linux Kernel Mailing List ( LKML) is t he forum where developers debat e on design issues and decide on fut ure feat ures. You can find a real- t im e feed of t he m ailing list at www.lkm l.org. The kernel now cont ains several m illion lines of code cont ribut ed by t housands of developers all over t he world. LKML act s as t he t hread t hat t ies all t hese developers t oget her. LKML is not for general Linux quest ions. The basic rule is t o post only quest ions pert aining t o kernel developm ent t hat have not been previously answered in t he m ailing list or in popularly available docum ent at ion. I f t he C com piler crashed while com piling your Linux applicat ion, you should post t hat quest ion elsewhere. Discussions in som e LKML t hreads are m ore int erest ing t han a New York Tim es best seller. Spend a few hours browsing LKML archives t o get an insight int o t he philosophy behind t he Linux kernel. Most subproj ect s in t he kernel have t heir own specific m ailing list s. So, subscribe t o t he linux- m t d m ailing list if you are developing a Linux flash m em ory driver or init iat e a t hread in t he linux- usb- devel m ailing list if you t hink you have found a bug in t he USB m ass st orage driver. We refer t o relevant m ailing list s at t he end of several chapt ers. I n various forum s, kernel expert s from around t he globe gat her under one roof. The Linux Sym posium held annually at Ot t awa, Canada, is one such conference. Ot hers include t he Linux Kongress t hat t akes place in Germ any and linux.conf.au organized in Aust ralia. There are also num erous com m ercial Linux forum s where indust ry leaders m eet and share t heir insight s. An exam ple is t he LinuxWorld Conference and Expo held yearly in Nort h Am erica. For t he lat est news from t he developer com m unit y, check out ht t p: / / lwn.net / . I f you want t o glean t he highlight s of t he lat est kernel release wit hout m any crypt ic references t o kernel int ernals, t his m ight be a good place t o look. You can find anot her web com m unit y t hat discusses current kernel t opics at ht t p: / / kernelt rap.org/ . Wit h every m aj or kernel release, you will see sweeping im provem ent s, be it kernel preem pt ion, lock- free readers, new services t o offload work from int errupt handlers, or support for new archit ect ures. St ay in const ant t ouch wit h t he m ailing list s, websit es, and forum s, t o keep yourself in t he t hick of t hings.

Lin u x D ist r ibu t ion s Because a GNU/ Linux syst em consist s of num erous ut ilit ies, program s, libraries, and t ools, in addit ion t o t he kernel, it 's a daunt ing t ask t o acquire and correct ly inst all all t he pieces. Linux dist ribut ions com e t o t he rescue by classifying t he com ponent s and bundling t hem int o packages in an orderly fashion. A t ypical dist ribut ion cont ains t housands of ready- m ade packages. You need not worry about downloading t he right program versions or fixing dependency issues. Because packaging is a way t o m ake a lot of m oney wit hin t he am bit of t he GNU license, t here are several Linux dist ribut ions in t he m arket t oday. Dist ribut ions such as Red Hat / Fedora, Debian, SuSE, Slackware, Gent oo, Ubunt u, and Mandriva are prim arily m eant for t he deskt op user. Mont aVist a, Tim eSys, and Wind River dist ribut ions are geared t oward em bedded developm ent . Em bedded Linux dist ribut ions also include a dynam ically configurable com pact applicat ion- set t o t ailor t he syst em 's foot print t o suit resource const raint s. I n addit ion t o packaging, dist ribut ions offer value- adds for kernel developm ent . Many proj ect s st art developm ent based on kernels supplied by a dist ribut ion rat her t han a kernel released officially at kernel.org. Reasons for t his include t he following:

Linux dist ribut ions t hat com ply wit h st andards relevant t o your device's indust ry dom ain are oft en bet t er st art ing point s for developm ent . Special I nt erest Groups ( SI Gs) have t aken shape t o prom ot e Linux in various dom ains. The Consum er Elect ronics Linux Forum ( CELF) , host ed at www.celinuxforum .org, focuses on using Linux on consum er elect ronics devices. The CELF specificat ion defines t he support level of feat ures such as scalable foot print , fast boot , execut e in place, and power m anagem ent , desirable on consum er elect ronics devices. The effort s of t he Open Source Developm ent Lab ( OSDL) , host ed at www.osdl.org, cent ers on charact erist ics dist inct t o carrier- grade devices. OSDL's Carrier Grade Linux ( CGL) specificat ion codifies value addit ions such as reliabilit y, high availabilit y, runt im e pat ching, and enhanced error recovery, im port ant in t he t elecom space.

The m ainline kernel m ight not include full support for t he em bedded cont roller of your choice even if t he cont roller is built around a kernel- support ed CPU core. A Linux dist ribut ion m ight offer device drivers for all t he peripheral m odules inside t he cont roller, however.

Debugging t ools t hat you plan t o use during kernel developm ent m ay not be part of t he m ainline kernel. For exam ple, t he kernel has no built - in debugger support . I f you want t o use a kernel debugger during developm ent , you have t o separat ely download and apply t he corresponding pat ches. You have t o endure m ore hassles if t est ed pat ches are not readily available for your kernel version. Dist ribut ions prepackage m any useful debugging feat ures, so you can st art using t hem right away.

Som e dist ribut ions provide legal indem nificat ion so t hat your com pany won't be liable for lawsuit s arising out of kernel bugs.

Dist ribut ions t end t o do a lot of t est ing on t he kernels t hey release. [ 1]

[ 1]

Because t his necessit at es freezing t he kernel t o a version t hat is not t he lat est , dist ribut ion- supplied kernels oft en cont ain backport s of som e feat ures released in lat er official kernels.

You can purchase service and support packages from dist ribut ion vendors for kernels t hat t hey supply.

Look in g a t t h e Sou r ce s Before we st art wet t ing our t oes in t he kernel, let 's download t he sources, learn t o apply a pat ch, and look at t he layout of t he code t ree. First , go t o www.kernel.org and get t he lat est st able t ree. The sources are archived as t ar files com pressed in bot h gzip ( .gz) and bzip2 ( .bz2) form at s. Obt ain t he source files by uncom pressing and unt arring t he zipped t ar ball. I n t he following com m ands, replace X.Y.Z wit h t he lat est kernel version, such as 2.6.23: bash> cd /usr/src bash> wget www.kernel.org/pub/linux/kernel/vX.Y/linux-X.Y.Z.tar.bz2 ... bash> tar xvfj linux-X.Y.Z.tar.bz2

Now t hat you have t he unpacked source t ree in / usr/ src/ linux- X.Y.Z/ on your syst em , let 's enable som e experim ent al t est feat ures int o t he t ree by get t ing a corresponding -mm ( Andrew Mort on) pat ch: Code View: bash> cd /usr/src bash> wget www.kernel.org/pub/linux/kernel/people/akpm/patches/X.Y/X.Y.Z/X.Y.Zmm2/X.Y.Z-mm2.bz2

Apply t he pat ch: bash> cd /usr/src/linux-X.Y.Z/ bash> bzip2 -dc ../X.Y.Z-mm2.bz2 | patch -p1

The -dc opt ion asks bzip2 t o uncom press t he specified files t o st andard out put . This is piped t o t he pat ch ut ilit y, which applies changes t o each m odified file in t he code t ree. I f you need t o apply m ult iple pat ches, do so in t he right sequence. To generat e a kernel enabled wit h t he X.Y.Z-aa-bb pat ch, first download t he full X.Y.Z kernel sources, apply t he X.Y.Z-aa pat ch, and t hen apply t he X.Y.Z-aa-bb pat ch.

Pat ch Subm ission To generat e a kernel pat ch out of your changes, use t he diff com m and: Code View: bash> diff –Nur /path/to/original/kernel /path/to/your/kernel > changes.patch

Not e t hat t he original kernel precedes t he changed version in t he diff- ing order. As per 2.6 kernel pat ch subm ission convent ions, you also need t o add a line at t he end of t he pat ch t hat says t his: Signed-off-by: Name

Wit h t his, you cert ify t hat you wrot e t he code yourself and t hat you have t he right t o cont ribut e it . You are now all set t o post your pat ch t o t he relevant m ailing list , such as LKML. Look at Docum ent at ion/ Subm it t ingPat ches for a guide on creat ing pat ches for subm ission and at Docum ent at ion/ applying- pat ches.t xt for a t ut orial on applying pat ches.

Now t hat your pat ched / usr/ src/ linux- X.Y.Z/ t ree is ready for use, let 's t ake a m om ent t o observe how t he source layout is organized. Go t o t he root of t he source t ree and list it s cont ent s. The direct ories branching out from t he root of t he code t ree are as follows:

1 . a r ch . This direct ory cont ains archit ect ure- specific files. You will see separat e subdirect ories under arch/ for processors such as ARM, Mot orola 68K, s390, MI PS, Alpha, SPARC, and I A64.

2 . block . This prim arily cont ains t he im plem ent at ion of I / O scheduling algorit hm s for block st orage devices.

3 . cr ypt o. This direct ory im plem ent s cipher operat ions and t he crypt ographic API , used, for exam ple, by som e WiFi device drivers for im plem ent ing encrypt ion algorit hm s.

4 . D ocu m e n t a t ion . This direct ory has brief descript ions of various kernel subsyst em s. This can be your first st op t o dig for answers t o kernel- relat ed queries.

5 . dr ive r s. Device drivers for num erous device classes and peripheral cont rollers reside in t his direct ory. The device classes include charact er, serial, I nt er- I nt egrat ed Circuit ( I 2 C) , Personal Com put er Mem ory Card I nt ernat ional Associat ion ( PCMCI A) , Peripheral Com ponent I nt erconnect ( PCI ) , Universal Serial Bus ( USB) , video, audio, block, I nt egrat ed Drive Elect ronics ( I DE) , Sm all Com put er Syst em I nt erface ( SCSI ) , CDROM, net work adapt ers, Asynchronous Transfer Mode ( ATM) , Bluet oot h, and Mem ory Technology Devices ( MTD) . Each of t hese classes live in a separat e subdirect ory under drivers/ . You will, for inst ance, find PCMCI A driver sources inside t he drivers/ pcm cia/ direct ory and MTD drivers inside t he drivers/ m t d/ direct ory. The subdirect ories present under drivers/ const it ut e t he prim ary subj ect s for t his book.

6.

6 . fs. This direct ory cont ains t he im plem ent at ion of filesyst em s such as EXT3, EXT4, reiserfs, FAT, VFAT, sysfs, procfs, isofs, JFFS2, XFS, NTFS, and NFS.

7 . in clu de . Kernel header files live here. Subdirect ories prefixed wit h asm cont ain headers specific t o t he part icular archit ect ure. So t he direct ory include/ asm - x86/ cont ains header files pert aining t o t he x86 archit ect ure, whereas include/ asm - arm / holds headers for t he ARM archit ect ure.

8 . in it . This direct ory cont ains high- level init ializat ion and st art up code.

9 . ipc. This cont ains support for I nt er- Process Com m unicat ion ( I PC) m echanism s such as m essage queues, sem aphores, and shared m em ory.

1 0 . k e r n e l. The archit ect ure- independent port ions of t he base kernel can be found here.

1 1 . lib. Library rout ines such as generic kernel obj ect ( kobj ect ) handlers and Cyclic Redundancy Code ( CRC) com put at ion funct ions st ay here.

1 2 . m m . The m em ory m anagem ent im plem ent at ion lives here.

1 3 . n e t . Net working prot ocols reside under t his direct ory. Prot ocols im plem ent ed include I nt ernet Prot ocol version 4 ( I Pv4) , I Pv6, I nt ernet work Prot ocol eXchange ( I PX) , Bluet oot h, ATM, I nfrared, Link Access Procedure Balanced ( LAPB) , and Logical Link Cont rol ( LLC) .

1 4 . script s. Script s used during kernel build reside here.

1 5 . se cu r it y. This direct ory cont ains t he fram ework for securit y.

1 6 . sou n d. The Linux audio subsyst em is based in t his direct ory.

1 7 . u sr . This current ly cont ains t he init ram fs im plem ent at ion.

Unified x86 Archit ect ure Tree St art ing wit h t he 2.6.24 kernel release, t he i386 and t he x86_64 ( t he 64- bit cousin of t he 32- bit i386) archit ect ure- specific t rees have been unified int o a com m on arch/ x86/ direct ory. I f you are using a pre- 2.6.24 kernel, replace references t o arch/ x86/ in t his book wit h arch/ i386/ . Sim ilarly, change any occurrence of include/ asm - x86/ t o include/ asm - i386/ . Som e filenam es wit hin t hese direct ories have also changed.

Wading t hrough t hese large direct ories in search of sym bols and ot her code elem ent s can be a t ough t ask. The t ools in Table 1.1 are wort hy aids as you navigat e t he kernel source t ree.

Ta ble 1 .1 . Tools Th a t Aid Sou r ce Tr e e N a viga t ion Tool

D e scr ipt ion

lxr

The Linux cross- referencer, lxr, downloadable from ht t p: / / lxr.sourceforge.net / , let s you t raverse t he kernel t ree using a web browser by providing hyperlinks t o connect kernel sym bols wit h t heir definit ions and uses.

cscope

cscope, host ed at ht t p: / / cscope.sourceforge.net / , builds a sym bolic dat abase from all files in a source t ree, so you can quickly locat e declarat ions, definit ions, regular expressions, and m ore. Cscope m ight not be as versat ile as lxr, but it gives you t he flexibilit y of using t he search feat ures of your favorit e t ext edit or rat her t han a browser. From t he root of your kernel t ree, issue t he cscope -qkRv com m and t o build t he cross- reference dat abase. The -q opt ion generat es m ore indexing inform at ion, so searches becom e not iceably fast er at t he expense of ext ra init ial st art up t im e. The –k opt ion request s cscope t o t une it s behavior t o suit kernel sources, -R asks for recursive subdirect ory t raversal, and –v appeals for verbose m essages. You can obt ain t he det ailed invocat ion synt ax from t he m an page.

ct ags/ et ags

The ct ags ut ilit y, downloadable from ht t p: / / ct ags.sourceforge.net / , generat es cross- reference t ags for m any languages, so you can locat e sym bol and funct ion definit ions in a source t ree from wit hin an edit or such as vi. Do make tags from t he root of your kernel t ree t o ct ag all source files. The et ags ut ilit y generat es sim ilar indexing inform at ion underst ood by t he em acs edit or. I ssue make TAGS t o et ag your kernel source files.

Ut ilit ies

Tools such as grep, find, sdiff, st race, od, dd, m ake, t ar, file, and obj dum p.

GCC opt ions

You m ay ask GCC t o generat e preprocessed source code using t he -E opt ion. Preprocessed code cont ains header file expansions and reduces t he need t o hop- skip t hrough nest ed include files t o expand m ult iple levels of m acros. Here is a usage exam ple t o preprocess drivers/ char/ m ydrv.c and produce expanded out put in m ydrv.i: bash> gcc -E drivers/char/mydrv.c -D__KERNEL__ -Iinclude -Iinclude/asm-x86/mach-default > mydrv.i

The include pat hs specified using t he -I opt ion depend on t he header files included by your code. GCC generat es assem bly list ings if you use t he -S opt ion. To generat e an assem bly list ing in m ydrv.s for drivers/ char/ m ydrv.c, do t his: bash> gcc -S drivers/char/mydrv.c -D__KERNEL__ -Iinclude -Ianother/include/path

Bu ildin g t h e Ke r n e l Now t hat you have an idea of t he source t ree layout , let 's m ake a t rivial code change, com pile, and get it running. Go t o t he t op- level init / direct ory and vent ure t o m ake a sm all code change t o t he init ializat ion file m ain.c. Add a print st at em ent t o t he beginning of t he funct ion, start_kernel(), declaring your love for polar bears: asmlinkage void __init start_kernel(void) { char *command_line; extern struct kernel_param __start___param[], __stop___param[]; +

printk("Penguins are cute, but so are polar bears\n"); /* ... */ rest_init();

}

You're now ready t o kick off t he build process. Go t o t he root of t he source t ree and st art wit h a clean slat e: bash> cd /usr/src/linux-X.Y.Z/ bash> make clean

Configure t he kernel. This is when you pick and choose t he pieces t hat form part of t he operat ing syst em . You m ay specify whet her each desired com ponent is t o be st at ically or dynam ically linked t o t he kernel: bash> make menuconfig

menuconfig is a t ext int erface t o t he kernel configurat ion m enu. Use make xconfig t o get a graphical int erface. The configurat ion inform at ion t hat you choose is saved in a file nam ed .config in t he root of your source t ree. I f you don't want t o weave t he configurat ion from scrat ch, use t he file arch/ your- arch/ defconfig ( or arch/ yourarch/ configs/ your- m achine_defconfig if t here are several support ed plat form s for your archit ect ure) as t he st art ing point . So, if you are com piling t he kernel for t he 32- bit x86 archit ect ure, do t his: bash> cp arch/x86/configs/i386_defconfig .config

Com pile t he kernel and generat e a com pressed boot im age: bash> make bzImage

The kernel im age is produced in arch/ x86/ boot / bzI m age. Updat e your boot part it ion: bash> cp arch/x86/boot/bzImage /boot/vmlinuz

You m ight need t o alert your boot loader about t he arrival of t he new boot im age. I f you are using t he GRUB boot loader, it figures t his out aut om at ically; but if you are using LI LO, raise a flag:

bash> /sbin/lilo Added linux *

Finally, rest art t he m achine and boot in t o your new kernel: bash> reboot

The first m essage in t he boot sequence launches your cam paign for polar bears.

Loa da ble M odu le s Because Linux runs on a variet y of archit ect ures and support s zillions of I / O devices, it 's not feasible t o perm anent ly com pile support for all possible devices int o t he base kernel. Dist ribut ions generally package a m inim al kernel im age and supply t he rest of t he funct ionalit ies in t he form of kernel m odules. During runt im e, t he necessary m odules are dynam ically loaded on dem and. To generat e m odules, go t o t he root of your kernel source t ree and build: bash> cd /usr/src/linux-X.Y.Z/ bash> make modules

To inst all t he produced m odules, do t his: bash> make modules_install

This creat es a kernel source direct ory st ruct ure under / lib/ m odules/ X.Y.Z/ kernel/ and populat es it wit h loadable m odule obj ect s. This also invokes t he depm od ut ilit y t hat generat es m odule dependencies in t he file / lib/ m odules/ X.Y.Z/ m odules.dep. The following ut ilit ies are available t o m anipulat e m odules: insm od , rm m od, lsm od, m odprobe, m odinfo, and depm od. The first t wo are ut ilit ies t o insert and rem ove m odules, whereas lsm od list s t he m odules t hat are current ly loaded. m odprobe is a cleverer version of insm od t hat also insert s dependent m odules aft er exam ining t he cont ent s of / lib/ m odules/ X.Y.Z/ m odules.dep. For exam ple, assum e t hat you need t o m ount a Virt ual File Allocat ion Table ( VFAT) part it ion present on a USB pen drive. Use m odprobe t o load t he VFAT filesyst em driver: [ 2]

[ 2] This exam ple assum es t hat t he m odule is not aut oloaded by t he kernel. I f you enable Aut om at ic Kernel Module Loading (CONFIG_KMOD) during configurat ion, t he kernel aut om at ically runs m odprobe wit h t he appropriat e argum ent s when it det ect s m issing subsyst em s. You'll learn about m odule aut oloading in Chapt er 4 , " Laying t he Groundwork."

bash> modprobe vfat bash> lsmod Module Size vfat 14208 fat 49052 nls_base 9728

Used by 0 1 vfat 2 vfat, fat

As you see in t he lsm od out put , m odprobe insert s t hree m odules rat her t han one. m odprobe first figures out t hat it has t o insert / lib/ m odules/ X.Y.Z/ kernel/ fs/ vfat / vfat .ko. But when it peeks int o t he dependency file / lib/ m odules/ X.Y.Z/ m odules.dep, it finds t he following line and realizes t hat it has t o load t wo ot her dependent m odules first : /lib/modules/X.Y.Z/kernel/fs/vfat/vfat.ko: /lib/modules/X.Y.Z/kernel/fs/fat/fat.ko /lib/modules/X.Y.Z/kernel/fs/nls/nls_base.ko

I t t hen proceeds t o load fat .ko and nls_base.ko before at t em pt ing t o insert vfat .ko, t hus aut om at ically loading all t he m odules you need t o m ount your VFAT part it ion.

Use t he m odinfo ut ilit y t o ext ract verbose inform at ion about t he m odules you j ust loaded: bash> modinfo vfat filename: /lib/modules/X.Y.Z/kernel/fs/vfat/vfat.ko license: GPL description: VFAT filesystem support ... depends: fat, nls_base

To com pile a kernel driver as a m odule, t oggle t he corresponding m enu choice but t on t o while configuring t he kernel. Most of t he device driver exam ples in t his book are im plem ent ed as kernel m odules. To build a m odule m ym odule.ko from it s source file m ym odule.c, creat e a one- line Makefile and execut e it as follows: bash> cd /path/to/module-source/ bash> echo "obj-m += mymodule.ko" > Makefile bash> make –C /path/to/kernel-sources/ M=`pwd` modules make: Entering directory '/path/to/kernel-sources' Building modules, stage 2. MODPOST CC /path/to/module-sources/mymodule.mod.o LD [M] /path/to/module-sources/mymodule.ko make: Leaving directory '/path/to/kernel-sources' bash> insmod ./mymodule.ko

Kernel m odules render t he kernel foot print sm aller and t he develop- build- t est cycle short er. You only need t o recom pile t he part icular m odule and reinsert it t o effect a change. We look at m odule debugging t echniques in Chapt er 21, " Debugging Device Drivers." There are also som e downsides if you choose t o design your driver as a kernel m odule. Unlike built - in drivers, m odules cannot reserve resources during boot t im e, when success is m ore or less guarant eed.

Be for e St a r t in g Linux has t rekked m any a t errain and is now st at e of t he art , so you can use it as a vehicle t o underst and operat ing syst em concept s, processor archit ect ures, and even indust ry dom ains. When you learn a t echnique used by a device driver subsyst em , look one level deeper and probe t he underlying reasons behind t hat design choice. Wherever not explicit ly st at ed, t he t ext assum es t he 32- bit x86 archit ect ure. The book is, however, m indful of t he fact t hat you are m ore likely t o writ e device drivers for em bedded devices t han for convent ional PCcom pat ible syst em s. The chapt er on serial drivers, for exam ple, exam ines t wo devices: a t ouch cont roller on a PC derivat ive and a UART on a cell phone. Or t he chapt er on I 2 C device drivers looks at an EEPROM on a PC syst em and a Real Tim e Clock on an em bedded device. The book also t eaches you about t he core infrast ruct ure t hat t he kernel provides for m ost driver classes, which hides archit ect ure dependencies from device drivers. Device driver debugging t echniques are discussed near t he end of t he book in Chapt er 21, so you m ight find it wort hwhile t o forward t o t hat chapt er as you develop drivers while reading t he book. This book is based on t he 2.6 kernel, which has subst ant ial changes across t he board from 2.4, t ouching all m aj or subsyst em s. Hopefully, you have inst alled a 2.6- based Linux on your syst em by now and st art ed experim ent ing wit h t he kernel sources. Each chapt er t akes t he libert y of profusely point ing you t o relevant kernel source files for t wo m ain reasons:

1 . Because each driver subsyst em in t he kernel is t ens of t housands of lines long, it 's only possible t o t ake a relat ively sim plist ic view in a book. Looking at real drivers in t he sources along wit h t he exam ple code in t his book will give you t he bigger pict ure.

2 . Before developing a driver, it 's a good idea t o zero in on an exist ing driver in t he drivers/ direct ory t hat is sim ilar t o your requirem ent and m ake t hat your st art ing point .

So, t o derive m axim um benefit from t his book, fam iliarize yourself wit h t he kernel by frequent ly browsing t he source t ree and st aring hard at t he code. And in t andem wit h your code explorat ions, follow t he goings- on in t he kernel m ailing list .

Ch a pt e r 2 . A Pe e k I n side t h e Ke r n e l I n Th is Ch a pt e r 18 Boot ing Up 30 Kernel Mode and User Mode 30 Process Cont ext and I nt errupt Cont ext 31 Kernel Tim ers 39 Concurrency in t he Kernel 49 Process Filesyst em 49 Allocat ing Mem ory 52 Looking at t he Sources

Before we st art our j ourney int o t he m yst ical world of Linux device drivers, let 's fam iliarize ourselves wit h som e basic kernel concept s by looking at several kernel regions t hrough t he lens of a driver developer. We learn about kernel t im ers, synchronizat ion m echanism s, and m em ory allocat ion. But let 's st art our expedit ion by get t ing a view from t he t op; let 's skim t hrough boot m essages em it t ed by t he kernel and hit t he breaks whenever som et hing looks int erest ing.

Boot in g Up Figure 2.1 shows t he Linux boot sequence on an x86- based com put er. Linux boot on x86- based hardware is set int o m ot ion when t he BI OS loads t he Mast er Boot Record ( MBR) from t he boot device. Code resident in t he MBR looks at t he part it ion t able and reads a Linux boot loader such as GRUB, LI LO, or SYSLI NUX from t he act ive

part it ion. The final st age of t he boot loader loads t he com pressed kernel im age and passes cont rol t o it . The kernel uncom presses it self and t urns on t he ignit ion.

Figu r e 2 .1 . Lin u x boot se qu e n ce on x 8 6 - ba se d h a r dw a r e .

x86- based processors have t wo m odes of operat ion, real m ode and pr ot ect ed m ode. I n real m ode, you can access only t he first 1MB of m em ory, t hat t oo wit hout any prot ect ion. Prot ect ed m ode is sophist icat ed and let s you t ap int o m any advanced feat ures of t he processor such as paging. The CPU has t o pass t hrough real m ode en rout e t o prot ect ed m ode. This road is a one- way st reet , however. You can't swit ch back t o real m ode from prot ect ed m ode. The first - level kernel init ializat ions are done in real m ode assem bly. Subsequent st art up is perform ed in prot ect ed m ode by t he funct ion start_kernel() defined in init / m ain.c, t he source file you m odified in t he previous chapt er. start_kernel() begins by init ializing t he CPU subsyst em . Mem ory and process m anagem ent are put in place soon aft er. Peripheral buses and I / O devices are st art ed next . As t he last st ep in t he boot sequence, t he init program , t he parent of all Linux processes, is invoked. I nit execut es user- space script s t hat st art necessary kernel services. I t finally spawns t erm inals on consoles and displays t he login prom pt . Each following sect ion header is a m essage from Figure 2.2 generat ed during boot progression on an x86- based

lapt op. The sem ant ics and t he m essages m ay change if you are boot ing t he kernel on ot her archit ect ures. I f som e explanat ions in t his sect ion sound rat her crypt ic, don't worry; t he int ent here is only t o give you a pict ure from 100 feet above and t o let you savor a first t ast e of t he kernel's flavor. Many concept s m ent ioned here in passing are covered in dept h lat er on. Figu r e 2 .2 . Ke r n e l boot m e ssa ge s. Code View: Linux version 2.6.23.1y ([email protected]) (gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)) #7 SMP PREEMPT Thu Nov 1 11:39:30 IST 2007 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) ... 758MB LOWMEM available. ... Kernel command line: ro root=/dev/hda1 ... Console: colour VGA+ 80x25 ... Calibrating delay using timer specific routine.. 1197.46 BogoMIPS (lpj=2394935) ... CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 1024K ... Checking 'hlt' instruction... OK. ... Setting up standard PCI resources ... NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 9, 2097152 bytes) ... checking if image is initramfs... it is Freeing initrd memory: 387k freed ... io scheduler noop registered io scheduler anticipatory registered (default) ... 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a NS16550A ... Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH4: IDE controller at PCI slot 0000:00:1f.1 Probing IDE interface ide0... hda: HTS541010G9AT00, ATA DISK drive hdc: HL-DT-STCD-RW/DVD DRIVE GCC-4241N, ATAPI CD/DVD-ROM drive ... serio: i8042 KBD port at 0x60,0x64 irq 1 mice: PS/2 mouse device common for all mice ... Synaptics Touchpad, model: 1, fw: 5.9, id: 0x2c6ab1, caps: 0x884793/0x0 ... agpgart: Detected an Intel 855GM Chipset. ... Intel(R) PRO/1000 Network Driver - version 7.3.20-k2 ... ehci_hcd 0000:00:1d.7: EHCI Host Controller ...

Yenta: CardBus bridge found at 0000:02:00.0 [1014:0560] ... Non-volatile memory driver v1.2 ... kjournald starting. Commit interval 5 seconds EXT3 FS on hda2, internal journal EXT3-fs: mounted filesystem with ordered data mode. ... INIT: version 2.85 booting ...

BI OS- Pr ovide d Ph ysica l RAM M a p The kernel assem bles t he syst em m em ory m ap from t he BI OS, and t his is one of t he first boot m essages you will see: BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) ... BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)

Real m ode init ializat ion code uses t he BI OS int 0x15 service wit h funct ion num ber 0xe820( hence t he st ring BIOS-e820 in t he preceding m essage) t o obt ain t he syst em m em ory m ap. The m em ory m ap indicat es reserved and usable m em ory ranges, which is subsequent ly used by t he kernel t o creat e it s free m em ory pool. We discuss m ore on t he BI OS- supplied m em ory m ap in t he sect ion " Real Mode Calls" in Appendix B, " Linux and t he BI OS."

7 5 8 M B LOW M EM Ava ila ble The norm ally addressable kernel m em ory region ( < 896MB) is called low m em ory. The kernel m em ory allocat or, kmalloc(), ret urns m em ory from t his region. Mem ory beyond 896MB ( called high m em ory) can be accessed only using special m appings. During boot , t he kernel calculat es and displays t he t ot al pages present in t hese m em ory zones. We t ake a deeper look at m em ory zones lat er in t his chapt er.

Ke r n e l Com m a n d Lin e : r o r oot = / de v/ h da 1 Linux boot loaders usually pass a com m and line t o t he kernel. Argum ent s in t he com m and line are sim ilar t o t he argv[] list passed t o t he main() funct ion in C program s, except t hat t hey are passed t o t he kernel inst ead. You m ay add com m and- line argum ent s t o t he boot loader configurat ion file or supply t hem from t he boot loader prom pt at runt im e. [ 1] I f you are using t he GRUB boot loader, t he configurat ion file is eit her / boot / grub/ grub.conf or / boot / grub/ m enu.lst depending on your dist ribut ion. I f you are a LI LO user, t he configurat ion file is / et c/ lilo.conf. An exam ple grub.conf file ( wit h com m ent s added) is list ed here. You can figure out t he genesis of t he preceding boot m essage if you look at t he line following title kernel 2.6.23:

[ 1] Boot loaders on em bedded devices are usually " slim " and do not support configurat ion files or equivalent m echanism s. Because of t his, m any non- x86 archit ect ures support a kernel configurat ion opt ion called CONFIG_CMDLINE t hat you can use t o supply t he kernel com m and line at build t im e.

default 0

#Boot the 2.6.23 kernel by default

timeout 5

#5 second to alter boot order or parameters

title kernel 2.6.23 #Boot Option 1 #The boot image resides in the first partition of the first disk #under the /boot/ directory and is named vmlinuz-2.6.23. 'ro' #indicates that the root partition should be mounted read-only. kernel (hd0,0)/boot/vmlinuz-2.6.23 ro root=/dev/hda1 #Look under section "Freeing initrd memory:387k freed" initrd (hd0,0)/boot/initrd #...

Com m and- line argum ent s affect t he code pat h t raversed during boot . As a sim ple exam ple, assum e t hat t he com m and- line argum ent of int erest is called bootmode. I f t his param et er is set t o 1, you would like t o print som e debug m essages during boot and swit ch t o a runlevel of 3 at t he end of t he boot . ( Wait unt il t he boot m essages are print ed out by t he init process t o learn t he sem ant ics of runlevels.) I f bootmode is inst ead set t o 0, you would prefer t he boot t o be relat ively laconic, and t he runlevel set t o 2. Because you are already fam iliar wit h init / m ain.c, let 's add t he following m odificat ion t o it : Code View: static unsigned int bootmode = 1; static int __init is_bootmode_setup(char *str) { get_option(&str, &bootmode); return 1; } /* Handle parameter "bootmode=" */ __setup("bootmode=", is_bootmode_setup); if (bootmode) { /* Print verbose output */ /* ... */ } /* ... */ /* If bootmode is 1, choose an init runlevel of 3, else switch to a run level of 2 */ if (bootmode) { argv_init[++args] = "3"; } else { argv_init[++args] = "2"; } /* ... */

Rebuild t he kernel as you did earlier and t ry out t he change. We discuss m ore about kernel com m and- line argum ent s in t he sect ion "Mem ory Layout" in Chapt er 18, " Em bedding Linux."

Ca libr a t in g D e la y...1 1 9 7 .4 6 BogoM I PS ( lpj = 2 3 9 4 9 3 5 ) During boot , t he kernel calculat es t he num ber of t im es t he processor can execut e an int ernal delay loop in one j iffy, which is t he t im e int erval bet ween t wo consecut ive t icks of t he syst em t im er. As you would expect , t he calculat ion has t o be calibrat ed t o t he processing speed of your CPU. The result of t his calibrat ion is st ored in a kernel variable called loops_per_jiffy. One place where t he kernel m akes use of loops_per_jiffy is when a device driver desires t o delay execut ion for sm all durat ions in t he order of m icroseconds. To underst and t he delay- loop calibrat ion code, let 's t ake a peek inside calibrate_delay(), defined in init / calibrat e.c. This funct ion cleverly derives float ing- point precision using t he int eger kernel. The following snippet ( wit h som e com m ent s added) shows t he init ial port ion of t he funct ion t hat carves out a coarse value for loops_per_jiffy: loops_per_jiffy = (1 busy) { /* ... */ if (time_after(jiffies, timeout)) { return -EBUSY; } /* ... */ } return SUCCESS;

The preceding code ret urns SUCCESS if t he busy condit ion get s cleared in less t han 3 seconds, and -EBUSY ot herwise. 3*HZ is t he num ber of jiffies present in 3 seconds. The calculat ed t im eout , (jiffies + 3*HZ) , is t hus t he new value of jiffies aft er t he 3- second t im eout elapses. The time_after() m acro com pares t he current value of jiffies wit h t he request ed t im eout , t aking care t o account for wraparound due t o overflows. Relat ed funct ions available for doing sim ilar com parisons are time_before(), time_before_eq(), and time_after_eq(). jiffies is defined as v olat ile, which asks t he com piler not t o opt im ize access t o t he variable. This ensures t hat jiffies, which is updat ed by t he t im er int errupt handler during each t ick, is reread during each pass t hrough t he loop. For t he reverse conversion from jiffies t o seconds, t ake a look at t his snippet from t he USB host cont roller driver, drivers/ usb/ host / ehci- sched.c: if (stream->rescheduled) { ehci_info(ehci, "ep%ds-iso rescheduled " "%lu times in %lu seconds\n", stream->bEndpointAddress, is_in? "in": "out", stream->rescheduled, ((jiffies – stream->start)/HZ)); }

The preceding debug st at em ent calculat es t he am ount of t im e in seconds wit hin which t his USB endpoint st ream ( we discuss USB in Chapt er 11, " Universal Serial Bus" ) was rescheduled stream->rescheduled t im es. (jiffies-stream->start) is t he num ber of jiffies t hat elapsed since t he rescheduling st art ed. The division by HZ convert s t hat value int o seconds. The 32- bit jiffies variable overflows in approxim at ely 50 days, assum ing a HZ value of 1000. Because syst em upt im es can be m any t im es t hat durat ion, t he kernel provides a variable called jiffies_64 t o hold 64- bit ( u64) jiffies. The linker posit ions jiffies_64 such t hat it s bot t om 32 bit s collocat e wit h jiffies. On 32- bit m achines, t he com piler needs t wo inst ruct ions t o assign one u64 variable t o anot her, so reading jiffies_64 is not at om ic. To get around t his problem , t he kernel provides a funct ion, get_jiffies_64(). Look at cpufreq_stats_update() defined in drivers/ cpufreq/ cpufreq_st at s.c for a usage exam ple.

Lon g D e la ys I n kernel t erm s, delays in t he order of jiffies are considered long durat ions. A possible, but nonopt im al, way t o accom plish long delays is by busy- looping. A funct ion t hat busy- wait s has a dog- in- t he- m anger at t it ude. I t neit her uses t he processor for doing useful work nor let s ot hers use it . The following code hogs t he processor for 1 second: unsigned long timeout = jiffies + HZ; while (time_before(jiffies, timeout)) continue;

A bet t er approach is t o sleep- wait , inst ead of busy- wait . Your code yields t he processor t o ot hers, while wait ing for t he t im e delay t o elapse. This is done using schedule_timeout(): unsigned long timeout = jiffies + HZ; schedule_timeout(timeout); /* Allow other parts of the kernel to run */

The delay guarant ee is only on t he lower bound of t he t im eout . Whet her from kernel space or from user space, it 's difficult t o get m ore precise cont rol over t im eout s t han t he granularit y of HZ because process t im e slices are updat ed by t he kernel scheduler only during t im er t icks. Also, even if your process is scheduled t o run aft er t he

specified t im eout , t he scheduler can decide t o pick anot her process from t he run queue based on priorit ies. [ 3]

[ 3] These scheduler propert ies have changed wit h t he advent of t he CFS scheduler in t he 2.6.23 kernel. Linux process schedulers are discussed in Chapt er 19, " Drivers in User Space."

Two ot her funct ions t hat facilit at e sleep- wait ing are wait_event_timeout() and msleep(). Bot h of t hem are im plem ent ed wit h t he help of schedule_timeout(). wait_event_timeout() is used when your code desires t o resum e execut ion if a specified condit ion becom es t rue or if a t im eout occurs. msleep() sleeps for t he specified num ber of m illiseconds. Such long- delay t echniques are suit able for use only from process cont ext . Sleep- wait ing cannot be done from int errupt cont ext because int errupt handlers are not allowed t o schedule() or sleep. ( See "I nt errupt Handling" in Chapt er 4 for a list of do's and don't s for code execut ing in int errupt cont ext .) Busy- wait ing for a short durat ion is possible from int errupt cont ext , but long busy- wait ing in t hat cont ext is considered a m ort al sin. Equally t aboo is long busy- wait ing wit h int errupt s disabled. The kernel also provides t im er API s t o execut e a funct ion at a point of t im e in t he fut ure. You can dynam ically define a t im er using init_timer() or st at ically creat e one wit h DEFINE_TIMER(). Aft er t his is done, populat e a timer_list wit h t he address and param et ers of your handler funct ion, and regist er it using add_timer(): #include struct timer_list my_timer; init_timer(&my_timer); /* Also see setup_timer() */ my_timer.expire = jiffies + n*HZ; /* n is the timeout in number of seconds */ my_timer.function = timer_func; /* Function to execute after n seconds */ my_timer.data = func_parameter; /* Parameter to be passed to timer_func */ add_timer(&my_timer); /* Start the timer */

Not e t hat t his is a one- shot t im er. I f you want t o run timer_func() periodically, you also need t o add t he preceding code inside timer_func() t o schedule it self aft er t he next t im eout : static void timer_func(unsigned long func_parameter) { /* Do work to be done periodically */ /* ... */ init_timer(&my_timer); my_timer.expire = jiffies + n*HZ; my_timer.data = func_parameter; my_timer.function = timer_func; add_timer(&my_timer); }

You m ay use mod_timer() t o change t he expirat ion of my_timer, del_timer() t o cancel my_timer, and timer_pending() t o see whet her my_timer is pending at t he m om ent . I f you look at ker nel/ t im er .c, you will find t hat schedule_timeout() int ernally uses t hese sam e API s.

User- space funct ions such as clock_settime() and clock_gettime() are used t o access kernel t im er services from user space. A user applicat ion m ay use setitimer() and getitimer() t o cont rol t he delivery of an alarm signal when a specified t im eout expires.

Sh or t D e la ys I n kernel t erm s, sub- j iffy delays qualify as short durat ions. Such delays are com m only request ed from bot h process and int errupt cont ext s. Because it is not possible t o use j iffy- based m et hods t o im plem ent sub- j iffy delays, t he m et hods discussed in t he previous sect ion t o sleep- wait cannot be used for sm all t im eout s. The only solut ion is t o busy- wait . Kernel API s t hat im plem ent short delays are mdelay(), udelay(), and ndelay(), which support m illisecond, m icrosecond, and nanosecond delays, respect ively. The act ual im plem ent at ions of t hese funct ions are archit ect ure- specific and m ay not be available on all plat form s. Busy- wait ing for short durat ions is accom plished by m easuring t he t im e t he processor t akes t o execut e an inst ruct ion and looping for t he necessary num ber of it erat ions. As discussed earlier in t his chapt er, t he kernel perform s t his m easurem ent during boot and st ores t he value in a variable called loops_per_jiffy. The short delay API s use loops_per_jiffy t o decide t he num ber of t im es t hey need t o busy- loop. To achieve a 1m icrosecond delay during a handshake process, t he USB host cont roller driver, drivers/ usb/ host / ehci- hcd.c, calls udelay(), which int ernally uses loops_per_jiffy: do { result = ehci_readl(ehci, ptr); /* ... */ if (result == done) return 0; udelay(1); /* Internally uses loops_per_jiffy */ usec--; } while (usec > 0);

Pe n t iu m Tim e St a m p Cou n t e r The Tim e St am p Count er ( TSC) is a 64- bit regist er present in Pent ium - com pat ible processors t hat count s t he num ber of clock cycles consum ed by t he processor since st art up. Because t he TSC get s increm ent ed at t he rat e of t he processor cycle speed, it provides a high- resolut ion t im er. The TSC is com m only used for profiling and inst rum ent ing code. I t is accessed using t he rdtsc inst ruct ion t o m easure execut ion t im e of int ervening code wit h m icrosecond precision. TSC t icks can be convert ed t o seconds by dividing by t he CPU clock speed, which can be read from t he kernel variable, cpu_khz. I n t he following snippet , low_tsc_ticks cont ains t he lower 32 bit s of t he TSC, while high_tsc_ticks cont ains t he higher 32 bit s. The lower 32 bit s overflow in a few seconds depending on your processor speed but is sufficient for m any code inst rum ent at ion purposes as shown here: unsigned long low_tsc_ticks0, high_tsc_ticks0; unsigned long low_tsc_ticks1, high_tsc_ticks1; unsigned long exec_time; rdtsc(low_tsc_ticks0, high_tsc_ticks0); /* Timestamp before */ printk("Hello World\n"); /* Code to be profiled */ rdtsc(low_tsc_ticks1, high_tsc_ticks1); /* Timestamp after */ exec_time = low_tsc_ticks1 - low_tsc_ticks0;

exec_time m easured 871 ( or half a m icrosecond) on a 1.8GHz Pent ium box.

Support for high- resolut ion t im ers ( CONFIG_HIGH_RES_TIMERS) has been m erged wit h t he 2.6.21 kernel. I t m akes use of hardware- specific high- speed t im ers t o provide high- precision capabilit ies t o API s such as nanosleep(). On Pent ium - class m achines, t he kernel leverages t he TSC t o offer t his capabilit y.

Re a l Tim e Clock The RTC t racks absolut e t im e in nonvolat ile m em ory. On x86 PCs, RTC regist ers const it ut e t he t op few locat ions of a sm all chunk of bat t ery- powered[ 4] com plem ent ary m et al oxide sem iconduct or ( CMOS) m em ory. Look at Figure 5.1 in Chapt er 5 , " Charact er Drivers," for t he locat ion of t he CMOS in t he legacy PC archit ect ure. On em bedded syst em s, t he RTC m ight be int ernal t o t he processor, or ext ernally connect ed via t he I 2 C or SPI buses discussed in Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol."

[ 4]

RTC bat t eries last for m any years and usually out live t he life span of com put ers, so you should never have t o replace t hem .

You m ay use t he RTC t o do t he following:

Read and set t he absolut e clock, and generat e int errupt s during clock updat es. Generat e periodic int errupt s wit h frequencies ranging from 2HZ t o 8192HZ. Set alarm s

Many applicat ions need t he concept of absolut e t im e or wall t im e. Because jiffies is relat ive t o t he t im e when t he syst em boot ed, it does not cont ain wall t im e. The kernel m aint ains wall t im e in a variable called xtime. During boot , xtime is init ialized t o t he current wall t im e by reading t he RTC. When t he syst em halt s, t he wall t im e is writ t en back t o t he RTC. You can use do_gettimeofday() t o read wall t im e wit h t he highest resolut ion support ed by t he hardware: #include static struct timeval curr_time; do_gettimeofday(&curr_time); my_timestamp = cpu_to_le32(curr_time.tv_sec); /* Record timestamp */

There are also a bunch of funct ions available t o user- space code t o access wall t im e. They include t he following:

time(), which ret urns t he calendar t im e, or t he num ber of seconds since Epoch ( 00: 00: 00 on January 1, 1970)

localtime(), which ret urns t he calendar t im e in broken- down form at

mktime(), which does t he reverse of localtime()

gettimeofday(), which ret urns t he calendar t im e wit h m icrosecond precision if your plat form support s it

Anot her way t o use t he RTC from user space is via t he charact er device, / dev/ rt c. Only one process is allowed t o access t his device at a t im e. We discuss m ore about RTC drivers in Chapt er 5 and Chapt er 8 . I n Chapt er 19, we develop an exam ple user applicat ion t hat uses / dev/ rt c t o perform periodic work wit h m icrosecond precision.

Con cu r r e n cy in t h e Ke r n e l Wit h t he arrival of m ult icore lapt ops, Sym m et ric Mult i Processing ( SMP) is no longer confined t o t he realm of hit ech users. SMP and kernel preem pt ion are scenarios t hat generat e m ult iple t hreads of execut ion. These t hreads can sim ult aneously operat e on shared kernel dat a st ruct ures. Because of t his, accesses t o such dat a st ruct ures have t o be serialized. Let 's discuss t he basics of prot ect ing shared kernel resources from concurrent access. We st art wit h a sim ple exam ple and gradually int roduce com plexit ies such as int errupt s, kernel preem pt ion, and SMP.

Spin lock s a n d M u t e x e s A code area t hat accesses shared resources is called a cr it ical sect ion. Spinlocks and m ut exes ( short for m ut ual exclusion) are t he t wo basic m echanism s used t o prot ect crit ical sect ions in t he kernel. Let 's look at each in t urn. A spinlock ensures t hat only a single t hread ent ers a crit ical sect ion at a t im e. Any ot her t hread t hat desires t o ent er t he crit ical sect ion has t o rem ain spinning at t he door unt il t he first t hread exit s. Not e t hat we use t he t erm t hread t o refer t o a t hread of execut ion, rat her t han a kernel t hread. The basic usage of spinlocks is as follows: #include spinlock_t mylock = SPIN_LOCK_UNLOCKED; /* Initialize */ /* Acquire the spinlock. This is inexpensive if there * is no one inside the critical section. In the face of * contention, spinlock() has to busy-wait. */ spin_lock(&mylock); /* ... Critical Section code ... */ spin_unlock(&mylock); /* Release the lock */

I n cont rast t o spinlocks t hat put t hreads int o a spin if t hey at t em pt t o ent er a busy crit ical sect ion, m ut exes put cont ending t hreads t o sleep unt il it 's t heir t urn t o occupy t he crit ical sect ion. Because it 's a bad t hing t o consum e processor cycles t o spin, m ut exes are m ore suit able t han spinlocks t o prot ect crit ical sect ions when t he est im at ed wait t im e is long. I n m ut ex t erm s, anyt hing m ore t han t wo cont ext swit ches is considered long, because a m ut ex has t o swit ch out t he cont ending t hread t o sleep, and swit ch it back in when it 's t im e t o wake it up. I n m any cases, t herefore, it 's easy t o decide whet her t o use a spinlock or a m ut ex:

I f t he crit ical sect ion needs t o sleep, you have no choice but t o use a m ut ex. I t 's illegal t o schedule, preem pt , or sleep on a wait queue aft er acquiring a spinlock.

Because m ut exes put t he calling t hread t o sleep in t he face of cont ent ion, you have no choice but t o use spinlocks inside int errupt handlers. ( You will learn m ore about t he const raint s of t he int errupt cont ext in Chapt er 4 .)

Basic m ut ex usage is as follows: #include /* Statically declare a mutex. To dynamically create a mutex, use mutex_init() */ static DEFINE_MUTEX(mymutex); /* Acquire the mutex. This is inexpensive if there * is no one inside the critical section. In the face of * contention, mutex_lock() puts the calling thread to sleep. */ mutex_lock(&mymutex); /* ... Critical Section code ... */ mutex_unlock(&mymutex);

/* Release the mutex */

To illust rat e t he use of concurrency prot ect ion, let 's st art wit h a crit ical sect ion t hat is present only in process cont ext and gradually int roduce com plexit ies in t he following order:

1 . Crit ical sect ion present only in process cont ext on a Uniprocessor ( UP) box running a nonpreem pt ible kernel.

2 . Crit ical sect ion present in process and int errupt cont ext s on a UP m achine running a nonpreem pt ible kernel.

3 . Crit ical sect ion present in process and int errupt cont ext s on a UP m achine running a preem pt ible kernel.

4 . Crit ical sect ion present in process and int errupt cont ext s on an SMP m achine running a preem pt ible kernel.

The Old Sem aphore I nt erface The m ut ex int erface, which replaces t he older sem aphore int erface, originat ed in t he –rt t ree and was m erged int o t he m ainline wit h t he 2.6.16 kernel release. The sem aphore int erface is st ill around, however. Basic usage of t he sem aphore int erface is as follows: #include

/* Architecture dependent header */

/* Statically declare a semaphore. To dynamically create a semaphore, use init_MUTEX() */ static DECLARE_MUTEX(mysem); down(&mysem);

/* Acquire the semaphore */

/* ... Critical Section code ... */ up(&mysem);

/* Release the semaphore */

Sem aphores can be configured t o allow a predet erm ined num ber of t hreads int o t he crit ical sect ion sim ult aneously. However, sem aphores t hat perm it m ore t han a single holder at a t im e are rarely used.

Ca se 1 : Pr oce ss Con t e x t , UP M a ch in e , N o Pr e e m pt ion This is t he sim plest case and needs no locking, so we won't discuss t his furt her.

Ca se 2 : Pr oce ss a n d I n t e r r u pt Con t e x t s, UP M a ch in e , N o Pr e e m pt ion I n t his case, you need t o disable only int errupt s t o prot ect t he crit ical region. To see why, assum e t hat A and B are process cont ext t hreads, and C is an int errupt cont ext t hread, all vying t o ent er t he sam e crit ical sect ion, as shown in Figure 2.4.

Figu r e 2 .4 . Pr oce ss a n d in t e r r u pt con t e x t t h r e a ds in side a cr it ica l se ct ion .

Because Thread C is execut ing in int errupt cont ext and always runs t o com plet ion before yielding t o Thread A or Thread B, it need not worry about prot ect ion. Thread A, for it s part , need not be concerned about Thread B ( and vice versa) because t he kernel is not preem pt ible. Thus, Thread A and Thread B need t o guard against only t he possibilit y of Thread C st om ping t hrough t he crit ical sect ion while t hey are inside t he sam e sect ion. They achieve t his by disabling int errupt s prior t o ent ering t he crit ical sect ion: Point A: local_irq_disable(); /* Disable Interrupts in local CPU */ /* ... Critical Section ... */ local_irq_enable(); /* Enable Interrupts in local CPU */

However, if int errupt s were already disabled when execut ion reached Point A, local_irq_enable() creat es t he unpleasant side effect of reenabling int errupt s, rat her t han rest oring int errupt st at e. This can be fixed as follows: unsigned long flags; Point A: local_irq_save(flags); /* Disable Interrupts */ /* ... Critical Section ... */ local_irq_restore(flags); /* Restore state to what it was at Point A */

This works correct ly irrespect ive of t he int errupt st at e at Point A.

Ca se 3 : Pr oce ss a n d I n t e r r u pt Con t e x t s, UP M a ch in e , Pr e e m pt ion I f preem pt ion is enabled, m ere disabling of int errupt s won't prot ect your crit ical region from being t ram pled over. There is t he possibilit y of m ult iple t hreads sim ult aneously ent ering t he crit ical sect ion in process cont ext . Referring back t o Figure 2.4 in t his scenario, Thread A and Thread B now need t o prot ect t hem selves against each ot her in addit ion t o guarding against Thread C. The solut ion apparent ly, is t o disable kernel preem pt ion before t he st art of t he crit ical sect ion and reenable it at t he end, in addit ion t o disabling/ reenabling int errupt s.

For t his, Thread A and Thread B use t he irq variant of spinlocks: unsigned long flags; Point A: /* Save interrupt state. * Disable interrupts - this implicitly disables preemption */ spin_lock_irqsave(&mylock, flags); /* ... Critical Section ... */ /* Restore interrupt state to what it was at Point A */ spin_unlock_irqrestore(&mylock, flags);

Preem pt ion st at e need not be explicit ly rest ored t o what it was at Point A because t he kernel int ernally does t hat for you via a variable called t he preem pt ion count er. The count er get s increm ent ed whenever preem pt ion is disabled ( using preempt_disable()) and get s decrem ent ed whenever preem pt ion is enabled ( using preempt_enable()) . Preem pt ion kicks in only if t he count er value is zero.

Ca se 4 : Pr oce ss a n d I n t e r r u pt Con t e x t s, SM P M a ch in e , Pr e e m pt ion Let 's now assum e t hat t he crit ical sect ion execut es on an SMP m achine. Your kernel has been configured wit h CONFIG_SMP and CONFIG_PREEMPT t urned on. I n t he scenarios discussed t his far, spinlock prim it ives have done lit t le m ore t han enable/ disable preem pt ion and int errupt s. The act ual locking funct ionalit y has been com piled away. I n t he presence of SMP, t he locking logic get s com piled in, and t he spinlock prim it ives are rendered SMP- safe. The SMP- enabled sem ant ics is as follows: unsigned long flags; Point A: /* - Save interrupt state on the local CPU - Disable interrupts on the local CPU. This implicitly disables preemption. - Lock the section to regulate access by other CPUs */ spin_lock_irqsave(&mylock, flags); /* ... Critical Section ... */ /* - Restore interrupt state and preemption to what it was at Point A for the local CPU - Release the lock */ spin_unlock_irqrestore(&mylock, flags);

On SMP syst em s, only int errupt s on t he local CPU are disabled when a spinlock is acquired. So, a process cont ext t hread ( say Thread A in Figure 2.4) m ight be running on one CPU, while an int errupt handler ( say Thread C in Figure 2.4) is execut ing on anot her CPU. An int errupt handler on a nonlocal processor t hus needs t o spin- wait unt il t he process cont ext code on t he local processor exit s t he crit ical sect ion. The int errupt cont ext code calls spin_lock()/ spin_unlock() t o do t his:

spin_lock(&mylock); /* ... Critical Section ... */ spin_unlock(&mylock);

Sim ilar t o t he irq variant s, spinlocks also have bot t om half ( BH) flavors. spin_lock_bh() disables bot t om halves when t he lock is acquired, whereas spin_unlock_bh() reenables bot t om halves when t he lock is released. We discuss bot t om halves in Chapt er 4 .

The –rt t ree The real t im e (-rt) t ree, also called t he CONFIG_PREEMPT_RT pat ch- set , im plem ent s low- lat ency m odificat ions t o t he kernel. The pat ch- set , downloadable from www.kernel.org/ pub/ linux/ kernel/ proj ect s/ rt , allows m ost of t he kernel t o be preem pt ed, part ly by replacing m any spinlocks wit h m ut exes. I t also incorporat es high- resolut ion t im ers. Several -rt feat ures have been int egrat ed int o t he m ainline kernel. You will find det ailed docum ent at ion at t he proj ect 's wiki host ed at ht t p: / / rt .wiki.kernel.org/ .

The kernel has specialized locking prim it ives in it s repert oire t hat help im prove perform ance under specific condit ions. Using a m ut ual- exclusion schem e t ailored t o your needs m akes your code m ore powerful. Let 's t ake a look at som e of t he specialized exclusion m echanism s.

At om ic Ope r a t or s At om ic operat ors are used t o perform light weight one- shot operat ions such as bum ping count ers, condit ional increm ent s, and set t ing bit posit ions. At om ic operat ions are guarant eed t o be serialized and do not need locks for prot ect ion against concurrent access. The im plem ent at ion of at om ic operat ors is archit ect ure- dependent . To check whet her t here are any rem aining dat a references before freeing a kernel net work buffer ( called an skbuff) , t he skb_release_data() rout ine defined in net / core/ skbuff.c does t he following: if (!skb->cloned || /* Atomically decrement and check if the returned value is zero */ !atomic_sub_return(skb->nohdr ? (1 dataref)) { /* ... */ kfree(skb->head); }

While skb_release_data() is t hus execut ing, anot her t hread using skbuff_clone() ( defined in t he sam e file) m ight be sim ult aneously increm ent ing t he dat a reference count er: /* ... */ /* Atomically bump up the data reference count */ atomic_inc(&(skb_shinfo(skb)->dataref)); /* ... */

The use of at om ic operat ors prot ect s t he dat a reference count er from being t ram pled by t hese t wo t hreads. I t

also elim inat es t he hassle of using locks t o prot ect a single int eger variable from concurrent access. The kernel also support s operat ors such as set_bit(), clear_bit(), and test_and_set_bit() t o at om ically engage in bit m anipulat ions. Look at include/ asm - your- arch/ at om ic.h for t he at om ic operat ors support ed on your archit ect ure.

Re a de r - W r it e r Lock s Anot her specialized concurrency regulat ion m echanism is a reader- writ er variant of spinlocks. I f t he usage of a crit ical sect ion is such t hat separat e t hreads eit her read from or writ e t o a shared dat a st ruct ure, but don't do bot h, t hese locks are a nat ural fit . Mult iple reader t hreads are allowed inside a crit ical region sim ult aneously. Reader spinlocks are defined as follows: rwlock_t myrwlock = RW_LOCK_UNLOCKED; read_lock(&myrwlock); /* Acquire reader lock */ /* ... Critical Region ... */ read_unlock(&myrwlock); /* Release lock */

However, if a writ er t hread ent ers a crit ical sect ion, ot her reader or writ er t hreads are not allowed inside. To use writ er spinlocks, you would writ e t his: rwlock_t myrwlock = RW_LOCK_UNLOCKED; write_lock(&myrwlock); /* Acquire writer lock */ /* ... Critical Region ... */ write_unlock(&myrwlock); /* Release lock */

Look at t he I PX rout ing code present in net / ipx/ ipx_rout e.c for a real- life exam ple of a reader- writ er spinlock. A reader- writ er lock called ipx_routes_lock prot ect s t he I PX rout ing t able from sim ult aneous access. Threads t hat need t o look up t he rout ing t able t o forward packet s request reader locks. Threads t hat need t o add or delet e ent ries from t he rout ing t able acquire writ er locks. This im proves perform ance because t here are usually far m ore inst ances of rout ing t able lookups t han rout ing t able updat es. Like regular spinlocks, reader- writ er locks also have corresponding irq variant s—nam ely, read_lock_irqsave(), read_lock_irqrestore(), write_lock_irqsave(), and write_lock_irqrestore(). The sem ant ics of t hese funct ions are sim ilar t o t hose of regular spinlocks. Sequence locks or seqlocks, int roduced in t he 2.6 kernel, are reader- writ er locks where writ ers are favored over readers. This is useful if writ e operat ions on a variable far out num ber read accesses. An exam ple is t he jiffies_64 variable discussed earlier in t his chapt er. Writ er t hreads do not wait for readers who m ay be inside a crit ical sect ion. Because of t his, reader t hreads m ay discover t hat t heir ent ry inside a crit ical sect ion has failed and m ay need t o ret ry: u64 get_jiffies_64(void) /* Defined in kernel/time.c */ { unsigned long seq; u64 ret; do { seq = read_seqbegin(&xtime_lock); ret = jiffies_64; } while (read_seqretry(&xtime_lock, seq)); return ret; }

Writ ers prot ect crit ical regions using write_seqlock() and write_sequnlock(). The 2.6 kernel int roduced anot her m echanism called Read- Copy Updat e ( RCU) , which yields im proved perform ance when readers far out num ber writ ers. The basic idea is t hat reader t hreads can execut e wit hout locking. Writ er t hreads are m ore com plex. They perform updat e operat ions on a copy of t he dat a st ruct ure and replace t he point er t hat readers see. The original copy is m aint ained unt il t he next cont ext swit ch on all CPUs t o ensure com plet ion of all ongoing read operat ions. Be aware t hat using RCU is m ore involved t han using t he prim it ives discussed t hus far and should be used only if you are sure t hat it 's t he right t ool for t he j ob. RCU dat a st ruct ures and int erface funct ions are defined in include/ linux/ rcupdat e.h. There is am ple docum ent at ion in Docum ent at ion/ RCU/ * . For an RCU usage exam ple, look at fs/ dcache.c. On Linux, each file is associat ed wit h direct ory ent ry inform at ion ( st ored in a st ruct ure called dent ry) , m et adat a inform at ion ( st ored in an inode) , and act ual dat a ( st ored in dat a blocks) . Each t im e you operat e on a file, t he com ponent s in t he file pat h are parsed, and t he corresponding dent ries are obt ained. The dent ries are kept cached in a dat a st ruct ure called t he dcache, t o speed up fut ure operat ions. At any t im e, t he num ber of dcache lookups is m uch m ore t han dcache updat es, so references t o t he dcache are prot ect ed using RCU prim it ives.

D e bu ggin g Concurrency- relat ed problem s are generally hard t o debug because t hey are usually difficult t o reproduce. I t 's a good idea t o enable SMP ( CONFIG_SMP) and preem pt ion ( CONFIG_PREEMPT) while com piling and t est ing your code, even if your product ion kernel is going t o run on a UP m achine wit h preem pt ion disabled. There is a kernel configurat ion opt ion under Kernel hacking called Spinlock and rw- lock debugging ( CONFIG_DEBUG_SPINLOCK) t hat can help you cat ch som e com m on spinlock errors. Also available are t ools such as lockm et er (ht t p: / / oss.sgi.com / proj ect s/ lockm et er/ ) t hat collect lock- relat ed st at ist ics. A com m on concurrency problem occurs when you forget t o lock an access t o a shared resource. This result s in different t hreads " racing" t hrough t hat access in an unregulat ed m anner. The problem , called a race condit ion, m ight m anifest in t he form of occasional st range code behavior. Anot her pot ent ial problem arises when you m iss releasing held locks in cert ain code pat hs, result ing in deadlocks. To underst and t his, consider t he following exam ple: spin_lock(&mylock);

/* Acquire lock */

/* ... Critical Section ... */ if (error) { /* This error condition occurs rarely */ return -EIO; /* Forgot to release the lock! */ } spin_unlock(&mylock);

/* Release lock */

Aft er t he occurrence of t he error condit ion, any t hread t rying t o acquire mylock get s deadlocked, and t he kernel m ight freeze. I f t he problem first m anifest s m ont hs or years aft er you writ e t he code, it 'll be all t he m ore t ough t o go back and debug it . ( There is a relat ed debugging exam ple in t he sect ion " Kdum p " in Chapt er 21, " Debugging Device Drivers." ) To avoid such unpleasant encount ers, concurrency logic should be designed when you archit ect your soft ware.

Pr oce ss File syst e m The process filesyst em ( procfs) is a virt ual filesyst em t hat creat es windows int o t he innards of t he kernel. The dat a you see when you browse procfs is generat ed by t he kernel on- t he- fly. Files in procfs are used t o configure kernel param et ers, look at kernel st ruct ures, glean st at ist ics from device drivers, and get general syst em inform at ion. Procfs is a pseudo filesyst em . This m eans t hat files resident in procfs are not associat ed wit h physical st orage devices such as hard disks. I nst ead, dat a in t hose files is dynam ically creat ed on dem and by t he corresponding ent ry point s in t he kernel. Because of t his, file sizes in procfs get shown as zero. Procfs is usually m ount ed under t he / proc direct ory during kernel boot ; you can see t his by invoking t he mount com m and. To get a first feel of t he capabilit ies of procfs, exam ine t he cont ent s of / proc/ cpuinfo, / proc/ m em info, / proc/ int errupt s, / proc/ t t y/ driver/ serial, / proc/ bus/ usb/ devices, and / proc/ st at . Cert ain kernel param et ers can be changed at runt im e by writ ing t o files under / proc/ sys/ . For exam ple, you can change kernel printk log levels by echoing a new set of values t o / proc/ sys/ kernel/ print k. Many ut ilit ies ( such as ps) and syst em perform ance m onit oring t ools ( such as sysst at ) int ernally derive inform at ion from files residing under / proc. Seq files, int roduced in t he 2.6 kernel, sim plify large procfs operat ions. They are described in Appendix C, " Seq Files."

Alloca t in g M e m or y Som e device drivers have t o be aware of t he exist ence of m em ory zones. I n addit ion, m any drivers need t he services of m em ory- allocat ion funct ions. I n t his sect ion, let 's briefly discuss bot h. The kernel organizes physical m em ory int o pages. The page size depends on t he archit ect ure. On x86- based m achines, it 's 4096 byt es. Each page in physical m em ory has a struct page ( defined in include/ linux/ m m _t ypes.h) associat ed wit h it : struct page { unsigned long flags; /* Page status */ atomic_t _count; /* Reference count */ /* ... */ void * virtual; /* Explained later on */ };

On 32- bit x86 syst em s, t he default kernel configurat ion split s t he available 4GB address space int o a 3GB virt ual m em ory space for user processes and a 1GB space for t he kernel, as shown in Figure 2.5. This im poses a 1GB lim it on t he am ount of physical m em ory t hat t he kernel can handle. I n realit y, t he lim it is 896MB because 128MB of t he address space is occupied by kernel dat a st ruct ures. You m ay increase t his lim it by changing t he 3GB/ 1GB split during kernel configurat ion, but you will incur t he displeasure of m em ory- int ensive applicat ions if you reduce t he virt ual address space of user processes.

Figu r e 2 .5 . D e fa u lt a ddr e ss spa ce split on a 3 2 - bit PC syst e m .

Kernel addresses t hat m ap t he low 896MB differ from physical addresses by a const ant offset and are called logical addresses. Wit h " high m em ory" support , t he kernel can access m em ory beyond 896MB by generat ing virt ual addresses for t hose regions using special m appings. All logical addresses are kernel virt ual addresses, but not vice versa.

This leads us t o t he following kernel m em ory zones:

1 . ZONE_DMA ( < 16MB) , t he zone used for Direct Mem ory Access ( DMA) . Because legacy I SA devices have 24 address lines and can access only t he first 16MB, t he kernel t ries t o dedicat e t his area for such devices.

2 . ZONE_NORMAL ( 16MB t o 896MB) , t he norm ally addressable region, also called low m em ory. The " virt ual" field in struct page for low m em ory pages cont ains t he corresponding logical addresses.

3 . ZONE_HIGH ( > 896MB) , t he space t hat t he kernel can access only aft er m apping resident pages t o regions in ZONE_NORMAL ( using kmap() and kunmap()) . The corresponding kernel addresses are virt ual and not logical. The " virt ual" field in struct page for high m em ory pages point s t o NULL if t he page is not km apped.

kmalloc() is a m em ory- allocat ion funct ion t hat ret urns cont iguous m em ory from ZONE_NORMAL. The prot ot ype is as follows: void *kmalloc(int count, int flags);

Where count is t he num ber of byt es t o allocat e, and flags is a m ode specifier. All support ed flags are list ed in include/ linux./ gfp.h ( gfp st ands for get free pages) , but t hese are t he com m only used ones:

1 . GFP_KERNEL Used by process cont ext code t o allocat e m em ory. I f t his flag is specified, kmalloc() is allowed t o go t o sleep and wait for pages t o get freed up.

2 . GFP_ATOMIC Used by int errupt cont ext code t o get hold of m em ory. I n t his m ode, kmalloc() is not allowed t o sleep- wait for free pages, so t he probabilit y of successful allocat ion wit h GFP_ATOMIC is lower t han wit h GFP_KERNEL.

Because m em ory ret urned by kmalloc() ret ains t he cont ent s from it s previous incarnat ion, t here could be a securit y risk if it 's exposed t o user space. To get zeroed km alloced m em ory, use kzalloc(). I f you need t o allocat e large m em ory buffers, and you don't require t he m em ory t o be physically cont iguous, use vmalloc() rat her t han kmalloc(): void *vmalloc(unsigned long count);

Here count is t he request ed allocat ion size. The funct ion ret urns kernel virt ual addresses. vmalloc() enj oys bigger allocat ion size lim it s t han kmalloc() but is slower and can't be called from int errupt cont ext . Moreover, you cannot use t he physically discont iguous m em ory ret urned by vmalloc() t o perform Direct Mem ory Access ( DMA) . High- perform ance net work drivers com m only use vmalloc() t o allocat e large descript or rings when t he device is opened. The kernel offers m ore sophist icat ed m em ory allocat ion t echniques. These include look aside buffers, slabs, and m em pools, which are beyond t he scope of t his chapt er.

Look in g a t t h e Sou r ce s Kernel boot st art s wit h t he execut ion of real m ode assem bly code living in t he arch/ x86/ boot / direct ory. Look at arch/ x86/ kernel/ set up_32.c t o see how t he prot ect ed m ode kernel obt ains inform at ion gleaned by t he real m ode kernel. The first boot m essage is print ed by code residing in init / m ain.c. Dig inside init / calibrat e.c t o learn m ore about BogoMI PS calibrat ion and include/ asm - your- arch/ bugs.h for an insight int o archit ect ure- dependent checks. Tim er services in t he kernel consist of archit ect ure- dependent port ions t hat live in arch/ your- arch/ kernel/ and generic port ions im plem ent ed in kernel/ t im er.c. For relat ed definit ions, look at t he header files, include/ linux/ t im e* .h. jiffies is defined in linux/ j iffies.h. The value for HZ is processor- dependent and can be found in include/ asm your- arch/ param .h. Mem ory m anagem ent sources reside in t he t op- level m m / direct ory. Table 2.1 cont ains a sum m ary of t he m ain dat a st ruct ures used in t his chapt er and t he locat ion of t heir definit ions in t he source t ree. Table 2.2 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 2 .1 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

HZ

include/ asm - your- arch/ param .h

Num ber of t im es t he syst em t im er t icks in 1 second

loops_per_jiffy init / m ain.c

Num ber of t im es t he processor execut es an int ernal delay- loop in 1 j iffy

timer_list

include/ linux/ t im er.h

Used t o hold t he address of a rout ine t hat you want t o execut e at som e point in t he fut ure

timeval

include/ linux/ t im e.h

Tim est am p

spinlock_t

include/ linux/ spinlock_t ypes.h

A busy- locking m echanism t o ensure t hat only a single t hread ent ers a crit ical sect ion

semaphore

include/ asm - yourarch/ sem aphore.h

A sleep- locking m echanism t hat allows a predet erm ined num ber of users t o ent er a crit ical sect ion

mutex

include/ linux/ m ut ex.h

The new int erface t hat replaces semaphore

rwlock_t

include/ linux/ spinlock_t ypes.h

Reader- writ er spinlock

page

include/ linux/ m m _t ypes.h

Kernel's represent at ion of a physical m em ory page

Ta ble 2 .2 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

time_after() time_after_eq() time_before() ime_before_eq()

include/ linux/ j iffies.h

Com pares t he current value of jiffies wit h a specified fut ure value

schedule_timeout()

ker nel/ t im er .c

Schedules a process t o run aft er a specified t im eout has elapsed

wait_event_timeout()

include/ linux/ wait .h

Resum es execut ion if a specified condit ion becom es t rue or if a t im eout occurs

DEFINE_TIMER()

include/ linux/ t im er.h

St at ically defines a t im er

init_timer()

ker nel/ t im er .c

Dynam ically defines a t im er

add_timer()

include/ linux/ t im er.h

Schedules t he t im er for execut ion aft er t he t im eout has elapsed

mod_timer()

ker nel/ t im er .c

Changes t im er expirat ion

timer_pending()

include/ linux/ t im er.h

Checks if a t im er is pending at t he m om ent

udelay()

include/ asm - yourarch/ delay.h arch/ yourarch/ lib/ delay.c

Busy- wait s for t he specified num ber of m icroseconds

rdtsc()

include/ asm - x86/ m sr.h

Get s t he value of t he TSC on Pent ium - com pat ible pr ocessor s

do_gettimeofday()

kernel/ t im e.c

Obt ains wall t im e

local_irq_disable()

include/ asm - yourarch/ syst em .h

Disables int errupt s on t he local CPU

local_irq_enable()

include/ asm - yourarch/ syst em .h

Enables int errupt s on t he local CPU

local_irq_save()

include/ asm - yourarch/ syst em .h

Saves int errupt st at e and disables int errupt s

local_irq_restore()

include/ asm - yourarch/ syst em .h

Rest ores int errupt st at e t o what it was when t he m at ching local_irq_save() was called

spin_lock()

include/ linux/ spinlock.h kernel/ spinlock.c

Acquires a spinlock.

spin_unlock()

include/ linux/ spinlock.h

Releases a spinlock

spin_lock_irqsave()

include/ linux/ spinlock.h kernel/ spinlock.c

Saves int errupt st at e, disables int errupt s and preem pt ion on local CPU, and locks t heir crit ical sect ion t o regulat e access by ot her CPUs

spin_unlock_irqrestore() include/ linux/ spinlock.h kernel/ spinlock.c

Rest ores int errupt st at e and preem pt ion and releases t he lock

DEFINE_MUTEX()

include/ linux/ m ut ex.h

St at ically declares a m ut ex

mutex_init()

include/ linux/ m ut ex.h

Dynam ically declares a m ut ex

mutex_lock()

kernel/ m ut ex.c

Acquires a m ut ex

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

mutex_unlock()

kernel/ m ut ex.c

Releases a m ut ex

DECLARE_MUTEX()

include/ asm - yourarch/ sem aphore.h

St at ically declares a sem aphore

init_MUTEX()

include/ asm - yourarch/ sem aphore.h

Dynam ically declares a sem aphore

up()

arch/ yourAcquires a sem aphore arch/ kernel/ sem aphore.c

down()

arch/ yourReleases a sem aphore arch/ kernel/ sem aphore.c

atomic_inc() atomic_inc_and_test() atomic_dec() atomic_dec_and_test() clear_bit() set_bit() test_bit() test_and_set_bit()

include/ asm - yourarch/ at om ic.h

At om ic operat ors t o perform light weight oper at ions

read_lock() read_unlock() read_lock_irqsave() read_lock_irqrestore() write_lock() write_unlock() write_lock_irqsave() write_lock_irqrestore()

include/ linux/ spinlock.h kernel/ spinlock.c

Reader- writ er variant of spinlocks

down_read() up_read() down_write() up_write()

kernel/ rwsem .c

Reader- writ er variant of sem aphores

read_seqbegin() read_seqretry() write_seqlock() write_sequnlock()

include/ linux/ seqlock.h

Seqlock operat ions

kmalloc()

include/ linux/ slab.h m m / slab.c

Allocat es physically cont iguous m em ory from ZONE_NORMAL

kzalloc()

include/ linux/ slab.h m m / ut il.c

Obt ains zeroed km alloced m em ory

kfree()

m m / slab.c

Releases km alloced m em ory

vmalloc()

m m / vm alloc.c

Allocat es virt ually cont iguous m em ory t hat is not guarant eed t o be physically cont iguous.

Ch a pt e r 3 . Ke r n e l Fa cilit ie s I n Th is Ch a pt e r 56 Kernel Threads 65 Helper I nt erfaces 85 Looking at t he Sources

I n t his chapt er, let 's look at som e kernel facilit ies t hat are useful com ponent s in a driver developer's t oolbox. We st art t his chapt er by looking at a kernel facilit y t hat is sim ilar t o user processes; kernel t hreads are program m ing abst ract ions orient ed t oward concurrent processing. The kernel offers several helper int erfaces t hat sim plify your code, elim inat e redundancies, increase code readabilit y, and help in long- t erm m aint enance. We will look at linked list s, hash list s, work queues, not ifier chains, com plet ion funct ions, and error- handling aids. These helpers are bug free and opt im ized, so your driver also inherit s t hose benefit s for free.

Ke r n e l Th r e a ds A kernel t hread is a way t o im plem ent background t asks inside t he kernel. The t ask can be busy handling asynchronous event s or sleep- wait ing for an event t o occur. Kernel t hreads are sim ilar t o user processes, except t hat t hey live in kernel space and have access t o kernel funct ions and dat a st ruct ures. Like user processes, kernel t hreads have t he illusion of m onopolizing t he processor because of preem pt ive scheduling. Many device drivers ut ilize t he services of kernel t hreads t o im plem ent assist ant or helper t asks. For exam ple, t he khubd kernel t hread, which is part of t he Linux USB driver core ( covered in Chapt er 11, " Universal Serial Bus" ) m onit ors USB hubs and configures USB devices when t hey are hot - plugged int o t he syst em .

Cr e a t in g a Ke r n e l Th r e a d Let 's learn about kernel t hreads wit h t he help of an exam ple. While developing t he exam ple t hread, you will also learn about kernel concept s such as process st at es, wait queues, and user m ode helpers. When you are com fort able wit h kernel t hreads, you can use t hem as a t est vehicle for carrying out various experim ent s wit hin t he kernel. Assum e t hat you would like t he kernel t o asynchronously invoke a user m ode program t o send you an em ail or pager alert , whenever it senses t hat t he healt h of cert ain key kernel dat a st ruct ures is det eriorat ing. ( For inst ance, free space in net work receive buffers has dipped below a low wat erm ark.)

This is a candidat e for being im plem ent ed as a kernel t hread for t he following reasons:

I t 's a background t ask because it has t o wait for asynchronous event s.

I t needs access t o kernel dat a st ruct ures because t he act ual det ect ion of event s is done by ot her part s of t he kernel.

I t has t o invoke a user m ode helper program , which is a t im e- consum ing operat ion.

Built - I n Kernel Threads To see t he kernel t hreads ( also called kernel processes) running on your syst em , run t he ps com m and. Nam es of kernel t hreads are surrounded by square bracket s: bash> ps -ef UID PID root 1 root 2 root 3 root 4 root 38 root 39 root 29 root 695 ... root 3914 root 3915 ... root 4015 root 4066

PPID 0 0 2 2 2 2 2 2 2 2 3364 4015

C 0 0 0 0 0 0 0 0

STIME 22:36 22:36 22:36 22:36 22:36 22:36 22:36 22:36

TTY ? ? ? ? ? ? ? ?

TIME 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00

CMD init [3] [kthreadd] [ksoftirqd/0] [events/0] [pdflush] [pdflush] [khubd] [kjournald]

0 22:37 ? 0 22:37 ?

00:00:00 [nfsd] 00:00:00 [nfsd]

0 22:55 tty3 0 22:59 tty3

00:00:00 -bash 00:00:00 ps -ef

The [ ksoft irqd/ 0] kernel t hread is an aid t o im plem ent soft irqs. Soft irqs are raised by int errupt handlers t o request " bot t om half" processing of port ions of t he handler whose execut ion can be deferred. We t ake a det ailed look at bot t om halves and soft irqs in Chapt er 4 , " Laying t he Groundwork," but t he basic idea here is t o allow as lit t le code as possible t o be present inside int errupt handlers. Having sm all int errupt handlers reduces int errupt - off t im es in t he syst em , result ing in lower lat encies. Ksoft irqd's j ob is t o ensure t hat a high load of soft irqs neit her st arves t he soft irqs nor overwhelm s t he syst em . On Sym m et ric Mult i Processing ( SMP) m achines where m ult iple t hread inst ances can run on different processors in parallel, one inst ance of ksoft irqd is creat ed per CPU t o im prove t hroughput (ksoft irqd/ n, where n is t he CPU num ber) . The event s/ n t hreads ( where n is t he CPU num ber) help im plem ent work queues, which are anot her way of deferring work in t he kernel. Part s of t he kernel t hat desire deferred execut ion of work can eit her creat e t heir own work queue or m ake use of t he default event s/ n worker t hread. Work queues are also dissect ed in Chapt er 4 . The t ask of t he pdflush kernel t hread is t o flush out dirt y pages from t he page cache. The page cache buffers accesses t o t he disk. To im prove perform ance, act ual writ es t o t he disk are delayed unt il t he pdflush daem on writ es out dirt ied dat a t o disk. This is done if t he available free m em ory dips below a t hreshold, or if t he page has rem ained dirt y for a sufficient ly long t im e. I n 2.4 kernels, t hese t wo t asks were respect ively perform ed by separat e kernel t hreads, bdflush and

kupdat ed. You m ight have not iced t wo inst ances of pdflush in t he ps out put . A new inst ance is creat ed if t he kernel senses t hat exist ing inst ances have t heir hands full, servicing disk queues. This im proves t hroughput , especially if your syst em has m ult iple disks and m any of t hem are busy. As you saw in t he preceding chapt er, kj ournald is t he generic kernel j ournaling t hread, which is used by filesyst em s such as EXT3. The Linux Net work File Syst em ( NFS) server is im plem ent ed using a set of kernel t hreads nam ed nfsd .

Our exam ple kernel t hread relinquishes t he processor unt il it get s woken up by part s of t he kernel responsible for m onit oring t he dat a st ruct ures of int erest . When awake, it invokes a user m ode helper program and passes appropriat e ident it y codes in it s environm ent . To creat e a kernel t hread, use kernel_thread(): ret = kernel_thread(mykthread, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);

The flags specify t he resources t o be shared bet ween t he parent and child t hreads. CLONE_FILES specifies t hat open files are t o be shared, and CLONE_SIGHAND request s t hat signal handlers be shared. List ing 3.1 shows t he exam ple im plem ent at ion. Because kernel t hreads usually act as helpers t o device drivers, t hey are creat ed when t he driver is init ialized. I n t his case, however, t he exam ple t hread can be creat ed from any suit able place, for inst ance, init / m ain.c. The t hread st art s by invoking daemonize(), which perform s init ial housekeeping and changes t he parent of t he calling t hread t o a kernel t hread called kt hreadd. Each Linux t hread has a single parent . I f a parent process dies wit hout wait ing for it s child t o exit , t he child becom es a zom bie process and wast es resources. Reparent ing t he child t o kt hreadd, avoids t his and ensures proper cleanup when t he t hread exit s. [ 1]

[ 1]

I n 2.6.21 and earlier kernels, daemonize() reparent s t he calling t hread t o t he init t ask by calling reparent_to_init().

Because daemonize() blocks all signals by default , use allow_signal() t o enable delivery if your t hread desires t o handle a part icular signal. There are no signal handlers inside t he kernel, so use signal_pending() t o check for signals and t ake appropriat e act ion. For debugging purposes, t he code in List ing 3.1 request s delivery of SIGKILL and dies if it 's received. kernel_thread() is depreciat ed in favor of t he higher- level kt hread API , which is built over t he form er. We will look at kt hreads lat er on. List in g 3 .1 . I m ple m e n t in g a Ke r n e l Th r e a d

Code View: static DECLARE_WAIT_QUEUE_HEAD(myevent_waitqueue); rwlock_t myevent_lock; extern unsigned int myevent_id; /* Holds the identity of the troubled data structure. Populated later on */ static int mykthread(void *unused) { unsigned int event_id = 0; DECLARE_WAITQUEUE(wait, current); /* Become a kernel thread without attached user resources */ daemonize("mykthread"); /* Request delivery of SIGKILL */ allow_signal(SIGKILL); /* The thread sleeps on this wait queue until it's woken up by parts of the kernel in charge of sensing the health of data structures of interest */ add_wait_queue(&myevent_waitqueue, &wait); for (;;) { /* Relinquish the processor until the event occurs */ set_current_state(TASK_INTERRUPTIBLE); schedule(); /* Allow other parts of the kernel to run */ /* Die if I receive SIGKILL */ if (signal_pending(current)) break; /* Control gets here when the thread is woken up */ read_lock(&myevent_lock); /* Critical section starts */ if (myevent_id) { /* Guard against spurious wakeups */ event_id = myevent_id; read_unlock(&myevent_lock); /* Critical section ends */ /* Invoke the registered user mode helper and pass the identity code in its environment */ run_umode_handler(event_id); /* Expanded later on */ } else { read_unlock(&myevent_lock); } } set_current_state(TASK_RUNNING); remove_wait_queue(&myevent_waitqueue, &wait); return 0; }

I f you com pile and run t his as part of t he kernel, you can see t he newly creat ed t hread, m ykt hread, in t he ps out put : bash> ps -ef UID root root ... root ...

PID 1 2

PPID C STIME TTY 0 0 21:56 ? 1 0 22:36 ?

111

1 0 21:56 ?

TIME CMD 00:00:00 init [3] 00:00:00 [ksoftirqd/0] 00:00:00 [mykthread]

Before we delve furt her int o t he t hread im plem ent at ion, let 's writ e a code snippet t hat m onit ors t he healt h of a dat a st ruct ure of int erest and awakens m ykt hread if a problem condit ion is det ect ed: /* Executed by parts of the kernel that own the data structures whose health you want to monitor */ /* ... */ if (my_key_datastructure looks troubled) { write_lock(&myevent_lock); /* Serialize */ /* Fill in the identity of the data structure */ myevent_id = datastructure_id; write_unlock(&myevent_lock); /* Wake up mykthread */ wake_up_interruptible(&myevent_waitqueue); } /* ... */

List ing 3.1 execut es in process cont ext , whereas t he preceding snippet runs from eit her process or int errupt cont ext . Process and int errupt cont ext s com m unicat e via kernel dat a st ruct ures. Our exam ple uses myevent_id and myevent_waitqueue for t his com m unicat ion. myevent_id cont ains t he ident it y of t he dat a st ruct ure in t rouble. Access t o myevent_id is serialized using locks. Not e t hat kernel t hreads are preem pt ible only if CONFIG_PREEMPT is t urned on at com pile t im e. I f CONFIG_PREEMPT is off, or if you are st ill running a 2.4 kernel wit hout t he preem pt ion pat ch, your t hread will freeze t he syst em if it does not go t o sleep. I f you com m ent out schedule() in List ing 3.1 and disable CONFIG_PREEMPT in your kernel configurat ion, your syst em will lock up. You will learn how t o obt ain soft real- t im e responses from kernel t hreads when we discuss scheduling policies in Chapt er 19, " Drivers in User Space."

Pr oce ss St a t e s a n d W a it Qu e u e s Here's t he code region from List ing 3.1 t hat put s m ykt hread t o sleep while wait ing for event s: add_wait_queue(&myevent_waitqueue, &wait); for (;;) { /* ... */ set_current_state(TASK_INTERRUPTIBLE); schedule(); /* Relinquish the processor */ /* Point A */ /* ... */ } set_current_state(TASK_RUNNING); remove_wait_queue(&myevent_waitqueue, &wait);

The operat ion of t he preceding snippet is based on t wo concept s: wait queues and process st at es.

Wait queues hold t hreads t hat need t o wait for an event or a syst em resource. Threads in a wait queue go t o sleep unt il t hey are woken up by anot her t hread or an int errupt handler t hat is responsible for det ect ing t he event . Queuing and dequeuing are respect ively done using add_wait_queue() and remove_wait_queue(), and waking up queued t asks is accom plished via wake_up_interruptible(). A kernel t hread ( or a norm al process) can be in any of t he following process st at es: running, int errupt ible, unint errupt ible, zom bie, st opped, t raced, or dead. These st at es are defined in include/ linux / sched.h :

A process in t he running st at e ( TASK_RUNNING) is in t he scheduler run queue and is a candidat e for get t ing CPU t im e allot t ed by t he scheduler.

A t ask in t he in t e r r u pt ible st at e ( TASK_INTERRUPTIBLE) is wait ing for an event t o occur and is not in t he scheduler run queue. When t he t ask get s woken up, or if a signal is delivered t o it , it re- ent ers t he run queue.

The u n in t e r r u pt ible st at e ( TASK_UNINTERRUPTIBLE) is sim ilar t o t he in t e r r u pt ible st at e except t hat receipt of a signal will not put t he t ask back int o t he run queue.

A st oppe d t ask ( TASK_STOPPED) has st opped execut ion due t o receipt of cert ain signals.

I f an applicat ion such as st race is using t he pt race support in t he kernel t o int ercept a t ask, it 'll be in t he t r a ce d st at e ( TASK_TRACED) .

A t ask in t he zom bie st at e ( EXIT_ZOMBIE) has t erm inat ed, but it s parent did not wait for t he t ask t o com plet e. An exit ing t ask is eit her in t he EXIT_ZOMBIE st at e or t he dead ( EXIT_DEAD) st at e.

You can use set_current_state() t o set t he run st at e of your kernel t hread. Let 's now t urn back t o t he preceding code snippet . m ykt hread sleeps on a wait queue (myevent_waitqueue) and changes it s st at e t o TASK_INTERRUPTIBLE, signaling it s desire t o opt out of t he scheduler run queue. The call t o schedule() asks t he scheduler t o choose and run a new t ask from it s run queue. When code responsible for healt h m onit oring wakes up m ykt hread using wake_up_interruptible(&myevent_waitqueue), t he t hread is put back int o t he scheduler run queue. The process st at e also get s sim ult aneously changed t o TASK_RUNNING, so t here is no race condit ion even if t he wake up occurs bet ween t he t im e t he t ask st at e is set t o TASK_INTERRUPTIBLE and t he t im e schedule() is called. The t hread also get s back int o t he run queue if a SIGKILL signal is delivered t o it . When t he scheduler subsequent ly picks m ykt hread from t he run queue, execut ion resum es from Point A.

Use r M ode H e lpe r s Mykt hread invokes run_umode_handler() in List ing 3.1 t o not ify user space about det ect ed event s: Code View: /* Called from Listing 3.1 */ static void run_umode_handler(int event_id) { int i = 0; char *argv[2], *envp[4], *buffer = NULL;

int value; argv[i++] = myevent_handler; /* Defined in kernel/sysctl.c */ /* Fill in the id corresponding to the data structure in trouble */ if (!(buffer = kmalloc(32, GFP_KERNEL))) return; sprintf(buffer, "TROUBLED_DS=%d", event_id); /* If no user mode handlers are found, return */ if (!argv[0]) return; argv[i] = 0; /* Prepare the environment for /path/to/helper */ i = 0; envp[i++] = "HOME=/"; envp[i++] = "PATH=/sbin:/usr/sbin:/bin:/usr/bin"; envp[i++] = buffer; envp[i] = 0; /* Execute the user mode program, /path/to/helper */ value = call_usermodehelper(argv[0], argv, envp, 0); /* Check return values */ kfree(buffer); }

The kernel support s a m echanism for request ing user m ode program s t o help perform cert ain funct ions. run_umode_handler() uses t his facilit y by invoking call_usermodehelper(). You have t o regist er t he user m ode program invoked by run_umode_handler() via a node in t he / proc/ sys/ direct ory. To do so, m ake sure t hat CONFIG_SYSCTL ( files t hat are part of t he / proc/ sys/ direct ory are collect ively known as t he sysct l int erface) is enabled in your kernel configurat ion and add an ent ry t o t he kern_table array in kernel/ sysct l.c: { .ctl_name

= KERN_MYEVENT_HANDLER, /* Define in include/linux/sysctl.h */ .procname = "myevent_handler", .data = &myevent_handler, .maxlen = 256, .mode = 0644, .proc_handler = &proc_dostring, .strategy = &sysctl_string, },

This creat es t he node / proc/ sys/ kernel/ m yevent _handler in t he process filesyst em . To regist er your user m ode helper, do t he following: bash> echo /path/to/helper > /proc/sys/kernel/myevent_handler

This result s in / pat h/ t o/ helper get t ing execut ed when m ykt hread invokes run_umode_handler().

Mykt hread passes t he ident it y of t he t roubled kernel dat a st ruct ure t o t he user m ode helper t hrough t he environm ent variable TROUBLED_DS. The helper can be a sim ple script like t he following t hat sends you an em ail alert cont aining t he inform at ion it gleaned from it s environm ent : bash> cat /path/to/helper #!/bin/bash echo Kernel datastructure $TROUBLED_DS is in trouble | mail -s Alert root

call_usermodehelper() has t o be execut ed from process cont ext and runs wit h root privileges. I t 's im plem ent ed using a work queue, which we will soon discuss.

Ch a pt e r 3 . Ke r n e l Fa cilit ie s I n Th is Ch a pt e r 56 Kernel Threads 65 Helper I nt erfaces 85 Looking at t he Sources

I n t his chapt er, let 's look at som e kernel facilit ies t hat are useful com ponent s in a driver developer's t oolbox. We st art t his chapt er by looking at a kernel facilit y t hat is sim ilar t o user processes; kernel t hreads are program m ing abst ract ions orient ed t oward concurrent processing. The kernel offers several helper int erfaces t hat sim plify your code, elim inat e redundancies, increase code readabilit y, and help in long- t erm m aint enance. We will look at linked list s, hash list s, work queues, not ifier chains, com plet ion funct ions, and error- handling aids. These helpers are bug free and opt im ized, so your driver also inherit s t hose benefit s for free.

Ke r n e l Th r e a ds A kernel t hread is a way t o im plem ent background t asks inside t he kernel. The t ask can be busy handling asynchronous event s or sleep- wait ing for an event t o occur. Kernel t hreads are sim ilar t o user processes, except t hat t hey live in kernel space and have access t o kernel funct ions and dat a st ruct ures. Like user processes, kernel t hreads have t he illusion of m onopolizing t he processor because of preem pt ive scheduling. Many device drivers ut ilize t he services of kernel t hreads t o im plem ent assist ant or helper t asks. For exam ple, t he khubd kernel t hread, which is part of t he Linux USB driver core ( covered in Chapt er 11, " Universal Serial Bus" ) m onit ors USB hubs and configures USB devices when t hey are hot - plugged int o t he syst em .

Cr e a t in g a Ke r n e l Th r e a d Let 's learn about kernel t hreads wit h t he help of an exam ple. While developing t he exam ple t hread, you will also learn about kernel concept s such as process st at es, wait queues, and user m ode helpers. When you are com fort able wit h kernel t hreads, you can use t hem as a t est vehicle for carrying out various experim ent s wit hin t he kernel. Assum e t hat you would like t he kernel t o asynchronously invoke a user m ode program t o send you an em ail or pager alert , whenever it senses t hat t he healt h of cert ain key kernel dat a st ruct ures is det eriorat ing. ( For inst ance, free space in net work receive buffers has dipped below a low wat erm ark.)

This is a candidat e for being im plem ent ed as a kernel t hread for t he following reasons:

I t 's a background t ask because it has t o wait for asynchronous event s.

I t needs access t o kernel dat a st ruct ures because t he act ual det ect ion of event s is done by ot her part s of t he kernel.

I t has t o invoke a user m ode helper program , which is a t im e- consum ing operat ion.

Built - I n Kernel Threads To see t he kernel t hreads ( also called kernel processes) running on your syst em , run t he ps com m and. Nam es of kernel t hreads are surrounded by square bracket s: bash> ps -ef UID PID root 1 root 2 root 3 root 4 root 38 root 39 root 29 root 695 ... root 3914 root 3915 ... root 4015 root 4066

PPID 0 0 2 2 2 2 2 2 2 2 3364 4015

C 0 0 0 0 0 0 0 0

STIME 22:36 22:36 22:36 22:36 22:36 22:36 22:36 22:36

TTY ? ? ? ? ? ? ? ?

TIME 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00

CMD init [3] [kthreadd] [ksoftirqd/0] [events/0] [pdflush] [pdflush] [khubd] [kjournald]

0 22:37 ? 0 22:37 ?

00:00:00 [nfsd] 00:00:00 [nfsd]

0 22:55 tty3 0 22:59 tty3

00:00:00 -bash 00:00:00 ps -ef

The [ ksoft irqd/ 0] kernel t hread is an aid t o im plem ent soft irqs. Soft irqs are raised by int errupt handlers t o request " bot t om half" processing of port ions of t he handler whose execut ion can be deferred. We t ake a det ailed look at bot t om halves and soft irqs in Chapt er 4 , " Laying t he Groundwork," but t he basic idea here is t o allow as lit t le code as possible t o be present inside int errupt handlers. Having sm all int errupt handlers reduces int errupt - off t im es in t he syst em , result ing in lower lat encies. Ksoft irqd's j ob is t o ensure t hat a high load of soft irqs neit her st arves t he soft irqs nor overwhelm s t he syst em . On Sym m et ric Mult i Processing ( SMP) m achines where m ult iple t hread inst ances can run on different processors in parallel, one inst ance of ksoft irqd is creat ed per CPU t o im prove t hroughput (ksoft irqd/ n, where n is t he CPU num ber) . The event s/ n t hreads ( where n is t he CPU num ber) help im plem ent work queues, which are anot her way of deferring work in t he kernel. Part s of t he kernel t hat desire deferred execut ion of work can eit her creat e t heir own work queue or m ake use of t he default event s/ n worker t hread. Work queues are also dissect ed in Chapt er 4 . The t ask of t he pdflush kernel t hread is t o flush out dirt y pages from t he page cache. The page cache buffers accesses t o t he disk. To im prove perform ance, act ual writ es t o t he disk are delayed unt il t he pdflush daem on writ es out dirt ied dat a t o disk. This is done if t he available free m em ory dips below a t hreshold, or if t he page has rem ained dirt y for a sufficient ly long t im e. I n 2.4 kernels, t hese t wo t asks were respect ively perform ed by separat e kernel t hreads, bdflush and

kupdat ed. You m ight have not iced t wo inst ances of pdflush in t he ps out put . A new inst ance is creat ed if t he kernel senses t hat exist ing inst ances have t heir hands full, servicing disk queues. This im proves t hroughput , especially if your syst em has m ult iple disks and m any of t hem are busy. As you saw in t he preceding chapt er, kj ournald is t he generic kernel j ournaling t hread, which is used by filesyst em s such as EXT3. The Linux Net work File Syst em ( NFS) server is im plem ent ed using a set of kernel t hreads nam ed nfsd .

Our exam ple kernel t hread relinquishes t he processor unt il it get s woken up by part s of t he kernel responsible for m onit oring t he dat a st ruct ures of int erest . When awake, it invokes a user m ode helper program and passes appropriat e ident it y codes in it s environm ent . To creat e a kernel t hread, use kernel_thread(): ret = kernel_thread(mykthread, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);

The flags specify t he resources t o be shared bet ween t he parent and child t hreads. CLONE_FILES specifies t hat open files are t o be shared, and CLONE_SIGHAND request s t hat signal handlers be shared. List ing 3.1 shows t he exam ple im plem ent at ion. Because kernel t hreads usually act as helpers t o device drivers, t hey are creat ed when t he driver is init ialized. I n t his case, however, t he exam ple t hread can be creat ed from any suit able place, for inst ance, init / m ain.c. The t hread st art s by invoking daemonize(), which perform s init ial housekeeping and changes t he parent of t he calling t hread t o a kernel t hread called kt hreadd. Each Linux t hread has a single parent . I f a parent process dies wit hout wait ing for it s child t o exit , t he child becom es a zom bie process and wast es resources. Reparent ing t he child t o kt hreadd, avoids t his and ensures proper cleanup when t he t hread exit s. [ 1]

[ 1]

I n 2.6.21 and earlier kernels, daemonize() reparent s t he calling t hread t o t he init t ask by calling reparent_to_init().

Because daemonize() blocks all signals by default , use allow_signal() t o enable delivery if your t hread desires t o handle a part icular signal. There are no signal handlers inside t he kernel, so use signal_pending() t o check for signals and t ake appropriat e act ion. For debugging purposes, t he code in List ing 3.1 request s delivery of SIGKILL and dies if it 's received. kernel_thread() is depreciat ed in favor of t he higher- level kt hread API , which is built over t he form er. We will look at kt hreads lat er on. List in g 3 .1 . I m ple m e n t in g a Ke r n e l Th r e a d

Code View: static DECLARE_WAIT_QUEUE_HEAD(myevent_waitqueue); rwlock_t myevent_lock; extern unsigned int myevent_id; /* Holds the identity of the troubled data structure. Populated later on */ static int mykthread(void *unused) { unsigned int event_id = 0; DECLARE_WAITQUEUE(wait, current); /* Become a kernel thread without attached user resources */ daemonize("mykthread"); /* Request delivery of SIGKILL */ allow_signal(SIGKILL); /* The thread sleeps on this wait queue until it's woken up by parts of the kernel in charge of sensing the health of data structures of interest */ add_wait_queue(&myevent_waitqueue, &wait); for (;;) { /* Relinquish the processor until the event occurs */ set_current_state(TASK_INTERRUPTIBLE); schedule(); /* Allow other parts of the kernel to run */ /* Die if I receive SIGKILL */ if (signal_pending(current)) break; /* Control gets here when the thread is woken up */ read_lock(&myevent_lock); /* Critical section starts */ if (myevent_id) { /* Guard against spurious wakeups */ event_id = myevent_id; read_unlock(&myevent_lock); /* Critical section ends */ /* Invoke the registered user mode helper and pass the identity code in its environment */ run_umode_handler(event_id); /* Expanded later on */ } else { read_unlock(&myevent_lock); } } set_current_state(TASK_RUNNING); remove_wait_queue(&myevent_waitqueue, &wait); return 0; }

I f you com pile and run t his as part of t he kernel, you can see t he newly creat ed t hread, m ykt hread, in t he ps out put : bash> ps -ef UID root root ... root ...

PID 1 2

PPID C STIME TTY 0 0 21:56 ? 1 0 22:36 ?

111

1 0 21:56 ?

TIME CMD 00:00:00 init [3] 00:00:00 [ksoftirqd/0] 00:00:00 [mykthread]

Before we delve furt her int o t he t hread im plem ent at ion, let 's writ e a code snippet t hat m onit ors t he healt h of a dat a st ruct ure of int erest and awakens m ykt hread if a problem condit ion is det ect ed: /* Executed by parts of the kernel that own the data structures whose health you want to monitor */ /* ... */ if (my_key_datastructure looks troubled) { write_lock(&myevent_lock); /* Serialize */ /* Fill in the identity of the data structure */ myevent_id = datastructure_id; write_unlock(&myevent_lock); /* Wake up mykthread */ wake_up_interruptible(&myevent_waitqueue); } /* ... */

List ing 3.1 execut es in process cont ext , whereas t he preceding snippet runs from eit her process or int errupt cont ext . Process and int errupt cont ext s com m unicat e via kernel dat a st ruct ures. Our exam ple uses myevent_id and myevent_waitqueue for t his com m unicat ion. myevent_id cont ains t he ident it y of t he dat a st ruct ure in t rouble. Access t o myevent_id is serialized using locks. Not e t hat kernel t hreads are preem pt ible only if CONFIG_PREEMPT is t urned on at com pile t im e. I f CONFIG_PREEMPT is off, or if you are st ill running a 2.4 kernel wit hout t he preem pt ion pat ch, your t hread will freeze t he syst em if it does not go t o sleep. I f you com m ent out schedule() in List ing 3.1 and disable CONFIG_PREEMPT in your kernel configurat ion, your syst em will lock up. You will learn how t o obt ain soft real- t im e responses from kernel t hreads when we discuss scheduling policies in Chapt er 19, " Drivers in User Space."

Pr oce ss St a t e s a n d W a it Qu e u e s Here's t he code region from List ing 3.1 t hat put s m ykt hread t o sleep while wait ing for event s: add_wait_queue(&myevent_waitqueue, &wait); for (;;) { /* ... */ set_current_state(TASK_INTERRUPTIBLE); schedule(); /* Relinquish the processor */ /* Point A */ /* ... */ } set_current_state(TASK_RUNNING); remove_wait_queue(&myevent_waitqueue, &wait);

The operat ion of t he preceding snippet is based on t wo concept s: wait queues and process st at es.

Wait queues hold t hreads t hat need t o wait for an event or a syst em resource. Threads in a wait queue go t o sleep unt il t hey are woken up by anot her t hread or an int errupt handler t hat is responsible for det ect ing t he event . Queuing and dequeuing are respect ively done using add_wait_queue() and remove_wait_queue(), and waking up queued t asks is accom plished via wake_up_interruptible(). A kernel t hread ( or a norm al process) can be in any of t he following process st at es: running, int errupt ible, unint errupt ible, zom bie, st opped, t raced, or dead. These st at es are defined in include/ linux / sched.h :

A process in t he running st at e ( TASK_RUNNING) is in t he scheduler run queue and is a candidat e for get t ing CPU t im e allot t ed by t he scheduler.

A t ask in t he in t e r r u pt ible st at e ( TASK_INTERRUPTIBLE) is wait ing for an event t o occur and is not in t he scheduler run queue. When t he t ask get s woken up, or if a signal is delivered t o it , it re- ent ers t he run queue.

The u n in t e r r u pt ible st at e ( TASK_UNINTERRUPTIBLE) is sim ilar t o t he in t e r r u pt ible st at e except t hat receipt of a signal will not put t he t ask back int o t he run queue.

A st oppe d t ask ( TASK_STOPPED) has st opped execut ion due t o receipt of cert ain signals.

I f an applicat ion such as st race is using t he pt race support in t he kernel t o int ercept a t ask, it 'll be in t he t r a ce d st at e ( TASK_TRACED) .

A t ask in t he zom bie st at e ( EXIT_ZOMBIE) has t erm inat ed, but it s parent did not wait for t he t ask t o com plet e. An exit ing t ask is eit her in t he EXIT_ZOMBIE st at e or t he dead ( EXIT_DEAD) st at e.

You can use set_current_state() t o set t he run st at e of your kernel t hread. Let 's now t urn back t o t he preceding code snippet . m ykt hread sleeps on a wait queue (myevent_waitqueue) and changes it s st at e t o TASK_INTERRUPTIBLE, signaling it s desire t o opt out of t he scheduler run queue. The call t o schedule() asks t he scheduler t o choose and run a new t ask from it s run queue. When code responsible for healt h m onit oring wakes up m ykt hread using wake_up_interruptible(&myevent_waitqueue), t he t hread is put back int o t he scheduler run queue. The process st at e also get s sim ult aneously changed t o TASK_RUNNING, so t here is no race condit ion even if t he wake up occurs bet ween t he t im e t he t ask st at e is set t o TASK_INTERRUPTIBLE and t he t im e schedule() is called. The t hread also get s back int o t he run queue if a SIGKILL signal is delivered t o it . When t he scheduler subsequent ly picks m ykt hread from t he run queue, execut ion resum es from Point A.

Use r M ode H e lpe r s Mykt hread invokes run_umode_handler() in List ing 3.1 t o not ify user space about det ect ed event s: Code View: /* Called from Listing 3.1 */ static void run_umode_handler(int event_id) { int i = 0; char *argv[2], *envp[4], *buffer = NULL;

int value; argv[i++] = myevent_handler; /* Defined in kernel/sysctl.c */ /* Fill in the id corresponding to the data structure in trouble */ if (!(buffer = kmalloc(32, GFP_KERNEL))) return; sprintf(buffer, "TROUBLED_DS=%d", event_id); /* If no user mode handlers are found, return */ if (!argv[0]) return; argv[i] = 0; /* Prepare the environment for /path/to/helper */ i = 0; envp[i++] = "HOME=/"; envp[i++] = "PATH=/sbin:/usr/sbin:/bin:/usr/bin"; envp[i++] = buffer; envp[i] = 0; /* Execute the user mode program, /path/to/helper */ value = call_usermodehelper(argv[0], argv, envp, 0); /* Check return values */ kfree(buffer); }

The kernel support s a m echanism for request ing user m ode program s t o help perform cert ain funct ions. run_umode_handler() uses t his facilit y by invoking call_usermodehelper(). You have t o regist er t he user m ode program invoked by run_umode_handler() via a node in t he / proc/ sys/ direct ory. To do so, m ake sure t hat CONFIG_SYSCTL ( files t hat are part of t he / proc/ sys/ direct ory are collect ively known as t he sysct l int erface) is enabled in your kernel configurat ion and add an ent ry t o t he kern_table array in kernel/ sysct l.c: { .ctl_name

= KERN_MYEVENT_HANDLER, /* Define in include/linux/sysctl.h */ .procname = "myevent_handler", .data = &myevent_handler, .maxlen = 256, .mode = 0644, .proc_handler = &proc_dostring, .strategy = &sysctl_string, },

This creat es t he node / proc/ sys/ kernel/ m yevent _handler in t he process filesyst em . To regist er your user m ode helper, do t he following: bash> echo /path/to/helper > /proc/sys/kernel/myevent_handler

This result s in / pat h/ t o/ helper get t ing execut ed when m ykt hread invokes run_umode_handler().

Mykt hread passes t he ident it y of t he t roubled kernel dat a st ruct ure t o t he user m ode helper t hrough t he environm ent variable TROUBLED_DS. The helper can be a sim ple script like t he following t hat sends you an em ail alert cont aining t he inform at ion it gleaned from it s environm ent : bash> cat /path/to/helper #!/bin/bash echo Kernel datastructure $TROUBLED_DS is in trouble | mail -s Alert root

call_usermodehelper() has t o be execut ed from process cont ext and runs wit h root privileges. I t 's im plem ent ed using a work queue, which we will soon discuss.

H e lpe r I n t e r fa ce s Several useful helper int erfaces exist in t he kernel t o m ake life easier for device driver developers. One exam ple is t he im plem ent at ion of t he doubly linked list library. Many drivers need t o m aint ain and m anipulat e linked list s of dat a st ruct ures. The kernel's list int erface rout ines elim inat e t he need for chasing list point ers and debugging m essy problem s relat ed t o list m aint enance. Let 's learn t o use helper int erfaces such as list s, hlist s, work queues, com plet ion funct ions, not ifier blocks, and kt hreads. There are equivalent ways t o do what t he helper facilit ies offer. You can, for exam ple, im plem ent your own list m anipulat ion rout ines inst ead of using t he list library, or use kernel t hreads t o defer work inst ead of subm it t ing it t o work queues. Using st andard kernel helper int erfaces, however, sim plifies your code, weeds out redundancies from t he kernel, increases code readabilit y, and helps long- t erm m aint enance.

Because t he kernel is vast , you can always find part s t hat do not yet t ake advant age of t hese helper m echanism s, so updat ing t hose code regions m ight be a good way t o st art cont ribut ing t o kernel developm ent .

Lin k e d List s To weave doubly linked list s of dat a st ruct ures, use t he funct ions provided in include/ linux/ list .h. Essent ially, you em bed a struct list_head inside your dat a st ruct ure: #include struct list_head { struct list_head *next, *prev; }; struct mydatastructure { struct list_head mylist; /* Embed */ /* ... */ /* Actual Fields */ };

mylist is t he link t hat chains different inst ances of mydatastructure. I f you have m ult iple list_heads em bedded inside mydatastructure, each of t hem const it ut es a link t hat renders mydatastructure a m em ber of a new list . You can use t he list library t o add or delet e m em bership from individual list s. To get t he lay of t he land before t he det ail, let 's sum m arize t he linked list program m ing int erface offered by t he list library. This is done in Table 3.1.

Ta ble 3 .1 . Lin k e d List M a n ipu la t ion Fu n ct ion s Fu n ct ion

Pu r pose

INIT_LIST_HEAD()

I nit ializes t he list head

list_add()

Adds an elem ent aft er t he list head

Fu n ct ion

Pu r pose

list_add_tail()

Adds an elem ent t o t he t ail of t he list

list_del()

Delet es an elem ent from t he list

list_replace()

Replaces an elem ent in t he list wit h anot her

list_entry()

Loops t hrough all nodes in t he list

list_for_each_entry()/ list_for_each_entry_safe()

Sim pler list it erat ion int erfaces

list_empty()

Checks whet her t here are any elem ent s in t he list

list_splice()

Joins one list wit h anot her

To illust rat e list usage, let 's im plem ent an exam ple. The exam ple also serves as a foundat ion t o underst and t he concept of work queues, which is discussed in t he next sect ion. Assum e t hat your kernel driver needs t o perform a heavy- dut y t ask from an ent ry point . An exam ple is a t ask t hat forces t he calling t hread t o sleep- wait . Nat urally, your driver doesn't like t o block unt il t he t ask finishes, because t hat slows down t he responsiveness of applicat ions relying on it . So, whenever t he driver needs t o perform t his expensive work, it defers execut ion by adding t he corresponding rout ine t o a linked list of work funct ions. The act ual work is perform ed by a kernel t hread, which t raverses t he list and execut es t he work funct ions in t he background. The driver subm it s work funct ions t o t he t ail of t he list , while t he kernel t hread ploughs it s way from t he head of t he list , t hus ensuring t hat work queued first get s done first . Of course, t he rest of t he driver needs t o be designed t o suit t his schem e of deferred execut ion. Before underst anding t his exam ple, however, be aware t hat we will use t he work queue int erface in List ing 3.5 t o im plem ent t he sam e t ask in a sim pler m anner. Let 's first int roduce t he key driver dat a st ruct ures used by our exam ple: static struct _mydrv_wq { struct list_head mydrv_worklist; /* Work List */ spinlock_t lock; /* Protect the list */ wait_queue_head_t todo; /* Synchronize submitter and worker */ } mydrv_wq;

struct _mydrv_work { struct list_head mydrv_workitem; void (*worker_func)(void *); void *worker_data; /* ... */ } mydrv_work;

/* /* /* /*

The work chain */ Work to perform */ Argument to worker_func */ Other fields */

mydrv_wq is global t o all work subm issions. I t s m em bers include a point er t o t he head of t he work list , and a wait queue t o com m unicat e bet ween driver funct ions t hat subm it work and t he kernel t hread t hat perform s t he work. The list helper funct ions do not prot ect accesses t o list m em bers, so you need t o use concurrency m echanism s t o serialize sim ult aneous point er references. This is done using a spinlock t hat is also a part of mydrv_wq. The driver init ializat ion rout ine mydrv_init() in List ing 3.2 init ializes t he spinlock, t he list head, and t he wait queue, and kick st art s t he worker t hread. List in g 3 .2 . I n it ia lize D a t a St r u ct u r e s

static int __init mydrv_init(void) { /* Initialize the lock to protect against concurrent list access */ spin_lock_init(&mydrv_wq.lock); /* Initialize the wait queue for communication between the submitter and the worker */ init_waitqueue_head(&mydrv_wq.todo); /* Initialize the list head */ INIT_LIST_HEAD(&mydrv_wq.mydrv_worklist); /* Start the worker thread. See Listing 3.4 */ kernel_thread(mydrv_worker, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD); return 0; }

Before exam ining t he worker t hread t hat execut es subm it t ed work, let 's look at work subm ission it self. List ing 3.3 im plem ent s a funct ion t hat ot her part s of t he kernel can use t o subm it work. I t uses list_add_tail() t o add a work funct ion t o t he t ail of t he list . Look at Figure 3.1 t o see t he physical st ruct ure of t he work list .

Figu r e 3 .1 . Lin k e d list of w or k fu n ct ion s.

[ View full size im age]

List in g 3 .3 . Su bm it t in g W or k t o Be Ex e cu t e d La t e r

int submit_work(void (*func)(void *data), void *data) { struct _mydrv_work *mydrv_work; /* Allocate the work structure */ mydrv_work = kmalloc(sizeof(struct _mydrv_work), GFP_ATOMIC); if (!mydrv_work) return -1; /* Populate the work structure */ mydrv_work->worker_func = func; /* Work function */ mydrv_work->worker_data = data; /* Argument to pass */ spin_lock(&mydrv_wq.lock); /* Protect the list */ /* Add your work to the tail of the list */ list_add_tail(&mydrv_work->mydrv_workitem, &mydrv_wq.mydrv_worklist); /* Wake up the worker thread */ wake_up(&mydrv_wq.todo); spin_unlock(&mydrv_wq.lock); return 0; }

To subm it a work funct ion void job(void *) from a driver ent ry point , do t his: submit_work(job, NULL);

Aft er subm it t ing t he work funct ion, List ing 3.3 wakes up t he worker t hread. The general st ruct ure of t he worker t hread shown in List ing 3.4 is sim ilar t o st andard kernel t hreads discussed in t he previous sect ion. The t hread uses list_entry() t o work it s way t hrough all nodes in t he list . list_entry() ret urns t he cont ainer dat a st ruct ure inside which t he list node is em bedded. Take a closer look at t he relevant line in List ing 3.4: mydrv_work = list_entry(mydrv_wq.mydrv_worklist.next, struct _mydrv_work, mydrv_workitem);

mydrv_workitem is em bedded inside mydrv_work, so list_entry() ret urns a point er t o t he corresponding mydrv_work st ruct ure. The param et ers passed t o list_entry() are t he address of t he em bedded list node, t he t ype of t he cont ainer st ruct ure, and t he field nam e of t he em bedded list node. Aft er execut ing a subm it t ed work funct ion, t he worker t hread rem oves t he corresponding node from t he list using list_del(). Not e t hat mydrv_wq.lock is released and reacquired in t he t im e window when t he subm it t ed work funct ion is execut ed. This is because work funct ions can go t o sleep result ing in pot ent ial deadlocks if newly scheduled code t ries t o acquire t he sam e spinlock.

List in g 3 .4 . Th e W or k e r Th r e a d

Code View: static int mydrv_worker(void *unused) { DECLARE_WAITQUEUE(wait, current); void (*worker_func)(void *); void *worker_data; struct _mydrv_work *mydrv_work; set_current_state(TASK_INTERRUPTIBLE); /* Spin until asked to die */ while (!asked_to_die()) { add_wait_queue(&mydrv_wq.todo, &wait); if (list_empty(&mydrv_wq.mydrv_worklist)) { schedule(); /* Woken up by the submitter */ } else { set_current_state(TASK_RUNNING); } remove_wait_queue(&mydrv_wq.todo, &wait); /* Protect concurrent access to the list */ spin_lock(&mydrv_wq.lock); /* Traverse the list and plough through the work functions present in each node */ while (!list_empty(&mydrv_wq.mydrv_worklist)) { /* Get the first entry in the list */ mydrv_work = list_entry(mydrv_wq.mydrv_worklist.next, struct _mydrv_work, mydrv_workitem); worker_func = mydrv_work->worker_func; worker_data = mydrv_work->worker_data; /* This node has been processed. Throw it out of the list */ list_del(mydrv_wq.mydrv_worklist.next); kfree(mydrv_work); /* Free the node */ /* Execute the work function in this node */ spin_unlock(&mydrv_wq.lock); /* Release lock */ worker_func(worker_data); spin_lock(&mydrv_wq.lock); /* Re-acquire lock */ } spin_unlock(&mydrv_wq.lock); set_current_state(TASK_INTERRUPTIBLE); } set_current_state(TASK_RUNNING); return 0; }

For sim plicit y, t he exam ple code does not perform error handling. For exam ple, if t he call t o kernel_thread() in List ing 3.2 fails, you need t o free m em ory allocat ed for t he corresponding work st ruct ure. Also, asked_to_die() in List ing 3.4 is left unwrit t en. I t essent ially breaks out of t he loop if it eit her det ect s a delivered signal or receives a com m unicat ion from t he release() ent ry point t hat t he m odule is about t o be unloaded from t he kernel. Before ending t his sect ion, let 's t ake a look at anot her useful list library rout ine, list_for_each_entry(). Wit h t his m acro, it erat ion becom es sim pler and m ore readable because you don't have t o use list_entry() inside t he loop. Use t he list_for_each_entry_safe() variant if you will delet e list elem ent s inside t he loop. You can replace t he following snippet in List ing 3.4: while (!list_empty(&mydrv_wq.mydrv_worklist)) { mydrv_work = list_entry(mydrv_wq.mydrv_worklist.next, struct _mydrv_work, mydrv_workitem); /* ... */ }

wit h: struct _mydrv_work *temp; list_for_each_entry_safe(mydrv_work, temp, &mydrv_wq.mydrv_worklist, mydrv_workitem) { /* ... */ }

You can't use list_for_each_entry() in t his case because you are rem oving t he ent ry point ed t o by mydrv_work inside t he loop in List ing 3.4. list_for_each_entry_safe() solves t his problem using t he t em porary variable passed as t he second argum ent ( temp) t o save t he address of t he next ent ry in t he list .

H a sh List s The doubly linked list im plem ent at ion discussed previously is not opt im al for cases where you want t o im plem ent linked dat a st ruct ures such as hash t ables. This is because hash t ables need only a list head cont aining a single point er. To reduce m em ory overhead for such applicat ions, t he kernel provides hash list s ( or hlist s) , a variat ion of list s. Unlike list s, which use t he sam e st ruct ure for t he list head and list nodes, hlist s have separat e definit ions: struct hlist_head { struct hlist_node *first; }; struct hlist_node { struct hlist_node *next, **pprev; };

To suit t he schem e of a single- point er hlist head, t he nodes m aint ain t he address of t he point er t o t he previous node, rat her t han t he point er it self. Hash t ables are im plem ent ed using an array of hlist_heads. Each hlist_head sources a doubly linked list of hlist_nodes. A hash funct ion is used t o locat e t he index ( or bucket ) in t he hlist_head array. When t hat is done, you m ay use hlist helper rout ines ( also defined in include/ linux/ list .h) t o operat e on t he list of hlist_nodes linked t o t he chosen bucket . Look at t he im plem ent at ion of t he direct ory cache ( dcache) in

fs/ dcache.c for an exam ple.

W or k Qu e u e s Work queues are a way t o defer work inside t he kernel. [ 2] Deferring work is useful in innum erable sit uat ions. Exam ples include t he following:

[ 2] Soft irqs and t asklet s are t wo ot her m echanism s available for deferring work inside t he kernel. Table 4.1 of Chapt er 4 com pares soft irqs, t asklet s, and work queues.

Triggering rest art of a net work adapt er in response t o an error int errupt

Filesyst em t asks such as syncing disk buffers

Sending a com m and t o a disk and following t hrough wit h t he st orage prot ocol st at e m achine

The funct ionalit y of work queues is sim ilar t o t he exam ple described in List ings 3.2 t o 3.4 . However, work queues can help you accom plish t he sam e t ask in a sim pler m anner. The work queue helper library exposes t wo int erface st ruct ures t o users: a workqueue_struct and a work_struct. Follow t hese st eps t o use work queues: 1.

Creat e a work queue ( or a workqueue_struct) wit h one or m ore associat ed kernel t hreads. To creat e a kernel t hread t o service a workqueue_struct, use create_singlethread_workqueue(). To creat e one worker t hread per CPU in t he syst em , use t he create_workqueue() variant . The kernel also has default per- CPU worker t hreads ( event s/ n, where n is t he CPU num ber) t hat you can t im eshare inst ead of request ing a separat e worker t hread. Depending on your applicat ion, you m ight incur a perform ance hit if you don't have a dedicat ed worker t hread.

2.

Creat e a work elem ent ( or a work_struct) . A work_struct is init ialized using INIT_WORK(), which populat es it wit h t he address and argum ent of your work funct ion.

3.

Subm it t he work elem ent t o t he work queue. A work_struct can be subm it t ed t o a dedicat ed queue using queue_work(), or t o t he default kernel worker t hread using schedule_work().

Let 's rewrit e List ings 3.2 t o 3.4 t o t ake advant age of t he work queue int erface. This is done in List ing 3.5. The ent ire kernel t hread, as well as t he spinlock and t he wait queue, vanish inside t he work queue int erface. Even t he call t o create_singlethread_workqueue() goes away if you are using t he default kernel worker t hread. List in g 3 .5 . Usin g W or k Qu e u e s t o D e fe r W or k

Code View: #include struct workqueue_struct *wq; /* Driver Initialization */ static int __init mydrv_init(void) { /* ... */ wq = create_singlethread_workqueue("mydrv"); return 0; } /* Work Submission. The first argument is the work function, and the second argument is the argument to the work function */ int submit_work(void (*func)(void *data), void *data) { struct work_struct *hardwork; hardwork = kmalloc(sizeof(struct work_struct), GFP_KERNEL); /* Init the work structure */ INIT_WORK(hardwork, func, data); /* Enqueue Work */ queue_work(wq, hardwork); return 0; }

I f you are using work queues, you will get linker errors unless you declare your m odule as licensed under GPL. This is because t he kernel export s t hese funct ions only t o GPLed code. I f you look at t he kernel work queue im plem ent at ion, you will see t his rest rict ion expressed in st at em ent s such as t his: EXPORT_SYMBOL_GPL(queue_work);

To announce t hat your m odule is copy left- ed under GPL, declare t he following: MODULE_LICENSE("GPL");

N ot ifie r Ch a in s Not ifier chains are used t o send st at us change m essages t o code regions t hat request t hem . Unlike hard- coded m echanism s, not ifiers offer a versat ile t echnique for get t ing alert ed when event s of int erest are generat ed. Not ifiers were originally added for passing net work event s t o concerned sect ions of t he kernel but are now used for m any ot her purposes. The kernel im plem ent s predefined not ifiers for significant event s. Exam ples of such not ificat ions include t he following:

D ie n ot ifica t ion, which is sent when a kernel funct ion t riggers a t rap or a fault , caused by an " oops," page fault , or a breakpoint hit . I f you are, for exam ple, writ ing a device driver for a m edical grade card, you m ight want t o regist er yourself wit h t he die not ifier so t hat you can at t em pt t o t urn off t he m edical elect ronics if a kernel panic occurs.

N e t de vice n ot ifica t ion, which is generat ed when a net work int erface goes up or down.

CPU fr e qu e n cy n ot ifica t ion , which is dispat ched when t here is a t ransit ion in t he processor frequency.

I n t e r n e t a ddr e ss n ot ifica t ion , which is sent when a change is det ect ed in t he I P address of a net work int erface.

An exam ple user of not ifiers is t he High- level Dat a Link Cont rol ( HDLC) prot ocol driver drivers/ net / wan/ hdlc.c, which regist ers it self wit h t he net device not ifier chain t o sense carrier changes. To at t ach your code t o a not ifier chain, you have t o regist er an event handler wit h t he associat ed chain. An event ident ifier and a not ifier- specific argum ent are passed as argum ent s t o t he handler rout ine when t he concerned event is generat ed. To define a cust om not ifier chain, you have t o addit ionally im plem ent t he infrast ruct ure t o ignit e t he chain when t he event is det ect ed. List ing 3.6 cont ains exam ples of using predefined and user- defined not ifiers. Table 3.2 cont ains a brief descript ion of t he not ifier chains used by List ing 3.6 and t he event s t hey propagat e, so look at t he list ing and t he t able in t andem .

Ta ble 3 .2 . N ot ifie r Ch a in s a n d t h e Eve n t s Th e y Pr opa ga t e N ot ifie r Ch a in

D e scr ipt ion

D ie N ot ifie r Ch a in (die_chain)

my_die_event_handler() at t aches t o t he die not ifier chain, die_chain, using register_die_notifier(). To t rigger invocat ion of my_die_event_handler(), int roduce an invalid dereference som ewhere in your code, such as t he following: int *q = 0; *q = 1;

When t his code snippet is execut ed, my_die_event_handler() get s called, and you will see a m essage like t his: my_die_event_handler: OOPs! at EIP=f00350e7 The die event not ifier passes t he die_args st ruct ure t o t he regist ered event handler. This argum ent cont ains a point er t o t he regs st ruct ure, which carries a snapshot of processor regist er cont ent s when t he fault occurred. my_die_event_handler() print s t he cont ent s of t he inst ruct ion point er regist er. N e t de vice N ot ifie r Ch a in (netdev_chain)

my_dev_event_handler() at t aches t o t he net device not ifier chain, netdev_chain, using register_netdevice_notifier(). You can generat e t his event by changing t he st at e of a net work int erface such as Et hernet ( ethX) or loopback (lo) : bash> ifconfig eth0 up

N ot ifie r Ch a in

D e scr ipt ion This result s in t he execut ion of my_dev_event_handler(). The handler is passed a point er t o struct net_device as argum ent , which cont ains t he nam e of t he net work int erface. my_dev_event_handler() uses t his inform at ion t o produce t he following m essage: my_dev_event_handler: Val=1, Interface=eth0 Val=1 corresponds t o t he NETDEV_UP event as defined in include/ linux/ not ifier.h.

Use r - D e fin e d N ot ifie r Ch a in

List ing 3.6 also im plem ent s a user- defined not ifier chain, my_noti_chain. Assum e t hat you want an event t o be generat ed whenever a user reads from a part icular file in t he process filesyst em . Add t he following in t he associat ed procfs read rout ine: blocking_notifier_call_chain(&my_noti_chain, 100, NULL);

This result s in t he invocat ion of my_event_handler() whenever you read from t he corresponding / proc file and result s in t he following m essage: my_event_handler: Val=100 Val cont ains t he ident it y of t he generat ed event , which is 100 for t his exam ple. The funct ion argum ent is left unused.

You have t o unregist er event handlers from not ifier chains when your m odule is released from t he kernel. For exam ple, if you up or down a net work int erface aft er unloading t he code in List ing 3.6, you will be rankled by an " oops," unless you perform an unregister_netdevice_notifier(&my_dev_notifier) from t he m odule's release() m et hod. This is because t he not ifier chain cont inues t o t hink t hat t he handler code is valid, even t hough it has been pulled out of t he kernel. List in g 3 .6 . N ot ifie r Eve n t H a n dle r s Code View: #include #include #include #include /* Die Notifier Definition */ static struct notifier_block my_die_notifier = { .notifier_call = my_die_event_handler, }; /* Die notification event handler */ int my_die_event_handler(struct notifier_block *self, unsigned long val, void *data) { struct die_args *args = (struct die_args *)data; if (val == 1) { /* '1' corresponds to an "oops" */ printk("my_die_event: OOPs! at EIP=%lx\n", args->regs->eip);

} /* else ignore */ return 0; }

/* Net Device notifier definition */ static struct notifier_block my_dev_notifier = { .notifier_call = my_dev_event_handler, };

/* Net Device notification event handler */ int my_dev_event_handler(struct notifier_block *self, unsigned long val, void *data) { printk("my_dev_event: Val=%ld, Interface=%s\n", val, ((struct net_device *) data)->name); return 0; }

/* User-defined notifier chain implementation */ static BLOCKING_NOTIFIER_HEAD(my_noti_chain); static struct notifier_block my_notifier = { .notifier_call = my_event_handler, }; /* User-defined notification event handler */ int my_event_handler(struct notifier_block *self, unsigned long val, void *data) { printk("my_event: Val=%ld\n", val); return 0; } /* Driver Initialization */ static int __init my_init(void) { /* ... */ /* Register Die Notifier */ register_die_notifier(&my_die_notifier); /* Register Net Device Notifier */ register_netdevice_notifier(&my_dev_notifier); /* Register a user-defined Notifier */ blocking_notifier_chain_register(&my_noti_chain, &my_notifier); /* ... */ }

my_noti_chain in List ing 3.6 is declared as a blocking not ifier using BLOCKING_NOTIFIER_HEAD() and is regist ered via blocking_notifier_chain_register(). This m eans t hat t he not ifier handler is always invoked from process cont ext . So, t he handler funct ion, my_event_handler(), is allowed t o go t o sleep. I f your not ifier

handler can be called from int errupt cont ext , declare t he not ifier chain using ATOMIC_NOTIFIER_HEAD(), and regist er it via atomic_notifier_chain_register().

The Old Not ifier I nt erface Kernel releases earlier t han 2.6.17 support ed only a general- purpose not ifier chain. The not ifier regist rat ion funct ion notifier_chain_register() was int ernally prot ect ed using a spinlock, but t he rout ine t hat walked t he not ifier chain dispat ching event s t o not ifier handlers (notifier_call_chain()) was lockless. The lack of locking was because of t he possibilit y t hat t he handler funct ions m ay go t o sleep, unregist er t hem selves while running, or get called from int errupt cont ext . The lockless im plem ent at ion int roduced race condit ions, however. The new not ifier API is built over t he original int erface and is int ended t o overcom e it s lim it at ions.

Com ple t ion I n t e r fa ce Many part s of t he kernel init iat e cert ain act ivit ies as separat e execut ion t hreads and t hen wait for t hem t o com plet e. The com plet ion int erface is an efficient and easy way t o im plem ent such code pat t erns. Som e exam ple usage scenarios include t he following:

Your driver m odule is assist ed by a kernel t hread. I f you rm m od t he m odule, t he release() m et hod is invoked before rem oving t he m odule code from kernel space. The release rout ine asks t he t hread t o kill it self and blocks unt il t he t hread com plet es it s exit . List ing 3.7 im plem ent s t his case.

You are writ ing a port ion of a block device driver ( discussed in Chapt er 14, " Block Drivers" ) t hat queues a read request t o a device. This t riggers a st at e m achine change im plem ent ed as a separat e t hread or work queue. The driver want s t o wait unt il t he operat ion com plet es before proceeding wit h anot her act ivit y. Look at drivers/ block/ floppy.c for an exam ple.

An applicat ion request s an Analog- t o- Digit al Convert er ( ADC) driver for a dat a sam ple. The driver init iat es a conversion request wait s, unt il an int errupt signals com plet ion of conversion, and ret urns t he dat a.

List in g 3 .7 . Syn ch r on izin g Usin g Com ple t ion Fu n ct ion s Code View: static DECLARE_COMPLETION(my_thread_exit); /* Completion */ static DECLARE_WAIT_QUEUE_HEAD(my_thread_wait); /* Wait Queue */ int pink_slip = 0; /* Exit Flag */ /* Helper thread */ static int my_thread(void *unused) { DECLARE_WAITQUEUE(wait, current); daemonize("my_thread"); add_wait_queue(&my_thread_wait, &wait); while (1) { /* Relinquish processor until event occurs */

set_current_state(TASK_INTERRUPTIBLE); schedule(); /* Control gets here when the thread is woken up from the my_thread_wait wait queue */ /* Quit if let go */ if (pink_slip) { break; } /* Do the real work */ /* ... */ } /* Bail out of the wait queue */ __set_current_state(TASK_RUNNING); remove_wait_queue(&my_thread_wait, &wait); /* Atomically signal completion and exit */ complete_and_exit(&my_thread_exit, 0); } /* Module Initialization */ static int __init my_init(void) { /* ... */ /* Kick start the thread */ kernel_thread(my_thread, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD); /* ... */ } /* Module Release */ static void __exit my_release(void) { /* ... */ pink_slip = 1; /* my_thread must go */ wake_up(&my_thread_wait); /* Activate my_thread */ wait_for_completion(&my_thread_exit); /* Wait until my_thread quits */ /* ... */ }

A com plet ion obj ect can be declared st at ically using DECLARE_COMPLETION() or creat ed dynam ically wit h init_completion(). A t hread can signal com plet ion wit h t he help of complete() or complete_all(). A caller can wait for com plet ion via wait_for_completion(). I n List ing 3.7, my_release() raises an exit request flag by set t ing pink_slip before waking up my_thread(). I t t hen calls wait_for_completion() t o wait unt il my_thread() com plet es it s exit . my_thread(), on it s part , wakes up t o find pink_slip set , and does t he following:

1 . Signals com plet ion t o my_release()

2 . Kills it self

my_thread() accom plishes t hese t wo st eps at om ically using complete_and_exit(). Using complete_and_exit() shut s t he window bet ween m odule exit and t hread exit t hat opens if you separat ely invoke complete() and exit(). We will use t he com plet ion API when we develop an exam ple t elem et ry driver in Chapt er 11.

Kt h r e a d H e lpe r s Kt hread helpers add a coat ing over t he raw t hread creat ion rout ines and sim plify t he t ask of t hread m anagem ent . List ing 3.8 rewrit es List ing 3.7 using t he kt hread helper int erface. my_init() now uses kthread_create() rat her t han kernel_thread(). You can pass t he t hread's nam e t o kthread_create() rat her t han explicit ly call daemonize() wit hin t he t hread. The kt hread int erface provides you free access t o a built - in exit synchronizat ion m echanism im plem ent ed using t he com plet ion int erface. So, as my_release() does in List ing 3.8, you m ay direct ly call kthread_stop() inst ead of laboriously set t ing pink_slip, waking up my_thread(), and wait ing for it t o com plet e using wait_for_completion(). Sim ilarly, my_thread() can m ake a neat call t o kthread_should_stop() t o check whet her it ought t o call it a day. List in g 3 .8 . Syn ch r on izin g Usin g Kt h r e a d H e lpe r s Code View: /* '+' and '-' show the differences from Listing 3.7 */ #include /* Assistant Thread */ static int my_thread(void *unused) { DECLARE_WAITQUEUE(wait, current); daemonize("my_thread"); + + + -

while (1) { /* Continue work if no other thread has * invoked kthread_stop() */ while (!kthread_should_stop()) { /* ... */ /* Quit if let go */ if (pink_slip) { break; } /* ... */ } __set_current_state(TASK_RUNNING); remove_wait_queue(&my_thread_wait, &wait);

+

complete_and_exit(&my_thread_exit, 0); return 0; }

+

struct task_struct *my_task;

/* Module Initialization */ static int __init my_init(void) { /* ... */ kernel_thread(my_thread, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD); + my_task = kthread_create(my_thread, NULL, "%s", "my_thread"); + if (my_task) wake_up_process(my_task); /* ... */ } /* Module Release */ static void __exit my_release(void) { /* ... */ pink_slip = 1; wake_up(&my_thread_wait); wait_for_completion(&my_thread_exit); + kthread_stop(my_task); /* ... */ }

I nst ead of creat ing t he t hread using kthread_create() and act ivat ing it via wake_up_process() as done in List ing 3.8, you m ay use t he following single call: kthread_run(my_thread, NULL, "%s", "my_thread");

Er r or - H a n dlin g Aids Several kernel funct ions ret urn point er values. Whereas callers usually check for failure by com paring t he ret urn value wit h NULL, t hey t ypically need m ore inform at ion t o decipher t he exact nat ure of t he error t hat has occurred. Because kernel addresses have redundant bit s, t hey can be overloaded t o encode error sem ant ics. This is done wit h t he help of a set of helper rout ines. List ing 3.9 im plem ent s a sim ple usage exam ple. List in g 3 .9 . Usin g Er r or - H a n dlin g Aids

Code View: #include char * collect_data(char *userbuffer) { char *buffer; /* ... */ buffer = kmalloc(100, GFP_KERNEL); if (!buffer) { /* Out of memory */ return ERR_PTR(-ENOMEM); } /* ... */ if (copy_from_user(buffer, userbuffer, 100)) { return ERR_PTR(-EFAULT); } /* ... */ return(buffer); }

int my_function(char *userbuffer) { char *buf; /* ... */ buf = collect_data(userbuffer); if (IS_ERR(buf)) { printk("Error returned is %d!\n", PTR_ERR(buf)); } /* ... */ }

I f kmalloc() fails inside collect_data() in List ing 3.9, you will get t he following m essage: Error returned is -12!

However, if collect_data() is successful, it ret urns a valid point er t o a dat a buffer. As anot her exam ple, let 's add error handling using IS_ERR() and PTR_ERR() t o t he t hread creat ion code in List ing 3.8: my_task = kthread_create(my_thread, NULL, "%s", "mydrv"); + + + +

if (!IS_ERR(my_task)) { /* Success */ wake_up_process(my_task); } else { /* Failure */

+ +

printk("Error value returned=%d\n", PTR_ERR(my_task)); }

Look in g a t t h e Sou r ce s The ksoft irqd, pdflush, and khubd kernel t hreads live in kernel/ soft irq.c, m m / pdflush.c, and drivers/ usb/ core/ hub.c, respect ively. The daemonize() funct ion can be found in kernel/ exit .c. For t he im plem ent at ion of user m ode helpers, look at kernel/ km od.c. The list and hlist library rout ines reside in include/ linux/ list .h. They are used all over t he kernel, so you will find usage exam ples in m ost subdirect ories. An exam ple is t he request_queue st ruct ure defined in include/ linux/ blkdev.h, which holds a linked list of disk I / O request s. We look at t his dat a st ruct ure in Chapt er 14. Go t o www.ussg.iu.edu/ hyperm ail/ linux/ kernel/ 0007.3/ 0805.ht m l and follow t he discussion t hread in t he m ailing list for an int erest ing debat e bet ween Linus Torvalds and Andi Kleen about t he pros and cons of com plem ent ing t he list library wit h hlist helper rout ines. The kernel work queue im plem ent at ion lives in kernel/ workqueue.c. To underst and t he real- world use of work queues, look at t he PRO/ Wireless 2200 net work driver, drivers/ net / wireless/ ipw2200.c. The kernel not ifier chain im plem ent at ion lives in kernel/ sys.c and include/ linux/ not ifier.h. Look at kernel/ sched.c and include/ linux/ com plet ion.h for t he gut s of t he com plet ion int erface. kernel/ kt hread.c cont ains t he source code for kt hread helpers, and include/ linux/ err.h includes definit ions of error handling aids. Table 3.3 cont ains a sum m ary of t he m ain dat a st ruct ures used in t his chapt er and t he locat ion of t heir definit ions in t he source t ree. Table 3.4 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 3 .3 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

wait_queue_t

include/ linux/ wait .h

Used by t hreads t hat desire t o wait for an event or a syst em resource

list_head

include/ linux/ list .h

Kernel st ruct ure t o weave a doubly linked list of dat a st ruct ures

hlist_head

include/ linux/ list .h

Kernel st ruct ure t o im plem ent hash t ables

work_struct

include/ linux/ workqueue.h I m plem ent s work queues, which are a way t o defer work inside t he kernel

notifier_block

include/ linux/ not ifier.h

completion

include/ linux/ com plet ion.h Used t o init iat e act ivit ies as separat e t hreads and t hen wait for t hem t o com plet e

I m plem ent s not ifier chains, which are used t o send st at us changes t o code regions t hat request t hem

Ta ble 3 .4 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

DECLARE_WAITQUEUE()

include/ linux/ wait .h

Declares a wait queue.

add_wait_queue()

ker nel/ w ait .c

Queues a t ask t o a wait queue. The t ask goes t o sleep unt il it 's woken up by anot her t hread or int errupt handler.

remove_wait_queue()

ker nel/ w ait .c

Dequeues a t ask from a wait queue.

wake_up_interruptible()

include/ linux/ wait .h kernel/ sched.c

Wakes up a t ask sleeping inside a wait queue and put s it back int o t he scheduler run queue.

schedule()

kernel/ sched.c

Relinquishes t he processor and allows ot her part s of t he kernel t o run.

set_current_state()

include/ linux/ sched.h

Set s t he run st at e of a process. The st at e can be one of TASK_RUNNING, TASK_INTERRUPTIBLE, TASK_UNINTERRUPTIBLE, TASK_STOPPED, TASK_TRACED, EXIT_ZOMBIE, or EXIT_DEAD.

kernel_thread()

arch/ yourarch/ kernel/ process.c

Creat es a kernel t hread.

daemonize()

kernel/ exit .c

Act ivat es a kernel t hread wit hout at t aching user resources and changes t he parent of t he calling t hread t o kt hreadd.

allow_signal()

kernel/ exit .c

Enables delivery of a specified signal.

signal_pending()

include/ linux/ sched.h

Checks whet her a signal has been delivered. There are no signal handlers inside t he kernel, so you have t o explicit ly check whet her a signal has been delivered.

call_usermodehelper()

include/ linux/ km od.h kernel/ km od.c

Execut es a user m ode program .

Linked list library functions

include/ linux/ list .h

See Table 3.1.

register_die_notifier()

arch/ yourarch/ kernel/ t raps.c

Regist ers a die not ifier.

register_netdevice_notifier()

net / core/ dev.c

Regist ers a net device not ifier.

register_inetaddr_notifier()

net / ipv4/ devinet .c

Regist ers an inet addr not ifier.

BLOCKING_NOTIFIER_HEAD()

include/ linux/ not ifier.h

Creat es a user- defined blocking not ifier.

blocking_notifier_chain_register() kernel/ sys.c

Regist ers a blocking not ifier.

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

blocking_notifier_call_chain()

kernel/ sys.c

Dispat ches an event t o a blocking not ifier chain.

ATOMIC_NOTIFIER_HEAD()

include/ linux/ not ifier.h

Creat es an at om ic not ifier.

atomic_notifier_chain_register()

kernel/ sys.c

Regist ers an at om ic not ifier.

DECLARE_COMPLETION()

include/ linux/ com plet ion.h St at ically declares a com plet ion obj ect .

init_completion()

include/ linux/ com plet ion.h Dynam ically declares a com plet ion obj ect .

complete()

kernel/ sched.c

Announces com plet ion.

wait_for_completion()

kernel/ sched.c

Wait s unt il t he com plet ion obj ect com plet es.

complete_and_exit()

kernel/ exit .c

At om ically signals com plet ion and exit .

kthread_create()

kernel/ kt hread.c

Creat es a kernel t hread.

kthread_stop()

kernel/ kt hread.c

Asks a kernel t hread t o st op.

kthread_should_stop()

kernel/ kt hread.c

A kernel t hread can poll on t his funct ion t o det ect whet her anot her t hread has asked it t o st op via kthread_stop().

IS_ERR()

include/ linux/ err.h

Finds out whet her t he ret urn value is an error code.

Ch a pt e r 4 . La yin g t h e Gr ou n dw or k I n Th is Ch a pt e r 90 I nt roducing Devices and Drivers 92 I nt errupt Handling 103 The Linux Device Model 114 Mem ory Barriers 114 Power Managem ent 115 Looking at t he Sources

We are now wit hin whispering dist ance of writ ing a device driver. Before doing t hat , however, let 's equip ourselves wit h som e driver concept s. We st art t he chapt er by get t ing an idea of t he book's problem st at em ent ; we will look at t he t ypical devices and I / O int erfaces present on PC- com pat ible syst em s and em bedded com put ers. I nt errupt handling is an int egral part of m ost drivers, so we next cover t he art of writ ing int errupt handlers. We t hen t urn our at t ent ion t o t he new device m odel int roduced in t he 2.6 kernel. The new m odel is built around abst ract ions such as sysfs, kobj ect s, device classes, and udev, which dist ill com m onalit ies from device drivers. The new device m odel also weeds policies out of kernel space and pushes t hem t o user space, result ing in a t ot al revam p of feat ures such as / dev node m anagem ent , hot plug, coldplug, m odule aut oload, and firm ware download.

I n t r odu cin g D e vice s a n d D r ive r s User applicat ions cannot direct ly com m unicat e wit h hardware because t hat ent ails possessing privileges such as execut ing special inst ruct ions and handling int errupt s. Device drivers assum e t he burden of int eract ing wit h hardware and export int erfaces t hat applicat ions and t he rest of t he kernel can use t o access devices. Applicat ions operat e on devices via nodes in t he / dev direct ory and glean device inform at ion using nodes in t he / sys direct ory. [ 1]

[ 1]

As you'll learn lat er, net working applicat ions rout e t heir request s t o t he underlying driver using a different m echanism .

Figure 4.1 shows t he hardware block diagram of a t ypical PC- com pat ible syst em . As you can see, t he syst em support s diverse devices and int erface t echnologies such as m em ory, video, audio, USB, PCI , WiFi, PCMCI A, I 2 C, I DE, Et hernet , serial port , keyboard, m ouse, floppy drive, parallel port , and I nfrared. The m em ory cont roller and t he graphics cont roller are part of a Nort h Bridge chipset in t he PC archit ect ure, whereas peripheral buses are sourced out of a Sout h Bridge.

Figu r e 4 .1 . H a r dw a r e block dia gr a m of a PC- com pa t ible syst e m .

Figure 4.2 illust rat es a sim ilar block diagram for a hypot het ical em bedded device. This diagram cont ains several int erfaces not t ypical in t he PC world such as flash m em ory, LCD, t ouch screen, and cellular m odem .

Figu r e 4 .2 . H a r dw a r e block dia gr a m of a n e m be dde d syst e m .

[ View full size im age]

Nat urally, t he capabilit y t o access peripheral devices is a crucial part of a syst em 's funct ioning. Device drivers provide t he engine t o achieve t his. The rest of t he chapt ers in t his book will zoom in on a device int erface and t each you how t o im plem ent t he corresponding device driver.

Ch a pt e r 4 . La yin g t h e Gr ou n dw or k I n Th is Ch a pt e r 90 I nt roducing Devices and Drivers 92 I nt errupt Handling 103 The Linux Device Model 114 Mem ory Barriers 114 Power Managem ent 115 Looking at t he Sources

We are now wit hin whispering dist ance of writ ing a device driver. Before doing t hat , however, let 's equip ourselves wit h som e driver concept s. We st art t he chapt er by get t ing an idea of t he book's problem st at em ent ; we will look at t he t ypical devices and I / O int erfaces present on PC- com pat ible syst em s and em bedded com put ers. I nt errupt handling is an int egral part of m ost drivers, so we next cover t he art of writ ing int errupt handlers. We t hen t urn our at t ent ion t o t he new device m odel int roduced in t he 2.6 kernel. The new m odel is built around abst ract ions such as sysfs, kobj ect s, device classes, and udev, which dist ill com m onalit ies from device drivers. The new device m odel also weeds policies out of kernel space and pushes t hem t o user space, result ing in a t ot al revam p of feat ures such as / dev node m anagem ent , hot plug, coldplug, m odule aut oload, and firm ware download.

I n t r odu cin g D e vice s a n d D r ive r s User applicat ions cannot direct ly com m unicat e wit h hardware because t hat ent ails possessing privileges such as execut ing special inst ruct ions and handling int errupt s. Device drivers assum e t he burden of int eract ing wit h hardware and export int erfaces t hat applicat ions and t he rest of t he kernel can use t o access devices. Applicat ions operat e on devices via nodes in t he / dev direct ory and glean device inform at ion using nodes in t he / sys direct ory. [ 1]

[ 1]

As you'll learn lat er, net working applicat ions rout e t heir request s t o t he underlying driver using a different m echanism .

Figure 4.1 shows t he hardware block diagram of a t ypical PC- com pat ible syst em . As you can see, t he syst em support s diverse devices and int erface t echnologies such as m em ory, video, audio, USB, PCI , WiFi, PCMCI A, I 2 C, I DE, Et hernet , serial port , keyboard, m ouse, floppy drive, parallel port , and I nfrared. The m em ory cont roller and t he graphics cont roller are part of a Nort h Bridge chipset in t he PC archit ect ure, whereas peripheral buses are sourced out of a Sout h Bridge.

Figu r e 4 .1 . H a r dw a r e block dia gr a m of a PC- com pa t ible syst e m .

Figure 4.2 illust rat es a sim ilar block diagram for a hypot het ical em bedded device. This diagram cont ains several int erfaces not t ypical in t he PC world such as flash m em ory, LCD, t ouch screen, and cellular m odem .

Figu r e 4 .2 . H a r dw a r e block dia gr a m of a n e m be dde d syst e m .

[ View full size im age]

Nat urally, t he capabilit y t o access peripheral devices is a crucial part of a syst em 's funct ioning. Device drivers provide t he engine t o achieve t his. The rest of t he chapt ers in t his book will zoom in on a device int erface and t each you how t o im plem ent t he corresponding device driver.

I n t e r r u pt H a n dlin g Because of t he indet erm inat e nat ure of I / O, and speed m ism at ches bet ween I / O devices and t he processor, devices request t he processor's at t ent ion by assert ing cert ain hardware signals asynchronously. These hardware signals are called int er r upt s. Each int errupt ing device is assigned an associat ed ident ifier called an int errupt r equest ( I RQ) num ber. When t he processor det ect s t hat an int errupt has been generat ed on an I RQ, it abrupt ly st ops what it 's doing and invokes an int errupt service rout ine ( I SR) regist ered for t he corresponding I RQ. I nt errupt handlers ( I SRs) execut e in int errupt cont ext .

I n t e r r u pt Con t e x t I SRs are crit ical pieces of code t hat direct ly converse wit h t he hardware. They are given t he privilege of inst ant execut ion in t he larger int erest of syst em perform ance. However, if I SRs are not quick and light weight , t hey cont radict t heir own philosophy. VI Ps are given preferent ial t reat m ent , but it 's incum bent on t hem t o m inim ize t he result ing inconvenience t o t he public. To com pensat e for rudely int errupt ing t he current t hread of execut ion, I SRs have t o polit ely execut e in a rest rict ed environm ent called int errupt cont ext ( or at om ic cont ext ) . Here is a list of do's and don't s for code execut ing in int errupt cont ext :

1 . I t 's a j ailable offense if your int errupt cont ext code goes t o sleep. I nt errupt handlers cannot relinquish t he processor by calling sleepy funct ions such as schedule_timeout(). Before invoking a kernel API from your int errupt handler, penet rat e t he nest ed invocat ion t rain and ensure t hat it does not int ernally t rigger a blocking wait . For exam ple, input_register_device() looks harm less from t he surface, but t osses a call t o kmalloc() under t he hood specifying GFP_KERNEL as an argum ent . As you saw in Chapt er 2 , " A Peek I nside t he Kernel," if your syst em 's free m em ory dips below a wat erm ark, kmalloc() sleep- wait s for m em ory t o get freed up by t he swapper, if you invoke it in t his m anner.

2 . For prot ect ing crit ical sect ions inside int errupt handlers, you can't use m ut exes because t hey m ay go t o sleep. Use spinlocks inst ead, and use t hem only if you m ust .

3 . I nt errupt handlers cannot direct ly exchange dat a wit h user space because t hey are not connect ed t o user land via process cont ext s. This brings us t o anot her reason why int errupt handlers cannot sleep: The scheduler works at t he granularit y of processes, so if int errupt handlers sleep and are scheduled out , how can t hey be put back int o t he run queue?

4 . I nt errupt handlers are supposed t o get out of t he way quickly but are expect ed t o get t he j ob done. To circum vent t his Cat ch- 22, int errupt handlers usually split t heir work int o t wo. The slim t op half of t he handler flags an acknowledgm ent claim ing t hat it has serviced t he int errupt but , in realit y, offloads all t he hard work t o a fat bot t om half. Execut ion of t he bot t om half is deferred t o a lat er point in t im e when all int errupt s are enabled. You will learn t o develop bot t om halves while discussing soft irqs and t asklet s lat er.

5 . You need not design int errupt handlers t o be reent rant . When an int errupt handler is running, t he corresponding I RQ is disabled unt il t he handler ret urns. So, unlike process cont ext code, different inst ances of t he sam e handler will not run sim ult aneously on m ult iple processors.

6 . I nt errupt handlers can be int errupt ed by handlers associat ed wit h I RQs t hat have higher priorit y. You can prevent t his nest ed int errupt ion by specifically request ing t he kernel t o t reat your int errupt handler as a

fast handler. Fast handlers run wit h all int errupt s disabled on t he local processor. Before disabling int errupt s or labeling your int errupt handler as fast , be aware t hat int errupt - off t im es are bad for syst em perform ance. More t he int errupt - off t im es, m ore is t he int errupt lat ency, or t he delay before a generat ed int errupt is serviced. I nt errupt lat ency is inversely proport ional t o t he real t im e responsiveness of t he syst em .

A funct ion can check t he value ret urned by in_interrupt() t o find out whet her it 's execut ing in int errupt cont ext . Unlike asynchronous int errupt s generat ed by ext ernal hardware, t here are classes of int errupt s t hat arrive synchronously. Synchronous int errupt s are so called because t hey don't occur unexpect edly—t he processor it self generat es t hem by execut ing an inst ruct ion. Bot h ext ernal and synchronous int errupt s are handled by t he kernel using ident ical m echanism s. Exam ples of synchronous int errupt s include t he following:

Except ions, which are used t o report grave runt im e errors

Soft ware int errupt s such as t he int 0x80 inst ruct ion used t o im plem ent syst em calls on t he x86 archit ect ure

Assign in g I RQs Device drivers have t o connect t heir I RQ num ber t o an int errupt handler. For t his, t hey need t o know t he I RQ assigned t o t he device t hey're driving. I RQ assignm ent s can be st raight forward or m ay require com plex probing. I n t he PC archit ect ure, for exam ple, t im er int errupt s are assigned I RQ 0, and RTC int errupt s answer t o I RQ 8. Modern bus t echnologies such as PCI are sophist icat ed enough t o respond t o queries regarding t heir I RQs ( assigned by t he BI OS when it walks t he bus during boot ) . PCI drivers can poke int o earm arked regions in t he device's configurat ion space and figure out t he I RQ. For older devices such as I ndust ries St andard Archit ect ure ( I SA) - based cards, t he driver m ight have t o leverage hardware- specific knowledge t o probe and decipher t he I RQ. Take a look at / proc/ int errupt s for a list of act ive I RQs on your syst em .

D e vice Ex a m ple : Rolle r W h e e l Now t hat you have learned t he basics of int errupt handling, let 's im plem ent an int errupt handler for an exam ple roller wheel device. Roller wheels can be found on som e phones and PDAs for easy m enu navigat ion and are capable of t hree m ovem ent s: clockwise rot at ion, ant iclockwise rot at ion, and key- press. Our im aginary roller wheel is wired such t hat any of t hese m ovem ent s int errupt t he processor on I RQ 7. Three low order bit s of General Purpose I / O ( GPI O) Port D of t he processor are connect ed t o t he roller device. The waveform s generat ed on t hese pins corresponding t o different wheel m ovem ent s are shown in Figure 4.3. The j ob of t he int errupt handler is t o decipher t he wheel m ovem ent s by looking at t he Port D GPI O dat a regist er.

Figu r e 4 .3 . Sa m ple w a ve for m s ge n e r a t e d by t h e r olle r w h e e l.

The driver has t o first request t he I RQ and associat e an int errupt handler wit h it : #define ROLLER_IRQ 7 static irqreturn_t roller_interrupt(int irq, void *dev_id);

if (request_irq(ROLLER_IRQ, roller_interrupt, IRQF_DISABLED | IRQF_TRIGGER_RISING, "roll", NULL)) { printk(KERN_ERR "Roll: Can't register IRQ %d\n", ROLLER_IRQ); return -EIO; }

Let 's look at t he argum ent s passed t o request_irq(). The I RQ num ber is not queried or probed but hard- coded t o ROLLER_IRQ in t his sim ple case as per t he hardware connect ion. The second argum ent , roller_interrupt(), is t he int errupt handler rout ine. I t s prot ot ype specifies a ret urn t ype of irqreturn_t, which can be IRQ_HANDLED if t he int errupt is handled successfully or IRQ_NONE if it isn't . The ret urn value assum es m ore significance for I / O t echnologies such as PCI , where m ult iple devices can share t he sam e I RQ. The IRQF_DISABLED flag specifies t hat t his int errupt handler has t o be t reat ed as a fast handler, so t he kernel

has t o disable int errupt s while invoking t he handler. IRQF_TRIGGER_RISING announces t hat t he roller wheel generat es a rising edge on t he int errupt line when it want s t o signal an int errupt . I n ot her words, t he roller wheel is an edge- sensit ive device. Som e devices are inst ead level- sensit ive and keep t he int errupt line assert ed unt il t he CPU services it . To flag an int errupt as level- sensit ive, use t he IRQF_TRIGGER_HIGH flag. Ot her possible values for t his argum ent include IRQF_SAMPLE_RANDOM ( used in t he sect ion, " Pseudo Char Drivers" in Chapt er 5 , " Charact er Drivers" ) and IRQF_SHARED ( used t o specify t hat t his I RQ is shared am ong m ult iple devices) . The next argum ent , "roll", is used t o ident ify t his device in dat a generat ed by files such as / proc/ int errupt s. The final param et er, set t o NULL in t his case, is relevant only for shared int errupt handlers and is used t o ident ify each device sharing t he I RQ line.

St art ing wit h t he 2.6.19 kernel, t here have been som e changes t o t he int errupt handler int erface. I nt errupt handlers used t o t ake a t hird argum ent ( struct pt_regs *) t hat cont ained a point er t o CPU regist ers, but t his has been rem oved st art ing wit h t he 2.6.19 kernel. Also, t he IRQF_xxx fam ily of int errupt flags replaced t he SA_xxx fam ily. For exam ple, wit h earlier kernels, you had t o use SA_INTERRUPT rat her t han IRQF_DISABLED t o m ark an int errupt handler as fast .

Driver init ializat ion is not a good place for request ing an I RQ because t hat can hog t hat valuable resource even when t he device is not in use. So, device drivers usually request t he I RQ when t he device is opened by an applicat ion. Sim ilarly, t he I RQ is freed when t he applicat ion closes t he device and not while exit ing t he driver m odule. Freeing an I RQ is done as follows: free_irq(int irq, void *dev_id);

List ing 4.1 shows t he im plem ent at ion of t he roller int errupt handler. roller_interrupt() t akes t wo argum ent s: t he I RQ and t he device ident ifier passed as t he final argum ent t o t he associat ed request_irq(). Look at Figure 4.3 side by side wit h t his list ing. List in g 4 .1 . Th e Rolle r I n t e r r u pt H a n dle r Code View: spinlock_t roller_lock = SPIN_LOCK_UNLOCKED; static DECLARE_WAIT_QUEUE_HEAD(roller_poll); static irqreturn_t roller_interrupt(int irq, void *dev_id) { int i, PA_t, PA_delta_t, movement = 0; /* Get the waveforms from bits 0, 1 and 2 of Port D as shown in Figure 4.3 */ PA_t = PORTD & 0x07; /* Wait until the state of the pins change. (Add some timeout to the loop) */ for (i=0; (PA_t==PA_delta_t); i++){ PA_delta_t = PORTD & 0x07; } movement = determine_movement(PA_t, PA_delta_t); /* See below */

spin_lock(&roller_lock); /* Store the wheel movement in a buffer for later access by the read()/poll() entry points */ store_movements(movement); spin_unlock(&roller_lock); /* Wake up the poll entry point that might have gone to sleep, waiting for a wheel movement */ wake_up_interruptible(&roller_poll); return IRQ_HANDLED; } int determine_movement(int PA_t, int PA_delta_t) { switch (PA_t){ case 0: switch (PA_delta_t){ case 1: movement = ANTICLOCKWISE; break; case 2: movement = CLOCKWISE; break; case 4: movement = KEYPRESSED; break; } break; case 1: switch (PA_delta_t){ case 3: movement = ANTICLOCKWISE; break; case 0: movement = CLOCKWISE; break; } break; case 2: switch (PA_delta_t){ case 0: movement = ANTICLOCKWISE; break; case 3: movement = CLOCKWISE; break; } break; case 3: switch (PA_delta_t){ case 2: movement = ANTICLOCKWISE; break; case 1: movement = CLOCKWISE; break;

} case 4: movement = KEYPRESSED; break; } }

Driver ent ry point s such as read() and poll() operat e in t andem wit h roller_interrupt(). For exam ple, when t he handler deciphers wheel m ovem ent , it wakes up any wait ing poll() t hreads t hat m ay have gone t o sleep in response t o a select() syst em call issued by an applicat ion such as X Windows. Revisit List ing 4.1 and im plem ent t he com plet e roller driver aft er learning t he int ernals of charact er drivers in Chapt er 5 . List ing 7.3 in Chapt er 7 , " I nput Drivers," t akes advant age of t he kernel's input int erface t o convert t his roller wheel int o a roller m ouse. Let 's end t his sect ion by int roducing som e funct ions t hat enable and disable int errupt s on a part icular I RQ. enable_irq(ROLLER_IRQ) enables int errupt generat ion when t he roller wheel m oves, while disable_irq(ROLLER_IRQ) does t he reverse. disable_irq_nosync(ROLLER_IRQ) disables roller int errupt s but does not wait for any current ly execut ing inst ance of roller_interrupt() t o ret urn. This nosync flavor of disable_irq() is fast er but can pot ent ially cause race condit ions. Use t his only when you know t hat t here can be no races. An exam ple user of disable_irq_nosync() is drivers/ ide/ ide- io.c, which blocks int errupt s during init ializat ion, because som e syst em s have t rouble wit h t hat .

Soft ir qs a n d Ta sk le t s As discussed previously, int errupt handlers have t wo conflict ing requirem ent s: They are responsible for t he bulk of device dat a processing, but t hey have t o exit as fast as possible. To bail out of t his sit uat ion, int errupt handlers are designed in t wo part s: a hurried and harried t op half t hat int eract s wit h t he hardware, and a relaxed bot t om half t hat does m ost of t he processing wit h all int errupt s enabled. Unlike int errupt s, bot t om halves are synchronous because t he kernel decides when t o execut e t hem . The following m echanism s are available in t he kernel t o defer work t o a bot t om half: soft irqs, t asklet s, and work queues. Soft irqs are t he basic bot t om half m echanism and have st rong locking requirem ent s. They are used only by a few perform ance- sensit ive subsyst em s such as t he net working layer, SCSI layer, and kernel t im ers. Tasklet s are built on t op of soft irqs and are easier t o use. I t 's recom m ended t o use t asklet s unless you have crucial scalabilit y or speed requirem ent s. A prim ary difference bet ween a soft irq and a t asklet is t hat t he form er is reent rant whereas t he lat t er isn't . Different inst ances of a soft irq can run sim ult aneously on different processors, but t hat is not t he case wit h t asklet s. To illust rat e t he usage of soft irqs and t asklet s, assum e t hat t he roller wheel in t he previous exam ple has inherent hardware problem s due t o t he presence of m oving part s ( say, t he wheel get s st uck occasionally) result ing in t he generat ion of out - of- spec waveform s. A st uck wheel can cont inuously generat e spurious int errupt s and pot ent ially freeze t he syst em . To get around t his problem , capt ure t he wave st ream , run som e analysis on it , and dynam ically swit ch from int errupt m ode t o a polled m ode if t he wheel looks st uck, and vice versa if it 's unst uck. Capt ure t he wave st ream from t he int errupt handler and perform t he analysis from a bot t om half. List ing 4.2 im plem ent s t his using soft irqs, and List ing 4.3 uses t asklet s. Bot h are sim plified variant s of List ing 4.1. This reduces t he handler t o t wo funct ions: roller_capture() t hat obt ains a wave snippet from GPI O Port D, and roller_analyze() t hat runs an algorit hm ic analysis on t he wave and swit ches t o polled m ode if required. List in g 4 .2 . Usin g Soft ir qs t o Offloa d W or k fr om I n t e r r u pt H a n dle r s

Code View: void __init roller_init() { /* ... */ /* Open the softirq. Add an entry for ROLLER_SOFT_IRQ in the enum list in include/linux/interrupt.h */ open_softirq(ROLLER_SOFT_IRQ, roller_analyze, NULL); }

/* The bottom half */ void roller_analyze() { /* Analyze the waveforms and switch to polled mode if required */ } /* The interrupt handler */ static irqreturn_t roller_interrupt(int irq, void *dev_id) { /* Capture the wave stream */ roller_capture(); /* Mark softirq as pending */ raise_softirq(ROLLER_SOFT_IRQ); return IRQ_HANDLED; }

To define a soft irq, you have t o st at ically add an ent ry t o include/ linux/ int errupt .h. You can't define one dynam ically. raise_softirq() announces t hat t he corresponding soft irq is pending execut ion. The kernel will execut e it at t he next available opport unit y. This can be during exit from an int errupt handler or via t he ksoft irqd kernel t hread. List in g 4 .3 . Usin g Ta sk le t s t o Offloa d W or k fr om I n t e r r u pt H a n dle r s

Code View: struct roller_device_struct { /* Device-specific structure */ /* ... */ struct tasklet_struct tsklt; /* ... */ } void __init roller_init() { struct roller_device_struct *dev_struct; /* ... */ /* Initialize tasklet */ tasklet_init(&dev_struct->tsklt, roller_analyze, dev); }

/* The bottom half */ void roller_analyze() { /* Analyze the waveforms and switch to polled mode if required */ } /* The interrupt handler */ static irqreturn_t roller_interrupt(int irq, void *dev_id) { struct roller_device_struct *dev_struct; /* Capture the wave stream */ roller_capture(); /* Mark tasklet as pending */ tasklet_schedule(&dev_struct->tsklt); return IRQ_HANDLED; }

tasklet_init() dynam ically init ializes a t asklet . The funct ion does not allocat e m em ory for a tasklet_struct, rat her you have t o pass t he address of an allocat ed one. tasklet_schedule() announces t hat t he corresponding t asklet is pending execut ion. Like for int errupt s, t he kernel offers a bunch of funct ions t o cont rol t he execut ion st at e of t asklet s on syst em s having m ult iple processors:

tasklet_enable() enables t asklet s.

tasklet_disable() disables t asklet s and wait s unt il any current ly execut ing t asklet inst ance has exit ed.

tasklet_disable_nosync() has sem ant ics sim ilar t o disable_irq_nosync(). The funct ion does not wait for act ive inst ances of t he t asklet t o finish execut ion.

You have seen t he differences bet ween int errupt handlers and bot t om halves, but t here are a few sim ilarit ies, t oo. I nt errupt handlers and t asklet s are bot h not reent rant . And neit her of t hem can go t o sleep. Also, int errupt handlers, t asklet s, and soft irqs cannot be preem pt ed. Work queues are a t hird way t o defer work from int errupt handlers. They execut e in process cont ext and are allowed t o sleep, so t hey can use drowsy funct ions such as m ut exes. We discussed work queues in t he preceding chapt er when we looked at various kernel helper facilit ies. Table 4.1 com pares soft irqs, t asklet s, and work queues.

Ta ble 4 .1 . Com pa r in g Soft ir qs, Ta sk le t s, a n d W or k Qu e u e s Soft ir qs

Ta sk le t s

W or k Qu e u e s

Ex e cu t ion con t e x t

Deferred work runs in int errupt cont ext .

Deferred work runs in int errupt cont ext .

Deferred work runs in process cont ext .

Re e n t r a n cy

Can run sim ult aneously on different CPUs.

Cannot run Can run sim ult aneously sim ult aneously on on different CPUs. different CPUs. Different CPUs can run different t asklet s, however.

Sle e p se m a n t ics

Cannot go t o sleep.

Cannot go t o sleep.

May go t o sleep.

Pr e e m pt ion

Cannot be preem pt ed/ scheduled.

Cannot be preem pt ed/ scheduled.

May be preem pt ed/ scheduled.

Ea se of u se

Not easy t o use.

Easy t o use.

Easy t o use.

W h e n t o u se

I f deferred work will not go t o sleep and if you have crucial scalabilit y or speed requirem ent s.

I f deferred work will not go t o sleep.

I f deferred work m ay go t o sleep.

There is an ongoing debat e in LKML on t he feasibilit y of get t ing rid of t he t asklet int erface. Tasklet s enj oy m ore priorit y t han process cont ext code, so t hey present lat ency problem s. Moreover, as you learned, t hey are const rained not t o sleep and t o execut e on t he sam e CPU. I t 's being suggest ed t hat all exist ing t asklet s be convert ed t o soft irqs or work queues on a case- by- case basis. The –rt pat ch- set alluded t o in Chapt er 2 m oves int errupt handling t o kernel t hreads t o achieve wider preem pt ion coverage.

Th e Lin u x D e vice M ode l The new Linux device m odel int roduces C+ + - like abst ract ions t hat fact or out com m onalit ies from device drivers int o bus and core layers. Let 's look at t he different com ponent s const it ut ing t he device m odel such as udev, sysfs, kobj ect s, and device classes and t heir effect s on key kernel subsyst em s such as / dev node m anagem ent , hot plug, firm ware download, and m odule aut oload. Udev is t he best vant age point t o view t he benefit s of t he device m odel, so let 's st art from t here.

Ude v Years ago when Linux was young, it was not fun t o adm inist er device nodes. All t he needed nodes ( which could run int o t housands) had t o be st at ically creat ed under t he / dev direct ory. This problem , in fact , dat ed all t he way back t o original UNI X syst em s. Wit h t he advent of t he 2.4 kernels cam e devfs, which int roduced dynam ic device node creat ion. Devfs provided services t o generat e device nodes in an in- m em ory filesyst em , but t he onus of nam ing t he nodes st ill rest ed wit h device drivers. Device nam ing policy is adm inist rat ive and does not m ix well wit h t he kernel, however. The place for policy is in header files, kernel m odule param et ers, or user space. Udev arrived on t he scene t o push device m anagem ent t o user space. Udev depends on t he following t o do it s work:

1 . Kernel sysfs support , which is an im port ant part of t he Linux device m odel. Sysfs is an in- m em ory filesyst em m ount ed under / sys at boot t im e ( look at / et c/ fst ab for t he specifier) . We will look at sysfs in t he next sect ion, but for now, t ake t he corresponding sysfs file accesses for grant ed.

2 . A set of user- space daem ons and ut ilit ies such as udevd and udevinfo.

3 . User- specified rules locat ed in t he / et c/ udev/ rules.d/ direct ory. You m ay fram e rules t o get a consist ent view of your devices.

To underst and how t o use udev, let 's look at an exam ple. Assum e t hat you have a USB DVD drive and a USB CD- RW drive. Depending on t he order in which you hot plug t hese devices, one of t hem is assigned t he nam e / dev/ sr0 , and t he ot her get s t he nam e / dev/ sr1. During pre- udev days, you had t o figure out t he associat ed nam es before you could use t he devices. But wit h udev, you can consist ent ly view t he DVD ( as say, / dev/ usbdvd) and t he CD- RW ( as say, / dev/ usbcdrw ) irrespect ive of t he order in which t hey are plugged in or out . First , pull product at t ribut es from corresponding files in sysfs. Assum e t hat t he ( Targus) DVD drive has been assigned t he device node / dev/ sr0 and t hat t he ( Addonics) CD- RW drive has been given t he nam e / dev/ sr1 . Use udevinfo t o collect device inform at ion: Code View: bash> udevinfo -a -p /sys/block/sr0 ... looking at the device chain at '/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-4': BUS=»usb» ID=»1-4» SYSFS{bConfigurationValue}=»1» ...

SYSFS{idProduct}=»0701» SYSFS{idVendor}=»05e3» SYSFS{manufacturer}=»Genesyslogic» SYSFS{maxchild}=»0» SYSFS{product}=»USB Mass Storage Device» ... bash> udevinfo -a -p /sys/block/sr1 ... looking at the device chain at '/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-3': BUS=»usb» ID=»1-3» SYSFS{bConfigurationValue}=»2» ... SYSFS{idProduct}=»0302» SYSFS{idVendor}=»0dbf» SYSFS{manufacturer}=»Addonics» SYSFS{maxchild}=»0» SYSFS{product}=»USB to IDE Cable» ...

Next , let 's use t he product inform at ion gleaned t o ident ify t he devices and add udev nam ing rules. Creat e a file called / et c/ udev/ rules.d/ 40- cdvd.rules and add t he following rules t o it : BUS="usb", SYSFS{idProduct}="0701", SYSFS{idVendor}="05e3", KERNEL="sr[0-9]*", NAME="%k", SYMLINK="usbdvd" BUS="usb", SYSFS{idProduct}="0302", SYSFS{idVendor}="0dbf", KERNEL="sr[0-9]*", NAME="%k", SYMLINK="usbcdrw"

The first rule t ells udev t hat whenever it finds a USB device wit h a product I D of 0x0701, vendor I D of 0x05e3, and a nam e st art ing wit h sr, it should creat e a node of t he sam e nam e under / dev and produce a sym bolic link nam ed usbdvd t o t he creat ed node. Sim ilarly, t he second rule orders creat ion of a sym bolic link nam ed usbcdrw for t he CD- RW drive. To t est for synt ax errors in your rules, run udevt est on / sys/ block/ sr* . To t urn on verbose m essages in / var/ log/ m essages, set udev_log t o " yes" in / et c/ udev/ udev.conf. To repopulat e t he / dev direct ory wit h newly added rules on- t he- fly, rest art udev using udevst art . When t his is done, your DVD drive consist ent ly appears t o t he syst em as / dev/ usbdvd, and your CD- RW drive always appears as / dev/ usbcdrw. You can det erm inist ically m ount t hem from shell script s using com m ands such as t hese: mount /dev/usbdvd /mnt/dvd

Consist ent nam ing of device nodes ( and net work int erfaces) is not t he sole capabilit y of udev. I t has m et am orphed int o t he Linux hot plug m anager, t oo. Udev is also in charge of aut om at ically loading m odules on dem and and downloading m icrocode ont o devices t hat need t hem . But before digging int o t hose capabilit ies, let 's obt ain a basic underst anding of t he innards of t he device m odel.

Sysfs, Kobj e ct s, a n d D e vice Cla sse s

Sysfs, kobj ect s, and device classes are t he building blocks of t he device m odel but are publicit y shy and prefer t o rem ain behind t he scenes. They are m ost ly in t he usage dom ain of bus and core im plem ent at ions, and hide inside API s t hat provide services t o device drivers. Sysfs is t he user- space m anifest at ion of t he kernel's st ruct ured device m odel. I t 's sim ilar t o procfs in t hat bot h are in- m em ory filesyst em s cont aining inform at ion about kernel dat a st ruct ures. Whereas procfs is a generic window int o kernel int ernals, sysfs is specific t o t he device m odel. Sysfs is, hence, not a replacem ent for procfs. I nform at ion such as process descript ors and sysct l param et ers belong t o procfs and not sysfs. As will be apparent soon, udev depends on sysfs for m ost of it s ext ended funct ions. Kobj ect s int roduce an encapsulat ion of com m on obj ect propert ies such as usage reference count s. They are usually em bedded wit hin larger st ruct ures. The following are t he m ain fields of a kobj ect , which is defined in include/ linux/ kobj ect .h:

1 . A kref obj ect t hat perform s reference count m anagem ent . The kref_init() int erface init ializes a kref, kref_get() increm ent s t he reference count associat ed wit h t he kref, and kref_put() decrem ent s t he reference count and frees t he obj ect if t here are no rem aining references. The URB st ruct ure ( explained in Chapt er 11, " Universal Serial Bus" ) , for exam ple, cont ains a kref t o t rack t he num ber of references t o it . [ 2 ]

[ 2]

The usb_alloc_urb() int erface calls kref_init(), usb_submit_urb() invokes kref_get(), and usb_free_urb() calls kref_put().

2 . A point er t o a kset , which is an obj ect set t o which t he kobj ect belongs.

3 . A kobj _t ype, which is an obj ect t ype t hat describes t he kobj ect .

Kobj ect s are int ert wined wit h sysfs. Every kobj ect inst ant iat ed wit hin t he kernel has a sysfs represent at ion. The concept of device classes is anot her feat ure of t he device m odel and is an int erface you're m ore likely t o use in a driver. The class int erface abst ract s t he idea t hat each device falls under a broader class ( or cat egory) of devices. A USB m ouse, a PS/ 2 keyboard, and a j oyst ick all fall under t he input class and own ent ries under / sys/ class/ input / . Figure 4.4 shows t he sysfs hierarchy on a lapt op t hat has an ext ernal USB m ouse connect ed t o it . The t op- level bus, class, and dev ice direct ories are expanded t o show t hat sysfs provides a view of t he USB m ouse based on it s device t ype as well as it s physical connect ion. The m ouse is an input class device but is physically a USB device answering t o t wo endpoint addresses, a cont rol endpoint ep00, and an int errupt endpoint , ep81. The USB port in quest ion belongs t o t he USB host cont roller on bus 2, and t he USB host cont roller it self is bridged t o t he CPU via t he PCI bus. I f t hese det ails are not m aking m uch sense at t his point , don't worry; rewind t o t his sect ion aft er reading t he chapt ers t hat t each input drivers (Chapt er 7 ) , PCI drivers ( Chapt er 10, " Peripheral Com ponent I nt erconnect " ) , and USB drivers ( Chapt er 11) . Figu r e 4 .4 . Sysfs h ie r a r ch y of a USB m ou se .

Code View: [/sys] +[block] -[bus]—[usb]—[devices]—[usb2]—[2-2]—[2-2:1.0]-[usbendpoint:usbdev2.2-ep81] -[class]-[input]—[mouse2]—[device]—[bus]—[usbendpoint:usbdev2.2-ep81] -[usb_device]—[usbdev2.2]—[device]—[bus] -[usb_endpoint]—[usbdev2.2-ep00]—[device] —[usbdev2.2-ep81]—[device] -[devices]—[pci0000:00]—[0000:00:1d:1]—[usb2]—[2-2]—[2-2:1.0] +[firmware] +[fs] +[kernel] +[module] +[power]

Browse t hrough / sys looking for ent ries t hat associat e wit h anot her device ( for exam ple, your net work card) t o get a bet t er feel of it s hierarchical organizat ion. The sect ion "Addressing and I dent ificat ion" in Chapt er 10 illust rat es how sysfs m irrors t he physical connect ion of a CardBus Et hernet - Modem card on a lapt op. The class program m ing int erface is built on t op of kobj ect s and sysfs, so it 's a good place t o st art digging t o underst and t he end- t o- end int eract ions bet ween t he com ponent s of t he device m odel. Let 's t urn t o t he RTC driver for an exam ple. The RTC driver (drivers/ char/ rt c.c) is a m iscellaneous ( or " m isc" ) driver. We discuss m isc drivers in det ail when we look at charact er device drivers in Chapt er 5 . I nsert t he RTC driver m odule and look at t he nodes creat ed under / sys and / dev: bash> modprobe rtc bash> ls -lR /sys/class/misc drwr-xr-x 2 root root 0 Jan 15 01:23 rtc /sys/class/misc/rtc: total 0 -r--r--r-- 1 root root 4096 Jan 15 01:23 dev --w------- 1 root root 4096 Jan 15 01:23 uevent bash> ls -l /dev/rtc crw-r--r-- 1 root root 10, 135 Jan 15 01:23 /dev/rtc

/ sys/ class/ m isc/ rt c/ dev cont ains t he m aj or and m inor num bers ( discussed in t he next chapt er) assigned t o t his device, / sys/ class/ m isc/ rt c/ uevent is used for coldplugging ( discussed in t he next sect ion) , and / dev/ rt c is used by applicat ions t o access t he RTC driver. Let 's underst and t he code flow t hrough t he device m odel. Misc drivers ut ilize t he services of misc_register() during init ializat ion, which looks like t his if you peel off som e code: /* ... */ dev = MKDEV(MISC_MAJOR, misc->minor); misc->class = class_device_create(misc_class, NULL, dev, misc->dev, "%s", misc->name); if (IS_ERR(misc->class)) { err = PTR_ERR(misc->class); goto out;

} /* ... */

Figure 4.5 cont inues t o peel off m ore layers t o get t o t he bot t om of t he device m odeling. I t illust rat es t he t ransit ions t hat ripple t hrough classes, kobj ect s, sysfs, and udev, which result in t he generat ion of t he / sys and / dev files list ed previously.

Figu r e 4 .5 . Tyin g t h e pie ce s of t h e de vice m ode l.

[ View full size im age]

Look at t he parallel port LED driver ( List ing 5.6 in t he sect ion "Talking t o t he Parallel Port " in Chapt er 5 ) and t he virt ual m ouse input driver ( List ing 7.2 in t he sect ion "Device Exam ple: Virt ual Mouse" in Chapt er 7 ) for exam ples on creat ing device cont rol files inside sysfs. Anot her abst ract ion t hat is part of t he device m odel is t he bus- device- driver program m ing int erface. Kernel device support is cleanly st ruct ured int o buses, devices, and drivers. This renders t he individual driver im plem ent at ions sim pler and m ore general. Bus im plem ent at ions can, for exam ple, search for drivers t hat can handle a part icular device. Consider t he kernel's I 2 C subsyst em ( explored in Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol" ) . The I 2 C layer consist s of a core infrast ruct ure, device drivers for bus adapt ers, and drivers for client devices. The I 2 C

core layer regist ers each det ect ed I 2 C bus adapt er using bus_register(). When an I 2 C client device ( say, an Elect rically Erasable Program m able Read- Only Mem ory [ EEPROM] chip) is probed and det ect ed, it s exist ence is recorded via device_register(). Finally, t he I 2 C EEPROM client driver regist ers it self using driver_register(). These regist rat ions are perform ed indirect ly using service funct ions offered by t he I 2 C core. bus_register() adds a corresponding ent ry t o / sys/ bus/ , while device_register() adds ent ries under / sys/ devices/ . struct bus_type, struct device, and struct device_driver are t he m ain dat a st ruct ures used respect ively by buses, devices, and drivers. Take a peek inside include/ linux/ device.h for t heir definit ions.

H ot plu g a n d Coldplu g Devices connect ed t o a running syst em on- t he- fly are said t o be hot plugged , whereas t hose connect ed prior t o syst em boot are considered t o be coldplugged. Earlier, t he kernel used t o not ify user space about hot plug event s by invoking a helper program regist ered via t he / proc filesyst em . But when current kernels det ect hot plug, t hey dispat ch uevent s t o user space via net link socket s. Net link socket s are an efficient m echanism t o com m unicat e bet ween kernel space and user space using socket API s. At t he user- space end, udevd, t he daem on t hat m anages device node creat ion and rem oval, receives t he uevent s and m anages hot plug.

To see how hot plug handling has evolved recent ly, let 's consider progressive levels of udev running different versions of t he 2.6 kernel:

1 . Wit h a udev- 039 package and a 2.6.9 kernel, when t he kernel det ect s a hot plug event , it invokes t he user space helper regist ered wit h / proc/ sys/ kernel/ hot plug. This default s t o / sbin/ hot plug, which receives at t ribut es of t he hot plugged device in it s environm ent . / sbin/ hot plug looks inside t he hot plug configurat ion direct ory ( usually / et c/ hot plug.d/ default / ) and runs, for exam ple, / et c/ hot plug.d/ default / 10- udev.hot plug, aft er execut ing ot her script s under / et c/ hot plug/ . bash> ls -l /etc/hotplug.d/default/ ... lrwcrwxrwx 1 root root 14 May 11 2005 10-udev.hotplug -> /sbin/udevsend ... When / sbin/ udevsend t hus get s execut ed, it passes t he hot plugged device inform at ion t o udevd.

2 . Wit h udev- 058 and a 2.6.11 kernel, t he st ory changes som ewhat . The udevsend ut ilit y replaces / sbin/ hot plug: bash> cat /proc/sys/kernel/hotplug /sbin/udevsend 3 . Wit h t he lat est levels of udev and t he kernel, udevd assum es full responsibilit y of m anaging hot plug wit hout depending on udevsend. I t now pulls hot plug event s direct ly from t he kernel via net link socket s ( see Figure 4.4) . / proc/ sys/ kernel/ hot plug cont ains not hing: bash> cat /proc/sys/kernel/hotplug bash>

Udev also handles coldplug. Because udev is part of user space and is st art ed only aft er t he kernel boot s, a special m echanism is needed t o em ulat e hot plug event s over coldplugged devices. At boot t im e, t he kernel creat es a file nam ed uevent under sysfs for all devices and em it s coldplug event s t o t hose files. When udev st art s, it reads all t he uevent files from / sys and generat es hot plug uevent s for each coldplugged device.

M icr ocode D ow n loa d You have t o feed m icrocode t o som e devices before t hey can get ready for act ion. The m icrocode get s execut ed by an on- card m icrocont roller. Device drivers used t o st ore m icrocode inside st at ic arrays in header files. But t his has becom e unt enable because m icrocode is usually dist ribut ed as propriet ary binary im ages by device vendors, and t hat doesn't m ix hom ogeneously wit h t he GPL- ed kernel. Anot her reason against m ixing firm ware wit h kernel sources is t hat t hey run on different release t im e lines. The solut ion apparent ly is t o separat ely m aint ain m icrocode in user space and pass it down t o t he kernel when required. Sysfs and udev provide an infrast ruct ure t o achieve t his. Let 's t ake t he exam ple of t he I nt el PRO/ Wireless 2100 WiFi m ini PCI card found on several lapt ops. The card is built around a m icrocont roller t hat needs t o execut e ext ernally supplied m icrocode for norm al operat ion. Let 's walk t hrough t he st eps t hat t he Linux driver follows t o download m icrocode t o t he card. Assum e t hat you have obt ained t he required m icrocode im age ( ipw2100- 1.3.fw ) from ht t p: / / ipw2100.sourceforge.net / firm ware.php and saved it under / lib/ firm ware/ on your syst em and t hat you have insert ed t he driver m odule ipw2100.ko: 1.

During init ializat ion, t he driver invokes t he following:

request_firmware(..,"ipw2100-1.3.fw",..); 2.

This dispat ches a hot plug uevent t o user space, along wit h t he ident it y of t he request ed m icrocode im age.

3.

Udevd receives t he uevent and responds by invoking / sbin/ firm ware_helper. For t his, it uses a rule sim ilar t o t he following from a file under / et c/ udev/ rules.d/ :

ACTION=="add", SUBSYSTEM=="firmware", RUN="/sbin/firmware_helper" 4.

/ sbin/ firm ware_helper looks inside / lib/ firm ware/ and locat es t he request ed m icrocode im age ipw21001.3.fw. I t dum ps t he im age t o / sys/ class/ 0000: 02: 02.0/ dat a. ( 0000: 02: 02 is t he PCI bus: device: funct ion ident ifier of t he WiFi card in t his case.)

5.

The driver receives t he m icrocode and downloads it ont o t he device. When done, it calls release_firmware() t o free t he corresponding dat a st ruct ures.

6.

The driver goes t hrough t he rest of t he init ializat ions and t he WiFi adapt er beacons.

M odu le Au t oloa d Aut om at ically loading kernel m odules on dem and is a convenient feat ure t hat Linux support s. To underst and how t he kernel em it s a " m odule fault " and how udev handles it , let 's insert a Xircom CardBus Et hernet adapt er int o a lapt op's PC Card slot :

1.

During com pile t im e, t he ident it y of support ed devices is generat ed as part of t he driver m odule obj ect . Take a peek at t he driver t hat support s t he Xircom CardBus Et hernet com bo card (drivers/ net / t ulip/ xircom _cb.c) and find t his snippet :

static struct pci_device_id xircom_pci_table[] = { {0x115D, 0x0003, PCI_ANY_ID, PCI_ANY_ID,}, {0,}, }; /* Mark the device table */ MODULE_DEVICE_TABLE(pci, xircom_pci_table); This declares t hat t he driver can support any card having a PCI vendor I D of 0x115D and a PCI device I D of 0x0003 ( m ore on t his in Chapt er 10) . When you inst all t he driver m odule, t he depm od ut ilit y looks inside t he m odule im age and deciphers t he I Ds present in t he device t able. I t t hen adds t he following ent ry t o / lib/ m odules/ kernel- version/ m odules.alias:

alias pci:v0000115Dd00000003sv*sd*bc*sc*i* xircom_cb where v st ands for VendorI D, d for DeviceI D, sv for subvendorI D, and * for wildcard m at ch. 2.

When you hot plug t he Xircom card int o a CardBus slot , t he kernel generat es a uevent t hat announces t he ident it y of t he newly insert ed device. You m ay look at t he generat ed uevent using udevm onit or:

bash> udevmonitor --env ... MODALIAS=pci:v0000115Dd00000003sv0000115Dsd00001181bc02sc00i00 ... 3.

Udevd receives t he uevent via a net link socket and invokes m odprobe wit h t he above MODALI AS t hat t he kernel passed up t o it :

modprobe pci:v0000115Dd00000003sv0000115Dsd00001181bc02sc00i00 4.

Modprobe finds t he m at ching ent ry in / lib/ m odules/ kernel- version/ m odules.alias creat ed during St ep 1, and proceeds t o insert xircom _cb:

bash> lsmod Module Size xircom_cb 10433 ...

Used by 0

The card is now ready t o surf. You m ay want t o revisit t his sect ion aft er reading Chapt er 10.

Udev on Em bedded Devices One school of t hought deprecat es t he use of udev in favor of st at ically creat ed device nodes on em bedded devices for t he following reasons:

Udev creat es / dev nodes during each reboot , com pared t o st at ic nodes t hat are creat ed only once during soft ware inst all. I f your em bedded device uses flash st orage, flash pages t hat hold / dev nodes suffer an erase- writ e cycle on each boot in t he case of t he form er, and t his reduces flash life span. ( Flash m em ory is discussed in det ail in Chapt er 17, " Mem ory Technology Devices." ) You do have t he opt ion of m ount ing / dev over a RAM- based filesyst em , however.

Udev cont ribut es t o increased boot t im e.

Udev feat ures such as dynam ic creat ion of / dev nodes and aut oloading of m odules creat e a degree of indet erm inism t hat som e solut ion designers prefer t o avoid on special- purpose em bedded devices, especially ones t hat do not int eract wit h t he out side world via hot pluggable buses. According t o t his point of view, st at ic node creat ion and boot - t im e insert ion of any m odules provide m ore cont rol over t he syst em and m ake it easier t o t est .

M e m or y Ba r r ie r s Many processors and com pilers reorder inst ruct ions t o achieve opt im al execut ion speeds. The reordering is done such t hat t he new inst ruct ion st ream is sem ant ically equivalent t o t he original one. However, if you are, for exam ple, writ ing t o m em ory m apped regist ers on an I / O device, inst ruct ion reordering can generat e unexpect ed side effect s. To prevent t he processor from reordering inst ruct ions, you can insert a barrier in your code. The wmb() funct ion insert s a road block t hat prevent s writ es from m oving t hrough it , rmb() provides a read barricade t hat disallows reads from crossing it , and mb() result s in a read- writ e barrier. I n addit ion t o t he CPU- t o- hardware int eract ions referred t o previously, m em ory barriers are also relevant for CPU- t o- CPU int eract ions on SMP syst em s. I f your CPU's dat a cache is operat ing in writ e- back m ode ( in which dat a is not copied from cache t o m em ory unt il it 's absolut ely necessary) , you m ight want t o st all t he inst ruct ion st ream unt il t he cache- t o- m em ory queue is drained. This is relevant , for exam ple, when you encount er inst ruct ions t hat acquire or release locks. Barriers are used in t his scenario t o obt ain a consist ent percept ion across CPUs. We revisit m em ory barriers when we discuss PCI drivers in Chapt er 10 and flash m ap drivers in Chapt er 17. I n t he m eanwhile, st op by Docum ent at ion/ m em ory- barriers.t xt for an explanat ion of different kinds of m em ory barriers.

Pow e r M a n a ge m e n t Power m anagem ent is crit ical on devices running on bat t ery, such as lapt ops and handhelds. Linux drivers need t o be aware of power st at es and have t o t ransit ion across st at es in response t o event s such as st andby, sleep, and low bat t ery. Drivers ut ilize power- saving feat ures support ed by t he underlying hardware when t hey swit ch t o m odes t hat consum e less power. For exam ple, t he st orage driver spins down t he disk, whereas t he video driver blanks t he display. Power- aware code in device drivers is only one piece of t he overall power m anagem ent fram ework. Power m anagem ent also feat ures part icipat ion from user space daem ons, ut ilit ies, configurat ion files, and boot firm ware. Two popular power m anagem ent m echanism s are APM ( discussed in t he sect ion, "Prot ect ed Mode Calls" in Appendix B, " Linux and t he BI OS" ) and Advanced Configurat ion and Power I nt erface ( ACPI ) . APM is get t ing obsolet e, and ACPI has em erged as t he de fact o power m anagem ent st rat egy on Linux syst em s. ACPI is furt her discussed in Chapt er 20, " More Devices and Drivers."

Look in g a t t h e Sou r ce s The core int errupt handling code is generic and is in t he kernel/ irq/ direct ory. The archit ect ure- specific port ions can be found in arch/ your- arch/ kernel/ irq.c. The funct ion do_IRQ() defined in t his file is a good place t o st art your j ourney int o t he kernel int errupt handling m echanism . The kernel soft irq and t asklet im plem ent at ions live in kernel/ soft irq.c. This file also cont ains addit ional funct ions t hat offer m ore fine- grained cont rol over soft irqs and t asklet s. Look at include/ linux/ int errupt .h for soft irq vect or enum erat ions and prot ot ypes required t o im plem ent your int errupt handler. For a real- life exam ple of writ ing int errupt handlers and bot t om halves, st art from t he handler t hat is part of drivers/ net / lib8390.c and follow t he t rail int o t he net working st ack. The kobj ect im plem ent at ion and relat ed program m ing int erfaces live in lib/ kobj ect .c and include/ linux/ kobj ect .h. Look at drivers/ base/ sys.c for t he sysfs im plem ent at ion. You will find device class API s in drivers/ base/ class.c. Dispat ching hot plug uevent s via net link socket s is done by lib/ kobj ect _uevent .c. You m ay download udev sources and docum ent at ion from www.kernel.org/ pub/ linux/ ut ils/ kernel/ hot plug/ udev.ht m l. For a fuller underst anding of how APM is im plem ent ed on x86 Linux, look at arch/ x86/ kernel/ apm _32.c, include/ linux/ apm _bios.h , and include/ asm - x86/ m ach- default / apm .h in t he kernel t ree. I f you are curious t o know how APM is im plem ent ed on BI OS- less archit ect ures such as ARM, look at include/ linux/ apm - em ulat ion.h and it s users. The kernel's ACPI im plem ent at ion lives in drivers/ acpi/ . Table 4.2 cont ains a sum m ary of t he m ain dat a st ruct ures used in t his chapt er and t he locat ion of t heir definit ions in t he source t ree. Table 4.3 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 4 .2 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e Loca t ion

D e scr ipt ion

tasklet_struct include/ linux/ int errupt .h

Manages a t asklet , which is a m et hod t o im plem ent bot t om halves

kobject

include/ linux/ kobj ect .h

Encapsulat es com m on propert ies of a kernel obj ect

kset

include/ linux/ kobj ect .h

An obj ect set t o which a kobj ect belongs

kobj_type

include/ linux/ kobj ect .h

An obj ect t ype t hat describes a kobj ect

class

include/ linux/ device.h

Abst ract s t he idea t hat a driver falls under a broader cat egory

bus device device_driver

include/ linux/ device.h

St ruct ures t hat form t he pillars under t he Linux device m odel

Ta ble 4 .3 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

request_irq()

kernel/ irq/ m anage.c

Request s an I RQ and associat es an int errupt handler wit h it

free_irq()

kernel/ irq/ m anage.c

Frees an I RQ

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

disable_irq()

kernel/ irq/ m anage.c

Disables t he int errupt associat ed wit h a supplied I RQ

disable_irq_nosync()

kernel/ irq/ m anage.c

Disables t he int errupt associat ed wit h a supplied I RQ wit hout wait ing for any current ly execut ing inst ances of t he int errupt handler t o ret urn

enable_irq()

kernel/ irq/ m anage.c

Re- enables t he int errupt t hat has been disabled using disable_irq() or disable_irq_nosync()

open_softirq()

kernel/ soft irq.c

Opens a soft irq

raise_softirq()

kernel/ soft irq.c

Marks t he soft irq as pending execut ion

tasklet_init()

kernel/ soft irq.c

Dynam ically init ializes a t asklet

tasklet_schedule()

include/ linux/ int errupt .hkernel/ soft irq.c Marks a t asklet as pending execut ion

tasklet_enable()

include/ linux/ int errupt .h

Enables a t asklet

tasklet_disable()

include/ linux/ int errupt .h

Disables a t asklet

tasklet_disable_nosync()

include/ linux/ int errupt .h

Disables a t asklet wit hout wait ing for act ive inst ances t o finish execut ion

class_device_register()

drivers/ base/ class.c

kobject_add()

lib/ kobj ect .c

sysfs_create_dir()

lib/ kobj ect _uevent .c

Fam ily of funct ions in t he Linux device m odel t hat creat e/ dest roy a class, device class, and associat ed kobj ect s and sysfs files

class_device_create()

fs/ sysfs/ dir.c

class_device_destroy() class_create() class_destroy() class_device_create_file() sysfs_create_file() class_device_add_attrs() kobject_uevent()

fs/ sysfs/ file.c

This finishes our explorat ion of device driver concept s. You m ight want t o dip back int o t his chapt er while developing your driver.

Ch a pt e r 5 . Ch a r a ct e r D r ive r s I n Th is Ch a pt e r 120 Char Driver Basics 121 Device Exam ple: Syst em CMOS 139 Sensing Dat a Availabilit y 145 Talking t o t he Parallel Port 156 RTC Subsyst em 157 Pseudo Char Drivers 160 Misc Drivers 166 Charact er Caveat s 167 Looking at t he Sources

You are now all set t o m ake a foray int o writ ing sim ple, albeit real- world, device drivers. I n t his chapt er, let 's look at t he int ernals of a charact er ( or char) device driver, which is kernel code t hat sequent ially accesses dat a from a device. Char drivers can capt ure raw dat a from several t ypes of devices: print ers, m ice, wat chdogs, t apes, m em ory, RTCs, and so on. They are however, not suit able for m anaging dat a residing on block devices capable of random access such as hard disks, floppies, or com pact discs.

Ch a r D r ive r Ba sics

Let 's st art wit h a t op- down view. To access a char device, a syst em user invokes a suit able applicat ion program . The applicat ion is responsible for t alking t o t he device, but t o do t hat , it needs t o elicit t he ident it y of a suit able driver. The cont act det ails of t he driver are export ed t o user space via t he / dev direct ory: bash> ls -l /dev total 0 crw------1 root ... lrwxrwxrwx 1 root ... brw-rw---1 root brw-rw---1 root ... crw------1 root crw------1 root

root

5,

root

1 Jul 16 10:02 console 3 Oct 6 10:02

cdrom -> hdc

disk disk

3, 3,

0 Oct 6 2007 1 Oct 6 2007

hda hda1

tty tty

4, 4,

1 Oct 6 10:20 2 Oct 6 10:02

tty1 tty2

The first charact er in each line of t he ls out put denot es t he driver t ype: c signifies a char driver, b st ands for a block driver, and l denot es a sym bolic link. The num bers in t he fift h colum n are called m aj or num bers, and t hose in t he sixt h colum n are m inor num bers. A m aj or num ber broadly ident ifies t he driver, whereas a m inor num ber pinpoint s t he exact device serviced by t he driver. For exam ple, t he I DE block st orage driver / dev/ hda owns a m aj or num ber of 3 and is in charge of handling t he hard disk on your syst em , but when you furt her specify a m inor num ber of 1 (/ dev/ hda1 ) , t hat narrows it down t o t he first disk part it ion. Char and block drivers occupy different spaces, so you can have sam e m aj or num ber assigned t o a char as well as a block driver. Let 's t ake a st ep furt her and peek inside a char driver. From a code- flow perspect ive, char drivers have t he following:

An init ializat ion ( or init()) rout ine t hat is responsible for init ializing t he device and seam lessly t ying t he driver t o t he rest of t he kernel via regist rat ion funct ions. A set of ent ry point s ( or m et hods) such as open(), read(), ioctl(), llseek(), and write(), which direct ly correspond t o I / O syst em calls invoked by user applicat ions over t he associat ed / dev node. I nt errupt rout ines, bot t om halves, t im er handlers, helper kernel t hreads, and ot her support infrast ruct ure. These are largely t ransparent t o user applicat ions.

From a dat a- flow perspect ive, char drivers own t he following key dat a st ruct ures:

1 . A per- device st ruct ure. This is t he inform at ion reposit ory around which t he driver revolves.

2 . struct cdev, a kernel abst ract ion for charact er drivers. This st ruct ure is usually em bedded inside t he perdevice st ruct ure referred previously.

3 . struct file_operations, which cont ains t he addresses of all driver ent ry point s.

4 . struct file, which cont ains inform at ion about t he associat ed / dev node.

Ch a pt e r 5 . Ch a r a ct e r D r ive r s I n Th is Ch a pt e r 120 Char Driver Basics 121 Device Exam ple: Syst em CMOS 139 Sensing Dat a Availabilit y 145 Talking t o t he Parallel Port 156 RTC Subsyst em 157 Pseudo Char Drivers 160 Misc Drivers 166 Charact er Caveat s 167 Looking at t he Sources

You are now all set t o m ake a foray int o writ ing sim ple, albeit real- world, device drivers. I n t his chapt er, let 's look at t he int ernals of a charact er ( or char) device driver, which is kernel code t hat sequent ially accesses dat a from a device. Char drivers can capt ure raw dat a from several t ypes of devices: print ers, m ice, wat chdogs, t apes, m em ory, RTCs, and so on. They are however, not suit able for m anaging dat a residing on block devices capable of random access such as hard disks, floppies, or com pact discs.

Ch a r D r ive r Ba sics

Let 's st art wit h a t op- down view. To access a char device, a syst em user invokes a suit able applicat ion program . The applicat ion is responsible for t alking t o t he device, but t o do t hat , it needs t o elicit t he ident it y of a suit able driver. The cont act det ails of t he driver are export ed t o user space via t he / dev direct ory: bash> ls -l /dev total 0 crw------1 root ... lrwxrwxrwx 1 root ... brw-rw---1 root brw-rw---1 root ... crw------1 root crw------1 root

root

5,

root

1 Jul 16 10:02 console 3 Oct 6 10:02

cdrom -> hdc

disk disk

3, 3,

0 Oct 6 2007 1 Oct 6 2007

hda hda1

tty tty

4, 4,

1 Oct 6 10:20 2 Oct 6 10:02

tty1 tty2

The first charact er in each line of t he ls out put denot es t he driver t ype: c signifies a char driver, b st ands for a block driver, and l denot es a sym bolic link. The num bers in t he fift h colum n are called m aj or num bers, and t hose in t he sixt h colum n are m inor num bers. A m aj or num ber broadly ident ifies t he driver, whereas a m inor num ber pinpoint s t he exact device serviced by t he driver. For exam ple, t he I DE block st orage driver / dev/ hda owns a m aj or num ber of 3 and is in charge of handling t he hard disk on your syst em , but when you furt her specify a m inor num ber of 1 (/ dev/ hda1 ) , t hat narrows it down t o t he first disk part it ion. Char and block drivers occupy different spaces, so you can have sam e m aj or num ber assigned t o a char as well as a block driver. Let 's t ake a st ep furt her and peek inside a char driver. From a code- flow perspect ive, char drivers have t he following:

An init ializat ion ( or init()) rout ine t hat is responsible for init ializing t he device and seam lessly t ying t he driver t o t he rest of t he kernel via regist rat ion funct ions. A set of ent ry point s ( or m et hods) such as open(), read(), ioctl(), llseek(), and write(), which direct ly correspond t o I / O syst em calls invoked by user applicat ions over t he associat ed / dev node. I nt errupt rout ines, bot t om halves, t im er handlers, helper kernel t hreads, and ot her support infrast ruct ure. These are largely t ransparent t o user applicat ions.

From a dat a- flow perspect ive, char drivers own t he following key dat a st ruct ures:

1 . A per- device st ruct ure. This is t he inform at ion reposit ory around which t he driver revolves.

2 . struct cdev, a kernel abst ract ion for charact er drivers. This st ruct ure is usually em bedded inside t he perdevice st ruct ure referred previously.

3 . struct file_operations, which cont ains t he addresses of all driver ent ry point s.

4 . struct file, which cont ains inform at ion about t he associat ed / dev node.

D e vice Ex a m ple : Syst e m CM OS Let 's im plem ent a char driver t o access t he syst em CMOS. The BI OS on PC- com pat ible hardware ( see Figure 5.1 ) uses t he CMOS t o st ore inform at ion such as st art up opt ions, boot order, and t he syst em dat e, which you can configure via t he BI OS set up m enu. Our exam ple CMOS driver let s you access t he t wo PC CMOS banks as t hough t hey are regular files. Applicat ions can operat e on / dev/ cm os/ 0 and / dev/ cm os/ 1, and use I / O syst em calls t o access dat a from t he t wo banks. Because t he BI OS assigns sem ant ics t o t he CMOS area at bit - level granularit y, t he driver is capable of bit - level access. So, a read() obt ains t he specified num ber of bit s and advances t he int ernal file point er by t he num ber of bit s read.

Figu r e 5 .1 . CM OS on a PC- com pa t ible syst e m .

The CMOS is accessed via t wo I / O addresses, an index regist er and a dat a regist er, as shown in Table 5.1. You have t o specify t he desired CMOS m em ory offset in t he index regist er and exchange inform at ion via t he dat a regist er.

Ta ble 5 .1 . Re gist e r La you t on t h e CM OS Re gist e r N a m e

D e scr ipt ion

CMOS_BANK0_INDEX_PORT

Specify t he desired CMOS bank 0 offset in t his regist er.

CMOS_BANK0_DATA_PORT

Read/ writ e dat a from / t o t he address specified in CMOS_BANK0_INDEX_PORT.

CMOS_BANK1_INDEX_PORT

Specify t he desired CMOS bank 1 offset in t his regist er.

CMOS_BANK1_DATA_PORT

Read/ writ e dat a from / t o t he address specified in CMOS_BANK1_INDEX_PORT.

Because each driver m et hod has a syst em call count erpart t hat applicat ions use, we will look at t he syst em calls and t he m at ching driver m et hods in t andem .

D r ive r I n it ia liza t ion The driver init() m et hod is t he bedrock of t he regist rat ion m echanism . I t 's responsible for t he following:

Request ing allocat ion of device m aj or num bers.

Allocat ing m em ory for t he per- device st ruct ure.

Connect ing t he ent ry point s (open(), read(), and so on) wit h t he char driver's cdev abst ract ion.

Associat ing t he device m aj or num ber wit h t he driver's cdev.

Creat ing nodes under / dev and / sys. As discussed in Chapt er 4 , " Laying t he Groundwork," / dev m anagem ent has m eandered from st at ic device nodes in t he 2.2 kernels, t o dynam ic nam es in 2.4, and furt her t o a user- space policy daem on ( udevd) in 2.6.

I nit ializing t he hardware. This is not relevant for our sim ple CMOS.

List ing 5.1 im plem ent s t he CMOS driver's init() m et hod. List in g 5 .1 . CM OS D r ive r I n it ia liza t ion Code View: #include /* Per-device (per-bank) structure */ struct cmos_dev { unsigned short current_pointer; /* Current pointer within the bank */ unsigned int size; /* Size of the bank */ int bank_number; /* CMOS bank number */ struct cdev cdev; /* The cdev structure */ char name[10]; /* Name of I/O region */ /* ... */ /* Mutexes, spinlocks, wait queues, .. */ } *cmos_devp; /* File operations structure. Defined in linux/fs.h */ static struct file_operations cmos_fops = { .owner = THIS_MODULE, /* Owner */ .open = cmos_open, /* Open method */ .release = cmos_release, /* Release method */ .read = cmos_read, /* Read method */ .write = cmos_write, /* Write method */ .llseek = cmos_llseek, /* Seek method */ .ioctl = cmos_ioctl, /* Ioctl method */ }; static dev_t cmos_dev_number; struct class *cmos_class;

/* Allotted device number */ /* Tie with the device model */

#define #define #define #define #define #define #define

NUM_CMOS_BANKS CMOS_BANK_SIZE DEVICE_NAME CMOS_BANK0_INDEX_PORT CMOS_BANK0_DATA_PORT CMOS_BANK1_INDEX_PORT CMOS_BANK1_DATA_PORT

2 (0xFF*8) "cmos" 0x70 0x71 0x72 0x73

unsigned char addrports[NUM_CMOS_BANKS] = {CMOS_BANK0_INDEX_PORT, CMOS_BANK1_INDEX_PORT,}; unsigned char dataports[NUM_CMOS_BANKS] = {CMOS_BANK0_DATA_PORT, CMOS_BANK1_DATA_PORT,}; /* * Driver Initialization */ int __init cmos_init(void) { int i; /* Request dynamic allocation of a device major number */ if (alloc_chrdev_region(&cmos_dev_number, 0, NUM_CMOS_BANKS, DEVICE_NAME) < 0) { printk(KERN_DEBUG "Can't register device\n"); return -1; } /* Populate sysfs entries */ cmos_class = class_create(THIS_MODULE, DEVICE_NAME); for (i=0; iname, "cmos%d", i); if (!(request_region(addrports[i], 2, cmos_devp->name)) { printk("cmos: I/O port 0x%x is not free.\n", addrports[i]); return –EIO; } /* Fill in the bank number to correlate this device with the corresponding CMOS bank */ cmos_devp->bank_number = i; /* Connect the file operations with the cdev */ cdev_init(&cmos_devp->cdev, &cmos_fops); cmos_devp->cdev.owner = THIS_MODULE; /* Connect the major/minor number to the cdev */ if (cdev_add(&cmos_devp->cdev, (dev_number + i), 1)) { printk("Bad cdev\n"); return 1; } /* Send uevents to udev, so it'll create /dev nodes */

class_device_create(cmos_class, NULL, (dev_number + i), NULL, "cmos%d", i); } printk("CMOS Driver Initialized.\n"); return 0; }

/* Driver Exit */ void __exit cmos_cleanup(void) { int i; /* Remove the cdev */ cdev_del(&cmos_devp->cdev); /* Release the major number */ unregister_chrdev_region(MAJOR(dev_number), NUM_CMOS_BANKS); /* Release I/O region */ for (i=0; i cat /proc/devices | grep cmos 253 cmos

253 is t he dynam ically allocat ed m aj or num ber for t he CMOS device. During pre- 2.6 days, dynam ic device node allocat ion was not support ed, so char drivers m ade calls t o register_chrdev() t o st at ically request specific m aj or num bers. Before proceeding furt her down t he code pat h, let 's t ake a peek at t he dat a st ruct ures used in List ing 5.1. cmos_dev is t he per- device dat a st ruct ure referred t o earlier. cmos_fops is t he file_operations st ruct ure t hat cont ains t he address of driver ent ry point s. cmos_fops also has a field called owner t hat is set t o THIS_MODULE,

t he address of t he driver m odule in quest ion. Knowing t he ident it y of t he st ruct ure owner enables t he kernel t o offload from t he driver t he burden of som e housekeeping funct ions such as t racking t he use- count when processes open or release t he device. As you saw, t he kernel uses an abst ract ion called cdev t o int ernally represent char devices. Char drivers usually em bed t heir cdev inside t heir per- device st ruct ure. I n our exam ple, cdev sit s inside cmos_dev. cmos_init() loops over each support ed m inor device ( CMOS bank in t his case) allocat ing m em ory for t he associat ed perdevice st ruct ure and, hence, for t he cdev st ruct ure living inside it . cdev_init() associat es t he file operat ions (cmos_fops) wit h t he cdev, and cdev_add() connect s t he m aj or/ m inor num bers allocat ed by alloc_chrdev_region() t o t he cdev. class_create() populat es a sysfs ent ry for t his device, and class_device_create() result s in t he generat ion of t wo uevent s: cm os0 and cm os1. As you learned in Chapt er 4 , udevd list ens t o uevent s and generat es device nodes aft er consult ing it s rules dat abase. Add t he following t o t he udev rules direct ory ( / et c/ udev/ rules.d/ ) t o produce device nodes corresponding t o t he t wo CMOS banks (/ dev/ cm os/ 0 and / dev/ cm os/ 1) on receiving t he respect ive uevent s ( cm os0 and cm os1) : KERNEL="cmos[0-1]*", NAME="cmos/%n"

Device drivers t hat need t o operat e on a range of I / O addresses st ake claim t o t he addresses via a call t o request_region(). This regulat ory m echanism ensures t hat request s by ot hers for t he sam e region fail unt il t he occupant releases it via a call t o release_region(). request_region() is com m only invoked by I / O bus drivers such as PCI and I SA t o m ark ownership of on- card m em ory in t he processor's address space ( m ore on t his in Chapt er 10, " Peripheral Com ponent I nt erconnect " ) . cmos_init() request s access t o t he I / O region of each CMOS bank by calling request_region(). The last argum ent t o request_region() is an ident ifier used by / pr oc/ iopor t s, so you will see t his if you peek at t hat file: bash> grep cmos /proc/ioports 0070-0071 : cmos0 0072-0073 : cmos1

This com plet es t he regist rat ion process, and cmos_init() print s out a m essage signaling it s happiness.

Ope n a n d Re le a se The kernel invokes t he driver's open() m et hod when an applicat ion opens t he corresponding device node. You can t rigger execut ion of cmos_open() by doing t his: bash> cat /dev/cmos/0

The kernel calls t he release() m et hod when an applicat ion closes an open device. So when cat closes t he file descript or at t ached t o / dev/ cm os/ 0 aft er reading t he cont ent s of CMOS bank 0, t he kernel invokes cmos_release(). List ing 5.2 shows t he im plem ent at ion of cmos_open() and cmos_release(). Let 's t ake a closer look at cmos_open(). There are a couple of t hings wort hy of not e here. The first is t he ext ract ion of cmos_dev. The inode passed as an argum ent t o cmos_open() cont ains t he address of t he cdev st ruct ure allocat ed during init ializat ion. As shown in List ing 5.1, cdev is em bedded inside cmos_dev. To elicit t he address of t he cont ainer st ruct ure cmos_dev, cmos_open() uses t he kernel helper funct ion, container_of(). The ot her not able operat ion in cmos_open() is t he usage of t he private_data field t hat is part of struct file, t he second argum ent . You can use t his field (file->private_data) as a placeholder t o convenient ly correlat e inform at ion from inside ot her driver m et hods. The CMOS driver uses t his field t o st ore t he address of cmos_dev.

Look at cmos_release() ( and t he rest of t he m et hods) t o see how private_data is used t o direct ly obt ain a handle on t he cmos_dev st ruct ure belonging t o t he corresponding CMOS bank. List in g 5 .2 . Ope n a n d Re le a se Code View: /* * Open CMOS bank */ int cmos_open(struct inode *inode, struct file *file) { struct cmos_dev *cmos_devp; /* Get the per-device structure that contains this cdev */ cmos_devp = container_of(inode->i_cdev, struct cmos_dev, cdev); /* Easy access to cmos_devp from rest of the entry points */ file->private_data = cmos_devp; /* Initialize some fields */ cmos_devp->size = CMOS_BANK_SIZE; cmos_devp->current_pointer = 0; return 0; } /* * Release CMOS bank */ int cmos_release(struct inode *inode, struct file *file) { struct cmos_dev *cmos_devp = file->private_data; /* Reset file pointer */ cmos_devp->current_pointer = 0; return 0; }

Ex ch a n gin g D a t a read() and write() are t he basic char driver m et hods responsible for exchanging dat a bet ween user space and t he device. The ext ended read()/ write() fam ily cont ains several ot her m et hods, t oo: fsync(), aio_read(), aio_write(), and mmap(). The CMOS driver operat es on a sim ple m em ory device and does not have t o work t hrough som e of t he com plexit ies faced by usual char drivers:

CMOS dat a access rout ines do not need t o sleep- wait for device I / O t o com plet e, whereas read() and write() m et hods belonging t o m any char drivers have t o support bot h blocking and nonblocking m odes of operat ion. Unless a device file is opened in t he nonblocking (O_NONBLOCK) m ode, read() and write() are allowed t o put t he calling process t o sleep unt il t he corresponding operat ion com plet es.

CMOS driver operat ions com plet e synchronously and do not depend on int errupt s. However, dat a access m et hods belonging t o m any drivers depend on int errupt s for dat a collect ion and have t o com m unicat e wit h int errupt cont ext code via dat a st ruct ures such as wait queues.

List ing 5.3 cont ains t he read() and write() m et hods belonging t o t he CMOS driver. You cannot direct ly access user buffers from kernel space and vice versa, so t o copy CMOS m em ory cont ent s t o user space, cmos_read() uses t he services of copy_to_user(). cmos_write() does t he reverse using copy_from_user(). Because copy_to_user() and copy_from_user() m ay fall asleep on t he j ob, you cannot hold spinlocks while calling t hem . As you saw earlier, accessing CMOS m em ory is accom plished by operat ing on a pair of I / O addresses. To read different sizes of dat a from an I / O address, t he kernel provides a fam ily of archit ect ure- independent funct ions: in[b|w|l|sb|sl](). Sim ilarly, a clust er of rout ines, out[b|w|l|sb|sl](), are available for writ ing t o I / O regions. port_data_in() and port_data_out() in List ing 5.3 use inb() and oub() for dat a t ransfer. List in g 5 .3 . Re a d a n d W r it e Code View: /* * Read from a CMOS Bank at bit-level granularity */ ssize_t cmos_read(struct file *file, char *buf, size_t count, loff_t *ppos) { struct cmos_dev *cmos_devp = file->private_data; char data[CMOS_BANK_SIZE]; unsigned char mask; int xferred = 0, i = 0, l, zero_out; int start_byte = cmos_devp->current_pointer/8; int start_bit = cmos_devp->current_pointer%8; if (cmos_devp->current_pointer >= cmos_devp->size) { return 0; /*EOF*/ } /* Adjust count if it edges past the end of the CMOS bank */ if (cmos_devp->current_pointer + count > cmos_devp->size) { count = cmos_devp->size - cmos_devp->current_pointer; } /* Get the specified number of bits from the CMOS */ while (xferred < count) { data[i] = port_data_in(start_byte, cmos_devp->bank_number) >> start_bit; xferred += (8 - start_bit); if ((start_bit) && (count + start_bit > 8)) { data[i] |= (port_data_in (start_byte + 1, cmos_devp->bank_number) count) {

/* Zero out (xferred-count) bits from the MSB of the last data byte */ zero_out = xferred - count; mask = 1 current_pointer/8; int start_bit = cmos_devp->current_pointer%8; if (cmos_devp->current_pointer >= cmos_devp->size) { return 0; /* EOF */ } /* Adjust count if it edges past the end of the CMOS bank */ if (cmos_devp->current_pointer + count > cmos_devp->size) { count = cmos_devp->size - cmos_devp->current_pointer; } kbuf = kmalloc((count/8)+1,GFP_KERNEL); if (kbuf==NULL) return -ENOMEM; /* Get the bits from the user buffer */ if (copy_from_user(kbuf,buf,(count/8)+1)) { kfree(kbuf); return -EFAULT; } /* Write the specified number of bits to the CMOS bank */ while (xferred < count) { tmp_data = port_data_in(start_byte, cmos_devp->bank_number);

mask = 1 bank_number); xferred += start_l; } start_byte++; i++; } if (!xferred) return -EIO; /* Push the offset pointer forward */ cmos_devp->current_pointer += xferred; return xferred; /* Return the number of written bits */ }

/* * Read data from specified CMOS bank */ unsigned char port_data_in(unsigned char offset, int bank) { unsigned char data; if (unlikely(bank >= NUM_CMOS_BANKS)) { printk("Unknown CMOS Bank\n");

return 0; } else { outb(offset, addrports[bank]); /* Read a byte */ data = inb(dataports[bank]); } return data;

} /* * Write data to specified CMOS bank */ void port_data_out(unsigned char offset, unsigned char data, int bank) { if (unlikely(bank >= NUM_CMOS_BANKS)) { printk("Unknown CMOS Bank\n"); return; } else { outb(offset, addrports[bank]); /* Output a byte */ outb(data, dataports[bank]); } return; }

I f a char driver's write() m et hod ret urns successfully, it im plies t hat t he driver has assum ed responsibilit y for t he dat a passed down t o it by t he applicat ion. However it does not guarant ee t hat t he dat a has been successfully writ t en t o t he device. I f an applicat ion needs t his assurance, it can invoke t he fsync() syst em call. The corresponding fsync() driver m et hod ensures t hat applicat ion dat a is flushed from driver buffers and writ t en t o t he device. The CMOS driver does not need an fsync() m et hod because, in t his case, driver- writ es are synonym ous wit h device- writ es. I f a user applicat ion has dat a sit t ing on m ult iple buffers t hat it needs t o send t o a device, it can request m ult iple driver writ es, but t hat is inefficient for t he following reasons:

1 . The overhead of m ult iple syst em calls and relat ed cont ext swit ches.

2 . The driver is t he one who knows t he device int im at ely, so it can probably do a m ore clever j ob of efficient ly gat hering dat a from different buffers and dispat ching it t o t he device.

Because of t his, vect ored versions of read() and write() are support ed on Linux and ot her UNI X flavors. The Linux char driver infrast ruct ure used t o offer t wo dedicat ed m et hods t o perform vect or operat ions: readv() and writev(). St art ing wit h t he 2.6.19 kernel release, t hese t wo m et hods have been folded int o t he generic Linux Asynchronous I / O ( AI O) layer, however. Linux AI O is a broad t opic and is out side t he scope of t his discussion, so we j ust concent rat e on t he synchronous vect or capabilit ies offered by AI O. The prot ot ypes of t he vect or driver m et hods are as follows:

ssize_t aio_read(struct kiocb *iocb, const struct iovec *vector, unsigned long count, loff_t offset); ssize_t aio_write(struct kiocb *iocb, const struct iovec *vector, unsigned long count, loff_t offset);

The first argum ent t o aio_read()/aio_write() describes t he AI O operat ion, and t he second argum ent is an array of iovecs. The lat t er is t he principal dat a st ruct ure used by t he vect or funct ions and cont ains t he addresses and lengt hs of buffers t hat hold t he dat a. I n fact , t his m echanism is t he user space equivalent of scat t er- gat her DMA discussed in Chapt er 10. Look at include/ linux/ uio.h for t he definit ion of iovecs and at drivers/ net / t un.c[ 1] for an exam ple im plem ent at ion of vect ored char driver m et hods.

[ 1]

Discussed in t he sidebar " TUN/ TAP Driver" in Chapt er 15, " Net work I nt erface Cards."

Anot her dat a access m et hod is mmap(), which associat es device m em ory wit h user virt ual m em ory. Applicat ions m ay call t he corresponding syst em call, also called mmap(), and direct ly operat e on t he ret urned m em ory region t o access device- resident m em ory. Not m any drivers im plem ent mmap(), so we won't delve int o t hat here. I nst ead, have a look at drivers/ char/ m em .c for an exam ple mmap() im plem ent at ion. The sect ion " Accessing Mem ory Regions" in Chapt er 19, " Drivers in User Space," illust rat es how applicat ions use mmap(). Our exam ple CMOS driver does not im plem ent mmap(). You m ight have not iced t hat port_data_in() and port_data_out() envelop t he bank num ber sanit y check wit hin a m acro called unlikely(). Two m acros, likely() and unlikely(), inform GCC about t he probabilit y of success of t he associat ed condit ional evaluat ion. This inform at ion is used by GCC while predict ing branches. Because we m ark it unlikely t hat t he bank sanit y check will fail, GCC generat es int elligent code t hat gels t he else{} clause sequent ially wit h t he code flow. Branching is done for t he if{} clause. The reverse happens if you use likely() rat her t han unlikely().

Se e k The kernel uses an int ernal point er t o keep t rack of t he current file access posit ion. Applicat ions use t he lseek() syst em call t o request reposit ioning of t his int ernal file point er. Using t he services of lseek(), you can reset t he file point er t o any offset wit hin t he file. The char driver count erpart of lseek() is t he llseek() m et hod. cmos_llseek() im plem ent s t his m et hod in t he CMOS driver. As we saw previously, t he int ernal file point er for t he CMOS m oves bit - wise rat her t han byt e- wise. I f a byt e of dat a is read from t he CMOS driver, t he file point er has t o be m oved by 8, so applicat ions have t o seek accordingly. cmos_llseek() also im plem ent s end- of- file sem ant ics depending on t he size of t he CMOS bank. To underst and t he sem ant ics of llseek(), let 's st art by looking at t he com m ands support ed by t he lseek() syst em call:

1 . SEEK_SET, which set s t he file point er t o a supplied fixed offset .

2 . SEEK_CUR, which calculat es t he offset relat ive t o t he current locat ion.

3 . SEEK_END, which calculat es t he offset relat ive t o t he end- of- file. This com m and can m aneuver t he file point er beyond t he end of t he file, but does not change t he file size. Reads beyond t he end- of- file m arker ret urn naught if no dat a is explicit ly writ t en. This t echnique is oft en used t o creat e big files. The CMOS driver does not support SEEK_END.

Look at cmos_llseek() in List ing 5.4 and co- relat e wit h t he preceding definit ions. List in g 5 .4 . Se e k Code View: /* * Seek to a bit offset within a CMOS bank */ static loff_t cmos_llseek(struct file *file, loff_t offset, int orig) { struct cmos_dev *cmos_devp = file->private_data; switch (orig) { case 0: /* SEEK_SET */ if (offset >= cmos_devp->size) { return -EINVAL; } cmos_devp->current_pointer = offset; /* Bit Offset */ break; case 1: /* SEEK_CURR */ if ((cmos_devp->current_pointer + offset) >= cmos_devp->size) { return -EINVAL; } cmos_devp->current_pointer = offset; /* Bit Offset */ break; case 2: /* SEEK_END - Not supported */ return -EINVAL; default: return -EINVAL; } return(cmos_devp->current_pointer); }

Con t r ol Anot her com m on char driver m et hod is called I / O Cont rol ( or ioct l) . This rout ine is used t o receive and im plem ent applicat ion com m ands t hat request device- specific act ions. Because CMOS m em ory is used by t he BI OS t o st ore crucial inform at ion such as t he boot device order, it 's usually prot ect ed via cyclic redundancy check ( CRC) algorit hm s. To det ect dat a corrupt ion, t he CMOS driver support s t wo ioct l com m ands:

1 . Adj ust checksum , which is used t o recalculat e t he CRC aft er t he CMOS cont ent s have been m odified. The calculat ed checksum is st ored at a predet erm ined offset in CMOS bank 1.

2.

2 . Verify checksum , which is used t o check whet her t he CMOS cont ent s are healt hy. This is done by com paring t he CRC of t he current cont ent s wit h t he value previously st ored.

Applicat ions send t hese com m ands down t o t he driver via t he ioctl() syst em call when t hey want t o request it t o perform checksum operat ions. Look at cmos_ioctl() in List ing 5.5 for t he im plem ent at ion of t he CMOS driver's ioctl m et hod. adjust_cmos_crc(int bank, unsigned short seed) im plem ent s t he st andard CRC algorit hm and is not shown in t he list ing.

List in g 5 .5 . I / O Con t r ol Code View: #define CMOS_ADJUST_CHECKSUM 1 #define CMOS_VERIFY_CHECKSUM 2 #define CMOS_BANK1_CRC_OFFSET 0x1E /* * Ioctls to adjust and verify CRC16s. */ static int cmos_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { unsigned short crc = 0; unsigned char buf; switch (cmd) { case CMOS_ADJUST_CHECKSUM: /* Calculate the CRC of bank0 using a seed of 0 */ crc = adjust_cmos_crc(0, 0); /* Seed bank1 with CRC of bank0 */ crc = adjust_cmos_crc(1, crc); /* Store calculated CRC */ port_data_out(CMOS_BANK1_CRC_OFFSET, (unsigned char)(crc & 0xFF), 1); port_data_out((CMOS_BANK1_CRC_OFFSET + 1), (unsigned char) (crc >> 8), 1); break; case CMOS_VERIFY_CHECKSUM: /* Calculate the CRC of bank0 using a seed of 0 */ crc = adjust_cmos_crc(0, 0); /* Seed bank1 with CRC of bank0 */ crc = adjust_cmos_crc(1, crc); /* Compare the calculated CRC with the stored CRC */ buf = port_data_in(CMOS_BANK1_CRC_OFFSET, 1); if (buf != (unsigned char) (crc & 0xFF)) return -EINVAL; buf = port_data_in((CMOS_BANK1_CRC_OFFSET+1), 1); if (buf != (unsigned char)(crc >> 8)) return -EINVAL; break; default:

return -EIO; } return 0; }

Se n sin g D a t a Ava ila bilit y Many user applicat ions are sophist icat ed and are not sat isfied wit h t he vint age open()/ read()/ write()/ close() calls. They desire synchronous or asynchronous not ificat ions t hat alert t hem when new dat a is available from t he device or when t he driver is ready t o accept new dat a. I n t his sect ion, we exam ine t wo char driver m et hods t hat sense dat a availabilit y: poll() and fasync(). The form er is synchronous, whereas t he lat t er is asynchronous. Because t hese m echanism s are relat ively advanced, let 's first underst and how applicat ions use t hese feat ures before finding out how t he underlying driver im plem ent s t hem . Sensing dat a availabilit y is not relevant for t he sim ple CMOS m em ory device discussed previously, so let 's t ake a few usage scenarios from a popular user space applicat ion: t he X Windows server.

Poll Consider t he following code snippet from t he X Windows source t ree ( downloadable from www.xfree86.org) t hat handles m ice event s: xc/programs/Xserver/hw/xfree86/input/mouse/mouse.c: case PROT_THINKING: /* ThinkingMouse */ /* This mouse may send a PnP ID string, ignore it. */ usleep(200000); xf86FlushInput(pInfo->fd); /* Send the command to initialize the beast. */ for (s = "E5E5"; *s; ++s) { xf86WriteSerial(pInfo->fd, s, 1); if ((xf86WaitForInput(pInfo->fd, 1000000) fd, &c, 1); if (c != *s) break; } break;

Essent ially, t he code sends an init ializat ion com m and t o t he m ouse, polls unt il it senses input dat a, and reads t he response from t he device. I f you peel t he envelope off Xf86WaitForInput() used previously, you will find a call t o t he select() syst em call: Code View: xc/programs/Xserver/hw/xfree86/os-support/shared/posix_tty.c: int xf86WaitForInput(int fd, int timeout) { fd_set readfds; struct timeval to; int r; FD_ZERO(&readfds); if (fd >= 0) { FD_SET(fd, &readfds); } to.tv_sec = timeout / 1000000; to.tv_usec = timeout % 1000000; if (fd >= 0) {

SYSCALL (r = select(FD_SETSIZE, &readfds, NULL, NULL, &to)); } else { SYSCALL (r = select(FD_SETSIZE, NULL, NULL, NULL, &to)); } if (xf86Verbose >= 9) ErrorF ("select returned %d\n", r); return (r); }

You m ay supply a bunch of file descript ors t o select() and ask it t o keep an eye on t hem in t he associat ed dat a st at e. You m ay also request a t im eout t o override dat a availabilit y. t im eout of NULL, select() blocks forever. Refer t o t he m an or info pages of select() for docum ent at ion. The call t o select() in t he preceding snippet induces t he X server t o poll connect ed m ouse wit hin a t im eout .

unt il t here is a change I f you ask for a det ailed for dat a from a

Linux support s anot her syst em call, poll(), which has sem ant ics sim ilar t o select(). The 2.6 kernel support s a new non- POSI X syst em call nam ed epoll() t hat is a m ore scalable superset of poll(). All t hese syst em calls rely on t he sam e underlying char driver m et hod, poll().

Most I / O syst em calls are POSI X- com pliant and are not Linux- specific ( program s such as X Windows aft er all, run on m any UNI X flavors, not j ust on Linux) , but t he int ernal driver m et hods are operat ing syst em - specific. On Linux, t he poll() driver m et hod is t he pillar under t he select() syst em call. I n t he previous X server scenario, t he m ouse driver's poll() m et hod looks like t his: static DECLARE_WAIT_QUEUE_HEAD(mouse_wait); /* Wait Queue */ static unsigned int mouse_poll(struct file *file, poll_table *wait) { poll_wait(file, &mouse_wait, wait); spin_lock_irq(&mouse_lock); /* See if data has arrived from the device or if the device is ready to accept more data */ /* ... */ spin_unlock_irq(&mouse_lock); /* Availability of data is detected from interrupt context */ if (data_is_available()) return(POLLIN | POLLRDNORM); /* Data can be written. Not relevant for mice */ if (data_can_be_written()) return(POLLOUT | POLLWRNORM); return 0; }

When Xf86WaitForInput() invokes select(), t he generic kernel poll im plem ent at ion ( defined in fs/ select .c) calls mouse_poll(). mouse_poll() t akes t wo argum ent s, t he usual file point er ( struct file *) and a point er t o a kernel dat a st ruct ure called t he poll_table. The poll_table is a t able of wait queues owned by device drivers t hat are being polled for dat a. mouse_poll() uses t he library funct ion, poll_wait(), t o add a wait queue ( mouse_wait) t o t he kernel poll_table and go t o sleep. As you saw in Chapt er 3 , " Kernel Facilit ies," device drivers usually own several wait queues t hat block unt il t hey det ect a change in a dat a condit ion. This condit ion can be t he arrival of new dat a from t he device, willingness of t he driver t o pass new dat a t o t he applicat ion, or t he readiness of t he device ( or t he driver) t o accept new dat a. Such condit ions are usually ( but not always) det ect ed by t he driver's int errupt handler. When t he m ouse driver's int errupt handler senses m ouse m ovem ent , it calls wake_up_interruptible(&mouse_wait) t o wake up t he sleeping mouse_poll(). I f t here is no change in t he dat a condit ion, t he poll() m et hod ret urns 0. I f t he driver is ready t o send at least one byt e of dat a t o t he applicat ion, it ret urns POLLIN|POLLRDNORM. I f t he driver is ready t o accept at least a byt e of dat a from t he applicat ion, it ret urns POLLOUT|POLLWRNORM.[ 2] Thus, if t here is no m ouse m ovem ent , mouse_poll() ret urns 0, and t he calling t hread is put t o sleep. The kernel invokes mouse_poll() again when t he m ouse int errupt handler senses device dat a and wakes up t he mouse_wait queue. This t im e around, mouse_poll() ret urns POLLIN|POLLRDNORM, so t he select() call and hence Xf86WaitForInput() ret urn posit ive values. The X server's m ouse handler ( xc/ program s/ Xserver/ hw/ xfree86/ input / m ouse/ m ouse.c) goes on t o read dat a from t he m ouse.

[ 2]

The full list of ret urn codes is defined in include/ asm - generic/ poll.h. Som e of t hem are used only by t he net working st ack.

User applicat ions t hat poll a driver are usually m ore int erest ed in driver charact erist ics t han device charact erist ics. For exam ple, depending on t he healt h of it s buffers, a driver m ight be ready t o accept new dat a from t he applicat ion before t he device it self is.

Fa syn c Som e applicat ions, for perform ance reasons, desire asynchronous not ificat ions from t he device driver. Assum e t hat an applicat ion on a Linux pacem aker program m er device is busy perform ing com plex com put at ions but want s t o be not ified as soon as dat a arrives from an im plant ed pacem aker via a t elem et ry int erface. The select()/poll() m echanism is not of use in t his case because it blocks t he com put at ions. What t he applicat ion needs is an asynchronous event report . I f t he t elem et ry driver can asynchronously dispat ch a signal ( usually SIGIO) as soon as it det ect s dat a from t he pacem aker, t he applicat ion can cat ch it using a signal handler and accordingly st eer t he code flow. For a real- world exam ple of asynchronous not ificat ion, let 's revert t o a region of t he X server t hat request s alert s when dat a is det ect ed from input devices. Take a look at t his snippet from t he X server sources: Code View: xc/programs/Xserver/hw/xfree86/os-support/shared/sigio.c: int xf86InstallSIGIOHandler(int fd, void (*f)(int, void *), void *closure) { struct sigaction sa; struct sigaction osa;

if (fcntl(fd, F_SETOWN, getpid()) == -1) { blocked = xf86BlockSIGIO(); /* O_ASYNC is defined as SIGIO elsewhere by the X server */ if (fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | O_ASYNC) == -1) { xf86UnblockSIGIO(blocked); return 0; } sigemptyset(&sa.sa_mask); sigaddset(&sa.sa_mask, SIGIO); sa.sa_flags = 0; sa.sa_handler = xf86SIGIO; sigaction(SIGIO, &sa, &osa); /* ... */ return 0; } static void xf86SIGIO(int sig) { /* Identify the device that triggered generation of this SIGIO and handle the data arriving from it */ /* ... */ }

As you can decipher from t he above snippet , t he X server does t he following:

Calls fcntl(F_SETOWN). The fcntl() syst em call is used t o m anipulat e file descript or behavior. F_SETOWN set s t he ownership of t he descript or t o t he calling process. This is required since t he kernel needs t o know where t o send t he asynchronous signal. This st ep is t ransparent t o t he device driver.

I nvokes fcntl(F_SETFL). F_SETFL request s t he driver t o deliver SIGIO t o t he applicat ion whenever t here is dat a t o be read, or if t he driver is ready t o receive m ore applicat ion dat a. The invocat ion of fcntl(F_SETFL) result s in t he invocat ion of t he fasync() driver m et hod. I t 's t his m et hod's responsibilit y t o add or rem ove ent ries from t he list of processes t hat are t o be delivered SIGIO. To t his end, fasync() ut ilizes t he services of a kernel library funct ion called fasync_helper().

I m plem ent s t he SIGIO signal handler, xf86SIGIO(), as per it s code archit ect ure and inst alls it using t he sigaction() syst em call. When t he underlying input device driver det ect s a change in dat a st at us, it dispat ches SIGIO t o regist ered request ers and t his t riggers execut ion of xf86SIGIO(). [ 3] Char drivers call kill_fasync() t o send SIGIO t o regist ered processes. To not ify a read event , POLLIN is passed as t he argum ent t o kill_fasync(). To not ify a writ e event , t he argum ent is POLLOUT.

[ 3]

I f your signal handler services asynchronous event s from m ult iple devices, you will need addit ional m echanism s, such as a select() call inside t he handler, t o figure out t he ident it y of t he device responsible for t he event .

To see how t he driver- side of t he asynchronous not ificat ion chain is im plem ent ed, let 's look at a fict it ious

fasync() m et hod belonging t o t he driver of an input device: Code View: /* This is invoked by the kernel when the X server opens this * input device and issues fcntl(F_SETFL) on the associated file * descriptor. fasync_helper() ensures that if the driver issues a * kill_fasync(), a SIGIO is dispatched to the owning application. */ static int inputdevice_fasync(int fd, struct file *filp, int on) { return fasync_helper(fd, filp, on, &inputdevice_async_queue); } /* Interrupt Handler */ irqreturn_t inputdevice_interrupt(int irq, void *dev_id) { /* ... */ /* Dispatch a SIGIO using kill_fasync() when input data is detected. Output data is not relevant since this is a read-only device */ wake_up_interruptible(&inputdevice_wait); kill_fasync(&inputdevice_async_queue, SIGIO, POLL_IN); /* ... */ return IRQ_HANDLED; }

To see how SIGIO delivery can be com plex, consider t he case of a t t y driver ( discussed in Chapt er 6 , " Serial Drivers" ) . I nt erest ed applicat ions get not ified under different scenarios:

I f t he underlying driver is not ready t o accept applicat ion dat a, it put s t he calling process t o sleep. When t he driver int errupt handler subsequent ly decides t hat t he device can accept m ore dat a, it wakes t he applicat ion and invokes kill_fasync(POLLOUT).

I f a newline charact er is received, t he t t y layer calls kill_fasync(POLLIN).

When t he driver wakes up a sleeping reader t hread aft er det ect ing t hat sufficient dat a byt es beyond a t hreshold have arrived from a device, it sends t hat inform at ion t o st akeholder processes by invoking kill_fasync(POLLIN).

Ta lk in g t o t h e Pa r a lle l Por t The parallel port is a ubiquit ous 25- pin int erface popularly found on PC- com pat ible syst em s. The capabilit y of a parallel port ( whet her it 's unidirect ional, bidirect ional, support s DMA, and so on) depends on t he underlying chipset . Look at Figure 4.1 in Chapt er 4 t o find out how t he PC archit ect ure support s parallel port s. The drivers/ parport / direct ory cont ains code ( called parport ) t hat im plem ent s I EEE 1284 parallel port com m unicat ion. Several devices t hat connect t o t he parallel port such as print ers and scanners use parport 's services. Parport has an archit ect ure- independent m odule called parport .ko and an archit ect ure- dependent one (par por t _pc.ko for t he PC archit ect ure) t hat provide program m ing int erfaces t o drivers of devices t hat int erface via t he parallel port . Let 's t ake t he exam ple of t he parallel print er driver, drivers/ char/ lp.c. These are t he high- level st eps needed t o print a file: 1.

The print er driver creat es char device nodes / dev/ lp0 t o / dev/ lpN, one per connect ed print er.

2.

The Com m on UNI X Print ing Syst em ( CUPS) is t he fram ework t hat provides print capabilit ies on Linux. The CUPS configurat ion file ( / et c/ print ers.conf on som e dist ribut ions) m aps print ers wit h t heir char device nodes (/ dev/ lpX) .

3.

CUPS ut ilit ies consult t his file and st ream dat a t o t he corresponding device node. So, if you have a print er connect ed t o t he first parallel port on your syst em and you issue t he com m and, lpr m yfile, it 's st ream ed via / dev/ lp0 t o t he print er's write() m et hod, lp_write(), defined in drivers/ char/ lp.c.

4.

lp_write() uses t he services of parport t o send t he dat a t o t he print er.

Apple I nc. has acquired ownership of CUPS soft ware. The code cont inues t o be licensed under GPLv2.

A char driver called ppdev (drivers/ char/ ppdev.c) export s t he / dev/ parport X device nodes t hat let user applicat ions direct ly com m unicat e wit h t he parallel port . ( We t alk m ore about ppdev in Chapt er 19.)

D e vice Ex a m ple : Pa r a lle l Por t LED Boa r d To learn how t o use t he services offered by parport , let 's writ e a sim ple driver. Consider a board t hat has eight light - em it t ing diodes ( LEDs) int erfaced t o a st andard 25- pin parallel port connect or. Because t he 8- bit parallel port dat a regist er on t he PC is direct ly m apped t o pins 2 t o 9 of t he parallel port connect or, t hose pins are wired t o t he LEDs on t he board. Writ ing t o t he parallel port dat a regist er cont rols t he volt age levels of t hese pins and t urns t he LEDs on or off. List ing 5.6 im plem ent s a char driver t hat com m unicat es wit h t his board over t he syst em parallel port . Em bedded com m ent s explain t he parport service rout ines t hat List ing 5.6 uses.

List in g 5 .6 . D r ive r for t h e Pa r a lle l LED Boa r d ( le d.c) Code View: #include #include

#include #include #include #define DEVICE_NAME

"led"

static dev_t dev_number; static struct class *led_class; struct cdev led_cdev; struct pardevice *pdev;

/* Allotted device number */ /* Class to which this device belongs */ /* Associated cdev */ /* Parallel port device */

/* LED open */ int led_open(struct inode *inode, struct file *file) { return 0; } /* Write to the LED */ ssize_t led_write(struct file *file, const char *buf, size_t count, loff_t *ppos) { char kbuf; if (copy_from_user(&kbuf, buf, 1)) return -EFAULT; /* Claim the port */ parport_claim_or_block(pdev); /* Write to the device */ parport_write_data(pdev->port, kbuf); /* Release the port */ parport_release(pdev); return count; } /* Release the device */ int led_release(struct inode *inode, struct file *file) { return 0; } /* File Operations */ static struct file_operations led_fops = { .owner = THIS_MODULE, .open = led_open, .write = led_write, .release = led_release, }; static int led_preempt(void *handle) { return 1; }

/* Parport attach method */ static void led_attach(struct parport *port) { /* Register the parallel LED device with parport */ pdev = parport_register_device(port, DEVICE_NAME, led_preempt, NULL, NULL, 0, NULL); if (pdev == NULL) printk("Bad register\n"); } /* Parport detach method */ static void led_detach(struct parport *port) { /* Do nothing */ } /* Parport driver operations */ static struct parport_driver led_driver = { .name = "led", .attach = led_attach, .detach = led_detach, }; /* Driver Initialization */ int __init led_init(void) { /* Request dynamic allocation of a device major number */ if (alloc_chrdev_region(&dev_number, 0, 1, DEVICE_NAME) < 0) { printk(KERN_DEBUG "Can't register device\n"); return -1; } /* Create the led class */ led_class = class_create(THIS_MODULE, DEVICE_NAME); if (IS_ERR(led_class)) printk("Bad class create\n"); /* Connect the file operations with the cdev */ cdev_init(&led_cdev, &led_fops); led_cdev.owner = THIS_MODULE; /* Connect the major/minor number to the cdev */ if (cdev_add(&led_cdev, dev_number, 1)) { printk("Bad cdev add\n"); return 1; } class_device_create(led_class, NULL, dev_number, NULL, DEVICE_NAME); /* Register this driver with parport */ if (parport_register_driver(&led_driver)) { printk(KERN_ERR "Bad Parport Register\n"); return -EIO; }

printk("LED Driver Initialized.\n"); return 0; } /* Driver Exit */ void __exit led_cleanup(void) { unregister_chrdev_region(MAJOR(dev_number), 1); class_device_destroy(led_class, MKDEV(MAJOR(dev_number), 0)); class_destroy(led_class); return; } module_init(led_init); module_exit(led_cleanup); MODULE_LICENSE("GPL");

led_init() is sim ilar t o cmos_init() developed in List ing 5.1, but for a couple of t hings:

1 . As you saw in Chapt er 4 , t he new device m odel dist inguishes bet ween drivers and devices. led_init() regist ers t he LED driver wit h parport via a call t o parport_register_driver().When t he kernel finds t he LED board during led_attach(), it regist ers t he device by invoking parport_register_device().

2 . led_init() creat es t he device node / dev/ led , which you can use t o cont rol t he st at e of individual LEDs.

Com pile and insert t he driver m odule int o t he kernel: bash> make –C /path/to/kerneltree/ M=$PWD modules bash> insmod ./led.ko LED Driver Initialized

To select ively drive som e parallel port pins and glow t he corresponding LEDs, echo t he appropriat e value t o / dev/ led : bash> echo 1 > /dev/led

Because ASCI I for 1 is 31 ( or 00110001) , t he first , fift h, and sixt h LEDs should t urn on. The preceding com m and t riggers invocat ion of led_write(). This driver m et hod first copies user m em ory ( t he value 31 in t his case) t o kernel buffers via copy_from_user(). I t t hen claim s t he parallel port , writ es dat a, and releases t he port , all using parport int erfaces. Sysfs is a bet t er place t han / dev t o cont rol device st at e, so it 's a good idea t o ent rust LED cont rol t o sysfs files. List ing 5.7 cont ains t he driver im plem ent at ion t hat achieves t his. The sysfs m anipulat ion code in t he list ing can

serve as a t em plat e t o achieve device cont rol from ot her drivers, t oo. List in g 5 .7 . Usin g Sysfs t o Con t r ol t h e Pa r a lle l LED Boa r d Code View: #include #include #include #include #include static static struct struct

dev_t dev_number; struct class *led_class; cdev led_cdev; pardevice *pdev;

struct kobject kobj;

/* /* /* /*

Allotted Device Number */ Class Device Model */ Character dev struct */ Parallel Port device */

/* Sysfs directory object */

/* Sysfs attribute of the leds */ struct led_attr { struct attribute attr; ssize_t (*show)(char *); ssize_t (*store)(const char *, size_t count); }; #define glow_show_led(number) static ssize_t glow_led_##number(const char *buffer, size_t count) { unsigned char buf; int value; sscanf(buffer, "%d", &value); parport_claim_or_block(pdev); buf = parport_read_data(pdev->port); if (value) { parport_write_data(pdev->port, buf | (1store ? lattr->store(buf, count) : -EIO; return ret; } /* Sysfs operations structure */ static struct sysfs_ops sysfs_ops = { .show = l_show, .store = l_store, };

/* Attributes of the /sys/class/pardevice/led/control/ kobject. Each file in this directory corresponds to one LED. Control each LED by writing or reading the associated sysfs file */ static struct attribute *led_attrs[] = { &led0.attr, &led1.attr, &led2.attr, &led3.attr, &led4.attr, &led5.attr, &led6.attr, &led7.attr, NULL }; /* This describes the kobject. The kobject has 8 files, one corresponding to each LED. This representation is called the ktype of the kobject */ static struct kobj_type ktype_led = { .sysfs_ops = &sysfs_ops, .default_attrs = led_attrs, }; /* Parport methods. We don't have a detach method */ static struct parport_driver led_driver = { .name = "led", .attach = led_attach, }; /* Driver Initialization */ int __init led_init(void) { struct class_device *c_d; /* Create the pardevice class - /sys/class/pardevice */ led_class = class_create(THIS_MODULE, "pardevice"); if (IS_ERR(led_class)) printk("Bad class create\n"); /* Create the led class device - /sys/class/pardevice/led/ */ c_d = class_device_create(led_class, NULL, dev_number, NULL, DEVICE_NAME); /* Register this driver with parport */ if (parport_register_driver(&led_driver)) { printk(KERN_ERR "Bad Parport Register\n"); return -EIO; } /* Instantiate a kobject to control each LED on the board */ /* Parent is /sys/class/pardevice/led/ */ kobj.parent = &c_d->kobj; /* The sysfs file corresponding to kobj is /sys/class/pardevice/led/control/ */ strlcpy(kobj.name, "control", KOBJ_NAME_LEN); /* Description of the kobject. Specifies the list of attribute

files in /sys/class/pardevice/led/control/ */ kobj.ktype = &ktype_led; /* Register the kobject */ kobject_register(&kobj); printk("LED Driver Initialized.\n"); return 0; } /* Driver Exit */ void led_cleanup(void) { /* Unregister kobject corresponding to /sys/class/pardevice/led/control */ kobject_unregister(&kobj); /* Destroy class device corresponding to /sys/class/pardevice/led/ */ class_device_destroy(led_class, MKDEV(MAJOR(dev_number), 0)); /* Destroy /sys/class/pardevice */ class_destroy(led_class); return; } module_init(led_init); module_exit(led_cleanup); MODULE_LICENSE("GPL");

The m acro definit ion of glow_show_led() in List ing 5.7 uses a t echnique popular in kernel source files t o com pact ly define several sim ilar funct ions. The definit ion produces read() and write() m et hods ( called show() and store() in sysfs t erm inology) at t ached t o eight / sys files, one per LED on t he board. Thus, glow_show_led(0) at t aches glow_led_0() and show_led_0() t o t he / sys file corresponding t o t he first LED. These funct ions are respect ively responsible for glowing/ ext inguishing t he first LED and reading it s st at us. ## glues a value t o a st ring, so glow_led_##number t ranslat es t o glow_led_0() when t he com piler processes t he st at em ent , glow_show_led(0). This sysfs- aware version of t he driver uses a kobj ect t o represent a " cont rol" abst ract ion, which em ulat es a soft ware knob t o cont rol t he LEDs. Each kobj ect is represent ed by a direct ory nam e in sysfs, so kobject_register() in List ing 5.7 result s in t he creat ion of t he / sys/ class/ pardevice/ led/ cont rol/ direct ory. A kt ype describes a kobj ect . The " cont rol" kobj ect is described via t he ktype_led st ruct ure, which cont ains a point er t o t he at t ribut e array, led_attrs[]. This array cont ains t he addresses of t he device at t ribut es of each LED. The at t ribut es of each LED are t ied t oget her by t he st at em ent : static struct led_attr led##number = __ATTR(led##number, 0644, show_led_##number, glow_led_##number);

This result s in inst ant iat ing t he cont rol file for each LED, / sys/ class/ pardevice/ led/ cont rol/ ledX, where X is t he LED num ber. To change t he st at e of ledX, echo a 1 ( or a 0) t o t he corresponding cont rol file. To glow t he first LED on t he board, do t his: bash> echo 1 > /sys/class/pardevice/led/control/led0

During m odule exit , t he driver unregist ers t he kobj ect s and classes using kobject_unregister(), class_device_destroy(), and class_destroy(). List ing 7.2 in Chapt er 7 , " I nput Drivers," uses anot her rout e t o creat e files in sysfs. Writ ing a char driver is no longer as sim ple as it used t o be in t he days of t he 2.4 kernel. To develop t he sim ple LED driver above, we used half a dozen abst ract ions: cdev, sysfs, kobj ect s, classes, class device, and parport . The abst ract ions, however, bring several advant ages t o t he t able such as bug- free building blocks, code reuse, and elegant design.

RTC Su bsyst e m RTC support in t he kernel is archit ect ed int o t wo layers: a hardware- independent t op- layer char driver t hat im plem ent s t he kernel RTC API , and a hardware- dependent bot t om - layer driver t hat com m unicat es wit h t he underlying bus. The RTC API , specified in Docum ent at ion/ rt c.t xt , is a set of st andard ioct ls t hat conform ing applicat ions such as hwclock leverage by operat ing on / dev/ rt c. The API also specifies at t ribut es in sysfs (/ sys/ class/ rt c/ ) and procfs (/ proc/ driver/ rt c) . The RTC API guarant ees t hat user space t ools are independent of t he underlying plat form and t he RTC chip. The bot t om - layer RTC driver is bus- specific. The em bedded device discussed in t he sect ion "Device Exam ple: Real Tim e Clock " in Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol," has an RTC chip connect ed t o t he I 2 C bus, which is driven by an I 2 C client driver. The kernel has a dedicat ed RTC subsyst em t hat provides t he t op- layer char driver and a core infrast ruct ure t hat bot t om - layer RTC drivers can use t o t ie in wit h t he t op layer. The m ain com ponent s of t his infrast ruct ure are t he rtc_class_ops st ruct ure and t he regist rat ion funct ions, rtc_device_[register|unregister](). Bot t om - layer RTC drivers scat t ered under different bus- specific direct ories are being unified wit h t his subsyst em under drivers/ rt c/ . The RTC subsyst em allows t he possibilit y t hat a syst em can have m ore t han one RTC. I t does t his by export ing m ult iple int erfaces, / dev/ rt cN and / sys/ class/ rt c/ rt cN, where N is t he num ber of RTCs on your syst em . Som e em bedded syst em s, for exam ple, have t wo RTCs: one built in t o t he m icrocont roller t o support sophist icat ed operat ions such as periodic int errupt generat ion, and anot her no- frills low- power bat t ery- backed ext ernal RTC for t im ekeeping. Because RTC- aware applicat ions operat e over / dev/ rt c, set up a sym bolic link so t hat one of t he creat ed / dev/ rt cX nodes can be accessed as / dev/ rt c. To enable t he RTC subsyst em , t urn on CONFIG_RTC_CLASS during kernel configurat ion.

The Legacy PC RTC Driver On PC syst em s, you have t he opt ion of bypassing t he RTC subsyst em by using t he legacy RTC driver, drivers/ char/ rt c.c. This driver provides t op and bot t om layers for t he RTC on PC- com pat ible syst em s and export s / dev/ rt c and / proc/ driver/ rt c t o user applicat ions. To enable t his driver, t urn on CONFIG_RTC during kernel configurat ion.

Pse u do Ch a r D r ive r s Several com m only used kernel facilit ies are not connect ed wit h any physical hardware, and t hese are elegant ly im plem ent ed as char devices. The null sink, t he perpet ual zero source, and t he kernel random num ber generat or are t reat ed as virt ual devices and are accessed using pseudo char device drivers. The / dev/ null char device sinks dat a t hat you don't want t o display on your screen. So if you need t o check out source files from a Concurrent Versioning Syst em ( CVS) reposit ory wit hout spewing filenam es all over t he screen, do t his: bash> cvs co kernel > /dev/null

This redirect s com m and out put t o t he writ e ent ry point belonging t o t he / dev/ null driver. The driver's read() and write() m et hods sim ply ret urn success ignoring t he cont ent s of t he input and out put buffers, respect ively. I f you want t o fill an im age file wit h zeros, call upon / dev / zer o t o com e t o your service: bash> dd if=/dev/zero of=file.img bs=1024 count=1024

This sources a st ream of zeros from t he read() m et hod belonging t o t he / dev / zer o driver. The driver has no write() m et hod. The kernel has a built - in random num ber generat or. For t he benefit of kernel users who desire t o use random sequences, t he random num ber generat or export s API s such as get_random_bytes(). For user m ode program s, it export s t wo char int erfaces: / dev/ random and / dev/ urandom . The qualit y of random ness is higher for reads from / dev/ random com pared t o t hat from / dev/ urandom . When a user program reads from / dev/ random , it get s st rong ( or t rue) random num bers, but reads from / dev/ urandom yield pseudo random num bers. The / dev/ random driver does not use form ulae t o generat e st rong random num bers. I nst ead, it gat hers " environm ent al noise" ( int erval bet ween int errupt s, key clicks, and so on) for m aint aining a reservoir of disorder ( called an ent ropy pool) t hat seeds t he random st ream . To see t he kernel's input subsyst em ( discussed in Chapt er 7 ) cont ribut ing t o t he ent ropy pool when it det ect s a keyboard press or m ouse m ovem ent , look at input_event() defined in drivers/ input / input .c: void input_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) { /* ... */ add_input_randomness(type, code, value); /* Contribute to entropy pool */ /* ... */ }

To see how t he core int errupt handling layer cont ribut es int er- int errupt periods t o t he ent ropy pool, look at handle_IRQ_event() defined in kernel/ irq/ handle.c: irqreturn_t handle_IRQ_event(unsigned int irq, struct irqaction *action) { /* ... */ if (status & IRQF_SAMPLE_RANDOM)

add_interrupt_randomness(irq); /* Contribute to entropy pool */ /* ... */ }

The generat ion of st rongly random num bers depends on t he size of t he ent ropy pool: bash> od –x /dev/random 0000000 7331 9028 7c89 4791 0000020 ebb9 e806 221a b8f9 0000040 68d8 0bbf 68a4 0898 0000060 b01d 8714 b1e1 19b9

7f64 af12 528e 0a86

3deb cb30 1557 9f60

86b3 9a0e d8b3 646c

7564 cc28 57ec c269

The out put st ops aft er a few lines, signaling t hat t he ent ropy pool is exhaust ed. To replenish t he ent ropy pool and rest art t he random st ream , j ab t he keyboard several t im es aft er swit ching t o an unused t erm inal or push t he m ouse around t he screen. A dum p of / dev/ urandom , however, produces a cont inuous pseudo random st ream t hat never st ops. / dev/ m em and / dev/ km em are classic pseudo char devices t hat are t ools t hat let you peek inside syst em m em ory. These char nodes export raw int erfaces connect ed t o physical m em ory and kernel virt ual m em ory, respect ively. To m anipulat e syst em m em ory, you m ay mmap() t hese nodes and operat e on t he ret urned regions. As an exercise, change t he host nam e of your syst em by accessing / dev/ m em . All t he char devices discussed in t his sect ion ( null, zero, random , urandom , m em , and km em ) have different m inor num bers but t he sam e st at ically assigned m aj or num ber, 1. Look at drivers/ char/ m em .c and drivers/ char/ random .c for t heir im plem ent at ion. Two ot her pseudo drivers belong t o t he sam e m aj or num ber fam ily: / dev/ full, which em ulat es an always full device; and / dev/ port, which peeks at syst em I / O port s. We use t he lat t er in Chapt er 19.

M isc D r ive r s Misc ( or m iscellaneous) drivers are sim ple char drivers t hat share cert ain com m on charact erist ics. The kernel abst ract s t hese com m onalit ies int o an API ( im plem ent ed in drivers/ char/ m isc.c) , and t his sim plifies t he way t hese drivers are init ialized. All m isc devices are assigned a m aj or num ber of 10, but each can choose a single m inor num ber. So, if a char driver needs t o drive m ult iple devices as in t he CMOS exam ple discussed earlier, it 's probably not a candidat e for being a m isc driver. Consider t he sequence of init ializat ion st eps t hat a char driver perform s:

Allocat es m aj or/ m inor num bers via alloc_chrdev_region() and friends Creat es / dev and / sys nodes using class_device_create() Regist ers it self as a char driver using cdev_init() and cdev_add()

A m isc driver accom plishes all t his wit h a single call t o misc_register(): static struct miscdevice mydrv_dev = { MYDRV_MINOR, "mydrv", &mydrv_fops }; misc_register(&mydrv_dev);

I n t he preceding exam ple, MYDRV_MINOR is t he m inor num ber t hat you want t o st at ically assign t o your m isc driver. You m ay also request a m inor num ber t o be dynam ically assigned by specifying MISC_DYNAMIC_MINOR rat her t han MYDRV_MINOR in t he mydrv_dev st ruct ure. Each m isc driver aut om at ically appears under / sys/ class/ m isc/ wit hout explicit effort from t he driver writ er. Because m isc drivers are char drivers, t he earlier discussion on char driver ent ry point s hold for m isc drivers, t oo. Let 's now look at an exam ple m isc driver.

D e vice Ex a m ple : W a t ch dog Tim e r A wat chdog's funct ion is t o ret urn an unresponsive syst em t o operat ional st at e. I t does t his by periodically checking t he syst em 's pulse and issuing a reset [ 4] if it can't det ect any. Applicat ion soft ware is responsible for regist ering t his pulse ( or " heart beat " ) by periodically st robing ( or " pet t ing" ) t he wat chdog using t he services of a wat chdog device driver. Most em bedded cont rollers support int ernal wat chdog m odules. Ext ernal wat chdog chips are also available. An exam ple is t he Net winder W83977AF chip.

[ 4] A wat chdog m ay issue audible beeps rat her t han a syst em reset . An exam ple scenario is when a t im eout occurs due t o a power supply problem , assum ing t hat t he wat chdog circuit is backed up using a bat t ery or a super capacit or.

Linux wat chdog drivers are im plem ent ed as m isc drivers and live inside drivers/ char/ wat chdog/ . Wat chdog drivers, like RTC drivers, export a st andard device int erface t o user land, so conform ing applicat ions are rendered independent of t he int ernals of wat chdog hardware. This API is specified in Docum ent at ion/ wat chdog/ wat chdog- api.t xt in t he kernel source t ree. Program s t hat desire t he services of a

wat chdog operat e on / dev/ w at chdog, a device node having a m isc m inor num ber of 130. List ing 5.9 im plem ent s a device driver for a fict it ious wat chdog m odule built in t o an em bedded cont roller. The exam ple wat chdog cont ains t wo m ain regist ers as shown in Table 5.2: a service regist er (WD_SERVICE_REGISTER) and a cont rol regist er (WD_CONTROL_REGISTER) . To pet t he wat chdog, t he driver writ es a specific sequence ( 0xABCD in t his case) t o t he service regist er. To program wat chdog t im eout , t he driver writ es t o specified bit posit ions in t he cont rol regist er.

Ta ble 5 .2 . Re gist e r La you t on t h e W a t ch dog M odu le Re gist e r N a m e

D e scr ipt ion

WD_SERVICE_REGISTER

Writ e a specific sequence t o t his regist er t o pet t he wat chdog.

WD_CONTROL_REGISTER

Writ e t he wat chdog t im eout t o t his regist er.

St robing t he wat chdog is usually done from user space because t he goal of having a wat chdog is t o det ect and respond t o bot h applicat ion and kernel hangs. A crit ical applicat ion [ 5] such as t he graphics engine in List ing 5.10 opens t he wat chdog driver in List ing 5.9 and periodically writ es t o it . I f no writ e occurs wit hin t he wat chdog t im eout due t o an applicat ion hang or a kernel crash, t he wat chdog t riggers a syst em reset . I n t he case of List ing 5.10, t he wat chdog will reboot t he syst em if

[ 5] I f you need t o m onit or t he healt h of several applicat ions, you m ay im plem ent a m ult iplexer in t he wat chdog device driver. I f any one of t he processes t hat open t he driver becom es unresponsive, t he wat chdog at t em pt s t o self- correct t he syst em .

The applicat ion hangs inside process_graphics()

The kernel, and consequent ly t he applicat ion, dies

The wat chdog st art s t icking when an applicat ion opens / dev/ wat chdog. Closing t his device node st ops t he wat chdog unless you set CONFIG_WATCHDOG_NOWAYOUT during kernel configurat ion. Set t ing t his opt ion helps you t ide over t he possibilit y t hat t he wat chdog m onit oring process ( such as List ing 5.10) get s killed by a signal while t he syst em cont inues running. List in g 5 .9 . An Ex a m ple W a t ch dog D r ive r Code View: #include #include #define DEFAULT_WATCHDOG_TIMEOUT 10 #define TIMEOUT_SHIFT 5 #define WENABLE_SHIFT

3

/* 10-second timeout */ /* To get to the timeout field in WD_CONTROL_REGISTER */ /* To get to the watchdog-enable field in WD_CONTROL_REGISTER */

/* Misc structure */ static struct miscdevice my_wdt_dev = { .minor = WATCHDOG_MINOR, /* defined as 130 in include/linux/miscdevice.h*/

.name = "watchdog", .fops = &my_wdt_dog };

/* /dev/watchdog */ /* Watchdog driver entry points */

/* Driver methods */ struct file_operations my_wdt_dog = { .owner = THIS_MODULE, .open = my_wdt_open, .release = my_wdt_close, .write = my_wdt_write, .ioctl = my_wdt_ioctl } /* Module Initialization */ static int __init my_wdt_init(void) { /* ... */ misc_register(&my_wdt_dev); /* ... */ } /* Open watchdog */ static void my_wdt_open(struct inode *inode, struct file *file) { /* Set the timeout and enable the watchdog */ WD_CONTROL_REGISTER |= DEFAULT_WATCHDOG_TIMEOUT xmit.tail] as is evident from t he UART driver's start_tx() ent ry point , usb_uart_start_tx() . I n t he receive pat h, t he driver pushes dat a collect ed from t he USB_UART t o t he associat ed t t y driver using tty_insert_flip_char() and tty_flip_buffer_push() . This is done in t he receive int errupt handler, usb_uart_rxint() . Revisit t his rout ine aft er reading t he next sect ion, " TTY Drivers ." List ing 6.1 uses com m ent s t o explain t he purpose of driver ent ry point s and t heir operat ion. I t leaves som e of t he ent ry point s in t he uart_ops st ruct ure unim plem ent ed t o cut out ext ra det ail.

List in g 6 .1 . USB_UART D r ive r for t h e Lin u x Ce ll Ph on e Code View: #include #include #include #include #include #include #include #include #define #define #define #define

USB_UART_MAJOR USB_UART_MINOR_START USB_UART_PORTS PORT_USB_UART

200 70 2 30

/* /* /* /*

You've to get this assigned */ Start minor numbering here */ The phone has 2 USB_UARTs */ UART type. Add this to include/linux/serial_core.h */

/* Each USB_UART has a 3-byte register set consisting of UU_STATUS_REGISTER at offset 0, UU_READ_DATA_REGISTER at offset 1, and UU_WRITE_DATA_REGISTER at offset 2 as shown in Table 6.1 */ #define USB_UART1_BASE 0xe8000000 /* Memory base for USB_UART1 */ #define USB_UART2_BASE 0xe9000000 /* Memory base for USB_UART2 */ #define USB_UART_REGISTER_SPACE 0x3 /* Semantics of bits in the status register */ #define USB_UART_TX_FULL 0x20 /* TX FIFO is full */ #define USB_UART_RX_EMPTY 0x10 /* TX FIFO is empty */ #define USB_UART_STATUS 0x0F /* Parity/frame/overruns? */ #define #define #define #define

USB_UART1_IRQ USB_UART2_IRQ USB_UART_FIFO_SIZE USB_UART_CLK_FREQ

3 /* USB_UART1 IRQ */ 4 /* USB_UART2 IRQ */ 32 /* FIFO size */ 16000000

static struct uart_port usb_uart_port[]; /* Defined later on */ /* Write a character to the USB_UART port */ static void usb_uart_putc(struct uart_port *port, unsigned char c) { /* Wait until there is space in the TX FIFO of the USB_UART. Sense this by looking at the USB_UART_TX_FULL bit in the status register */ while (__raw_readb(port->membase) & USB_UART_TX_FULL); /* Write the character to the data port*/ __raw_writeb(c, (port->membase+1)); } /* Read a character from the USB_UART */ static unsigned char usb_uart_getc(struct uart_port *port) { /* Wait until data is available in the RX_FIFO */ while (__raw_readb(port->membase) & USB_UART_RX_EMPTY); /* Obtain the data */

return(__raw_readb(port->membase+2)); } /* Obtain USB_UART status */ static unsigned char usb_uart_status(struct uart_port *port) { return(__raw_readb(port->membase) & USB_UART_STATUS); } /* * Claim the memory region attached to USB_UART port. Called * when the driver adds a USB_UART port via uart_add_one_port(). */ static int usb_uart_request_port(struct uart_port *port) { if (!request_mem_region(port->mapbase, USB_UART_REGISTER_SPACE, "usb_uart")) { return -EBUSY; } return 0; } /* Release the memory region attached to a USB_UART port. * Called when the driver removes a USB_UART port via * uart_remove_one_port(). */ static void usb_uart_release_port(struct uart_port *port) { release_mem_region(port->mapbase, USB_UART_REGISTER_SPACE); } /* * Configure USB_UART. Called when the driver adds a USB_UART port. */ static void usb_uart_config_port(struct uart_port *port, int flags) { if (flags & UART_CONFIG_TYPE && usb_uart_request_port(port) == 0) { port->type = PORT_USB_UART; } } /* Receive interrupt handler */ static irqreturn_t usb_uart_rxint(int irq, void *dev_id) { struct uart_port *port = (struct uart_port *) dev_id; struct tty_struct *tty = port->info->tty; unsigned int status, data; /* ... */ do { /* ... */ /* Read data */ data = usb_uart_getc(port);

/* Normal, overrun, parity, frame error? */ status = usb_uart_status(port); /* Dispatch to the tty layer */ tty_insert_flip_char(tty, data, status); /* ... */ } while (more_chars_to_be_read()); /* More chars */ /* ... */ tty_flip_buffer_push(tty); return IRQ_HANDLED; } /* Called when an application opens a USB_UART */ static int usb_uart_startup(struct uart_port *port) { int retval = 0; /* ... */ /* Request IRQ */ if ((retval = request_irq(port->irq, usb_uart_rxint, 0, "usb_uart", (void *)port))) { return retval; } /* ... */ return retval; } /* Called when an application closes a USB_UART */ static void usb_uart_shutdown(struct uart_port *port) { /* ... */ /* Free IRQ */ free_irq(port->irq, port); /* Disable interrupts by writing to appropriate registers */ /* ... */ } /* Set UART type to USB_UART */ static const char * usb_uart_type(struct uart_port *port) { return port->type == PORT_USB_UART ? "USB_UART" : NULL; } /* Start transmitting bytes */ static void usb_uart_start_tx(struct uart_port *port) { while (1) { /* Get the data from the UART circular buffer and write it to the USB_UART's WRITE_DATA register */ usb_uart_putc(port, port->info->xmit.buf[port->info->xmit.tail]); /* Adjust the tail of the UART buffer */ port->info->xmit.tail = (port->info->xmit.tail + 1) & (UART_XMIT_SIZE - 1); /* Statistics */

port->icount.tx++; /* Finish if no more data available in the UART buffer */ if (uart_circ_empty(&port->info->xmit)) break; } /* ... */ } /* The UART operations structure */ static struct uart_ops usb_uart_ops = { .start_tx = usb_uart_start_tx, .startup = usb_uart_startup, .shutdown = usb_uart_shutdown, .type = usb_uart_type, .config_port = usb_uart_config_port,

/* /* /* /* /*

Start transmitting */ App opens USB_UART */ App closes USB_UART */ Set UART type */ Configure when driver adds a USB_UART port */ .request_port = usb_uart_request_port,/* Claim resources associated with a USB_UART port */ .release_port = usb_uart_release_port,/* Release resources associated with a USB_UART port */ #if 0 /* Left unimplemented for the USB_UART */ .tx_empty = usb_uart_tx_empty, /* Transmitter busy? */ .set_mctrl = usb_uart_set_mctrl, /* Set modem control */ .get_mctrl = usb_uart_get_mctrl, /* Get modem control .stop_tx = usb_uart_stop_tx, /* Stop transmission */ .stop_rx = usb_uart_stop_rx, /* Stop reception */ .enable_ms = usb_uart_enable_ms, /* Enable modem status signals */ .set_termios = usb_uart_set_termios, /* Set termios */ #endif }; static struct uart_driver usb_uart_reg = { .owner = THIS_MODULE, /* .driver_name = "usb_uart", /* .dev_name = "ttyUU", /* .major = USB_UART_MAJOR, /* .minor = USB_UART_MINOR_START, /* .nr = USB_UART_PORTS, /* .cons = &usb_uart_console, /*

Owner */ Driver name */ Node name */ Major number */ Minor number start */ Number of UART ports */ Pointer to the console structure. Discussed in Chapter 12, "Video Drivers" */

}; /* Called when the platform driver is unregistered */ static int usb_uart_remove(struct platform_device *dev) { platform_set_drvdata(dev, NULL); /* Remove the USB_UART port from the serial core */ uart_remove_one_port(&usb_uart_reg, &usb_uart_port[dev->id]); return 0; } /* Suspend power management event */ static int usb_uart_suspend(struct platform_device *dev, pm_message_t state)

{ uart_suspend_port(&usb_uart_reg, &usb_uart_port[dev->id]); return 0; } /* Resume after a previous suspend */ static int usb_uart_resume(struct platform_device *dev) { uart_resume_port(&usb_uart_reg, &usb_uart_port[dev->id]); return 0; } /* Parameters static struct { .mapbase .iotype .irq .uartclk .fifosize .ops .flags .line }, { .mapbase .iotype .irq .uartclk .fifosize .ops .flags .line } };

of each supported USB_UART port */ uart_port usb_uart_port[] = { = = = = = = = =

(unsigned int) USB_UART1_BASE, UPIO_MEM, /* Memory mapped */ USB_UART1_IRQ, /* IRQ */ USB_UART_CLK_FREQ, /* Clock HZ */ USB_UART_FIFO_SIZE, /* Size of the FIFO */ &usb_uart_ops, /* UART operations */ UPF_BOOT_AUTOCONF, /* UART port flag */ 0, /* UART port number */

= = = = = = = =

(unsigned int)USB_UART2_BASE, UPIO_MEM, /* Memory mapped */ USB_UART2_IRQ, /* IRQ */ USB_UART_CLK_FREQ, /* CLock HZ */ USB_UART_FIFO_SIZE, /* Size of the FIFO */ &usb_uart_ops, /* UART operations */ UPF_BOOT_AUTOCONF, /* UART port flag */ 1, /* UART port number */

/* Platform driver probe */ static int __init usb_uart_probe(struct platform_device *dev) { /* ... */ /* Add a USB_UART port. This function also registers this device with the tty layer and triggers invocation of the config_port() entry point */ uart_add_one_port(&usb_uart_reg, &usb_uart_port[dev->id]); platform_set_drvdata(dev, &usb_uart_port[dev->id]); return 0; } struct platform_device *usb_uart_plat_device1; /* Platform device for USB_UART 1 */ struct platform_device *usb_uart_plat_device2; /* Platform device for USB_UART 2 */ static struct platform_driver usb_uart_driver = { .probe = usb_uart_probe, /* Probe method */ .remove = __exit_p(usb_uart_remove), /* Detach method */

.suspend = usb_uart_suspend, .resume = usb_uart_resume, .driver = { .name = "usb_uart", }, }; /* Driver Initialization */ static int __init usb_uart_init(void) { int retval;

/* Power suspend */ /* Resume after a suspend */ /* Driver name */

/* Register the USB_UART driver with the serial core */ if ((retval = uart_register_driver(&usb_uart_reg))) { return retval; } /* Register platform device for USB_UART 1. Usually called during architecture-specific setup */ usb_uart_plat_device1 = platform_device_register_simple("usb_uart", 0, NULL, 0); if (IS_ERR(usb_uart_plat_device1)) { uart_unregister_driver(&usb_uart_reg); return PTR_ERR(usb_uart_plat_device1); } /* Register platform device for USB_UART 2. Usually called during architecture-specific setup */ usb_uart_plat_device2 = platform_device_register_simple("usb_uart", 1, NULL, 0); if (IS_ERR(usb_uart_plat_device2)) { uart_unregister_driver(&usb_uart_reg); platform_device_unregister(usb_uart_plat_device1); return PTR_ERR(usb_uart_plat_device2); } /* Announce a matching driver for the platform devices registered above */ if ((retval = platform_driver_register(&usb_uart_driver))) { uart_unregister_driver(&usb_uart_reg); platform_device_unregister(usb_uart_plat_device1); platform_device_unregister(usb_uart_plat_device2); } return 0; } /* Driver Exit */ static void __exit usb_uart_exit(void) { /* The order of unregistration is important. Unregistering the UART driver before the platform driver will crash the system */ /* Unregister the platform driver */ platform_driver_unregister(&usb_uart_driver); /* Unregister the platform devices */ platform_device_unregister(usb_uart_plat_device1);

platform_device_unregister(usb_uart_plat_device2); /* Unregister the USB_UART driver */ uart_unregister_driver(&usb_uart_reg); } module_init(usb_uart_init); module_exit(usb_uart_exit);

RS- 4 8 5 RS- 485 is not a st andard PC int erface like RS- 232, but in t he em bedded space, you m ay com e across com put ers t hat use RS- 485 connect ions t o reliably com m unicat e wit h cont rol syst em s. RS- 485 uses different ial signals t hat let it exchange dat a over dist ances of up t o a few t housand feet , unlike RS- 232 t hat has a range of only a few dozen feet . On t he processor side, t he RS- 485 int erface is a UART operat ing in half- duplex m ode. So, before sending dat a from t he t ransm it FI FO t o t he wire, t he UART device driver needs t o addit ionally enable t he RS485 t ransm it t er and disable t he receiver, possibly by wiggling associat ed GPI O pins. To obt ain dat a from t he wire t o t he receive FI FO, t he UART driver has t o perform t he reverse operat ion. You have t o enable/ disable t he RS- 485 t ransm it t er/ receiver at t he right places in t he serial layer. I f you disable t he t ransm it t er t oo soon, it m ight not get sufficient t im e t o drain t he last byt es from t he t ransm it FI FO, and t his can result in dat a t runcat ion. I f you disable t he t ransm it t er t oo lat e, on t he ot her hand, you prevent dat a recept ion for t hat m uch t im e, which m ight lead t o receive dat a loss. RS- 485 support s m ult idrop, so t he higher- layer prot ocol m ust im plem ent a suit able addressing m echanism if you have m ult iple devices connect ed t o t he bus. RS- 485 does not support hardware flow cont rol lines using Request To Send ( RTS) and Clear To Send ( CTS) .

TTY D r ive r s Let 's now t ake a look at t he st ruct ures and regist rat ion funct ions t hat lie at t he heart of t t y drivers. Three st ruct ures are im port ant for t heir operat ion:

1 . struct tty_struct defined in include/ linux/ t t y.h. This st ruct ure cont ains all st at e inform at ion associat ed wit h an open t t y. I t 's an enorm ous st ruct ure, but here are som e im port ant fields: struct tty_struct { int magic; struct tty_driver *driver; struct tty_ldisc ldisc; /* ... */ struct tty_flip_buffer flip;

/* Magic marker */ /* Pointer to the tty driver */ /* Attached Line discipline */ /* Flip Buffer. See below. */

/* ... */ wait_queue_head_t write_wait; wait_queue_head_t read_wait;

/* See the section "Line Disciplines" */ /* See the section "Line Disciplines" */

/* ... */ }; 2 . struct tty_flip_buffer or t he flip buffer em bedded inside tty_struct. This is t he cent erpiece of t he dat a collect ion and processing m echanism : struct tty_flip_buffer { /* ... */ struct semaphore pty_sem; char *char_buf_ptr;

/* Serialize */ /* Pointer to the flip buffer */

/* ... */ unsigned char char_buf[2*TTY_FLIPBUF_SIZE]; /* The flip buffer */ /* ... */ }; The low- level serial driver uses one half of t he flip buffer for gat hering dat a, while t he line discipline uses t he ot her half for processing t he dat a. The buffer point ers used by t he serial driver and t he line discipline are t hen flipped, and t he process cont inues. Have a look at t he funct ion flush_to_ldisc() in drivers/ char/ t t y_io.c t o see t he flip in act ion. I n recent kernels, t he tty_flip_buffer st ruct ure has been som ewhat redesigned. I t 's now m ade up of a buffer header (tty_bufhead) and a buffer list ( tty_buffer) : struct tty_bufhead { /* ... */ struct semaphore pty_sem;

/* Serialize */

struct tty_buffer *head, tail, free; /* See below */ /* ... */ }; struct tty_buffer { struct tty_buffer *next; char *char_buf_ptr; /* ... */ unsigned long data[0];

/* Pointer to the flip buffer */ /* The flip buffer, memory for which is dynamically allocated */

}; 3 . struct tty_driver defined in include/ linux/ t t y_driver.h. This specifies t he program m ing int erface bet ween t t y drivers and higher layers: struct tty_driver { int magic; /* Magic number */ /* ... */ int major; /* Major number */ int minor_start; /* Start of minor number */ /* ... */ /* Interface routines between a tty driver and higher layers */ int (*open)(struct tty_struct *tty, struct file *filp); void (*close)(struct tty_struct *tty, struct file *filp); int (*write)(struct tty_struct *tty, const unsigned char *buf, int count); void (*put_char)(struct tty_struct *tty, unsigned char ch); /* ... */ }; Like a UART driver, a t t y driver needs t o perform t wo st eps t o regist er it self wit h t he kernel: 1.

Call tty_register_driver(struct tty_driver *tty_d) t o regist er it self wit h t he t t y core.

2.

Call

tty_register_device(struct tty_driver *tty_d, unsigned device_index, struct device *device) t o regist er each individual t t y t hat it support s.

We won't develop a sam ple t t y driver, but here are som e com m on ones used on Linux:

Serial em ulat ion over Bluet oot h, discussed in Chapt er 16, is im plem ent ed in t he form of a t t y driver. This driver ( drivers/ net / bluet oot h/ rfcom m / t t y.c) calls tty_register_driver() during init ializat ion and tty_register_device() while handling each incom ing Bluet oot h connect ion.

To work wit h a syst em console on a Linux deskt op, you need t he services of virt ual t erm inals ( VTs) if you

are in t ext m ode or pseudo t erm inals ( PTYs) if you are in graphics m ode. VTs and PTYs are im plem ent ed as t t y drivers and live in drivers/ char/ vt .c and drivers/ char/ pt y.c, respect ively.

The t t y driver used over convent ional UARTs resides in drivers/ serial/ serial_core.c.

The USB- serial t t y driver is in drivers/ usb/ serial/ usb- serial.c.

Lin e D isciplin e s Line disciplines provide an elegant m echanism t hat let s you use t he sam e serial driver t o run different t echnologies. The low- level physical driver and t he t t y driver handle t he t ransfer of dat a t o and from t he hardware, while line disciplines are responsible for processing t he dat a and t ransferring it bet ween kernel space and user space. The serial subsyst em support s 17 st andard line disciplines. The default line discipline t hat get s at t ached when you open a serial port is N_TTY, which im plem ent s t erm inal I / O processing. N_TTY is responsible for " cooking" charact ers received from t he keyboard. Depending on user request , it m aps t he cont rol charact er t o newline, convert s lowercase t o uppercase, expands t abs, and echoes charact ers t o t he associat ed VT. N_TTY also support s a raw m ode used by edit ors, which leaves all t he preceding processing t o user applicat ions. See Figure 7.3 in t he next chapt er, " I nput Drivers," t o learn how t he keyboard subsyst em is connect ed t o N_TTY. The exam ple t t y drivers list ed at t he end of t he previous sect ion, "TTY Drivers," use N_TTY by default . Line disciplines also im plem ent net work int erfaces over serial t ransport prot ocols. For exam ple, line disciplines t hat are part of t he Point - t o- Point Prot ocol ( N_PPP) and t he Serial Line I nt ernet Prot ocol ( N_SLIP) subsyst em s, fram e packet s, allocat e and populat e associat ed net working dat a st ruct ures, and pass t he dat a on t o t he corresponding net work prot ocol st ack. Ot her line disciplines handle I nfrared Dat a ( N_IRDA) and t he Bluet oot h Host Cont rol I nt erface ( N_HCI) .

D e vice Ex a m ple : Tou ch Con t r olle r Let 's t ake a look at t he int ernals of a line discipline by im plem ent ing a sim ple line discipline for a serial t ouchscreen cont roller. Figure 6.6 shows how t he t ouch cont roller is connect ed on an em bedded lapt op derivat ive. The Finit e St at e Machine ( FSM) of t he t ouch cont roller is a candidat e for being im plem ent ed as a line discipline because it can leverage t he facilit ies and int erfaces offered by t he serial layer.

Figu r e 6 .6 . Con n e ct ion dia gr a m of a t ou ch con t r olle r on a PC- de r iva t ive .

Ope n a n d Close To creat e a line discipline, define a struct tty_ldisc and regist er a prescribed set of ent ry point s wit h t he kernel. List ing 6.2 cont ains a code snippet t hat perform s bot h t hese operat ions for t he exam ple t ouch cont roller. List in g 6 .2 . Lin e D isciplin e Ope r a t ion s Code View: struct tty_ldisc n_touch_ldisc = { TTY_LDISC_MAGIC, /* Magic */ "n_tch", /* Name of the line discipline */ N_TCH, /* Line discipline ID number */ n_touch_open, /* Open the line discipline */ n_touch_close, /* Close the line discipline */ n_touch_flush_buffer, /* Flush the line discipline's read buffer */ n_touch_chars_in_buffer, /* Get the number of processed characters in the line discipline's read buffer */ n_touch_read, /* Called when data is requested from user space */ n_touch_write, /* Write method */ n_touch_ioctl, /* I/O Control commands */ NULL, /* We don't have a set_termios routine */ n_touch_poll, /* Poll */ n_touch_receive_buf, /* Called by the low-level driver to pass data to user space*/ n_touch_receive_room, /* Returns the room left in the line discipline's read buffer */ n_touch_write_wakeup /* Called when the low-level device driver is ready to transmit more data */ }; /* ... */ if ((err = tty_register_ldisc(N_TCH, &n_touch_ldisc))) { return err; }

I n List ing 6.2, n_tch is t he nam e of t he line discipline, and N_TCH is t he line discipline ident ifier num ber. You have t o specify t he value of N_TCH in include/ linux/ t t y.h, t he header file t hat cont ains all line discipline definit ions. Line disciplines act ive on your syst em can be found in / proc/ t t y/ ldiscs. Line disciplines gat her dat a from t heir half of t he t t y flip buffer, process it , and copy t he result ing dat a t o a local read buffer. For N_TCH, n_touch_receive_room() ret urns t he m em ory left in t he read buffer, while n_touch_chars_in_buffer() ret urns t he num ber of processed charact ers in t he read buffer t hat are ready t o be delivered t o user space. n_touch_write() and n_touch_write_wakeup() do not hing because N_TCH is a read- only device. n_touch_open() t akes care of allocat ing m em ory for t he m ain line discipline dat a st ruct ures, as shown in List ing 6.3. List in g 6 .3 . Ope n in g t h e Lin e D isciplin e

Code View: /* Private structure used to implement the Finite State Machine (FSM) for the touch controller. The controller and the processor communicate using a specific protocol that the FSM implements */ struct n_touch { int current_state; /* Finite State Machine */ spinlock_t touch_lock; /* Spinlock */ struct tty_struct *tty; /* Associated tty */ /* Statistics and other housekeeping */ /* ... */ } *n_tch;

/* Device open() */ static int n_touch_open(struct tty_struct *tty) { /* Allocate memory for n_tch */ if (!(n_tch = kmalloc(sizeof(struct n_touch), GFP_KERNEL))) { return -ENOMEM; } memset(n_tch, 0, sizeof(struct n_touch)); tty->disc_data = n_tch; /* Other entry points now have direct access to n_tch */ /* Allocate the line discipline's local read buffer used for copying data out of the tty flip buffer */ tty->read_buf = kmalloc(BUFFER_SIZE, GFP_KERNEL); if (!tty->read_buf) return -ENOMEM; /* Clear the read buffer */ memset(tty->read_buf, 0, BUFFER_SIZE); /* Initialize lock */ spin_lock_init(&ntch->touch_lock); /* Initialize other necessary tty fields. See drivers/char/n_tty.c for an example */ /* ... */ return 0; }

You m ight want t o set N_TCH as t he default line discipline ( rat her t han N_TTY) when- ever t he serial port connect ed t o t he t ouch cont roller is opened. See t he sect ion " Changing Line Disciplines" t o see how t o change line disciplines from user space.

Re a d Pa t h For int errupt - driven devices, t he read dat a pat h usually consist s of t wo t hreads working in t andem :

1 . A t op t hread originat ing from t he user process request ing t he read

2 . A bot t om t hread springing from t he int errupt service rout ine t hat collect s dat a from t he device

Figure 6.7 shows t hese t hreads associat ed wit h t he read dat a flow. The int errupt handler queues t he receive_buf() m et hod ( n_touch_receive_buf() in our exam ple) as a t ask. You can override t his behavior by set t ing t he tty->low_latency flag.

Figu r e 6 .7 . Lin e disciplin e r e a d pa t h .

[ View full size im age]

The t ouch cont roller and t he processor com m unicat e using a specific prot ocol described in t he cont roller's dat a sheet . The driver im plem ent s t his com m unicat ion prot ocol using an FSM as indicat ed earlier. List ing 6.4 encodes t his FSM as part of t he receive_buf() ent ry point , n_touch_receive_buf().

List in g 6 .4 . Th e n_touch_receive_buf() M e t h od

Code View: static void n_touch_receive_buf(struct tty_struct *tty, const unsigned char *cp, char *fp, int count) { /* Work on the data in the line discipline's half of the flip buffer pointed to by cp */ /* ... */ /* Implement the Finite State Machine to interpret commands/data arriving from the touch controller and put the processed data into the local read buffer */ .................................................................................... /* Datasheet-dependent Code Region */ switch (tty->disc_data->current_state) { case RESET: /* Issue a reset command to the controller */ tty->driver->write(tty, 0, mode_stream_command, sizeof(mode_stream_command)); tty->disc_data->current_state = STREAM_DATA; /* ... */ break; case STREAM_DATA: /* ... */ break; case PARSING: /* ... */ tty->disc_data->current_state = PARSED; break; case PARSED: /* ... */ } .................................................................................... if (tty->disc_data->current_state == PARSED) { /* If you have a parsed packet, copy the collected coordinate and direction information into the local read buffer */ spin_lock_irqsave(&tty->disc_data->touch_lock, flags); for (i=0; i < PACKET_SIZE; i++) { tty->disc_data->read_buf[tty->disc_data->read_head] = tty->disc_data->current_pkt[i]; tty->disc_data->read_head = (tty->disc_data->read_head + 1) & (BUFFER_SIZE - 1); tty->disc_data->read_cnt++; } spin_lock_irqrestore(&tty->disc_data->touch_lock, flags); /* ... */ /* See Listing 6.5 */ } }

n_touch_receive_buf() processes t he dat a arriving from t he serial driver. I t exchanges a series of com m ands and responses wit h t he t ouch cont roller and put s t he received coordinat e and direct ion ( press/ release)

inform at ion int o t he line discipline's read buffer. Accesses t o t he read buffer have t o be serialized using a spinlock because it 's used by bot h ldisc.receive_buf() and ldisc.read() t hreads shown in Figure 6.7 (n_touch_receive_buf() and n_touch_read(), respect ively, in our exam ple) . As you can see in List ing 6.4, n_touch_receive_buf() dispat ches com m ands t o t he t ouch cont roller by direct ly calling t he write() ent ry point of t he serial driver. n_touch_receive_buf() needs t o do a couple m ore t hings:

1 . The t op read() t hread in Figure 6.7 put s t he calling process t o sleep if no dat a is available. So, n_touch_receive_buf() has t o wake up t hat t hread and let it read t he dat a t hat was j ust processed.

2 . I f t he line discipline is running out of read buffer space, n_touch_receive_buf() has t o request t he serial driver t o t hrot t le dat a arriving from t he device. ldisc.read() is responsible for request ing t he corresponding unt hrot t ling when it ferries t he dat a t o user space and frees m em ory in t he read buffer. The serial driver uses soft ware or hardware flow cont rol m echanism s t o achieve t he t hrot t ling and unt hrot t ling.

List ing 6.5 perform s t hese t wo operat ions. List in g 6 .5 . W a k in g Up t h e Re a d Th r e a d a n d Th r ot t lin g t h e Se r ia l D r ive r /* n_touch_receive_buf() continued.. */ /* Wake up any threads waiting for data */ if (waitqueue_active(&tty->read_wait) && (tty->read_cnt >= tty->minimum_to_wake)) wake_up_interruptible(&tty->read_wait); } /* If we are running out of buffer space, request the serial driver to throttle incoming data */ if (n_touch_receive_room(tty) < TOUCH_THROTTLE_THRESHOLD) { tty->driver.throttle(tty); } /* ... */

A wait queue (tty->read_wait) is used t o synchronize bet ween t he ldisc.read() and ldisc.receive_buf() t hreads. ldisc.read() adds t he calling process t o t he wait queue when it does not find dat a t o read, while ldisc.receive_buf() wakes t he ldisc.read() t hread when t here is dat a available t o be read. So, n_touch_read() does t he following:

I f t here is no dat a t o be read yet , put t he calling process t o sleep on t he read_wait queue. The process get s woken by n_touch_receive_buf() when dat a arrives.

I f dat a is available, collect it from t he local read buffer ( tty->read_buf[tty->read_tail]) and dispat ch it t o user space.

I f t he serial driver has been t hrot t led and if enough space is available in t he read buffer aft er t his read, ask t he serial driver t o unt hrot t le.

Net working line disciplines usually allocat e an sk_buff ( t he basic Linux net working dat a st ruct ure discussed in Chapt er 15, " Net work I nt erface Cards" ) and use t his as t he read buffer. They don't have a read() m et hod, because t he corresponding receive_buf() copies received dat a int o t he allocat ed sk_buff and direct ly passes it t o t he associat ed prot ocol st ack.

W r it e Pa t h A line discipline's write() ent ry point perform s any post processing t hat is required before passing t he dat a down t o t he low- level driver. I f t he underlying driver is not able t o accept all t he dat a t hat t he line discipline offers, t he line discipline put s t he request ing t hread t o sleep. The driver's int errupt handler wakes t he line discipline when it is ready t o receive m ore dat a. To do t his, t he driver calls t he write_wakeup() m et hod regist ered by t he line discipline. The associat ed synchronizat ion is done using a wait queue (tty->write_wait) , and t he operat ion is sim ilar t o t hat of t he read_wait queue described in t he previous sect ion. Many net working line disciplines have no write() m et hods. The prot ocol im plem ent at ion direct ly t ransm it s t he fram es down t o t he serial device driver. However, t hese line disciplines usually st ill have a write_wakeup() ent ry point t o respond t o t he serial driver's request for m ore t ransm it dat a. N_TCH does not need a write() m et hod eit her, because t he t ouch cont roller is a read- only device. As you saw in List ing 6.4, rout ines in t he receive pat h direct ly t alk t o t he low- level UART driver when t hey need t o send com m and fram es t o t he cont roller.

I / O Con t r ol A user program can send com m ands t o a device via ioctl() calls, as discussed in Chapt er 5 , " Charact er Drivers." When an applicat ion opens a serial device, it can usually issue t hree classes of ioct ls t o it :

Com m ands support ed by t he serial device driver, such as TIOCMSET t hat set s m odem inform at ion

Com m ands support ed by t he t t y driver, such as TIOCSETD t hat changes t he at t ached line discipline

Com m ands support ed by t he at t ached line discipline, such as a com m and t o reset t he t ouch cont roller in t he case of N_TCH

The ioctl() im plem ent at ion for N_TCH is largely st andard. Support ed com m ands depend on t he prot ocol described in t he t ouch cont roller's dat a sheet .

M or e Ope r a t ion s Anot her line discipline operat ion is flush_buffer(), which is used t o flush any dat a pending in t he read buffer. flush_buffer() is also called when a line discipline is closed. I t wakes up any read t hreads t hat are wait ing for m ore dat a as follows: if (tty->link->packet){ wake_up_interruptible(&tty->disc_data->read_wait); }

Yet anot her ent ry point ( not support ed by N_TCH) is set_termios(). The N_TTY line discipline support s t he

set_termios() int erface, which is used t o set opt ions specific t o line discipline dat a processing. For exam ple, you m ay use set_termios() t o put t he line discipline in raw m ode or " cooked" m ode. Som e opt ions specific t o t he t ouch cont roller ( such as changing t he baud rat e, parit y, and num ber of st op bit s) are im plem ent ed by t he set_termios() m et hod of t he underlying device driver. The rem aining ent ry point s such as poll() are fairly st andard, and you can ret urn t o Chapt er 5 in case you need assist ance. You m ay com pile your line discipline as part of t he kernel or dynam ically load it as a m odule. I f you choose t o com pile it as a m odule, you m ust , of course, also provide funct ions t o be called during m odule init ializat ion and exit . The form er is usually t he sam e as t he init() m et hod. The lat t er needs t o clean up privat e dat a st ruct ures and unregist er t he line discipline. Unregist ering t he discipline is a one- liner: tty_unregister_ldisc(N_TCH);

An easier way t o drive a serial t ouch cont roller is by leveraging t he services offered by t he kernel's input subsyst em and t he built - in serport line discipline. We look at t hat t echnique in t he next chapt er.

Ch a n gin g Lin e D isciplin e s N_TCH get s bound t o t he low- level serial driver when a user space program opens t he serial port connect ed t o t he t ouch cont roller. But som et im es, a user- space applicat ion m ight want t o at t ach a different line discipline t o t he device. For inst ance, you m ight want t o writ e a program t hat dum ps raw dat a received from t he t ouch cont roller wit hout processing it . List ing 6.6 opens t he t ouch cont roller and changes t he line discipline t o N_TTY t o dum p t he dat a t hat is com ing in. List in g 6 .6 . Ch a n gin g a Lin e D isciplin e fr om Use r Spa ce fd = open("/dev/ttySX", O_RDONLY | O_NOCTTY); /* At this point, N_TCH is attached to /dev/ttySX, the serial port used by the touch controller. Switch to N_TTY */ ldisc = N_TTY; ioctl(fd, TIOCSETD, &ldisc); /* Set termios to raw mode and dump the data coming in */ /* ... */

The TIOCSETD ioctl() closes t he current line discipline and opens t he newly request ed line discipline.

Look in g a t t h e Sou r ce s The serial core resides in drivers/ serial/ , but t t y im plem ent at ions and low- level drivers are scat t ered across t he source t ree. The driver files referred t o in Figure 6.3, for exam ple, live in four different direct ories: drivers/ serial/ , drivers/ char/ , drivers/ usb/ serial/ , and drivers/ net / irda/ . The drivers/ serial/ direct ory, which now also cont ains UART drivers, didn't exist in t he 2.4 kernel; UART- specific code used t o be dispersed bet ween drivers/ char/ and arch/ your- arch/ direct ories. The present code part it ioning is m ore logical because UART drivers are not t he only folks t hat access t he serial layer—devices such as USB- t o- serial convert ers and I rDA dongles also need t o t alk t o t he serial core. Look at drivers/ serial/ im x.c for a real- world, low- level UART driver. I t handles UARTs t hat are part of Freescale's i.MX series of em bedded cont rollers. For a list of line disciplines support ed on Linux, see include/ linux/ t t y.h. To get a feel of net working line disciplines, look at t he corresponding source files for PPP (drivers/ net / ppp_async.c) , Bluet oot h (drivers/ bluet oot h/ hci_ldisc.c) , I nfrared ( drivers/ net / irda/ irt t y- sir.c) , and SLI P (drivers/ net / slip.c) . Table 6.3 cont ains a sum m ary of t he m ain dat a st ruct ures used in t his chapt er and t he locat ion of t heir definit ions in t he source t ree. Table 6.4 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 6 .3 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

uart_driver

include/ linux/ serial_core.h

Represent at ion of a low- level UART driver.

uart_port

include/ linux/ serial_core.h

Represent at ion of a UART port .

uart_ops

include/ linux/ serial_core.h

Ent ry point s support ed by UART drivers.

platform_device

include/ linux/ plat form _device.h

Represent at ion of a plat form device.

platform_driver

include/ linux/ plat form _device.h

Represent at ion of a plat form driver.

tty_struct

include/ linux/ t t y.h

St at e inform at ion about a t t y.

tty_bufhead, tty_buffer

include/ linux/ t t y.h

These t wo st ruct ures im plem ent t he flip buffer associat ed wit h a t t y.

tty_driver

include/ linux/ t t y_driver.h

Program m ing int erface bet ween t t y drivers and higher layers.

tty_ldisc

include/ linux/ t t y_ldisc.h

Ent ry point s support ed by a line discipline.

Ta ble 6 .4 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

uart_register_driver()

drivers/ serial/ sderial_core.c

Regist ers a UART driver wit h t he serial core

uart_add_one_port()

drivers/ serial/ sderial_core.c

Regist ers a UART port support ed by t he UART driver

uart_unregister_driver()

drivers/ serial/ sderial_core.c

Rem oves a UART driver from t he serial core

platform_device register() platform_device_register_simple() platform_add_devices()

drivers/ base/ plat form .c

Regist ers a plat form device

platform_device_unregister()

drivers/ base/ plat form .c

Unregist ers a plat form device

platform_driver_register()/ platform_driver_unregister()

drivers/ base/ plat form .c

Regist ers/ unregist ers a plat form driver

tty_insert_flip_char()

include/ linux/ t t y_flip.h

Adds a charact er t o t he t t y flip buffer

tty_flip_buffer_push()

drivers/ char/ t t y_io.c

Queues a request t o push t he flip buffer t o t he line discipline

tty_register_driver()

drivers/ char/ t t y_io.c

Regist ers a t t y driver wit h t he serial core

tty_unregister_driver()

drivers/ char/ t t y_io.c

Rem oves a t t y driver from t he serial core

tty_register_ldisc()

drivers/ char/ t t y_io.c

Creat es a line discipline by regist ering prescribed ent ry point s

tty_unregister_ldisc()

drivers/ char/ t t y_io.c

Rem oves a line discipline from t he serial core

Som e serial dat a t ransfer scenarios are com plex. You m ight need t o m ix and m at ch different serial layer blocks, as you saw in Figure 6.3. Som e sit uat ions m ay necessit at e a dat a pat h passing t hrough m ult iple line disciplines. For exam ple, set t ing up a dialup connect ion over Bluet oot h involves t he m ovem ent of dat a t hrough t he HCI line discipline as well as t he PPP line discipline. I f you can, est ablish such a connect ion and st ep t hrough t he code flow using a kernel debugger.

Ch a pt e r 7 . I n pu t D r ive r s I n Th is Ch a pt e r 210 I nput Event Drivers 216 I nput Device Drivers 230 Debugging 231 Looking at t he Sources

The kernel's input subsyst em was creat ed t o unify scat t ered drivers t hat handle diverse classes of dat a- input devices such as keyboards, m ice, t rackballs, j oyst icks, roller wheels, t ouch screens, accelerom et ers, and t ablet s. The input subsyst em brings t he following advant ages t o t he t able:

Uniform handling of funct ionally sim ilar input devices even when t hey are physically different . For exam ple, all m ice, such as PS/ 2, USB or Bluet oot h, are t reat ed alike.

An easy event int erface for dispat ching input report s t o user applicat ions. Your driver does not have t o creat e and m anage / dev nodes and relat ed access m et hods. I nst ead, it can sim ply invoke input API s t o send m ouse m ovem ent s, key presses, or t ouch event s upst ream t o user land. Applicat ions such as X Windows work seam lessly over t he event int erfaces export ed by t he input subsyst em .

Ext ract ion of com m on port ions out of input drivers and a result ing abst ract ion t hat sim plifies t he drivers and int roduces consist ency. For exam ple, t he input subsyst em offers a collect ion of low- level drivers called serio t hat provides access t o input hardware such as serial port s and keyboard cont rollers.

Figure 7.1 illust rat es t he operat ion of t he input subsyst em . The subsyst em cont ains t wo classes of drivers t hat work in t andem : event drivers and dev ice drivers. Event drivers are responsible for int erfacing wit h applicat ions, whereas device drivers are responsible for low- level com m unicat ion wit h input devices. The m ouse event generat or, m ousedev , is an exam ple of t he form er, and t he PS/ 2 m ouse driver is an exam ple of t he lat t er. Bot h event drivers and device drivers can avail t he services of an efficient , bug- free, reusable core, which lies at t he heart of t he input subsyst em .

Figu r e 7 .1 . Th e in pu t su bsyst e m .

[ View full size im age]

Because event drivers are st andardized and available for all input classes, you are m ore likely t o im plem ent a device driver t han an event driver. Your device driver can use a suit able exist ing event driver via t he input core t o int erface wit h user applicat ions. Not e t hat t his chapt er uses t he t erm dev ice driver t o refer t o an input device driver as opposed t o an input event driver.

I n pu t Eve n t D r ive r s The event int erfaces export ed by t he input subsyst em have evolved int o a st andard t hat m any graphical windowing syst em s underst and. Event drivers offer a hardware- independent abst ract ion t o t alk t o input devices, j ust as t he fram e buffer int erface ( discussed in Chapt er 12, " Video Drivers" ) present s a generic m echanism t o com m unicat e wit h display devices. Event drivers, in t andem wit h fram e buffer drivers, insulat e graphical user int erfaces ( GUI s) from t he vagaries of t he underlying hardware.

Th e Evde v I n t e r fa ce Evdev is a generic input event driver. Each event packet produced by evdev has t he following form at , defined in include/ linux/ input .h: struct input_event { struct timeval time; __u16 type; __u16 code; __s32 value; };

/* /* /* /*

Timestamp */ Event Type */ Event Code */ Event Value */

To learn how t o use evdev, let 's im plem ent an input device driver for a virt ual m ouse.

D e vice Ex a m ple : Vir t u a l M ou se This is how our virt ual m ouse works: An applicat ion (coord.c) em ulat es m ouse m ovem ent s and dispat ches coordinat e inform at ion t o t he virt ual m ouse driver ( vm s.c) via a sysfs node, / sys/ devices/ plat form / vm s/ coordinat es. The virt ual m ouse driver (vm s driver for short ) channels t hese m ovem ent s upst ream via evdev. Figure 7.2 shows t he det ails.

Figu r e 7 .2 . An in pu t dr ive r for a vir t u a l m ou se .

[ View full size im age]

General- purpose m ouse ( gpm ) is a server t hat let s you use a m ouse in t ext m ode wit hout assist ance from an X server. Gpm underst ands evdev m essages, so t he vm s driver can direct ly com m unicat e wit h it . Aft er you have everyt hing in place, you can see t he cursor dancing over your screen t o t he t une of t he virt ual m ouse m ovem ent s st ream ed by coord.c. List ing 7.1 cont ains coord.c, which cont inuously generat es random X and Y coordinat es. Mice, unlike j oyst icks or t ouch screens, produce relat ive coordinat es, so t hat is what coord.c does. The vm s driver is shown in List ing 7.2 .

List in g 7 .1 . Applica t ion t o Sim u la t e M ou se M ove m e n t s ( coor d.c)

Code View: #include int main(int argc, char *argv[]) { int sim_fd; int x, y; char buffer[10]; /* Open the sysfs coordinate node */ sim_fd = open("/sys/devices/platform/vms/coordinates", O_RDWR); if (sim_fd < 0) { perror("Couldn't open vms coordinate file\n"); exit(-1); } while (1) { /* Generate random relative coordinates */ x = random()%20; y = random()%20; if (x%2) x = -x; if (y%2) y = -y; /* Convey simulated coordinates to the virtual mouse driver */ sprintf(buffer, "%d %d %d", x, y, 0); write(sim_fd, buffer, strlen(buffer)); fsync(sim_fd); sleep(1); } close(sim_fd); }

List in g 7 .2 . I n pu t D r ive r for t h e Vir t u a l M ou se ( vm s.c) Code View: #include #include #include #include #include struct input_dev *vms_input_dev; /* Representation of an input device */ static struct platform_device *vms_dev; /* Device structure */ /* Sysfs method to input simulated coordinates to the virtual mouse driver */ static ssize_t write_vms(struct device *dev, struct device_attribute *attr, const char *buffer, size_t count) { int x,y; sscanf(buffer, "%d%d", &x, &y);

/* Report relative coordinates via the event interface */ input_report_rel(vms_input_dev, REL_X, x); input_report_rel(vms_input_dev, REL_Y, y); input_sync(vms_input_dev); return count; } /* Attach the sysfs write method */ DEVICE_ATTR(coordinates, 0644, NULL, write_vms); /* Attribute Descriptor */ static struct attribute *vms_attrs[] = { &dev_attr_coordinates.attr, NULL }; /* Attribute group */ static struct attribute_group vms_attr_group = { .attrs = vms_attrs, }; /* Driver Initialization */ int __init vms_init(void) { /* Register a platform device */ vms_dev = platform_device_register_simple("vms", -1, NULL, 0); if (IS_ERR(vms_dev)) { PTR_ERR(vms_dev); printk("vms_init: error\n"); } /* Create a sysfs node to read simulated coordinates */ sysfs_create_group(&vms_dev->dev.kobj, &vms_attr_group); /* Allocate an input device data structure */ vms_input_dev = input_allocate_device(); if (!vms_input_dev) { printk("Bad input_alloc_device()\n"); } /* Announce that the virtual mouse will generate relative coordinates */ set_bit(EV_REL, vms_input_dev->evbit); set_bit(REL_X, vms_input_dev->relbit); set_bit(REL_Y, vms_input_dev->relbit); /* Register with the input subsystem */ input_register_device(vms_input_dev); printk("Virtual Mouse Driver Initialized.\n"); return 0; } /* Driver Exit */ void

vms_cleanup(void) { /* Unregister from the input subsystem */ input_unregister_device(vms_input_dev); /* Cleanup sysfs node */ sysfs_remove_group(&vms_dev->dev.kobj, &vms_attr_group); /* Unregister driver */ platform_device_unregister(vms_dev); return; } module_init(vms_init); module_exit(vms_cleanup);

Let 's t ake a closer look at List ing 7.2. During init ializat ion, t he vm s driver regist ers it self as an input device driver. For t his, it first allocat es an input_dev st ruct ure using t he core API , input_allocate_device(): vms_input_dev = input_allocate_device();

I t t hen announces t hat t he virt ual m ouse generat es relat ive event s: set_bit(EV_REL, vms_input_dev->evbit);

/* Event Type is EV_REL */

Next , it declares t he event codes t hat t he virt ual m ouse produces: set_bit(REL_X, vms_input_dev->relbit); /* Relative 'X' movement */ set_bit(REL_Y, vms_input_dev->relbit); /* Relative 'Y' movement */

I f your virt ual m ouse is also capable of generat ing but t on clicks, you need t o add t his t o vms_init(): set_bit(EV_KEY, vms_input_dev->evbit); /* Event Type is EV_KEY */ set_bit(BTN_0, vms_input_dev->keybit); /* Event Code is BTN_0 */

Finally, t he regist rat ion: input_register_device(vms_input_dev);

write_vms() is t he sysfs store() m et hod t hat at t aches t o / sys/ devices/ plat form / vm s/ coordinat es. When coord.c writ es an X/ Y pair t o t his file, write_vms() does t he following: input_report_rel(vms_input_dev, REL_X, x); input_report_rel(vms_input_dev, REL_Y, y); input_sync(vms_input_dev);

The first st at em ent generat es a REL_X event or a relat ive device m ovem ent in t he X direct ion. The second produces a REL_Y event or a relat ive m ovem ent in t he Y direct ion. input_sync() indicat es t hat t his event is com plet e, so t he input subsyst em collect s t hese t wo event s int o a single evdev packet and sends it out of t he door t hrough / dev/ input / event X, where X is t he int erface num ber assigned t o t he vm s driver. An applicat ion reading t his file will receive event packet s in t he input_event form at described earlier. To request gpm t o at t ach t o t his event int erface and accordingly chase t he cursor around your screen, do t his: bash> gpm -m /dev/input/eventX -t evdev

The ADS7846 t ouch cont roller driver and t he accelerom et er driver, discussed respect ively under t he sect ions "Touch Cont rollers" and " Accelerom et ers" lat er, are also evdev users.

M or e Eve n t I n t e r fa ce s The vm s driver ut ilizes t he generic evdev event int erface, but input devices such as keyboards, m ice, and t ouch cont rollers have cust om event drivers. We will look at t hem when we discuss t he corresponding device drivers. To writ e your own event driver and export it t o user space via / dev/ input / m ydev, you have t o populat e a st ruct ure called input_handler and regist er it wit h t he input core as follows: Code View: static struct input_handler my_event_handler = { .event = mydev_event, /* Handle event reports sent by input device drivers that use this event driver's services */ .fops = &mydev_fops, /* Methods to manage /dev/input/mydev */ .minor = MYDEV_MINOR_BASE, /* Minor number of /dev/input/mydev */ .name = "mydev", /* Event driver name */ .id_table = mydev_ids, /* This event driver can handle requests from these IDs */ .connect = mydev_connect, /* Invoked if there is an ID match */ .disconnect = mydev_disconnect, /* Called when the driver unregisters */ }; /* Driver Initialization */ static int __init mydev_init(void) { /* ... */ input_register_handler(&my_event_handler); /* ... */ return 0; }

Look at t he im plem ent at ion of m ousedev ( drivers/ input / m ousedev.c) for a com plet e exam ple.

Ch a pt e r 7 . I n pu t D r ive r s I n Th is Ch a pt e r 210 I nput Event Drivers 216 I nput Device Drivers 230 Debugging 231 Looking at t he Sources

The kernel's input subsyst em was creat ed t o unify scat t ered drivers t hat handle diverse classes of dat a- input devices such as keyboards, m ice, t rackballs, j oyst icks, roller wheels, t ouch screens, accelerom et ers, and t ablet s. The input subsyst em brings t he following advant ages t o t he t able:

Uniform handling of funct ionally sim ilar input devices even when t hey are physically different . For exam ple, all m ice, such as PS/ 2, USB or Bluet oot h, are t reat ed alike.

An easy event int erface for dispat ching input report s t o user applicat ions. Your driver does not have t o creat e and m anage / dev nodes and relat ed access m et hods. I nst ead, it can sim ply invoke input API s t o send m ouse m ovem ent s, key presses, or t ouch event s upst ream t o user land. Applicat ions such as X Windows work seam lessly over t he event int erfaces export ed by t he input subsyst em .

Ext ract ion of com m on port ions out of input drivers and a result ing abst ract ion t hat sim plifies t he drivers and int roduces consist ency. For exam ple, t he input subsyst em offers a collect ion of low- level drivers called serio t hat provides access t o input hardware such as serial port s and keyboard cont rollers.

Figure 7.1 illust rat es t he operat ion of t he input subsyst em . The subsyst em cont ains t wo classes of drivers t hat work in t andem : event drivers and dev ice drivers. Event drivers are responsible for int erfacing wit h applicat ions, whereas device drivers are responsible for low- level com m unicat ion wit h input devices. The m ouse event generat or, m ousedev , is an exam ple of t he form er, and t he PS/ 2 m ouse driver is an exam ple of t he lat t er. Bot h event drivers and device drivers can avail t he services of an efficient , bug- free, reusable core, which lies at t he heart of t he input subsyst em .

Figu r e 7 .1 . Th e in pu t su bsyst e m .

[ View full size im age]

Because event drivers are st andardized and available for all input classes, you are m ore likely t o im plem ent a device driver t han an event driver. Your device driver can use a suit able exist ing event driver via t he input core t o int erface wit h user applicat ions. Not e t hat t his chapt er uses t he t erm dev ice driver t o refer t o an input device driver as opposed t o an input event driver.

I n pu t Eve n t D r ive r s The event int erfaces export ed by t he input subsyst em have evolved int o a st andard t hat m any graphical windowing syst em s underst and. Event drivers offer a hardware- independent abst ract ion t o t alk t o input devices, j ust as t he fram e buffer int erface ( discussed in Chapt er 12, " Video Drivers" ) present s a generic m echanism t o com m unicat e wit h display devices. Event drivers, in t andem wit h fram e buffer drivers, insulat e graphical user int erfaces ( GUI s) from t he vagaries of t he underlying hardware.

Th e Evde v I n t e r fa ce Evdev is a generic input event driver. Each event packet produced by evdev has t he following form at , defined in include/ linux/ input .h: struct input_event { struct timeval time; __u16 type; __u16 code; __s32 value; };

/* /* /* /*

Timestamp */ Event Type */ Event Code */ Event Value */

To learn how t o use evdev, let 's im plem ent an input device driver for a virt ual m ouse.

D e vice Ex a m ple : Vir t u a l M ou se This is how our virt ual m ouse works: An applicat ion (coord.c) em ulat es m ouse m ovem ent s and dispat ches coordinat e inform at ion t o t he virt ual m ouse driver ( vm s.c) via a sysfs node, / sys/ devices/ plat form / vm s/ coordinat es. The virt ual m ouse driver (vm s driver for short ) channels t hese m ovem ent s upst ream via evdev. Figure 7.2 shows t he det ails.

Figu r e 7 .2 . An in pu t dr ive r for a vir t u a l m ou se .

[ View full size im age]

General- purpose m ouse ( gpm ) is a server t hat let s you use a m ouse in t ext m ode wit hout assist ance from an X server. Gpm underst ands evdev m essages, so t he vm s driver can direct ly com m unicat e wit h it . Aft er you have everyt hing in place, you can see t he cursor dancing over your screen t o t he t une of t he virt ual m ouse m ovem ent s st ream ed by coord.c. List ing 7.1 cont ains coord.c, which cont inuously generat es random X and Y coordinat es. Mice, unlike j oyst icks or t ouch screens, produce relat ive coordinat es, so t hat is what coord.c does. The vm s driver is shown in List ing 7.2 .

List in g 7 .1 . Applica t ion t o Sim u la t e M ou se M ove m e n t s ( coor d.c)

Code View: #include int main(int argc, char *argv[]) { int sim_fd; int x, y; char buffer[10]; /* Open the sysfs coordinate node */ sim_fd = open("/sys/devices/platform/vms/coordinates", O_RDWR); if (sim_fd < 0) { perror("Couldn't open vms coordinate file\n"); exit(-1); } while (1) { /* Generate random relative coordinates */ x = random()%20; y = random()%20; if (x%2) x = -x; if (y%2) y = -y; /* Convey simulated coordinates to the virtual mouse driver */ sprintf(buffer, "%d %d %d", x, y, 0); write(sim_fd, buffer, strlen(buffer)); fsync(sim_fd); sleep(1); } close(sim_fd); }

List in g 7 .2 . I n pu t D r ive r for t h e Vir t u a l M ou se ( vm s.c) Code View: #include #include #include #include #include struct input_dev *vms_input_dev; /* Representation of an input device */ static struct platform_device *vms_dev; /* Device structure */ /* Sysfs method to input simulated coordinates to the virtual mouse driver */ static ssize_t write_vms(struct device *dev, struct device_attribute *attr, const char *buffer, size_t count) { int x,y; sscanf(buffer, "%d%d", &x, &y);

/* Report relative coordinates via the event interface */ input_report_rel(vms_input_dev, REL_X, x); input_report_rel(vms_input_dev, REL_Y, y); input_sync(vms_input_dev); return count; } /* Attach the sysfs write method */ DEVICE_ATTR(coordinates, 0644, NULL, write_vms); /* Attribute Descriptor */ static struct attribute *vms_attrs[] = { &dev_attr_coordinates.attr, NULL }; /* Attribute group */ static struct attribute_group vms_attr_group = { .attrs = vms_attrs, }; /* Driver Initialization */ int __init vms_init(void) { /* Register a platform device */ vms_dev = platform_device_register_simple("vms", -1, NULL, 0); if (IS_ERR(vms_dev)) { PTR_ERR(vms_dev); printk("vms_init: error\n"); } /* Create a sysfs node to read simulated coordinates */ sysfs_create_group(&vms_dev->dev.kobj, &vms_attr_group); /* Allocate an input device data structure */ vms_input_dev = input_allocate_device(); if (!vms_input_dev) { printk("Bad input_alloc_device()\n"); } /* Announce that the virtual mouse will generate relative coordinates */ set_bit(EV_REL, vms_input_dev->evbit); set_bit(REL_X, vms_input_dev->relbit); set_bit(REL_Y, vms_input_dev->relbit); /* Register with the input subsystem */ input_register_device(vms_input_dev); printk("Virtual Mouse Driver Initialized.\n"); return 0; } /* Driver Exit */ void

vms_cleanup(void) { /* Unregister from the input subsystem */ input_unregister_device(vms_input_dev); /* Cleanup sysfs node */ sysfs_remove_group(&vms_dev->dev.kobj, &vms_attr_group); /* Unregister driver */ platform_device_unregister(vms_dev); return; } module_init(vms_init); module_exit(vms_cleanup);

Let 's t ake a closer look at List ing 7.2. During init ializat ion, t he vm s driver regist ers it self as an input device driver. For t his, it first allocat es an input_dev st ruct ure using t he core API , input_allocate_device(): vms_input_dev = input_allocate_device();

I t t hen announces t hat t he virt ual m ouse generat es relat ive event s: set_bit(EV_REL, vms_input_dev->evbit);

/* Event Type is EV_REL */

Next , it declares t he event codes t hat t he virt ual m ouse produces: set_bit(REL_X, vms_input_dev->relbit); /* Relative 'X' movement */ set_bit(REL_Y, vms_input_dev->relbit); /* Relative 'Y' movement */

I f your virt ual m ouse is also capable of generat ing but t on clicks, you need t o add t his t o vms_init(): set_bit(EV_KEY, vms_input_dev->evbit); /* Event Type is EV_KEY */ set_bit(BTN_0, vms_input_dev->keybit); /* Event Code is BTN_0 */

Finally, t he regist rat ion: input_register_device(vms_input_dev);

write_vms() is t he sysfs store() m et hod t hat at t aches t o / sys/ devices/ plat form / vm s/ coordinat es. When coord.c writ es an X/ Y pair t o t his file, write_vms() does t he following: input_report_rel(vms_input_dev, REL_X, x); input_report_rel(vms_input_dev, REL_Y, y); input_sync(vms_input_dev);

The first st at em ent generat es a REL_X event or a relat ive device m ovem ent in t he X direct ion. The second produces a REL_Y event or a relat ive m ovem ent in t he Y direct ion. input_sync() indicat es t hat t his event is com plet e, so t he input subsyst em collect s t hese t wo event s int o a single evdev packet and sends it out of t he door t hrough / dev/ input / event X, where X is t he int erface num ber assigned t o t he vm s driver. An applicat ion reading t his file will receive event packet s in t he input_event form at described earlier. To request gpm t o at t ach t o t his event int erface and accordingly chase t he cursor around your screen, do t his: bash> gpm -m /dev/input/eventX -t evdev

The ADS7846 t ouch cont roller driver and t he accelerom et er driver, discussed respect ively under t he sect ions "Touch Cont rollers" and " Accelerom et ers" lat er, are also evdev users.

M or e Eve n t I n t e r fa ce s The vm s driver ut ilizes t he generic evdev event int erface, but input devices such as keyboards, m ice, and t ouch cont rollers have cust om event drivers. We will look at t hem when we discuss t he corresponding device drivers. To writ e your own event driver and export it t o user space via / dev/ input / m ydev, you have t o populat e a st ruct ure called input_handler and regist er it wit h t he input core as follows: Code View: static struct input_handler my_event_handler = { .event = mydev_event, /* Handle event reports sent by input device drivers that use this event driver's services */ .fops = &mydev_fops, /* Methods to manage /dev/input/mydev */ .minor = MYDEV_MINOR_BASE, /* Minor number of /dev/input/mydev */ .name = "mydev", /* Event driver name */ .id_table = mydev_ids, /* This event driver can handle requests from these IDs */ .connect = mydev_connect, /* Invoked if there is an ID match */ .disconnect = mydev_disconnect, /* Called when the driver unregisters */ }; /* Driver Initialization */ static int __init mydev_init(void) { /* ... */ input_register_handler(&my_event_handler); /* ... */ return 0; }

Look at t he im plem ent at ion of m ousedev ( drivers/ input / m ousedev.c) for a com plet e exam ple.

I n pu t D e vice D r ive r s Let 's t urn our at t ent ion t o drivers for com m on input devices such as keyboards, m ice, and t ouch screens. But first , let 's t ake a quick look at an off- t he- shelf hardware access facilit y available t o input drivers.

Se r io The serio layer offers library rout ines t o access legacy input hardware such as i8042- com pat ible keyboard cont rollers and t he serial port . PS/ 2 keyboards and m ice int erface wit h t he form er, whereas serial t ouch cont rollers connect t o t he lat t er. To com m unicat e wit h hardware serviced by serio, for exam ple, t o send a com m and t o a PS/ 2 m ouse, regist er prescribed callback rout ines wit h serio using serio_register_driver(). To add a new driver as part of serio, regist er open()/close()/start()/stop()/write() ent ry point s using serio_register_port (). Look at drivers/ input / serio/ serport .c for an exam ple. As you can see in Figure 7.1, serio is only one rout e t o access low- level hardware. Several input device drivers inst ead rely on low- level support from bus layers such as USB or SPI .

Ke yboa r ds Keyboards com e in different flavors—legacy PS/ 2, USB, Bluet oot h, I nfrared, and so on. Each t ype has a specific input device driver, but all use t he sam e keyboard event driver, t hus ensuring a consist ent int erface t o t heir users. The keyboard event driver, however, has a dist inguishing feat ure com pared t o ot her event drivers: I t passes dat a t o anot her kernel subsyst em ( t he t t y layer) , rat her t han t o user space via / dev nodes.

PC Ke yboa r ds The PC keyboard ( also called PS/ 2 keyboard or AT keyboard) int erfaces wit h t he processor via an i8042com pat ible keyboard cont roller. Deskt ops usually have a dedicat ed keyboard cont roller, but on lapt ops, keyboard int erfacing is one of t he responsibilit ies of a general- purpose em bedded cont roller ( see t he sect ion "Em bedded Cont rollers" in Chapt er 20, " More Devices and Drivers" ) . When you press a key on a PC keyboard, t his is t he road it t akes:

1 . The keyboard cont roller ( or t he em bedded cont roller) scans and decodes t he keyboard m at rix and t akes care of nuances such as key debouncing.

2 . The keyboard device driver, wit h t he help of serio, reads raw scancodes from t he keyboard cont roller for each key press and release. The difference bet ween a press and a release is in t he m ost significant bit , which is set for t he lat t er. A push on t he " a" key, for exam ple, yields a pair of scancodes, 0x1e and 0x9e. Special keys are escaped using 0xE0, so a j ab on t he right - arrow key produces t he sequence, ( 0xE0 0x4D 0xE0 0xCD) . You m ay use t he showkey ut ilit y t o observe scancodes em anat ing from t he cont roller ( t he sym bol at t aches explanat ions) : bash> showkey -s kb mode was UNICODE [ if you are trying this under X, it might not work since the X server is also reading /dev/console ] press any key (program terminates 10s after last keypress)...

... 0x1e 0x9e

A push of the "a" key

3 . The keyboard device driver convert s received scancodes t o keycodes, based on t he input m ode. To see t he keycode corresponding t o t he " a" key: bash> showkey ... keycode 30 press keycode 30 release

A push of the "a" key Release of the "a" key

To report t he keycode upst ream , t he driver generat es an input event , which passes cont rol t o t he keyboard event driver.

4 . The keyboard event driver undert akes keycode t ranslat ion depending on t he loaded key m ap. ( See m an pages of loadkeys and t he m ap files present in / lib/ kbd/ keym aps. ) I t checks whet her t he t ranslat ed keycode is t ied t o act ions such as swit ching t he virt ual console or reboot ing t he syst em . To glow t he CAPSLOCK and NUMLOCK LEDs inst ead of reboot ing t he syst em in response t o a Ct rl+ Alt + Del push, add t he following t o t he Ct rl+ Alt + Del handler of t he keyboard event driver, drivers/ char/ keyboard.c: static void fn_boot_it(struct vc_data *vc, struct pt_regs *regs) { + + }

set_vc_kbd_led(kbd, VC_CAPSLOCK); set_vc_kbd_led(kbd, VC_NUMLOCK); ctrl_alt_del();

5 . For regular keys, t he t ranslat ed keycode is sent t o t he associat ed virt ual t erm inal and t he N_TTY line discipline. ( We discussed virt ual t erm inals and line disciplines in Chapt er 6 , " Serial Drivers." ) This is done as follows by drivers/ char/ keyboard.c: /* Add the keycode to flip buffer */ tty_insert_flip_char(tty, keycode, 0); /* Schedule */ con_schedule_flip(tty); The N_TTY line discipline processes t he input t hus received from t he keyboard, echoes it t o t he virt ual console, and let s user- space applicat ions read charact ers from t he / dev/ t t yX node connect ed t o t he virt ual t erm inal. Figure 7.3 shows t he dat a flow from t he t im e you push a key on your keyboard unt il t he t im e it 's echoed on your virt ual console. The left half of t he figure is hardware- specific, and t he right half is generic. As per t he design goal of t he input subsyst em , t he underlying hardware int erface is t ransparent t o t he keyboard event driver and t he t t y layer. The input core and t he clearly defined event int erfaces t hus insulat e input users from t he int ricacies of t he hardware.

Figu r e 7 .3 . D a t a flow fr om a PS/ 2 - com pa t ible k e yboa r d.

[ View full size im age]

USB a n d Blu e t oot h Ke yboa r ds The USB specificat ions relat ed t o hum an int erface devices ( HI D) st ipulat e t he prot ocol t hat USB keyboards, m ice, keypads, and ot her input peripherals use for com m unicat ion. On Linux, t his is im plem ent ed via t he usbhid USB client driver, which is responsible for t he USB HI D class (0x03) . Usbhid regist ers it self as an input device driver. I t conform s t o t he input API and report s input event s appropriat e t o t he connect ed HI D. To underst and t he code pat h for a USB keyboard, revert t o Figure 7.3 and m odify t he hardware- specific left half. Replace t he keyboard cont roller in t he I nput Hardware box wit h a USB cont roller, serio wit h t he USB core layer, and t he I nput Device Driver box wit h t he usbhid driver.

For a Bluet oot h keyboard, replace t he keyboard cont roller in Figure 7.3 wit h a Bluet oot h chipset , serio wit h t he Bluet oot h core layer, and t he I nput Device Driver box wit h t he Bluet oot h hidp driver. USB and Bluet oot h are discussed in det ail in Chapt er 11, " Universal Serial Bus," and Chapt er 16, " Linux Wit hout Wires," respect ively.

M ice Mice, like keyboards, com e wit h different capabilit ies and have different int erfacing opt ions. Let 's look at t he com m on ones.

PS/ 2 M ice Mice generat e relat ive m ovem ent s in t he X and Y axes. They also possess one or m ore but t ons. Som e have scroll wheels, t oo. The input device driver for PS/ 2- com pat ible legacy m ice relies on t he serio layer t o t alk t o t he underlying cont roller. The input event driver for m ice, called m ousedev , report s m ouse event s t o user applicat ions via / dev/ input / m ice.

D e vice Ex a m ple : Rolle r M ou se To get a feel of a real- world m ouse device driver, let 's convert t he roller wheel discussed in Chapt er 4 , " Laying t he Groundwork," int o a variat ion of t he generic PS/ 2 m ouse. The " roller m ouse" generat es one- dim ensional m ovem ent in t he Y- axis. Clockwise and ant iclockwise t urns of t he wheel produce posit ive and negat ive relat ive Y coordinat es respect ively ( like t he scroll wheel in m ice) , while pressing t he roller wheel result s in a left but t on m ouse event . The roller m ouse is t hus ideal for navigat ing m enus in devices such as sm art phones, handhelds, and m usic players. The roller m ouse device driver im plem ent ed in List ing 7.3 works wit h windowing syst em s such as X Windows. Look at roller_mouse_init() t o see how t he driver declares it s m ouse- like capabilit ies. Unlike t he roller wheel driver in List ing 4.1 of Chapt er 4 , t he roller m ouse driver needs no read() or poll() m et hods because event s are report ed using input API s. The roller int errupt handler roller_isr() also changes accordingly. Gone are t he housekeepings done in t he int errupt handler using a wait queue, a spinlock, and t he store_movement() rout ine t o support read() and poll(). I n List ing 7.3, t he leading + and - denot e t he differences from t he roller wheel driver im plem ent ed in List ing 4.1 of Chapt er 4 . List in g 7 .3 . Th e Rolle r M ou se D r ive r Code View: + #include + #include + + + + +

/* Device structure */ struct { /* ... */ struct input_dev dev; } roller_mouse;

+ + + + + +

static int __init roller_mouse_init(void) { /* Allocate input device structure */ roller_mouse->dev = input_allocate_device();

+ +

/* Can generate a click and a relative movement */ roller_mouse->dev->evbit[0] = BIT(EV_KEY) | BIT(EV_REL);

+ + + + + +

/* Can move only in the Y-axis */ roller_mouse->dev->relbit[0] = BIT(REL_Y);

+ + + + + + +

roller_mouse->dev->name = "roll";

/* My click should be construed as the left button press of a mouse */ roller_mouse->dev->keybit[LONG(BTN_MOUSE)] = BIT(BTN_LEFT);

/* For entries in /sys/class/input/inputX/id/ */ roller_mouse->dev->id.bustype = ROLLER_BUS; roller_mouse->dev->id.vendor = ROLLER_VENDOR; roller_mouse->dev->id.product = ROLLER_PROD; roller_mouse->dev->id.version = ROLLER_VER;

+ /* Register with the input subsystem */ + input_register_device(roller_mouse->dev); +} /* Global variables */ - spinlock_t roller_lock = SPIN_LOCK_UNLOCKED; - static DECLARE_WAIT_QUEUE_HEAD(roller_poll); /* The Roller Interrupt Handler */ static irqreturn_t roller_interrupt(int irq, void *dev_id) { int i, PA_t, PA_delta_t, movement = 0; /* Get the waveforms from bits 0, 1 and 2 of Port D as shown in Figure 7.1 */ PA_t = PORTD & 0x07; /* Wait until the state of the pins change. (Add some timeout to the loop) */ for (i=0; (PA_t==PA_delta_t); i++){ PA_delta_t = PORTD & 0x07; } movement = determine_movement(PA_t, PA_delta_t); + + +

spin_lock(&roller_lock); /* Store the wheel movement in a buffer for later access by the read()/poll() entry points */ store_movements(movement); spin_unlock(&roller_lock); /* Wake up the poll entry point that might have gone to sleep, waiting for a wheel movement */ wake_up_interruptible(&roller_poll); if (movement == CLOCKWISE) { input_report_rel(roller_mouse->dev, REL_Y, 1); } else if (movement == ANTICLOCKWISE) {

+ input_report_rel(roller_mouse->dev, REL_Y, -1); + } else if (movement == KEYPRESSED) { + input_report_key(roller_mouse->dev, BTN_LEFT, 1); + } + input_sync(roller_mouse->dev); return IRQ_HANDLED; }

Tr a ck poin t s A t rackpoint is a point ing device t hat com es int egrat ed wit h t he PS/ 2- t ype keyboard on several lapt ops. This device includes a j oyst ick locat ed am ong t he keys and m ouse but t ons posit ioned under t he spacebar. A t rackpoint essent ially funct ions as a m ouse, so you can operat e it using t he PS/ 2 m ouse driver. Unlike a regular m ouse, a t rackpoint offers m ore m ovem ent cont rol. You can com m and t he t rackpoint cont roller t o change propert ies such as sensit ivit y and inert ia. The kernel has a special driver, drivers/ input / m ouse/ t rackpoint .c, t o creat e and m anage associat ed sysfs nodes. For t he full set of t rack point configurat ion opt ions, look under / sys/ devices/ plat form / i8042/ serioX/ serioY/ .

Tou ch pa ds A t ouchpad is a m ouse- like point ing device com m only found on lapt ops. Unlike convent ional m ice, a t ouchpad does not have m oving part s. I t can generat e m ouse- com pat ible relat ive coordinat es but is usually used by operat ing syst em s in a m ore powerful m ode t hat produces absolut e coordinat es. The com m unicat ion prot ocol used in absolut e m ode is sim ilar t o t he PS/ 2 m ouse prot ocol, but not com pat ible wit h it . The basic PS/ 2 m ouse driver is capable of support ing devices t hat conform t o different variat ions of t he bare PS/ 2 m ouse prot ocol. You m ay add support for a new m ouse prot ocol t o t he base driver by supplying a prot ocol driver via t he psmouse st ruct ure. I f your lapt op uses t he Synapt ics t ouchpad in absolut e m ode, for exam ple, t he base PS/ 2 m ouse driver uses t he services of a Synapt ics prot ocol driver t o int erpret t he st ream ing dat a. For an end- t o- end underst anding of how t he Synapt ics prot ocol works in t andem wit h t he base PS/ 2 driver, look at t he following four code regions collect ed in List ing 7.4:

The PS/ 2 m ouse driver, drivers/ input / m ouse/ psm ouse- base.c, inst ant iat es a psmouse_protocol st ruct ure wit h inform at ion regarding support ed m ouse prot ocols ( including t he Synapt ics t ouchpad prot ocol) .

The psmouse st ruct ure, defined in drivers/ input / m ouse/ psm ouse.h, t ies various PS/ 2 prot ocols t oget her.

synaptics_init() populat es t he psmouse st ruct ure wit h t he address of associat ed prot ocol funct ions.

The prot ocol handler funct ion synaptics_process_byte(), populat ed in synaptics_init(), get s called from int errupt cont ext when serio senses m ouse m ovem ent . I f you unfold synaptics_process_byte(), you will see t ouchpad m ovem ent s being report ed t o user applicat ions via m ousedev.

List in g 7 .4 . PS/ 2 M ou se Pr ot ocol D r ive r for t h e Syn a pt ics Tou ch pa d

Code View: drivers/input/mouse/psmouse-base.c: /* List of supported PS/2 mouse protocols */ static struct psmouse_protocol psmouse_protocols[] = { { .type = PSMOUSE_PS2, /* The bare PS/2 handler */ .name = "PS/2", .alias = "bare", .maxproto = 1, .detect = ps2bare_detect, }, /* ... */ { .type = PSMOUSE_SYNAPTICS, /* Synaptics TouchPad Protocol */ .name = "SynPS/2", .alias = "synaptics", .detect = synaptics_detect, /* Is the protocol detected? */ .init = synaptics_init, /* Initialize Protocol Handler */ }, /* ... */ }

drivers/input/mouse/psmouse.h: /* The structure that ties various mouse protocols together */ struct psmouse { struct input_dev *dev; /* The input device */ /* ... */ /* Protocol Methods */ psmouse_ret_t (*protocol_handler) (struct psmouse *psmouse, struct pt_regs *regs); void (*set_rate)(struct psmouse *psmouse, unsigned int rate); void (*set_resolution) (struct psmouse *psmouse, unsigned int resolution); int (*reconnect)(struct psmouse *psmouse); void (*disconnect)(struct psmouse *psmouse); /* ... */ }; drivers/input/mouse/synaptics.c: /* init() method of the Synaptics protocol */ int synaptics_init(struct psmouse *psmouse) { struct synaptics_data *priv; psmouse->private = priv = kmalloc(sizeof(struct synaptics_data), GFP_KERNEL); /* ... */ /* This is called in interrupt context when mouse movement is sensed */ psmouse->protocol_handler = synaptics_process_byte; /* More protocol methods */ psmouse->set_rate = synaptics_set_rate; psmouse->disconnect = synaptics_disconnect; psmouse->reconnect = synaptics_reconnect; /* ... */

} drivers/input/mouse/synaptics.c: /* If you unfold synaptics_process_byte() and look at synaptics_process_packet(), you can see the input events being reported to user applications via mousedev */ static void synaptics_process_packet(struct psmouse *psmouse) { /* ... */ if (hw.z > 0) { /* Absolute X Coordinate */ input_report_abs(dev, ABS_X, hw.x); /* Absolute Y Coordinate */ input_report_abs(dev, ABS_Y, YMAX_NOMINAL + YMIN_NOMINAL - hw.y); } /* Absolute Z Coordinate */ input_report_abs(dev, ABS_PRESSURE, hw.z); /* ... */ /* Left TouchPad button */ input_report_key(dev, BTN_LEFT, hw.left); /* Right TouchPad button */ input_report_key(dev, BTN_RIGHT, hw.right); /* ... */ }

USB a n d Blu e t oot h M ice USB m ice are handled by t he sam e input driver ( usbhid) t hat drives USB keyboards. Sim ilarly, t he hidp driver t hat im plem ent s support for Bluet oot h keyboards also t akes care of Bluet oot h m ice. As you would expect , USB and Bluet oot h m ice drivers channel device dat a t hrough m ousedev.

Tou ch Con t r olle r s I n Chapt er 6 , we im plem ent ed a device driver for a serial t ouch cont roller in t he form of a line discipline called N_TCH. The input subsyst em offers a bet t er and easier way t o im plem ent t hat driver. Refashion t he finit e st at e m achine in N_TCH as an input device driver wit h t he following changes:

1 . Serio offers a line discipline called serport for accessing devices connect ed t o t he serial port . Use serport 's services t o t alk t o t he t ouch cont roller.

2 . I nst ead of passing coordinat e inform at ion t o t he t t y layer, generat e input report s via evdev as you did in List ing 7.2 for t he virt ual m ouse.

Wit h t his, t he t ouch screen is accessible t o user space via / dev/ input / event X. The act ual driver im plem ent at ion is left as an exercise.

An exam ple of a t ouch cont roller t hat does not int erface via t he serial port is t he Analog Devices ADS7846 chip, which com m unicat es over a Serial Peripheral I nt erface ( SPI ) . The driver for t his device uses t he services of t he SPI core rat her t han serio. The sect ion " The Serial Peripheral I nt erface Bus" in Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol," discusses SPI . Like m ost t ouch drivers, t he ADS7846 driver uses t he evdev int erface t o dispat ch t ouch inform at ion t o user applicat ions. Som e t ouch cont rollers int erface over USB. An exam ple is t he 3M USB t ouch cont roller, driven by drivers/ input / t ouchscreen/ usbt ouchscreen.c.

Many PDAs have four- wire resist ive t ouch panels superim posed on t heir LCDs. The X and Y plat es of t he panel ( t wo wires for eit her axes) connect t o an analog- t o- digit al convert er ( ADC) , which provides a digit al readout of t he analog volt age difference arising out of t ouching t he screen. An input driver collect s t he coordinat es from t he ADC and dispat ches it t o user space.

Different inst ances of t he sam e t ouch panel m ay produce slight ly different coordinat e ranges ( m axim um values in t he X and Y direct ions) due t o t he nuances of m anufact uring processes. To insulat e applicat ions from t his variat ion, t ouch screens are calibr at ed prior t o use. Calibrat ion is usually init iat ed by t he GUI by displaying cross- m arks at screen boundaries and ot her vant age point s, and request ing t he user t o t ouch t hose point s. The generat ed coordinat es are program m ed back int o t he t ouch cont roller using appropriat e com m ands if it support s self- calibrat ion, or used t o scale t he coordinat e st ream in soft ware ot herwise. The input subsyst em also cont ains an event driver called t sdev t hat generat es coordinat e inform at ion according t o t he Com paq t ouch- screen prot ocol. I f your syst em report s t ouch event s via t sdev , applicat ions t hat underst and t his prot ocol can elicit t ouch input from / dev/ input / t sX. This driver is, however, scheduled for rem oval from t he m ainline kernel in favor of t he user space t slib library. Docum ent at ion/ feat ure- rem ovalschedule.t xt list s feat ures t hat are going away from t he kernel source t ree.

Acce le r om e t e r s An accelerom et er m easures accelerat ion. Several I BM/ Lenovo lapt ops have an accelerom et er t hat det ect s sudden m ovem ent . The generat ed inform at ion is used t o prot ect t he hard disk from dam age using a m echanism called Hard Drive Act ive Prot ect ion Syst em ( HDAPS) , analogous t o t he way a car airbag shields a passenger from inj ury. The HDAPS driver is im plem ent ed as a plat form driver t hat regist ers wit h t he input subsyst em . I t uses evdev t o st ream t he X and Y com ponent s of t he det ect ed accelerat ion. Applicat ions can read accelerat ion event s via / dev/ input / event X t o det ect condit ions, such as shock and vibe, and perform a defensive act ion, such as parking t he hard drive's head. The following com m and spews out put if you m ove t he lapt op ( assum e t hat event 3 is assigned t o HDAPS) : bash> od –x /dev/input/event3 0000000 a94d 4599 1f19 0007 0003 0000 ffed ffff ...

The accelerom et er also provides inform at ion such as t em perat ure, keyboard act ivit y, and m ouse act ivit y, all of which can be gleaned via files in / sys/ devices/ plat form / hdaps/ . Because of t his, t he HDAPS driver is part of t he hardware m onit oring ( hwm on) subsyst em in t he kernel sources. We t alk about hardware m onit oring in t he sect ion "Hardware Monit oring wit h LM- Sensors" in t he next chapt er.

Ou t pu t Eve n t s Som e input device drivers also handle out put event s. For exam ple, t he keyboard driver can glow t he CAPSLOCK

LED, and t he PC speaker driver can sound a beep. Let 's zoom in on t he lat t er. During init ializat ion, t he speaker driver declares it s out put capabilit y by set t ing appropriat e evbit s and regist ering a callback rout ine t o handle t he out put event : Code View: drivers/input/misc/pcspkr.c: static int __devinit pcspkr_probe(struct platform_device *dev) { /* ... */ /* Capability Bits */ pcspkr_dev->evbit[0] = BIT(EV_SND); pcspkr_dev->sndbit[0] = BIT(SND_BELL) | BIT(SND_TONE); /* The Callback routine */ pcspkr_dev->event = pcspkr_event; err = input_register_device(pcspkr_dev); /* ... */ } /* The callback routine */ static int pcspkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) { /* ... */ /* I/O programming to sound a beep */ outb_p(inb_p(0x61) | 3, 0x61); /* set command for counter 2, 2 byte write */ outb_p(0xB6, 0x43); /* select desired HZ */ outb_p(count & 0xff, 0x42); outb((count >> 8) & 0xff, 0x42); /* ... */ }

To sound t he beeper, t he keyboard event driver generat es a sound event ( EV_SND) as follows: input_event(handle->dev, EV_SND, SND_TONE, hz

/* Type */ /* Code */ /* Value */);

This t riggers execut ion of t he callback rout ine, pcspkr_event(), and you hear t he beep.

D e bu ggin g You can use t he evbug m odule as a debugging aid if you're developing an input driver. I t dum ps t he ( t y pe, code, value) t uple ( see struct input_event defined previously) corresponding t o event s generat ed by t he input subsyst em . Figure 7.4 cont ains dat a capt ured by evbug while operat ing som e input devices: Figu r e 7 .4 . Evbu g ou t pu t . Code View: /* Touchpad Movement */ evbug.c Event. Dev: isa0060/serio1/input0: Type: 3, Code: 28, Value: 0 evbug.c Event. Dev: isa0060/serio1/input0: Type: 1, Code: 325, Value: 0 evbug.c Event. Dev: isa0060/serio1/input0: Type: 0, Code: 0, Value: 0 /* Trackpoint Movement */ evbug.c Event. Dev: synaptics-pt/serio0/input0: Type: 2, Code: 0, Value: -1 evbug.c Event. Dev: synaptics-pt/serio0/input0: Type: 2, Code: 1, Value: -2 evbug.c Event. Dev: synaptics-pt/serio0/input0: Type: 0, Code: 0, Value: 0 /* USB Mouse Movement */ evbug.c Event. Dev: usb-0000:00:1d.1-2/input0: evbug.c Event. Dev: usb-0000:00:1d.1-2/input0: evbug.c Event. Dev: usb-0000:00:1d.1-2/input0: evbug.c Event. Dev: usb-0000:00:1d.1-2/input0: /* PS/2 evbug.c evbug.c evbug.c

Type: Type: Type: Type:

2, 0, 2, 0,

Code: Code: Code: Code:

1, 0, 0, 0,

Value: Value: Value: Value:

-1 0 1 0

Keyboard keypress 'a' */ Event. Dev: isa0060/serio0/input0: Type: 4, Code: 4, Value: 30 Event. Dev: isa0060/serio0/input0: Type: 1, Code: 30, Value: 0 Event. Dev: isa0060/serio0/input0: Type: 0, Code: 0, Value: 0

/* USB keyboard keypress 'a' */ evbug.c Event. Dev: usb-0000:00:1d.1-1/input0: evbug.c Event. Dev: usb-0000:00:1d.1-1/input0: evbug.c Event. Dev: usb-0000:00:1d.1-2/input0: evbug.c Event. Dev: usb-0000:00:1d.1-2/input0:

Type: Type: Type: Type:

1, 0, 1, 0,

Code: Code: Code: Code:

30, Value: 1 0, Value: 0 30, Value: 0 0, Value: 0

To m ake sense of t he dum p in Figure 7.4, rem em ber t hat t ouchpads generat e absolut e coordinat es (EV_ABS) or event t ype 0x03, t rackpoint s produce relat ive coordinat es ( EV_REL) or event t ype 0x02, and keyboards em it key event s ( EV_KEY) or event t ype 0x01. Event t ype 0x0 corresponds t o an invocat ion of input_sync(), which does t he following: input_event(dev, EV_SYN, SYN_REPORT, 0);

This t ranslat es t o a ( t ype, code, value) t uple of (0x0, 0x0, 0x0) and com plet es each input event .

Look in g a t t h e Sou r ce s Most input event drivers are present in t he drivers/ input / direct ory. The keyboard event driver, however, lives in drivers/ char/ keyboard.c, because it 's connect ed t o virt ual t erm inals and not t o device nodes under / dev/ input / . You can find input device drivers in several places. Drivers for legacy keyboards, m ice, and j oyst icks, reside in separat e subdirect ories under drivers/ input / . Bluet oot h input drivers live in net / bluet oot h/ hidp/ . You can also find input drivers in regions such as drivers/ hwm on/ and drivers/ m edia/ video/ . Event t ypes, codes, and values, are defined in include/ linux/ input .h. The serio subsyst em st ays in drivers/ input / serio/ . Sources for t he serport line discipline is in drivers/ input / serio/ serport .c. Docum ent at ion/ input / cont ains m ore det ails on different input int erfaces. Table 7.1 sum m arizes t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion inside t he source t ree. Table 7.2 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 7 .1 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

input_event

include/ linux/ input .h

Each event packet produced by evdev has t his form at .

input_dev

include/ linux/ input .h

Represent at ion of an input device.

input_handler

include/ linux/ serial_core.h

Cont ains t he ent ry point s support ed by an event driver.

psmouse_protocol drivers/ input / m ouse/ psm ouse- base.c psmouse

drivers/ input / m ouse/ psm ouse.h

I nform at ion about a support ed PS/ 2 m ouse prot ocol driver. Met hods support ed by a PS/ 2 m ouse driver.

Ta ble 7 .2 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

input_register_device()

drivers/ input / input .c

Regist ers a device wit h t he input core

input_unregister_device()

drivers/ input / input .c

Rem oves a device from t he input core

input_report_rel()

include/ linux/ input .h

Generat es a relat ive m ovem ent in a specified direct ion

input_report_abs()

include/ linux/ input .h

Generat es an absolut e m ovem ent in a specified direct ion

input_report_key()

include/ linux/ input .h

Generat es a key or a but t on press

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

input_sync()

include/ linux/ input .h

I ndicat es t hat t he input subsyst em can collect previously generat ed event s int o an evdev packet and send it t o user space via / dev/ input / input X

input_register_handler()

drivers/ input / input .c

Regist ers a cust om event driver

sysfs_create_group()

fs/ sysfs/ group.c

Creat es a sysfs node group wit h specified at t ribut es

sysfs_remove_group()

fs/ sysfs/ group.c

Rem oves a sysfs group creat ed using sysfs_create_group()

tty_insert_flip_char()

include/ linux/ t t y_flip.h

Sends a charact er t o t he line discipline layer

platform_device_register_simple() drivers/ base/ plat form .c platform_device_unregister()

drivers/ base/ plat form .c

Creat es a sim ple plat form device Unregist ers a plat form device

Ch a pt e r 8 . Th e I n t e r - I n t e gr a t e d Cir cu it Pr ot ocol I n Th is Ch a pt e r 234 What 's I 2 C/ SMBus? 235 I 2 C Core 237 Bus Transact ions 238 Device Exam ple: EEPROM 247 Device Exam ple: Real Tim e Clock 251 I 2C- dev 251 Hardware Monit oring Using LMSensors 251 The Serial Peripheral I nt erface Bus 254 The 1- Wire Bus 254 Debugging 255 Looking at t he Sources

The I nt er- I nt egrat ed Circuit , or I 2 C ( pronounced I squared C) bus and it s subset , t he Syst em Managem ent Bus ( SMBus) , are synchronous serial int erfaces t hat are ubiquit ous on deskt ops and em bedded devices. Let 's find out how t he kernel support s I 2 C/ SMBus host adapt ers and client devices by im plem ent ing exam ple drivers t o access an I 2 C EEPROM and an I 2 C RTC. And before wrapping up t his chapt er, let 's also peek at t wo ot her serial int erfaces support ed by t he kernel: t he Serial Peripheral I nt erface or SPI ( oft en pronounced spy ) bus and t he 1- wire bus. All t hese serial int erfaces ( I 2 C, SMBus, SPI , and 1- wire) share t wo com m on charact erist ics:

The am ount of dat a exchanged is sm all.

The required dat a t ransfer rat e is low.

W h a t 's I 2 C/ SM Bu s? I 2 C is a serial bus t hat is widely used in deskt ops and lapt ops t o int erface t he processor wit h devices such as EEPROMs, audio codecs, and specialized chips t hat m onit or param et ers such as t em perat ure and power- supply volt age. I n addit ion, I 2 C is widely used in em bedded devices t o com m unicat e wit h RTCs, sm art bat t ery circuit s, m ult iplexers, port expanders, opt ical t ransceivers, and ot her sim ilar devices. Because I 2 C is support ed by a large num ber of m icrocont rollers, t here are loads of cheap I 2 C devices available in t he m arket t oday. I 2 C and SMBus are m ast er- slave prot ocols where com m unicat ion t akes place bet ween a host adapt er ( or host cont roller) and client devices ( or slaves) . The host adapt er is usually part of t he Sout h Bridge chipset on deskt ops and part of t he m icrocont roller on em bedded devices. Figure 8.1 shows an exam ple I 2 C bus on PCcom pat ible hardware.

Figu r e 8 .1 . I 2 C/ SM Bu s on PC- com pa t ible h a r dw a r e .

I 2 C and it s subset SMBus are 2- wire int erfaces originally developed by Philips and I nt el, respect ively. The t wo wires are clock and bidirect ional dat a, and t he corresponding lines are called Serial CLock ( SCL) and Serial DAt a ( SDA) . Because t he I 2 C bus needs only a pair of wires, it consum es less space on t he circuit board. However, t he support ed bandwidt hs are also low. I 2 C allows up t o 100Kbps in t he st andard m ode and 400Kbps in a fast m ode. ( SMBus support s only up t o 100Kbps, however.) The bus is t hus suit able only for slow peripherals. Even t hough I 2 C support s bidirect ional exchange, t he com m unicat ion is half duplex because t here is only a single dat a wire. I 2 C and SMBus devices own 7- bit addresses. The prot ocol also support s 10- bit addresses, but m any devices respond only t o 7- bit addressing, which yields a m axim um of 127 devices on t he bus. Due t o t he m ast er- slave nat ure of t he prot ocol, device addresses are also known as slave addresses.

Ch a pt e r 8 . Th e I n t e r - I n t e gr a t e d Cir cu it Pr ot ocol I n Th is Ch a pt e r 234 What 's I 2 C/ SMBus? 235 I 2 C Core 237 Bus Transact ions 238 Device Exam ple: EEPROM 247 Device Exam ple: Real Tim e Clock 251 I 2C- dev 251 Hardware Monit oring Using LMSensors 251 The Serial Peripheral I nt erface Bus 254 The 1- Wire Bus 254 Debugging 255 Looking at t he Sources

The I nt er- I nt egrat ed Circuit , or I 2 C ( pronounced I squared C) bus and it s subset , t he Syst em Managem ent Bus ( SMBus) , are synchronous serial int erfaces t hat are ubiquit ous on deskt ops and em bedded devices. Let 's find out how t he kernel support s I 2 C/ SMBus host adapt ers and client devices by im plem ent ing exam ple drivers t o access an I 2 C EEPROM and an I 2 C RTC. And before wrapping up t his chapt er, let 's also peek at t wo ot her serial int erfaces support ed by t he kernel: t he Serial Peripheral I nt erface or SPI ( oft en pronounced spy ) bus and t he 1- wire bus. All t hese serial int erfaces ( I 2 C, SMBus, SPI , and 1- wire) share t wo com m on charact erist ics:

The am ount of dat a exchanged is sm all.

The required dat a t ransfer rat e is low.

W h a t 's I 2 C/ SM Bu s? I 2 C is a serial bus t hat is widely used in deskt ops and lapt ops t o int erface t he processor wit h devices such as EEPROMs, audio codecs, and specialized chips t hat m onit or param et ers such as t em perat ure and power- supply volt age. I n addit ion, I 2 C is widely used in em bedded devices t o com m unicat e wit h RTCs, sm art bat t ery circuit s, m ult iplexers, port expanders, opt ical t ransceivers, and ot her sim ilar devices. Because I 2 C is support ed by a large num ber of m icrocont rollers, t here are loads of cheap I 2 C devices available in t he m arket t oday. I 2 C and SMBus are m ast er- slave prot ocols where com m unicat ion t akes place bet ween a host adapt er ( or host cont roller) and client devices ( or slaves) . The host adapt er is usually part of t he Sout h Bridge chipset on deskt ops and part of t he m icrocont roller on em bedded devices. Figure 8.1 shows an exam ple I 2 C bus on PCcom pat ible hardware.

Figu r e 8 .1 . I 2 C/ SM Bu s on PC- com pa t ible h a r dw a r e .

I 2 C and it s subset SMBus are 2- wire int erfaces originally developed by Philips and I nt el, respect ively. The t wo wires are clock and bidirect ional dat a, and t he corresponding lines are called Serial CLock ( SCL) and Serial DAt a ( SDA) . Because t he I 2 C bus needs only a pair of wires, it consum es less space on t he circuit board. However, t he support ed bandwidt hs are also low. I 2 C allows up t o 100Kbps in t he st andard m ode and 400Kbps in a fast m ode. ( SMBus support s only up t o 100Kbps, however.) The bus is t hus suit able only for slow peripherals. Even t hough I 2 C support s bidirect ional exchange, t he com m unicat ion is half duplex because t here is only a single dat a wire. I 2 C and SMBus devices own 7- bit addresses. The prot ocol also support s 10- bit addresses, but m any devices respond only t o 7- bit addressing, which yields a m axim um of 127 devices on t he bus. Due t o t he m ast er- slave nat ure of t he prot ocol, device addresses are also known as slave addresses.

I 2 C Cor e The I 2 C core is a code base consist ing of rout ines and dat a st ruct ures available t o host adapt er drivers and client drivers. Com m on code in t he core m akes t he driver developer's j ob easier. The core also provides a level of indirect ion t hat renders client drivers independent of t he host adapt er, allowing t hem t o work unchanged even if t he client device is used on a board t hat has a different I 2 C host adapt er. This philosophy of a core layer and it s at t endant benefit s is also relevant for m any ot her device driver classes in t he kernel, such as PCMCI A, PCI , and USB. I n addit ion t o t he core, t he kernel I 2 C infrast ruct ure consist s of t he following:

Device drivers for I 2 C host adapt ers. They fall in t he realm of bus drivers and usually consist of an adapt er driver and an algorit hm driver. The form er uses t he lat t er t o t alk t o t he I 2 C bus.

Device drivers for I 2 C client devices.

i2c- dev, which allows t he im plem ent at ion of user m ode I 2 C client drivers.

You are m ore likely t o im plem ent client drivers t han adapt er or algorit hm drivers because t here are a lot m ore I 2 C devices t han t here are I 2 C host adapt ers. So, we will confine ourselves t o client drivers in t his chapt er. Figure 8.2 illust rat es t he Linux I 2 C subsyst em . I t shows I 2 C kernel m odules t alking t o a host adapt er and a client device on an I 2 C bus.

Figu r e 8 .2 . Th e Lin u x I 2 C su bsyst e m .

[ View full size im age]

Because SMBus is a subset of I 2 C, using only SMBus com m ands t o t alk t o your device yields a driver t hat works wit h bot h SMBus and I 2 C adapt ers. Table 8.1 list s t he SMBus- com pat ible dat a t ransfer rout ines provided by t he I 2 C core.

Ta ble 8 .1 . SM Bu s- Com pa t ible D a t a Acce ss Fu n ct ion s Pr ovide d by t h e I 2 C Cor e Fu n ct ion

Pu r pose

i2c_smbus_read_byte()

Reads a single byt e from t he device wit hout specifying a locat ion offset . Uses t he sam e offset as t he previously issued com m and.

i2c_smbus_write_byte()

Sends a single byt e t o t he device at t he sam e m em ory offset as t he previously issued com m and.

i2c_smbus_write_quick()

Sends a single bit t o t he device ( in place of t he Rd/ Wr bit shown in List ing 8.1) .

i2c_smbus_read_byte_data()

Reads a single byt e from t he device at a specified offset .

i2c_smbus_write_byte_data()

Sends a single byt e t o t he device at a specified offset .

i2c_smbus_read_word_data()

Reads 2 byt es from t he specified offset .

i2c_smbus_write_word_data()

Sends 2 byt es t o t he specified offset .

Fu n ct ion

Pu r pose

i2c_smbus_read_block_data()

Reads a block of dat a from t he specified offset .

i2c_smbus_write_block_data()

Sends a block of dat a ( < = 32 byt es) t o t he specified offset .

Bu s Tr a n sa ct ion s Before im plem ent ing an exam ple driver, let 's get a bet t er underst anding of t he I 2 C prot ocol by peering at t he wires t hrough a m agnifying glass. List ing 8.1 shows a code snippet t hat t alks t o an I 2 C EEPROM and t he corresponding t ransact ions t hat occur on t he bus. The t ransact ions were capt ured by connect ing an I 2 C bus analyzer while running t he code snippet . The code uses user m ode I 2 C funct ions. ( We t alk m ore about user m ode I 2 C program m ing in Chapt er 19, " Drivers in User Space." ) List in g 8 .1 . Tr a n sa ct ion s on t h e I 2 C Bu s Code View: /* ... */ /* * Connect to the EEPROM. 0x50 is the device address. * smbus_fp is a file pointer into the SMBus device. */ ioctl(smbus_fp, 0x50, slave); /* Write a byte (0xAB) at memory offset 0 on the EEPROM */ i2c_smbus_write_byte_data(smbus_fp, 0, 0xAB); /* * This is the corresponding transaction observed * on the bus after the write: * S 0x50 Wr [A] 0 [A] 0xAB [A] P * * S is the start bit, 0x50 is the 7-bit slave address (0101000b), * Wr is the write command (0b), A is the Accept bit (or * acknowledgment) received by the host from the slave, 0 is the * address offset on the slave device where the byte is to be * written, 0xAB is the data to be written, and P is the stop bit. * The data enclosed within [] is sent from the slave to the * host, while the rest of the bits are sent by the host to the * slave. */ /* Read a byte from offset 0 on the EEPROM */ res = i2c_smbus_read_byte_data(smbus_fp, 0); /* * This is the corresponding transaction observed * on the bus after the read: * S 0x50 Wr [A] 0 [A] S 0x50 Rd [A] [0xAB] NA P * * The explanation of the bits is the same as before, except that * Rd stands for the Read command (1b), 0xAB is the data received * from the slave, and NA is the Reverse Accept bit (or the * acknowledgment sent by the host to the slave). */

D e vice Ex a m ple : EEPROM Our first exam ple client device is an EEPROM sit t ing on an I 2 C bus, as shown in Figure 8.1. Alm ost all lapt ops and deskt ops have such an EEPROM for st oring BI OS configurat ion inform at ion. The exam ple EEPROM has t wo m em ory banks. The driver export s / dev int erfaces corresponding t o each bank: / dev / eep/ 0 and / dev/ eep/ 1. Applicat ions operat e on t hese nodes t o exchange dat a wit h t he EEPROM. Each I 2 C/ SMBus client device is assigned a slave address t hat funct ions as t he device ident ifier. The EEPROM in t he exam ple answers t o t wo slave addresses, SLAVE_ADDR1 and SLAVE_ADDR2, one per bank. The exam ple driver uses I 2 C com m ands t hat are com pat ible wit h SMBus, so it works wit h bot h I 2 C and SMBus EEPROMs.

I n it ia lizin g As is t he case wit h all driver classes, I 2 C client drivers also own an init() ent ry point . I nit ializat ion ent ails allocat ing dat a st ruct ures, regist ering t he driver wit h t he I 2 C core, and hooking up wit h sysfs and t he Linux device m odel. This is done in List ing 8.2. List in g 8 .2 . I n it ia lizin g t h e EEPROM D r ive r Code View: /* Driver entry points */ static struct file_operations eep_fops = { .owner = THIS_MODULE, .llseek = eep_llseek, .read = eep_read, .ioctl = eep_ioctl, .open = eep_open, .release = eep_release, .write = eep_write, }; static dev_t dev_number; static struct class *eep_class;

/* Allotted Device Number */ /* Device class */

/* Per-device client data structure for each * memory bank supported by the driver */ struct eep_bank { struct i2c_client *client; unsigned int addr; unsigned short current_pointer; int bank_number; /* ... */

/* /* /* /* /*

I2C client for this bank */ Slave address of this bank */ File pointer */ Actual memory bank number */ Spinlocks, data cache for slow devices,.. */

}; #define NUM_BANKS 2 #define BANK_SIZE 2048

/* Two supported banks */ /* Size of each bank */

struct ee_bank *ee_bank_list;

/* List of private data structures, one per bank */

/* * Device Initialization */ int __init eep_init(void) { int err, i; /* Allocate the per-device data structure, ee_bank */ ee_bank_list = kmalloc(sizeof(struct ee_bank)*NUM_BANKS, GFP_KERNEL); memset(ee_bank_list, 0, sizeof(struct ee_bank)*NUM_BANKS); /* Register and create the /dev interfaces to access the EEPROM banks. Refer back to Chapter 5, "Character Drivers" for more details */ if (alloc_chrdev_region(&dev_number, 0, NUM_BANKS, "eep") < 0) { printk(KERN_DEBUG "Can't register device\n"); return -1; } eep_class = class_create(THIS_MODULE, DEVICE_NAME); for (i=0; i < NUM_BANKS;i++) { /* Connect the file operations with cdev */ cdev_init(&ee_bank[i].cdev, &ee_fops); /* Connect the major/minor number to the cdev */ if (cdev_add(&ee_bank[i].cdev, (dev_number + i), 1)) { printk("Bad kmalloc\n"); return 1; } class_device_create(eep_class, NULL, (dev_number + i), NULL, "eeprom%d", i); } /* Inform the I2C core about our existence. See the section "Probing the Device" for the definition of eep_driver */ err = i2c_add_driver(&eep_driver); if (err) { printk("Registering I2C driver failed, errno is %d\n", err); return err; } printk("EEPROM Driver Initialized.\n"); return 0; }

List ing 8.2 init iat es creat ion of t he device nodes, but t o com plet e t heir product ion, add t he following t o an appropriat e rule file under / et c/ udev/ rules.d/ :

KERNEL:"eeprom[0-1]*", NAME="eep/%n"

This creat es / dev / eep/ 0 and / dev / eep/ 1 in response t o recept ion of t he corresponding uevent s from t he kernel. A user m ode program t hat needs t o read from t he n t h m em ory bank can t hen operat e on / dev / eep/ n. List ing 8.3 im plem ent s t he open() m et hod for t he EEPROM driver. The kernel calls eep_open() when an applicat ion opens / dev/ eep/ X. eep_open() st ores t he per- device dat a st ruct ure in a privat e area so t hat it 's direct ly accessible from t he rest of t he driver m et hods. List in g 8 .3 . Ope n in g t h e EEPROM D r ive r int eep_open(struct inode *inode, struct file *file) { /* The EEPROM bank to be opened */ n = MINOR(file->f_dentry->d_inode->i_rdev); file->private_data = (struct ee_bank *)ee_bank_list[n]; /* Initialize the fields in ee_bank_list[n] such as size, slave address, and the current file pointer */ /* ... */ }

Pr obin g t h e D e vice The I 2 C client driver, in part nership wit h t he host cont roller driver and t he I 2 C core, at t aches it self t o a slave device as follows:

1 . During init ializat ion, it regist ers a probe() m et hod, which t he I 2 C core invokes when an associat ed host cont roller is det ect ed. I n List ing 8.2, eep_init() regist ered eep_probe() by invoking i2c_add_driver(): static struct i2c_driver eep_driver = { .driver = { .name = "EEP", }, .id = I2C_DRIVERID_EEP, .attach_adapter = eep_probe, .detach_client = eep_detach, }; i2c_add_driver(&eep_driver);

/* Name */ /* ID */ /* Probe Method */ /* Detach Method */

`

The driver ident ifier, I2C_DRIVERID_EEP, should be unique for t he device and should be defined in include/ linux/ i2c- id.h.

2 . When t he core calls t he driver's probe() m et hod signifying t he presence of a host adapt er, it , in t urn, invokes i2c_probe() wit h argum ent s specifying t he addresses of t he slave devices t hat t he driver is responsible for and an associat ed attach() rout ine.

List ing 8.4 im plem ent s eep_probe(), t he probe() m et hod of t he EEPROM driver. normal_i2c specifies t he EEPROM bank addresses and is populat ed as part of t he i2c_client_address_data st ruct ure. Addit ional fields in t his st ruct ure can be used t o request finer addressing cont rol. You can ask t he I 2 C core t o ignore a range of addresses using t he ignore field. Or you m ay use t he probe field t o specify ( adapt er, slave address) pairs if you want t o bind a slave address t o a part icular host adapt er. This will be useful, for exam ple, if your processor support s t wo I 2 C host adapt ers, and you have an EEPROM on bus 1 and a t em perat ure sensor on bus 2, bot h answering t o t he sam e slave address.

3 . The host cont roller walks t he bus looking for t he slave devices specified in St ep 2. To do t his, it generat es a bus t ransact ion such as S SLAVE_ADDR Wr, where S is t he st art bit , SLAVE_ADDR is t he associat ed 7- bit slave address as specified in t he device's dat asheet , and Wr is t he writ e com m and, as described in t he sect ion "Bus Transact ions." I f a working slave device exist s on t he bus, it 'll respond by sending an acknowledgm ent bit ( [A]) .

4 . I f t he host adapt er det ect s a slave in St ep 3, t he I 2 C core invokes t he attach() rout ine supplied via t he t hird argum ent t o i2c_probe() in St ep 2. For t he EEPROM driver, t his rout ine is eep_attach(), which regist ers a per- device client dat a st ruct ure, as shown in List ing 8.5. I f your device expect s an init ial program m ing sequence ( for exam ple, regist ers on an I 2 C Digit al Visual I nt erface t ransm it t er chip have t o be init ialized before t he chip can st art funct ioning) , perform t hose operat ions in t his rout ine.

List in g 8 .4 . Pr obin g t h e Pr e se n ce of EEPROM Ba n k s #include /* The EEPROM has two memory banks having addresses SLAVE_ADDR1 * and SLAVE_ADDR2, respectively */ static unsigned short normal_i2c[] = { SLAVE_ADDR1, SLAVE_ADDR2, I2C_CLIENT_END }; static struct .normal_i2c .probe .ignore .forces };

i2c_client_address_data addr_data = { = normal_i2c, = ignore, = ignore, = ignore,

static int eep_probe(struct i2c_adapter *adapter) { /* The callback function eep_attach(), is shown * in Listing 8.5 */ return i2c_probe(adapter, &addr_data, eep_attach); }

List in g 8 .5 . At t a ch in g a Clie n t

int eep_attach(struct i2c_adapter *adapter, int address, int kind) { static struct i2c_client *eep_client; eep_client = kmalloc(sizeof(*eep_client), GFP_KERNEL); eep_client->driver = &eep_driver; /* Registered in Listing 8.2 */ eep_client->addr = address; /* Detected Address */ eep_client->adapter = adapter; /* Host Adapter */ eep_client->flags = 0; strlcpy(eep_client->name, "eep", I2C_NAME_SIZE); /* Populate fields in the associated per-device data structure */ /* ... */ /* Attach */ i2c_attach_client(new_client); }

Ch e ck in g Ada pt e r Ca pa bilit ie s Each host adapt er m ight be lim it ed by a set of const raint s. An adapt er m ight not support all t he com m ands t hat Table 8.1 cont ains. For exam ple, it m ight allow t he SMBus read_word com m and but not t he read_block com m and. A client driver has t o check whet her a com m and is support ed by t he adapt er before using it . The I 2 C core provides t wo funct ions t o do t his:

1 . i2c_check_functionality() checks whet her a part icular funct ion is support ed.

2 . i2c_get_functionality() ret urns a m ask cont aining all support ed funct ions.

See include/ linux/ i2c.h for t he list of possible funct ionalit ies.

Acce ssin g t h e D e vice To read dat a from t he EEPROM, first glean inform at ion about it s invocat ion t hread from t he privat e dat a field associat ed wit h t he device node. Next , use SMBus- com pat ible dat a access rout ines provided by t he I 2 C core (Table 8.1 shows t he available funct ions) t o read t he dat a. Finally, send t he dat a t o user space and increm ent t he int ernal file point er so t hat t he next read()/ write() operat ion st art s from where t he last one ended. These st eps are perform ed by List ing 8.6. The list ing om it s sanit y and error checks for convenience.

List in g 8 .6 . Re a din g fr om t h e EEPROM

Code View: ssize_t eep_read(struct file *file, char *buf, size_t count, loff_t *ppos) { int i, transferred, ret, my_buf[BANK_SIZE]; /* Get the private client data structure for this bank */ struct ee_bank *my_bank = (struct ee_bank *)file->private_data; /* Check whether the smbus_read_word() functionality is supported */ if (i2c_check_functionality(my_bank->client, I2C_FUNC_SMBUS_READ_WORD_DATA)) { /* Read the data */ while (transferred < count) { ret = i2c_smbus_read_word_data(my_bank->client, my_bank->current_pointer+i); my_buf[i++] = (u8)(ret & 0xFF); my_buf[i++] = (u8)(ret >> 8); transferred += 2; } /* Copy data to user space and increment the internal file pointer. Sanity checks are omitted for simplicity */ copy_to_user(buffer, (void *)my_buf, transferred); my_bank->current_pointer += transferred; } return transferred; }

Writ ing t o t he device is done sim ilarly, except t hat an i2c_smbus_write_XXX() funct ion is used inst ead.

Som e EEPROM chips have a Radio Frequency I dent ificat ion ( RFI D) t ransm it t er t o wirelessly t ransm it st ored inform at ion. This is used t o aut om at e supply- chain processes such as invent ory m onit oring and asset t racking. Such EEPROMs usually im plem ent safeguards via an access prot ect ion bank t hat cont rols access perm issions t o t he dat a banks. I n such cases, t he driver has t o wiggle corresponding bit s in t he access prot ect ion bank before it can operat e on t he dat a banks.

To access t he EEPROM banks from user space, develop applicat ions t hat operat e on / dev/ eep/ n. To dum p t he cont ent s of t he EEPROM banks, use od: bash> od –a /dev/eep/0 0000000 S E R # dc4 0000020 @ 1 3 R 1 0000040 5 1 0 H sp

ff soh 1 5 1 S

R 3 2

P nul nul nul nul nul nul nul Z J 1 V 1 L 4 6 8 8 8 7 J U 9 9

0000060 H 0 0 6 6 nul nul nul bs 3 8 L 5 0 0 3 0000100 Z J 1 N U B 4 6 8 6 V 7 nul nul nul nul 0000120 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000400

As an exercise, t ake a st ab at m odifying t he EEPROM driver t o creat e / sys int erfaces t o t he EEPROM banks rat her t han t he / dev int erfaces. You m ay reuse code from List ing 5.7, " Using Sysfs t o Cont rol t he Parallel LED Board," in Chapt er 5 t o help you in t his endeavor.

M or e M e t h ods To obt ain a fully funct ional driver, you need t o add a few rem aining ent ry point s. These are hardly different from t hose of norm al charact er drivers discussed in Chapt er 5 , so t he code list ings are not shown:

To support t he lseek() syst em call t hat assigns a new value t o t he int ernal file point er, im plem ent t he llseek() driver m et hod. The int ernal file point er st ores st at e inform at ion about EEPROM access.

To verify dat a int egrit y, t he EEPROM driver can support an ioctl() m et hod t o adj ust and verify checksum s of st ored dat a.

The poll() and fsync() m et hods are not relevant for t he EEPROM.

I f you choose t o com pile t he driver as a m odule, you have t o supply an exit() m et hod t o unregist er t he device and clean up client - specific dat a st ruct ures. Unregist ering t he driver from t he I 2 C core is a oneliner: i2c_del_driver(&eep_driver);

D e vice Ex a m ple : Re a l Tim e Clock Let 's now t ake t he exam ple of an RTC chip connect ed t o an em bedded cont roller over t he I 2 C bus. The connect ion diagram is shown in Figure 8.3.

Figu r e 8 .3 . An I 2 C RTC on a n e m be dde d syst e m .

Assum e t hat t he I 2 C slave address of t he RTC is 0x60 and t hat it s regist er space is organized as shown in Table 8.2 .

Ta ble 8 .2 . Re gist e r La you t on t h e I 2 C RTC Re gist e r N a m e

D e scr ipt ion

Offse t

RTC_HOUR_REG

Hour count er

0x0

RTC_MINUTE_REG

Minut e count er

0x1

RTC_SECOND_REG

Second count er

0x2

RTC_STATUS_REG

Flags and int errupt st at us

0x3

RTC_CONTROL_REG

Enable/ disable RTC

0x4

Let 's base our driver for t his chip on t he EEPROM driver discussed previously. We will t ake t he I 2 C client driver archit ect ure, slave regist rat ion, and I 2 C core funct ions for grant ed and im plem ent only t he code t hat com m unicat es wit h t he RTC. When t he I 2 C core det ect s a device having t he RTC's slave address ( 0x60) on t he I 2 C bus, it invokes myrtc_attach(). The invocat ion t rain is sim ilar t o t hat for eep_attach() in List ing 8.5. Assum e t hat you have t o perform t he following chip init ializat ions in myrtc_attach():

1 . Clear t he RTC st at us regist er (RTC_STATUS_REG) .

2 . St art t he RTC ( if it is not already running) by t urning on appropriat e bit s in t he RTC cont rol regist er (RTC_CONTROL_REG) .

To do t his, let 's build an i2c_msg and generat e I 2 C t ransact ions on t he bus using i2c_transfer(). This t ransfer m echanism is exclusive t o I 2 C and is not SMBus- com pat ible. To writ e t o t he t wo RTC regist ers referred t o previously, you have t o build t wo i2c_msg m essages. The first m essage set s t he regist er offset . I n our case, it 's 3, t he offset of RTC_STATUS_REG. The second m essage carries t he desired num ber of byt es t o t he specified offset . I n t his cont ext , it ferries 2 byt es, one each t o RTC_STATUS_REG and RTC_CONTROL_REG. Code View: #include /* For struct i2c_msg */ int myrtc_attach(struct i2c_adapter *adapter, int addr, int kind) { u8 buf[2]; int offset = RTC_STATUS_REG; /* Status register lives here */ struct i2c_msg rtc_msg[2]; /* Write 1 byte of offset information to the RTC */ rtc_msg[0].addr = addr; /* Slave address. In our case, this is 0x60 */ rtc_msg[0].flags = I2C_M_WR; /* Write Command */ rtc_msg[0].buf = &offset; /* Register offset for the next transaction */ rtc_msg[0].len = 1; /* Offset is 1 byte long */ /* Write 2 bytes of data (the contents of the status and control registers) at the offset programmed by the previous i2c_msg */ rtc_msg[1].addr = addr; /* Slave address */ rtc_msg[1].flags = I2C_M_WR; /* Write command */ rtc_msg[1].buf = &buf[0]; /* Data to be written to control and status registers */ rtc_msg[1].len = 2; /* Two register values */ buf[0] = 0; /* Zero out the status register */ buf[1] |= ENABLE_RTC; /* Turn on control register bits that start the RTC */ /* Generate bus transactions corresponding to the two messages */ i2c_transfer(adapter, rtc_msg, 2); /* ... */ printk("My RTC Initialized\n"); }

Now t hat t he RTC is init ialized and t icking, you can glean t he current t im e by reading t he cont ent s of RTC_HOUR_REG, RTC_MINUTE_REG, and RTC_SECOND_REG. This is done as follows: Code View: #include /* For struct rtc_time */ int

myrtc_gettime(struct i2c_client *client, struct rtc_time *r_t) { u8 buf[3]; /* Space to carry hour/minute/second */ int offset = 0; /* Time-keeping registers start at offset 0 */ struct i2c_msg rtc_msg[2]; /* Write 1 byte of offset information to the RTC */ rtc_msg[0].addr = addr; /* Slave address */ rtc_msg[0].flags = 0; /* Write Command */ rtc_msg[0].buf = &offset; /* Register offset for the next transaction */ rtc_msg[0].len = 1; /* Offset is 1 byte long */ /* Read current time by getting 3 bytes of data from offset 0 (i.e., from RTC_HOUR_REG, RTC_MINUTE_REG, and RTC_SECOND_REG) */ rtc_msg[1].addr = addr; /* Slave address */ rtc_msg[1].flags = I2C_M_RD; /* Read command */ rtc_msg[1].buf = &buf[0]; /* Data to be read from hour, minute and second registers */ rtc_msg[1].len = 3; /* Three registers to read */ /* Generate bus transactions corresponding to the above two messages */ i2c_transfer(adapter, rtc_msg, 2); /* Read the time */ r_t->tm_hour = BCD2BIN(buf[0]); /* Hour */ r_t->tm_min = BCD2BIN(buf[1]); /* Minute */ r_t->tm_sec = BCD2BIN(buf[2]); /* Second */ return(0); }

myrtc_gettime() im plem ent s t he bus- specific bot t om layer of t he RTC driver. The t op layer of t he RTC driver should conform t o t he kernel RTC API , as discussed in t he sect ion "RTC Subsyst em " in Chapt er 5 . The advant age of t his schem e is t hat applicat ions can run unchanged irrespect ive of whet her your RTC is int ernal t o t he Sout h Bridge of a PC or ext ernally connect ed t o an em bedded cont roller as in t his exam ple. RTCs usually st ore t im e in Binary Coded Decim al ( BCD) , where each nibble represent s a num ber bet ween 0 and 9 ( rat her t han bet ween 0 and 15) . The kernel provides a m acro called BCD2BIN() t o convert from BCD encoding t o decim al and BIN2BCD() t o perform t he reverse operat ion. myrtc_gettime() uses t he form er while reading t im e from RTC regist ers. Look at drivers/ rt c/ rt c- ds1307.c for an exam ple RTC driver t hat handles t he - Dallas/ Maxim DS13XX series of I 2 C RTC chips. Being a 2- wire bus, t he I 2 C bus does not have an int errupt request line t hat slave devices can assert , but som e I 2 C host adapt ers have t he capabilit y t o int errupt t he processor t o signal com plet ion of dat a- t ransfer request s. This int errupt - driven operat ion is, however, t ransparent t o I 2 C client drivers and is hidden inside t he service rout ines offered by t he I 2 C core. Assum ing t hat t he I 2 C host cont roller t hat is part of t he em bedded SoC in Figure 8.3 has t he capabilit y t o int errupt t he processor, t he invocat ion of i2c_transfer() in myrtc_attach() m ight be doing t he following under t he hood:

Build a t ransact ion corresponding t o rtc_msg[0] and writ e it t o t he bus using t he services of t he host

cont roller device driver.

Wait unt il t he host cont roller assert s a t ransm it com plet e int errupt signaling t hat rtc_msg[0] is now on t he wire.

I nside t he int errupt handler, look at t he I 2 C host cont roller st at us regist er t o see whet her an acknowledgm ent has been received from t he RTC slave.

Ret urn error if t he host cont roller st at us and cont rol regist ers indicat e t hat all's not well.

Repeat t he sam e for rtc_msg[1].

I 2 C- de v Som et im es, when you need t o enable support for a large num ber of slow I 2 C devices, it m akes sense t o drive t hem wholly from user space. The I 2 C layer support s a driver called i2c- dev t o achieve t his. Fast forward t o t he sect ion "User Mode I 2 C" in Chapt er 19 for an exam ple t hat im plem ent s a user m ode I 2 C driver using i2c- dev.

H a r dw a r e M on it or in g Usin g LM - Se n sor s The LM- Sensors proj ect , host ed at www.lm - sensors.org, brings hardware- m onit oring capabilit ies t o Linux. Several com put er syst em s use sensor chips t o m onit or param et ers such as t em perat ure, power supply volt age, and fan speed. Periodically t racking t hese param et ers can be crit ical. A blown CPU fan can m anifest in t he form of st range and random soft ware problem s. I m agine t he consequences if t he syst em is a m edical grade device! LM- Sensors com es t o t he rescue wit h device drivers for m any sensor chips, a ut ilit y called sensors t o generat e a healt h report , and a script called sensors- det ect t o scan your syst em and help you generat e appropriat e configurat ion files. Most chips t hat offer hardware m onit oring int erface t o t he CPU via I 2 C/ SMBus. Device drivers for such devices are norm al I 2 C client drivers but reside in t he drivers/ hwm on/ direct ory, rat her t han drivers/ i2c/ chips/ . An exam ple is Nat ional Sem iconduct or's LM87 chip, which can m onit or m ult iple volt ages, t em perat ures, and fans. Have a look at drivers/ hwm on/ lm 87.c for it s driver im plem ent at ion. I 2 C driver I Ds from 1000 t o 1999 are reserved for sensor chips ( look at include/ linux/ i2c- id.h) . Several sensor chips int erface t o t he CPU via t he I SA/ LPC bus rat her t han I 2 C/ SMBus. Ot hers em it analog out put t hat reaches t he CPU t hrough an Analog- t o- Digit al Convert er ( ADC) . Drivers for such chips share t he drivers/ hwm on/ direct ory wit h I 2 C sensor drivers. An exam ple of a non- I 2 C sensor driver is drivers/ hwm on/ hdaps.c, t he driver for t he accelerom et er present in several I BM/ Lenovo lapt ops t hat we discussed in Chapt er 7 , " I nput Drivers." Anot her exam ple of a non- I 2 C sensor is t he Winbond 83627HF Super I / O chip, which is driven by drivers/ hwm on/ w83627hf.c.

Th e Se r ia l Pe r iph e r a l I n t e r fa ce Bu s The Serial Peripheral I nt erface ( SPI ) bus is a serial m ast er- slave int erface sim ilar t o I 2 C and com es built in on m any m icrocont rollers. I t uses four wires ( com pared t o t wo on I 2 C) : Serial CLocK ( SCLK) , Chip Select ( CS) , Mast er Out Slave I n ( MOSI ) , and Mast er I n Slave Out ( MI SO) . MOSI is used for shift ing dat a int o t he slave device, and MI SO is used for shift ing dat a out of t he slave device. Because t he SPI bus has dedicat ed wires for t ransm it t ing and receiving dat a, it can operat e in full- duplex m ode, unlike t he I 2 C bus. The t ypical speed of operat ion of SPI is in t he low- m egahert z range, unlike t he m id- kilohert z range on I 2 C, so t he form er yields higher t hroughput . SPI peripherals available in t he m arket t oday include Radio Frequency ( RF) chips, sm art card int erfaces, EEPROMs, RTCs, t ouch sensors, and ADCs. The kernel provides a core API for exchanging m essages over t he SPI bus. A t ypical SPI client driver does t he following:

1 . Regist ers probe() and remove() m et hods wit h t he SPI core. Opt ionally regist ers suspend() and resume() m et hods: #include static struct spi_driver myspi_driver = { .driver = { .name = "myspi", .bus = &spi_bus_type, .owner = THIS_MODULE, }, .probe = myspidevice_probe, .remove = __devexit_p(myspidevice_remove), } spi_register_driver(&myspi_driver); The SPI core creat es an spi_device st ruct ure corresponding t o t his device and passes t his as an argum ent when it invokes t he regist ered driver m et hods.

2 . Exchanges m essages wit h t he SPI device using access funct ions such as spi_sync() and spi_async(). The form er wait s for t he operat ion t o com plet e, whereas t he lat t er asynchronously t riggers invocat ion of a regist ered callback rout ine when m essage t ransfer com plet es. These dat a access rout ines are invoked from suit able places such as t he SPI int errupt handler, a sysfs m et hod, or a t im er handler. The following code snippet illust rat es SPI m essage subm ission: #include struct spi_device *spi;

/* Representation of a SPI device */ struct spi_transfer xfer; /* Contains transfer buffer details */ struct spi_message sm; /* Sequence of spi_transfer segments */ u8 *command_buffer; /* Data to be transferred */

int len;

/* Length of data to be transferred */

spi_message_init(&sm); xfer.tx_buf = command_buffer; xfer.len = len; spi_message_add_tail(&xfer, &sm); spi_sync(spi, &sm);

/* /* /* /* /*

Initialize spi_message */ Device-specific data */ Data length */ Add the message */ Blocking transfer request */

For an exam ple SPI device, consider t he ADS7846 t ouch- screen cont roller t hat we briefly discussed in Chapt er 7 . This driver does t he following:

1 . Regist ers probe(), remove(), suspend(), and resume() m et hods wit h t he SPI core using spi_register_driver().

2 . The probe() m et hod regist ers t he driver wit h t he input subsyst em using input_register_device() and request s an I RQ using request_irq().

3 . The driver gat hers t ouch coordinat es from it s int errupt handler using spi_async(). This funct ion t riggers invocat ion of a regist ered callback rout ine when t he SPI m essage t ransfer com plet es.

4 . The callback funct ion in t urn, report s t ouch coordinat es and clicks via t he input event int erface, / dev/ input / event X, using input_report_abs() and input_report_key(), as discussed in Chapt er 7 . Applicat ions such as X Windows and gpm seam lessly work wit h t he event int erface and respond t o t ouch input .

A driver t hat wiggles I / O pins t o get t hem t o t alk a prot ocol is called a bit - banging driver. For an exam ple SPI bit - banging driver, look at drivers/ spi/ spi_but t erfly.c, which is a driver t o t alk t o Dat aFlash chips t hat are present on But t erfly boards built by At m el around t heir AVR processor fam ily. For t his, connect your host syst em 's parallel port t o t he AVR But t erfly using a specially m ade dongle and use t he spi_but t erfly driver do t he bit banging. Look at Docum ent at ion/ spi/ but t erfly for a det ailed descript ion of t his driver. Current ly t here is no support for user space SPI drivers à la i2c- dev. You only have t he opt ion of writ ing a kernel driver t o t alk t o your SPI device.

I n t he em bedded world, you m ay com e across solut ions where t he processor uses a com panion chip t hat int egrat es various funct ions. An exam ple is t he Freescale MC13783 Power Managem ent and Audio Com ponent ( PMAC) used in t andem wit h t he ARM9- based i.MX27 cont roller. The PMAC int egrat es an RTC, a bat t ery charger, a t ouch- screen int erface, an ADC m odule, and an audio codec. The processor and t he PMAC com m unicat e over SPI . The SPI bus does not cont ain an int errupt line, so t he PMAC has t he capabilit y t o ext ernally int errupt t he processor using a GPI O pin configured for t his purpose.

Th e 1 - W ir e Bu s The 1- wire prot ocol developed by Dallas/ Maxim uses a 1- wire ( or w1) bus t hat carries bot h power and signal; t he ret urn ground pat h is provided using som e ot her m eans. I t provides a sim ple way t o int erface wit h slow devices by reducing space, cost , and com plexit y. An exam ple device t hat works using t his prot ocol is t he ibut t on (www.ibut t on.com ) , which is used for sensing t em perat ure, carrying dat a, or holding unique I Ds. Anot her w1 chip t hat int erfaces t hrough a single port pin of an em bedded cont roller is t he DS2433 4kb 1- wire EEPROM from Dallas/ Maxim . The driver for t his chip, drivers/ w1/ slaves/ w1_ds2433.c, export s access t o t he EEPROM via a sysfs node. The m ain dat a st ruct ures associat ed wit h a w1 device driver are w1_family and w1_family_ops, bot h defined in w1_fam ily.h.

D e bu ggin g To collect I2 C- specific debugging m essages, t urn on a relevant com binat ion of I 2C Core debugging m essages, I 2C Algorit hm debugging m essages, I 2C Bus debugging m essages, and I 2C Chip debugging m essages under Device Drivers I 2C Support in t he kernel configurat ion m enu. Sim ilarly, for SPI debugging, t urn on Debug SPI Support Support for SPI drivers under Device Drivers To underst and t he flow of I 2 C packet s on t he bus, connect an I 2 C bus analyzer t o your board as we did while running List ing 8.1. The lm - sensors package cont ains a t ool called i2cdum p t hat dum ps regist er cont ent s of devices on t he I 2 C bus. There is a m ailing list dedicat ed t o Linux I 2 C at ht t p: / / list s.lm - sensors.org/ m ailm an/ list info/ i2c.

Look in g a t t h e Sou r ce s I n t he 2.4 kernel source t ree, a single direct ory ( drivers/ i2c/ ) cont ained all t he I 2 C/ SMBus- relat ed sources. The I 2 C code in 2.6 kernels is organized hierarchically: The drivers/ i2c/ busses/ direct ory cont ains adapt er drivers, t he drivers/ i2c/ algos/ direct ory has algorit hm drivers, and t he drivers/ i2c/ chips/ direct ory cont ains client drivers. You can find client drivers in ot her regions of t he source t ree, t oo. The drivers/ sound/ direct ory, for exam ple, includes drivers for audio chipset s t hat use an I 2 C cont rol int erface. Take a look inside t he Docum ent at ion/ i2c/ direct ory for t ips and m ore exam ples. Kernel SPI service funct ions live in drivers/ spi/ spi.c. The SPI driver for t he ADS7846 t ouch cont roller is im plem ent ed in drivers/ input / t ouchscreen/ ads7846.c. The MTD subsyst em discussed in Chapt er 17, " Mem ory Technology Devices," im plem ent s drivers for SPI flash chips. An exam ple is drivers/ m t d/ devices/ m t d_dat aflash.c, t he driver t o access At m el Dat aFlash SPI chips. The drivers/ w1/ direct ory cont ains kernel support for t he w1 prot ocol. Drivers for t he host cont roller side of t he w1 int erface live in drivers/ w1/ m ast ers/ , and drivers for w1 slaves reside in drivers/ w1/ slaves/ . Table 8.3 sum m arizes t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he kernel t ree. Table 8.4 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 8 .3 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

i2c_driver

include/ linux/ i2c.h

Represent at ion of an I 2 C driver

i2c_client_address_data

include/ linux/ i2c.h

Slave addresses t hat an I 2 C client driver is responsible for

i2c_client

include/ linux/ i2c.h

I dent ifies a chip connect ed t o an I 2 C bus

i2c_msg

include/ linux/ i2c.h

I nform at ion pert aining t o a t ransact ion t hat you want t o generat e on t he I 2 C bus

spi_driver

include/ linux/ spi/ spi.h

Represent at ion of an SPI driver

spi_device

include/ linux/ spi/ spi.h

Represent at ion of an SPI device

spi_transfer

include/ linux/ spi/ spi.h

Det ails of an SPI t ransfer buffer

spi_message

include/ linux/ spi/ spi.h

Sequence of spi_transfer segm ent s

w1_family

drivers/ w1/ w1_fam ily.h Represent at ion of a w1 slave driver

w1_family_ops

drivers/ w1/ w1_fam ily.h A w1 slave driver's ent ry point s

Ta ble 8 .4 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

i2c_add_driver()

include/ linux/ i2c.h drivers/ i2c/ i2c- core.c

Regist ers driver ent ry point s wit h t he I 2 C core.

i2c_del_driver()

drivers/ i2c/ i2c- core.c

Rem oves a driver from t he I 2 C core.

i2c_probe()

drivers/ i2c/ i2c- core.c

Specifies t he addresses of slave devices t hat t he driver is responsible for and an associat ed attach() rout ine t o be invoked if one of t he specified addresses is det ect ed by t he I 2 C core.

i2c_attach_client()

drivers/ i2c/ i2c- core.c

Adds a new client t o t he list of client s serviced by t he associat ed I 2 C host adapt er.

i2c_detach_client()

drivers/ i2c/ i2c- core.c

Det aches an act ive client . Usually done when t he client driver or t he associat ed host adapt er unregist ers.

i2c_check_functionality()

include/ linux/ i2c.h

Verifies whet her a part icular funct ion is support ed by t he host adapt er.

i2c_get_functionality()

include/ linux/ i2c.h

Obt ains a m ask cont aining all funct ions support ed by t he host adapt er.

i2c_add_adapter()

drivers/ i2c/ i2c- core.c

Regist ers a host adapt er.

i2c_del_adapter()

drivers/ i2c/ i2c- core.c

Unregist ers a host adapt er.

SMBus- com pat ible I 2 C dat a access rout ines

drivers/ i2c/ i2c- core.c

See Table 8.1.

i2c_transfer()

drivers/ i2c/ i2c- core.c

Sends an i2c_msg over t he I 2 C bus. This funct ion is not SMBuscom pat ible.

spi_register_driver()

drivers/ spi/ spi.c

Regist ers driver ent ry point s wit h t he SPI core.

spi_unregister_driver()

include/ linux/ spi/ spi.h

Unregist ers an SPI driver.

spi_message_init()

include/ linux/ spi/ spi.h

I nit ializes an SPI m essage.

spi_message_add_tail()

include/ linux/ spi/ spi.h

Adds an SPI m essage t o a t ransfer list .

spi_sync()

drivers/ spi/ spi.c

Synchronously t ransfers dat a over t he SPI bus. This funct ion blocks unt il com plet ion.

spi_async()

include/ linux/ spi/ spi.h

Asynchronously t ransfers dat a over t he SPI bus using a com plet ion callback m echanism .

Ch a pt e r 9 . PCM CI A a n d Com pa ct Fla sh I n Th is Ch a pt e r 258 What 's PCMCI A/ CF? 260 Linux- PCMCI A Subsyst em 262 Host Cont roller Drivers 263 PCMCI A Core 263 Driver Services 264 Client Drivers 271 Tying t he Pieces Toget her 272 PCMCI A St orage 272 Serial PCMCI A 273 Debugging 275 Looking at t he Sources

Today's popular t echnologies such as wireless and wired Et hernet , General Packet Radio Service ( GPRS) , Global Posit ioning Syst em ( GPS) , m iniat ure st orage, and m odem s are ubiquit ous in t he form fact or of PCMCI A ( an acronym for Personal Com put er Mem ory Card I nt ernat ional Associat ion) or CF ( Com pact Flash) cards. Most lapt ops and m any em bedded devices support PCMCI A or CF int erfaces, t hus inst ant ly enabling t hem t o t ake advant age of t hese t echnologies. On em bedded syst em s, PCMCI A/ CF slot s offer a t echnology upgrade pat h wit hout t he need t o re- spin t he board. A cost - reduced version of an I nt ernet - enabled device can, for exam ple, use a PCMCI A dialup m odem , while a higher- end flavor can have WiFi. The Linux kernel support s PCMCI A devices on a variet y of archit ect ures. I n t his chapt er, let 's explore t he support present in t he kernel for PCMCI A/ CF host adapt ers and client devices.

W h a t 's PCM CI A/ CF? PCMCI A is a 16- bit dat a- t ransfer int erface specificat ion originally used by m em ory cards. CF cards are sm aller, but com pat ible wit h PCMCI A, and are frequent ly used in handheld devices such as PDAs and digit al cam eras. CF cards have only 50 pins but can be slipped int o your lapt op's 68- pin PCMCI A slot using a passive CF- t o- PCMCI A adapt er. PCMCI A and CF have been confined t o t he lapt op and handheld space and have not m ade inroads int o deskt ops and higher- end m achines. The PCMCI A specificat ion has now grown t o include support for higher speeds in t he form of 32- bit CardBus cards. The t erm PC Card is used while referring t o eit her PCMCI A or CardBus devices. CardBus is closer t o t he PCI bus, so t he kernel has m oved support for CardBus devices from t he PCMCI A layer t o t he PCI layer. The lat est t echnology specificat ion from t he PCMCI A indust ry st andards group is t he ExpressCard, which is com pat ible wit h PCI Express, a new bus t echnology based on PCI concept s. We look at CardBus and ExpressCard when we discuss PCI in t he next chapt er. PC cards com e in t hree flavors in t he increasing order of t hickness: Type I ( 3.3m m ) , Type I I ( 5m m ) , and Type I I I ( 10.5m m ) . Figure 9.1 shows PCMCI A bus connect ion on a lapt op, and Figure 9.2 illust rat es PCMCI A on an em bedded device. As you m ight have not iced, t he PCMCI A host cont roller bridges t he PCMCI A card wit h t he syst em bus. Lapt ops and t heir derivat ives generally have a PCMCI A host cont roller chip connect ed t o t he PCI bus, while several em bedded cont rollers have a PCMCI A host cont roller built in t o t heir silicon. The cont roller m aps card m em ory t o host I / O and m em ory windows and rout es int errupt s generat ed by t he card t o a suit able processor int errupt line.

Figu r e 9 .1 . PCM CI A on a la pt op.

Figu r e 9 .2 . PCM CI A on a n e m be dde d syst e m .

Ch a pt e r 9 . PCM CI A a n d Com pa ct Fla sh I n Th is Ch a pt e r 258 What 's PCMCI A/ CF? 260 Linux- PCMCI A Subsyst em 262 Host Cont roller Drivers 263 PCMCI A Core 263 Driver Services 264 Client Drivers 271 Tying t he Pieces Toget her 272 PCMCI A St orage 272 Serial PCMCI A 273 Debugging 275 Looking at t he Sources

Today's popular t echnologies such as wireless and wired Et hernet , General Packet Radio Service ( GPRS) , Global Posit ioning Syst em ( GPS) , m iniat ure st orage, and m odem s are ubiquit ous in t he form fact or of PCMCI A ( an acronym for Personal Com put er Mem ory Card I nt ernat ional Associat ion) or CF ( Com pact Flash) cards. Most lapt ops and m any em bedded devices support PCMCI A or CF int erfaces, t hus inst ant ly enabling t hem t o t ake advant age of t hese t echnologies. On em bedded syst em s, PCMCI A/ CF slot s offer a t echnology upgrade pat h wit hout t he need t o re- spin t he board. A cost - reduced version of an I nt ernet - enabled device can, for exam ple, use a PCMCI A dialup m odem , while a higher- end flavor can have WiFi. The Linux kernel support s PCMCI A devices on a variet y of archit ect ures. I n t his chapt er, let 's explore t he support present in t he kernel for PCMCI A/ CF host adapt ers and client devices.

W h a t 's PCM CI A/ CF? PCMCI A is a 16- bit dat a- t ransfer int erface specificat ion originally used by m em ory cards. CF cards are sm aller, but com pat ible wit h PCMCI A, and are frequent ly used in handheld devices such as PDAs and digit al cam eras. CF cards have only 50 pins but can be slipped int o your lapt op's 68- pin PCMCI A slot using a passive CF- t o- PCMCI A adapt er. PCMCI A and CF have been confined t o t he lapt op and handheld space and have not m ade inroads int o deskt ops and higher- end m achines. The PCMCI A specificat ion has now grown t o include support for higher speeds in t he form of 32- bit CardBus cards. The t erm PC Card is used while referring t o eit her PCMCI A or CardBus devices. CardBus is closer t o t he PCI bus, so t he kernel has m oved support for CardBus devices from t he PCMCI A layer t o t he PCI layer. The lat est t echnology specificat ion from t he PCMCI A indust ry st andards group is t he ExpressCard, which is com pat ible wit h PCI Express, a new bus t echnology based on PCI concept s. We look at CardBus and ExpressCard when we discuss PCI in t he next chapt er. PC cards com e in t hree flavors in t he increasing order of t hickness: Type I ( 3.3m m ) , Type I I ( 5m m ) , and Type I I I ( 10.5m m ) . Figure 9.1 shows PCMCI A bus connect ion on a lapt op, and Figure 9.2 illust rat es PCMCI A on an em bedded device. As you m ight have not iced, t he PCMCI A host cont roller bridges t he PCMCI A card wit h t he syst em bus. Lapt ops and t heir derivat ives generally have a PCMCI A host cont roller chip connect ed t o t he PCI bus, while several em bedded cont rollers have a PCMCI A host cont roller built in t o t heir silicon. The cont roller m aps card m em ory t o host I / O and m em ory windows and rout es int errupt s generat ed by t he card t o a suit able processor int errupt line.

Figu r e 9 .1 . PCM CI A on a la pt op.

Figu r e 9 .2 . PCM CI A on a n e m be dde d syst e m .

Lin u x - PCM CI A Su bsyst e m Linux- PCMCI A support is available on I nt el- based lapt ops as well as on archit ect ures such as ARM, MI PS, and PowerPC. The PCMCI A subsyst em consist s of device drivers for PCMCI A host cont rollers, client drivers for different cards, a daem on t hat aids hot plugging, user m ode ut ilit ies, and a Card Services m odule t hat int eract s wit h all of t hese. Figure 9.3 illust rat es t he int eract ion bet ween t he m odules t hat const it ut e t he Linux- PCMCI A subsyst em .

Figu r e 9 .3 . Th e Lin u x - PCM CI A su bsyst e m .

[ View full size im age]

The Old Linux- PCMCI A Subsyst em The Linux- PCMCI A subsyst em has recent ly undergone an overhaul. To get PCMCI A working wit h 2.6.13 and newer kernels, you need t he pcm ciaut ils package (ht t p: / / kernel.org/ pub/ linux/ ut ils/ kernel/ pcm cia/ howt o.ht m l) , which obsolet es t he pcm cia- cs package ( ht t p: / / pcm cia- cs.sourceforge.net ) used wit h earlier kernels. I nt ernal kernel program m ing int erfaces and dat a st ruct ures have also changed. Earlier kernels relied on a user space daem on called cardm gr t o support hot plugging, but t he new PCMCI A im plem ent at ion handles hot plug using udev, j ust as ot her bus subsyst em s do. So wit h new set ups, you don't need cardm gr and should m ake sure t hat it is not st art ed. There is a m igrat ion guide at ht t p: / / kernel.org/ pub/ linux/ ut ils/ kernel/ pcm cia/ cardm gr- t o- pcm ciaut ils.ht m l.

Figure 9.3 cont ains t he following com ponent s:

Host cont roller device drivers t hat im plem ent low- level rout ines for com m unicat ing wit h t he PCMCI A host cont roller. Your handheld and lapt op have different host cont rollers and, hence, use different host cont roller drivers. Each PCMCI A slot t hat t he host cont roller support s is called a socket .

PCMCI A client drivers ( XX_cs in Figure 9.3) t hat respond t o socket event s such as card insert ion and ej ect ion. This is t he driver t hat you are m ost likely t o im plem ent when you at t em pt t o Linux- enable a PCMCI A card. The XX_cs driver usually works in t andem wit h a generic driver ( XX in Figure 9.3) t hat is not PCMCI A- specific. I n relat ion t o Figure 9.3, if your device is a PCMCI A I DE disk, XX is t he I DE disk driver, XX_cs is t he ide_cs driver, XX- dependent layers are filesyst em layers, and XX- applicat ions are program s t hat access dat a files. XX_cs configures t he generic driver ( XX) wit h resources such as I RQs, I / O base addresses, and m em ory windows.

The PCMCI A core t hat provides services t o host cont roller drivers and client drivers. The core provides an infrast ruct ure t hat m akes driver im plem ent at ions sim pler and adds a level of indirect ion t hat renders client drivers independent of host cont rollers. I rrespect ive of whet her you are using your Bluet oot h CF card on an XScale- based handheld or an x86- based lapt op, t he sam e client drivers can be pressed int o service.

A driver services m odule (ds) t hat offers regist rat ion int erfaces and bus services t o client drivers.

The pcm ciaut ils package, which cont ains t ools such as pccardct l t hat cont rol t he st at e of PCMCI A socket s and select bet ween different card- configurat ion schem es.

Figure 9.4 glues kernel m odules on t op of Figure 9.1 t o illust rat e how t he Linux- PCMCI A subsyst em int eract s wit h hardware on a PC- com pat ible syst em .

Figu r e 9 .4 . Re la t in g PCM CI A dr ive r com pon e n t s w it h PC h a r dw a r e .

[ View full size im age]

I n t he following sect ions, let 's t ake a closer look at t he com ponent s const it ut ing t he Linux- PCMCI A subsyst em . To bet t er underst and t he role of t hese com ponent s and t heir int eract ion, we will insert a PCMCI A WiFi card int o a lapt op and t race t he code flow in t he sect ion "Tying t he Pieces Toget her."

H ost Con t r olle r D r ive r s Whereas t he generic card driver ( XX) is responsible for handling int errupt s generat ed by t he card funct ion ( say, receive int errupt s when a PCMCI A net work card receives dat a packet s) , t he host cont roller driver is responsible for handling bus- specific int errupt s t riggered by event s such as card insert ion and ej ect ion. Figure 9.2 shows t he block diagram of an em bedded device designed around an em bedded cont roller t hat has built - in PCMCI A support . Even if you are using a cont roller support ed by t he kernel PCMCI A layer, you m ight need t o t weak t he host cont roller driver ( for exam ple, t o configure GPI O lines used for det ect ing card insert ion event s or swit ching power t o t he socket ) depending on your board's design. I f you are port ing t he kernel t o a St rongARM- based handheld, for exam ple, t ailor drivers/ pcm cia/ sa1100_assabet .c t o suit your hardware. This chapt er does not cover t he im plem ent at ion of host cont roller device drivers.

PCM CI A Cor e Card Services is t he m ain const it uent of t he PCMCI A core. I t offers a set of services t o client drivers and host cont roller drivers. I t cont ains a kernel t hread called pccardd t hat polls for socket - relat ed event s. Pccardd not ifies t he Driver Services event handler ( discussed in t he next sect ion) when t he host cont roller report s event s such as card insert ion and card rem oval. Anot her com ponent of t he PCMCI A core is a library t hat m anipulat es t he Card I nform at ion St ruct ure ( CI S) t hat is part of PCMCI A cards. PCMCI A/ CF cards have t wo m em ory spaces: At t ribut e m em ory and Com m on m em ory. At t ribut e m em ory cont ains t he CI S and card configurat ion regist ers. At t ribut e m em ory of a PCMCI A I DE disk, for exam ple, cont ains it s CI S and regist ers t hat specify t he sect or count and t he cylinder num ber. Com m on m em ory in t his case cont ains t he m em ory array t hat holds disk dat a. The PCMCI A core offers CI S m anipulat ion rout ines such as pccard_get_first_tuple(), pccard_get_next_tuple(), and pccard_parse_tuple() t o client drivers. List ing 9.2 uses t he assist ance of som e of t hese funct ions. The PCMCI A core passes CI S inform at ion t o user space via sysfs and udev. Ut ilit ies such as pccardct l, part of t he pcm ciaut ils package, depend on sysfs and udev for t heir operat ion. This sim plifies t he earlier design approach t hat relied on a cust om infrast ruct ure when t hese facilit ies were absent in t he kernel.

D r ive r Se r vice s Driver Services provides an infrast ruct ure t hat offers t he following:

A handler t hat cat ches event alert s dispat ched by t he pccardd kernel t hread. The handler scans and validat es t he card's CI S space and t riggers t he load of an appropriat e client driver.

A layer t hat has t he t ask of com m unicat ing wit h t he kernel's bus core. To t his end, Driver Services im plem ent s t he pcmcia_bus_type and relat ed bus operat ions.

Service rout ines such as pcmcia_register_driver() t hat client drivers use t o regist er t hem selves wit h t he PCMCI A core. The exam ple driver in List ing 9.1 uses som e of t hese rout ines.

Clie n t D r ive r s The client device driver ( XX_cs in Figure 9.3) looks at t he card's CI S space and configures t he card depending on t he inform at ion it gat hers.

D a t a St r u ct u r e s Before proceeding t o develop an exam ple PCMCI A client driver, let 's m eet som e relat ed dat a st ruct ures:

1 . A PCMCI A device is ident ified by t he pcmcia_device_id st ruct ure defined in include/ linux/ m od_devicet able.h : struct pcmcia_device_id { /* ... */ __u16 manf_id; __u16 card_id; __u8 func_id; /* ... */ };

/* Manufacturer ID */ /* Card ID */ /* Function ID */

manf_id, card_id, and func_id hold t he card's m anufact urer I D, card I D, and funct ion I D, respect ively. The PCMCI A core offers a m acro called PCMCIA_DEVICE_MANF_CARD() t hat creat es a pcmcia_device_id st ruct ure from t he m anufact urer and card I Ds supplied t o it . Anot her kernel m acro called MODULE_DEVICE_TABLE() m arks t he support ed pcmcia_device_ids in t he m odule im age so t hat t he m odule can be loaded on dem and when t he card is insert ed and t he PCMCI A subsyst em gleans m at ching m anufact urer/ card/ funct ion I Ds from t he card's CI S space. We explored t his m echanism in t he sect ion "Module Aut oload" in Chapt er 4 , " Laying t he Groundwork." This procedure is analogous t o t hat used by device drivers for t wo ot her popular I / O buses t hat support hot plugging: PCI and USB. Table 9.1 gives a heads- up on t he sim ilarit ies bet ween drivers for t hese t hree bus t echnologies. Don't worry if t hat is hard t o digest ; we will have a det ailed discussion on PCI and USB in t he following chapt ers.

Ta ble 9 .1 . D e vice I D s a n d H ot plu g M e t h ods for PCM CI A, PCI , a n d USB PCM CI A

PCI

USB

Device I D t able st ruct ure

pcmcia_device_id

pci_device_id

usb_device_id

Macro t o creat e a device I D

PCMCIA_DEVICE_MANF_CARD() PCI_DEVICE()

USB_DEVICE()

Device represent at ion

struct pcmcia_device

struct pci_dev

struct usb_device

Driver represent at ion

struct pcmcia_driver

struct pci_driver struct usb_driver

Hot plug m et hods

probe() and remove()

probe() and remove()

probe() and disconnect()

Hot plug event det ect ion

pccardd kt hread

PCI - fam ilydependent

khubd kt hread

2 . PCMCI A client drivers need t o associat e t heir pcmcia_device_id t able wit h t heir probe() and remove() m et hods. This t ie up is achieved by t he pcmcia_driver st ruct ure: struct pcmcia_driver { int (*probe)(struct pcmcia_device *dev);

/* Probe method */ void (*remove)(struct pcmcia_device *dev); /* Remove method */ /* ... */ struct pcmcia_device_id *id_table; /* Device ID table */ /* ... */

}; 3 . struct pcmcia_device int ernally represent s a PCMCI A device and is defined as follows in drivers/ pcm cia/ ds.h: struct pcmcia_device { /* ... */ io_req_t io; irq_req_t irq; config_req_t conf; /* ... */ struct device dev; /* ... */ };

/* I/O attributes*/ /* IRQ settings */ /* Configuration */ /* Connection to device model */

4 . CI S m anipulat ion rout ines use a tuple_t st ruct ure defined in include/ pcm cia/ cist pl.h t o hold a CI S inform at ion unit . A CISTPL_LONGLINK_MFC t uple t ype, for exam ple, cont ains inform at ion relat ed t o a m ult ifunct ion card. For t he full list of t uples and t heir descript ions, look at include/ pcm cia/ cist pl.h and ht t p: / / pcm cia- cs.sourceforge.net / ft p/ doc/ PCMCI A- PROG.ht m l. typedef struct tuple_t { /* ... */ cisdata_t TupleCode; /* ... */ cisdata_t DesiredTuple; /* ... */ cisdata_t *TupleData;

/* See include/pcmcia/cistpl.h */ /* Identity of the desired tuple */ /* Buffer space */

}; 5 . The CI S cont ains configurat ion t able ent ries for each configurat ion t hat t he card support s. cistpl_cftable_entry_t, defined in include/ pcm cia/ cist pl.h, holds such an ent ry: typedef struct cistpl_cftable_entry_t { /* ... */ cistpl_power_t vcc, vpp1, vpp2; /* Voltage level */ cistpl_io_t io; /* I/O attributes */ cistpl_irq_t irq; /* IRQ settings */ cistpl_mem_t mem; /* Memory window */ /* ... */ }; 6.

6 . cisparse_t, also defined in include/ pcm cia/ cist pl.h, holds a t uple parsed by t he PCMCI A core: typedef union cisparse_t { /* ... */ cistpl_manfid_t manfid; /* ... */ cistpl_cftable_entry_t cftable_entry;

/* Manf ID */ /* Configuration table entry */

/* ... */ } cisparse_t;

D e vice Ex a m ple : PCM CI A Ca r d Let 's develop a skelet al client device driver ( because t oo m any det ails will m ake it a loaded discussion) t o learn t he workings of t he PCMCI A subsyst em . The im plem ent at ion is general, so you m ay use it as a t em plat e irrespect ive of whet her your card im plem ent s net working, st orage, or som e ot her t echnology. Only t he XX_cs driver is im plem ent ed; t he generic XX driver is assum ed t o be available off t he shelf. As alluded t o earlier, PCMCI A drivers cont ain probe() and remove() m et hods t o support hot plugging. List ing 9.1 regist ers t he driver's probe() m et hod, remove() m et hod, and pcmcia_device_id t able wit h t he PCMCI A core. XX_probe() get s invoked when t he associat ed PCMCI A card is insert ed, and XX_remove() is called when t he card is ej ect ed. List in g 9 .1 . Re gist e r in g a Clie n t D r ive r Code View: #include /* Definition of struct pcmcia_device */ static struct pcmcia_driver XX_cs_driver = { .owner = THIS_MODULE, .drv = { .name = "XX_cs", /* Name */ }, .probe = XX_probe, /* Probe */ .remove = XX_remove, /* Release */ .id_table = XX_ids, /* ID table */ .suspend = XX_suspend, /* Power management */ .resume = XX_resume, /* Power management */ }; #define XX_MANFUFACTURER_ID #define XX_CARD_ID

0xABCD 0xCDEF

/* Device's manf_id */ /* Device's card_id */

/* Identity of supported cards */ static struct pcmcia_device_id XX_ids[] = { PCMCIA_DEVICE_MANF_CARD(XX_MANFUFACTURER_ID, XX_CARD_ID), PCMCIA_DEVICE_NULL, }; MODULE_DEVICE_TABLE(pcmcia, XX_ids); /* For module autoload */ /* Initialization */ static int __init init_XX_cs(void) { return pcmcia_register_driver(&XX_cs_driver); }

/* Probe Method */ static int XX_probe(struct pcmcia_device *link) { /* Populate the pcmcia_device structure allotted for this card by the core. First fill in general information */ /* ... */ /* Fill in attributes related to I/O windows and interrupt levels */ XX_config(link); /* See Listing 9.2 */ }

List ing 9.2 shows t he rout ine t hat configures t he generic device driver ( XX) wit h resource inform at ion such as I / O and m em ory window base addresses. Aft er t his st ep, dat a flow t o and from t he PCMCI A card passes t hrough XX and is t ransparent t o t he rest of t he layers. Any int errupt s generat ed by t he PCMCI A card, such as t hose relat ed t o dat a recept ion or t ransm it com plet ion for net work cards, are handled by t he int errupt handler t hat is part of XX. List ing 9.2 is loosely based on drivers/ net / wireless/ airo_cs.c, t he client driver for t he Cisco Aironet 4500 and 4800 series of PCMCI A WiFi cards. The list ing uses t he services of t he PCMCI A core t o do t he following:

Obt ain a suit able configurat ion t able ent ry t uple from t he card's CI S Parse t he t uple Glean card configurat ion inform at ion such as I / O base addresses and power set t ings from t he parsed t uple Request allocat ion of an int errupt line

I t t hen configures t he chipset - specific driver ( XX) wit h t he inform at ion previously obt ained.

List in g 9 .2 . Con figu r in g t h e Ge n e r ic D e vice D r ive r Code View: #include #include #include #include /* This makes the XX device available to the system. XX_config() is based on airo_config(), defined in drivers/net/wireless/airo_cs.c */ static int XX_config(struct pcmcia_device *link) { tuple_t tuple; cisparse_t parse; u_char buf[64]; /* Populate a tuple_t structure with the identity of the desired tuple. In this case, we're looking for a configuration table

entry */ tuple.DesiredTuple tuple.Attributes tuple.TupleData tuple.TupleDataMax

= = = =

CISTPL_CFTABLE_ENTRY; 0; buf; sizeof(buf);

/* Walk the CIS for a matching tuple and glean card configuration information such as I/O window base addresses */ /* Get first tuple */ CS_CHECK(GetFirstTuple, pcmcia_get_first_tuple(link, &tuple)); while (1){ cistpl_cftable_entry_t dflt = {0}; cistpl_cftable_entry_t *cfg = &(parse.cftable_entry); /* Read a configuration tuple from the card's CIS space */ if (pcmcia_get_tuple_data(link, &tuple) != 0 || pcmcia_parse_tuple(link, &tuple, &parse) != 0) { goto next_entry; } /* We have a matching tuple! */ /* Configure power settings in the pcmcia_device based on what was found in the parsed tuple entry */ if (cfg->vpp1.present & (1vpp1.param[CISTPL_POWER_VNOM]/10000; /* ... */ /* Configure I/O window settings in the pcmcia_device based on what was found in the parsed tuple entry */ if ((cfg->io.nwin > 0) || (dflt.io.nwin > 0)) { cistpl_io_t *io = (cfg->io.nwin) ? &cfg->io : &dflt.io; /* ... */ if (!(io->flags & CISTPL_IO_8BIT)) { link->io.Attributes1 = IO_DATA_PATH_WIDTH_16; } link->io.BasePort1 = io->win[0].base; /* ... */ } /* ... */ break; next_entry: CS_CHECK(GetNextTuple, pcmcia_get_next_tuple(link, &tuple); } /* Allocate IRQ */ if (link->conf.Attributes & CONF_ENABLE_IRQ) { CS_CHECK(RequestIRQ, pcmcia_request_irq(link, &link->irq)); } /* ... */ /* Invoke init_XX_card(), which is part of the generic XX driver (so, not shown in this listing), and pass the I/O base and IRQ information obtained above */ init_XX_card(link->irq.AssignedIRQ, link->io.BasePort1, 1, &handle_to_dev(link));

/* The chip-specific (form factor independent) driver is ready to take responsibility of this card from now on! */ }

Tyin g t h e Pie ce s Toge t h e r As you saw in Figure 9.3, t he PCMCI A layer consist s of various com ponent s. The dat a- flow pat h bet ween t he com ponent s can som et im es get com plicat ed. Let 's t race t he code pat h from t he t im e you insert a PCMCI A card unt il an applicat ion st art s t ransferring dat a t o t he card. Assum e t hat a Cisco Aironet PCMCI A card is insert ed ont o a lapt op having an 82365- com pat ible PCMCI A host cont roller: 1.

The PCMCI A host cont roller driver (drivers/ pcm cia/ yent a_socket .c) det ect s t he insert ion event via it s int errupt service rout ine and m akes not e of it using suit able dat a st ruct ures.

2.

The pccardd kernel t hread t hat is part of Card Services (drivers/ pcm cia/ cs.c) sleeps on a wait queue unt il t he host cont roller driver wakes it up when it det ect s t he card insert ion in St ep 1.

3.

Card Services dispat ches an insert ion event t o Driver Services (drivers/ pcm cia/ ds.c) . This t riggers execut ion of t he event handler regist ered by Driver Services during init ializat ion.

4.

Driver Services validat es t he card's CI S space, det erm ines inform at ion about t he insert ed device such as it s m anufact urer I D and card I D, and regist ers t he device wit h t he kernel. The appropriat e client device driver ( drivers/ net / wireless/ airo_cs.c) is t hen loaded. Revisit our previous discussion on MODULE_DEVICE_TABLE() t o see how t his is accom plished.

5.

The client driver ( airo_cs.c) loaded in St ep 4 init ializes and regist ers it self using pcmcia_register_driver(), as shown in List ing 9.1. This regist rat ion int erface int ernally set s t he bus t ype of t he device t o pcmcia_bus_type. PCMCI A bus operat ions such as probe() and remove(), defined by Driver Services (ds.c) , are also int ernally regist ered.

6.

The kernel invokes t he bus probe() operat ion regist ered by Driver Services in St ep 5. This in t urn, invokes t he probe() m et hod owned by t he m at ching client driver ( airo_probe()) , also regist ered in St ep 5. The client probe() rout ine populat es set t ings, such as I / O windows and int errupt lines, and configures t he generic chipset - specific driver ( drivers/ net / wireless/ airo.c) , as shown in List ing 9.2.

7.

The chipset driver (airo.c) creat es a net work int erface ( et hX) and is responsible for norm al operat ion from t his point onward. I t 's t his driver t hat handles int errupt s generat ed by t he card in response t o packet recept ion and t ransm it com plet ion. The form fact or of t he device ( for exam ple, whet her it 's a PCMCI A or a PCI card) is t ransparent t o t he chipset driver as well as t o t he applicat ions t hat operat e over et hX.

PCM CI A St or a ge Today's PCMCI A/ CF st orage support densit ies in t he gigabyt e realm . The st orage cards com e in different flavors:

Miniat ure I DE disk drives or m icrodrives. These are t iny versions of m echanical hard drives t hat use m agnet ic m edia. Their dat a t ransfer rat es are t ypically higher t han solid st at e m em ory devices, but I DE drives have spin- up and seek lat encies before dat a can be t ransferred. The I DE Card Services driver ide_cs, in conj unct ion wit h legacy I DE drivers, is used t o com m unicat e wit h such m em ory cards.

Solid- st at e m em ory cards t hat em ulat e I DE. Such cards have no m oving part s and are usually based on flash m em ory, which is t ransparent t o t he operat ing syst em because of t he I DE em ulat ion. Because t hese drives are effect ively I DE- based, t he sam e I DE Card Services driver ( ide_cs) can be used t o t alk t o t hem .

Mem ory cards t hat use flash m em ory, but wit hout I DE em ulat ion. The m em ory_cs Card Services driver provides block and charact er int erfaces over such cards. The block int erface is used t o put a filesyst em ont o card m em ory, whereas t he charact er int erface is used t o access raw dat a. You m ay also use m em ory_cs t o read t he at t ribut e m em ory space of any PCMCI A card.

Se r ia l PCM CI A Many net working t echnologies such as General Packet Radio Service ( GPRS) , Global Syst em for Mobile Com m unicat ions ( GSM) , Global Posit ioning Syst em ( GPS) , and Bluet oot h use a serial t ransport m echanism t o com m unicat e wit h host syst em s. I n t his sect ion, let 's find out how t he PCMCI A layer handles cards t hat feat ure such t echnologies. Not e t hat t his sect ion is only t o help you underst and t he bus int erface part of GPRS, GSM, and Bluet oot h cards having a PCMCI A/ CF form fact or. The t echnologies t hem selves are discussed in det ail in Chapt er 16, " Linux Wit hout Wires." The generic serial Card Services driver, ser ial_cs, allows t he rest of t he operat ing syst em t o see t he PCMCI A/ CF card as a serial device. The first unused serial device, / dev/ t t ySX, get s allot t ed t o t he card. serial_cs t hus em ulat es a serial port over GPRS, GSM, and GPS cards. I t also allows Bluet oot h PCMCI A/ CF cards t hat use a serial t ransport t o t ransfer Host Cont rol I nt erface ( HCI ) packet s t o Bluet oot h prot ocol layers. Figure 9.5 illust rat es how kernel m odules im plem ent ing different net working t echnologies int eract wit h serial_cs t o com m unicat e wit h t heir respect ive cards.

Figu r e 9 .5 . N e t w or k in g w it h PCM CI A/ CF ca r ds t h a t u se se r ia l t r a n spor t .

[ View full size im age]

The Point - t o- Point Prot ocol ( PPP) allows net working prot ocols such as TCP/ I P t o run over a serial link. I n t he cont ext of Figure 9.5, PPP get s TCP/ I P applicat ions running over GPRS and GSM dialup. The PPP daem on, pppd, at t aches over virt ual serial port s em ulat ed by serial_cs. The PPP kernel m odules—ppp_generic, ppp_async, and slhc—have t o be loaded for pppd t o work. I nvoke pppd as follows: bash> pppd ttySX call connection-script

where connect ion- script is a file cont aining com m and sequences t hat pppd exchanges wit h t he service provider t o est ablish a link. The connect ion script depends on t he part icular card t hat is being used. A GPRS card would need a cont ext st ring t o be sent as part of t he connect ion script , whereas a GSM card m ight need an exchange of passwords. An exam ple connect ion script is described in t he sect ion "GPRS" in Chapt er 16.

D e bu ggin g To effect ively debug PCMCI A/ CF client drivers, you need t o see debug m essages em it t ed by t he PCMCI A core. For t his, enable CONFIG_PCMCIA_DEBUG ( Bus opt ions PCCARD support Enable PCCARD debugging) during kernel configurat ion. Verbosit y levels of t he debug out put can be cont rolled eit her via t he pcmcia_core.pc_debug kernel com m and- line argum ent or using t he pc_debug m odule insert ion param et er. I nform at ion about PC Card client drivers is available in t he process filesyst em ent ry, / proc/ bus/ pccard/ drivers. Look at / sys/ bus/ pcm cia/ devices/ * for card- specific inform at ion such as m anufact urer and card I Ds. Take a look inside / proc/ bus/ pci/ t o know m ore about your PCMCI A host cont roller if your syst em uses a PCI - t o- PCMCI A bridge. / proc/ int errupt s list s I RQs act ive on your syst em , including t hose used by t he PCMCI A layer. There is a m ailing list dedicat ed t o Linux- PCMCI A at ht t p: / / list s.infradead.org/ m ailm an/ list info/ linux- pcm cia.

Look in g a t t h e Sou r ce s I n t he Linux source t ree, t he drivers/ pcm cia/ direct ory cont ains t he sources for Card Services, Driver Services, and host cont roller drivers. Look at drivers/ pcm cia/ yent a_socket .c for t he host cont roller driver t hat runs on m any x86- based lapt ops. Header files present in include/ pcm cia/ cont ain PCMCI A- relat ed st ruct ure definit ions. Client drivers live alongside ot her drivers belonging t o t he associat ed device class. So, you will find drivers for PCMCI A net working cards inside drivers/ net / pcm cia/ . The client driver for PCMCI A m em ory devices t hat em ulat e I DE is drivers/ ide/ legacy/ ide- cs.c. See drivers/ serial/ serial_cs.c for t he client driver used by PCMCI A m odem s. Table 9.2 sum m arizes t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he kernel t ree. Table 9.3 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 9 .2 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

pcmcia_device_id

include/ linux/ m od_devicet able.h I dent it y of a PCMCI A card.

pcmcia_device

include/ pcm cia/ ds.h

Represent at ion of a PCMCI A device.

pcmcia_driver

include/ pcm cia/ ds.h

Represent at ion of a PCMCI A client driver.

tuple_t

include/ pcm cia/ cist pl.h

CI S m anipulat ion rout ines use a tuple_t st ruct ure t o hold inform at ion.

cistpl_cftable_entry_t include/ pcm cia/ cist pl.h include/ pcm cia/ cist pl.h

cisparse_t

D e scr ipt ion

Configurat ion t able ent ry in t he CI S space. A parsed CI S t uple.

Ta ble 9 .3 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

pcmcia_register_driver()

drivers/ pcm cia/ ds.c

Regist ers a driver wit h t he PCMCI A core

pcmcia_unregister_driver() drivers/ pcm cia/ ds.c

Unregist ers a driver from t he PCMCI A core

pcmcia_get_first_tuple() pcmcia_get_tuple_data() pcmcia_parse_tuple()

include/ pcm cia/ cist pl.h drivers/ pcm cia/ cist pl.c

Library rout ines t o m anipulat e CI S space

pcmcia_request_irq()

drivers/ pcm cia/ pcm cia_resource.c Get s an I RQ assigned for a PCMCI A card

Ch a pt e r 1 0 . Pe r iph e r a l Com pon e n t I n t e r con n e ct I n Th is Ch a pt e r 278 The PCI Fam ily 281 Addressing and I dent ificat ion 285 Accessing PCI Regions 288 Direct Mem ory Access 292 Device Exam ple: Et hernet - Modem Card 308 Debugging 308 Looking at t he Sources

Peripheral Com ponent I nt erconnect ( PCI ) is an om nipresent I / O backbone. Whet her you are backing up dat a on a st orage server, capt uring video from your deskt op, or surfing t he web from your lapt op, PCI m ight be serving you in som e avat ar or t he ot her. PCI , and form fact ors adapt ed or derived from PCI such as Mini PCI , CardBus, PCI Ext ended, PCI Express, PCI Express Mini Card, and Express Card have becom e de fact o peripheral connect ion t echnologies on t oday's com put ers.

Th e PCI Fa m ily PCI is a high- speed bus used for com m unicat ion bet ween t he CPU and I / O devices. The PCI specificat ion enables t ransfer of 32 bit s of dat a in parallel at 33MHz or 66MHz, yielding a peak t hroughput of 266MBps. CardBus is a derivat ive of PCI and has t he form fact or of a PC Card. CardBus cards are also 32- bit s wide and run at 33MHz. Even t hough CardBus and PCMCI A cards use t he sam e 68- pin connect ors, CardBus devices support 32 dat a lines com pared t o 16 for PCMCI A by m ult iplexing address and dat a lines as done in t he PCI bus.

Mini PCI , also a 33MHz 32- bit bus, is anot her adapt at ion of PCI found in sm all- foot print com put ers such as lapt ops. A PCI card can t alk via a Mini PCI slot using a com pat ible connect or. An ext ension t o PCI called PCI Ext ended ( or PCI - X) expands t he bus widt h t o 64 bit s, frequency t o 133MHz, and t he t hroughput t o about 1GBps. PCI - X 2.0 is t he current version of t he st andard. PCI Express ( PCI e or PCI - E) is t he present generat ion of t he PCI fam ily. Unlike t he parallel PCI bus, PCI e uses a serial prot ocol t o t ransfer dat a. PCI e support s a m axim um of 32 serial links. Each PCI e link ( in t he com m only used version 1.1 of t he specificat ion) yields a t hroughput of 250MBps in each t ransfer direct ion, t hus producing a m axim um PCI e dat a rat e of 8GBps in each direct ion. PCI e 2.0 is t he current version of t he st andard and support s higher dat a rat es. Serial com m unicat ion is fast er and cheaper t han parallel dat a t ransfer due t o t he absence of fact ors such as signal int erference, so t he indust ry t rend is t o m ove from parallel buses t o serial t echnologies. PCI e and it s adapt at ions aim t o replace PCI and it s derivat ives, and t his shift is also part of t he m et hodology change from parallel t o serial com m unicat ion. Several I / O int erfaces discussed in t his book, such as RS- 232, USB, FireWire, SATA, Et hernet , Fibre Channel, and I nfiniBand, are serial com m unicat ion archit ect ures. The CardBus equivalent in t he PCI e fam ily is t he Express Card. Express Cards direct ly connect t o t he syst em bus via a PCI e link or USB 2.0 ( discussed in t he next chapt er) , and circum vent m iddlem en such as CardBus cont rollers. Mini PCI 's cousin in t he PCI e fam ily is PCI Express Mini Card. Recent lapt ops support Express Card slot s inst ead of ( or in addit ion t o) CardBus, and PCI Express Mini Card slot s in place of Mini PCI . The form er t wo have sm aller foot print s and higher speeds com pared t o t he lat t er t wo. Table 10.1 sum m arizes t he im port ant relat ives of PCI . From t he kernel's perspect ive, all t hese t echnologies are com pat ible wit h one anot her. A kernel PCI driver will work wit h all relat ed t echnologies m ent ioned previously; so even t hough we base exam ple code in t his chapt er on a CardBus card, t he concept s apply t o ot her PCI derivat ives, t oo.

Ta ble 1 0 .1 . PCI 's Siblin gs, Ch ildr e n , a n d Cou sin s Bu s N a m e

Cha r a ct e r ist ics

For m Fa ct or

PCI

32- bit bus at 33MHz or 66MHz; yields up t o 266MBps.

I nt ernal slot in deskt ops and servers.

Mini PCI

32- bit bus at 33MHz.

I nt ernal slot in lapt ops.

CardBus

32- bit bus at 33MHz.

Ext ernal PC card slot in lapt ops. Com pat ible wit h PCI .

PCI Ext ended ( PCI - X)

64- bit bus at 133 MHz, yielding up t o 1GBps.

I nt ernal slot in deskt ops and servers. Wider t han PCI , but a PCI card can be plugged int o a PCI - X slot .

PCI Express ( PCI e)

250MBps per PCI e link in each t ransfer direct ion, yielding a m axim um t hroughput of 8GBps in each direct ion.

Replaces t he int ernal PCI slot in newer syst em s. PCI e is a serial prot ocol unlike nat ive PCI , which is parallel.

PCI Express Mini Card

250MBps in each direct ion if t he Replaces Mini PCI as t he int ernal int erface is based on a PCI e link; slot in newer lapt ops. Sm aller 60MBps if t he int erface is based form fact or t han Mini PCI . on USB 2.0.

Bu s N a m e

Cha r a ct e r ist ics

For m Fa ct or

Express Card

250MBps in each direct ion if t he int erface is based on a PCI e link; 60MBps if t he int erface is based on USB 2.0.

Thin ext ernal slot in newer lapt ops t hat replaces CardBus. I nt erfaces wit h t he syst em bus via PCI e or USB 2.0.

Solut ions based on t he PCI fam ily are available for a vast spect rum of hardware dom ains:

Net working t echnologies such as Gigabit Et hernet , WiFi, ATM, Token Ring, and I SDN.

Host adapt ers for st orage t echnologies, such as SCSI .

Host cont rollers for I / O buses such as USB, FireWire, I DE, I 2 C, and PCMCI A. On PC- com pat ible syst em s, t hese host cont rollers funct ion as bridges bet ween t he PCI cont roller on t he Sout h Bridge and t he bus t echnology t hey source. Verify t his by running lspci ( discussed lat er) .

Graphics, video st ream ing, and dat a capt ure.

Serial port and parallel port cards.

Sound cards.

Devices such as Wat chdogs, EDAC- capable m em ory cont rollers, and gam e port s.

For t he driver developer, t he PCI fam ily offers an at t ract ive advant age: a syst em of aut om at ic device configurat ion. Unlike drivers for t he older I SA generat ion, PCI drivers need not im plem ent com plex probing logic. During boot , t he BI OS- t ype boot firm ware ( or t he kernel it self if so configured) walks t he PCI bus and assigns resources such as int errupt levels and I / O base addresses. The device driver gleans t his assignm ent by peeking at a m em ory region called t he PCI configurat ion space. PCI devices possess 256 byt es of configurat ion m em ory. The t op 64 byt es of t he configurat ion space is st andardized and holds regist ers t hat cont ain det ails such as t he st at us, int errupt line, and I / O base addresses. PCI e and PCI - X 2.0 offer an ext ended configurat ion space of 4KB. We will learn how t o operat e on t he PCI configurat ion space lat er. Figure 10.1 shows PCI in a PC- com pat ible syst em . Com ponent s int egrat ed int o t he Sout h Bridge such as cont roller silicon for USB, I DE, I 2 C, LPC, and Et hernet reside off t he PCI bus. Som e of t hese cont rollers cont ain an int ernal PCI - t o- PCI bridge t o source a dedicat ed PCI bus for t he respect ive I / O t echnology. The Sout h Bridge addit ionally cont ains an ext ernal PCI bus t o connect I / O peripherals such as CardBus cont rollers and WiFi chipset s. Figure 10.1 also shows PCI address t uples corresponding t o each connect ed subsyst em . This will get clearer when we learn about PCI addressing next .

Figu r e 1 0 .1 . PCI in side a PC Sou t h Br idge .

Ch a pt e r 1 0 . Pe r iph e r a l Com pon e n t I n t e r con n e ct I n Th is Ch a pt e r 278 The PCI Fam ily 281 Addressing and I dent ificat ion 285 Accessing PCI Regions 288 Direct Mem ory Access 292 Device Exam ple: Et hernet - Modem Card 308 Debugging 308 Looking at t he Sources

Peripheral Com ponent I nt erconnect ( PCI ) is an om nipresent I / O backbone. Whet her you are backing up dat a on a st orage server, capt uring video from your deskt op, or surfing t he web from your lapt op, PCI m ight be serving you in som e avat ar or t he ot her. PCI , and form fact ors adapt ed or derived from PCI such as Mini PCI , CardBus, PCI Ext ended, PCI Express, PCI Express Mini Card, and Express Card have becom e de fact o peripheral connect ion t echnologies on t oday's com put ers.

Th e PCI Fa m ily PCI is a high- speed bus used for com m unicat ion bet ween t he CPU and I / O devices. The PCI specificat ion enables t ransfer of 32 bit s of dat a in parallel at 33MHz or 66MHz, yielding a peak t hroughput of 266MBps. CardBus is a derivat ive of PCI and has t he form fact or of a PC Card. CardBus cards are also 32- bit s wide and run at 33MHz. Even t hough CardBus and PCMCI A cards use t he sam e 68- pin connect ors, CardBus devices support 32 dat a lines com pared t o 16 for PCMCI A by m ult iplexing address and dat a lines as done in t he PCI bus.

Mini PCI , also a 33MHz 32- bit bus, is anot her adapt at ion of PCI found in sm all- foot print com put ers such as lapt ops. A PCI card can t alk via a Mini PCI slot using a com pat ible connect or. An ext ension t o PCI called PCI Ext ended ( or PCI - X) expands t he bus widt h t o 64 bit s, frequency t o 133MHz, and t he t hroughput t o about 1GBps. PCI - X 2.0 is t he current version of t he st andard. PCI Express ( PCI e or PCI - E) is t he present generat ion of t he PCI fam ily. Unlike t he parallel PCI bus, PCI e uses a serial prot ocol t o t ransfer dat a. PCI e support s a m axim um of 32 serial links. Each PCI e link ( in t he com m only used version 1.1 of t he specificat ion) yields a t hroughput of 250MBps in each t ransfer direct ion, t hus producing a m axim um PCI e dat a rat e of 8GBps in each direct ion. PCI e 2.0 is t he current version of t he st andard and support s higher dat a rat es. Serial com m unicat ion is fast er and cheaper t han parallel dat a t ransfer due t o t he absence of fact ors such as signal int erference, so t he indust ry t rend is t o m ove from parallel buses t o serial t echnologies. PCI e and it s adapt at ions aim t o replace PCI and it s derivat ives, and t his shift is also part of t he m et hodology change from parallel t o serial com m unicat ion. Several I / O int erfaces discussed in t his book, such as RS- 232, USB, FireWire, SATA, Et hernet , Fibre Channel, and I nfiniBand, are serial com m unicat ion archit ect ures. The CardBus equivalent in t he PCI e fam ily is t he Express Card. Express Cards direct ly connect t o t he syst em bus via a PCI e link or USB 2.0 ( discussed in t he next chapt er) , and circum vent m iddlem en such as CardBus cont rollers. Mini PCI 's cousin in t he PCI e fam ily is PCI Express Mini Card. Recent lapt ops support Express Card slot s inst ead of ( or in addit ion t o) CardBus, and PCI Express Mini Card slot s in place of Mini PCI . The form er t wo have sm aller foot print s and higher speeds com pared t o t he lat t er t wo. Table 10.1 sum m arizes t he im port ant relat ives of PCI . From t he kernel's perspect ive, all t hese t echnologies are com pat ible wit h one anot her. A kernel PCI driver will work wit h all relat ed t echnologies m ent ioned previously; so even t hough we base exam ple code in t his chapt er on a CardBus card, t he concept s apply t o ot her PCI derivat ives, t oo.

Ta ble 1 0 .1 . PCI 's Siblin gs, Ch ildr e n , a n d Cou sin s Bu s N a m e

Cha r a ct e r ist ics

For m Fa ct or

PCI

32- bit bus at 33MHz or 66MHz; yields up t o 266MBps.

I nt ernal slot in deskt ops and servers.

Mini PCI

32- bit bus at 33MHz.

I nt ernal slot in lapt ops.

CardBus

32- bit bus at 33MHz.

Ext ernal PC card slot in lapt ops. Com pat ible wit h PCI .

PCI Ext ended ( PCI - X)

64- bit bus at 133 MHz, yielding up t o 1GBps.

I nt ernal slot in deskt ops and servers. Wider t han PCI , but a PCI card can be plugged int o a PCI - X slot .

PCI Express ( PCI e)

250MBps per PCI e link in each t ransfer direct ion, yielding a m axim um t hroughput of 8GBps in each direct ion.

Replaces t he int ernal PCI slot in newer syst em s. PCI e is a serial prot ocol unlike nat ive PCI , which is parallel.

PCI Express Mini Card

250MBps in each direct ion if t he Replaces Mini PCI as t he int ernal int erface is based on a PCI e link; slot in newer lapt ops. Sm aller 60MBps if t he int erface is based form fact or t han Mini PCI . on USB 2.0.

Bu s N a m e

Cha r a ct e r ist ics

For m Fa ct or

Express Card

250MBps in each direct ion if t he int erface is based on a PCI e link; 60MBps if t he int erface is based on USB 2.0.

Thin ext ernal slot in newer lapt ops t hat replaces CardBus. I nt erfaces wit h t he syst em bus via PCI e or USB 2.0.

Solut ions based on t he PCI fam ily are available for a vast spect rum of hardware dom ains:

Net working t echnologies such as Gigabit Et hernet , WiFi, ATM, Token Ring, and I SDN.

Host adapt ers for st orage t echnologies, such as SCSI .

Host cont rollers for I / O buses such as USB, FireWire, I DE, I 2 C, and PCMCI A. On PC- com pat ible syst em s, t hese host cont rollers funct ion as bridges bet ween t he PCI cont roller on t he Sout h Bridge and t he bus t echnology t hey source. Verify t his by running lspci ( discussed lat er) .

Graphics, video st ream ing, and dat a capt ure.

Serial port and parallel port cards.

Sound cards.

Devices such as Wat chdogs, EDAC- capable m em ory cont rollers, and gam e port s.

For t he driver developer, t he PCI fam ily offers an at t ract ive advant age: a syst em of aut om at ic device configurat ion. Unlike drivers for t he older I SA generat ion, PCI drivers need not im plem ent com plex probing logic. During boot , t he BI OS- t ype boot firm ware ( or t he kernel it self if so configured) walks t he PCI bus and assigns resources such as int errupt levels and I / O base addresses. The device driver gleans t his assignm ent by peeking at a m em ory region called t he PCI configurat ion space. PCI devices possess 256 byt es of configurat ion m em ory. The t op 64 byt es of t he configurat ion space is st andardized and holds regist ers t hat cont ain det ails such as t he st at us, int errupt line, and I / O base addresses. PCI e and PCI - X 2.0 offer an ext ended configurat ion space of 4KB. We will learn how t o operat e on t he PCI configurat ion space lat er. Figure 10.1 shows PCI in a PC- com pat ible syst em . Com ponent s int egrat ed int o t he Sout h Bridge such as cont roller silicon for USB, I DE, I 2 C, LPC, and Et hernet reside off t he PCI bus. Som e of t hese cont rollers cont ain an int ernal PCI - t o- PCI bridge t o source a dedicat ed PCI bus for t he respect ive I / O t echnology. The Sout h Bridge addit ionally cont ains an ext ernal PCI bus t o connect I / O peripherals such as CardBus cont rollers and WiFi chipset s. Figure 10.1 also shows PCI address t uples corresponding t o each connect ed subsyst em . This will get clearer when we learn about PCI addressing next .

Figu r e 1 0 .1 . PCI in side a PC Sou t h Br idge .

Addr e ssin g a n d I de n t ifica t ion PCI devices are addressed using bus, dev ice, and funct ion num bers, and t hey are ident ified via vendorI Ds, deviceI Ds, and class codes. Let 's learn t hese concept s wit h t he help of t he lspci ut ilit y t hat is part of t he PCI Ut ilit ies package downloadable from ht t p: / / m j .ucw.cz/ pciut ils.sht m l. Assum e t hat you're using a Xircom Et hernet - Modem m ult ifunct ion CardBus card on a Pent ium - class lapt op served by a Texas I nst rum ent s PCI 4510 CardBus cont roller, as shown in Figure 10.1. Run lspci: Code View: bash>lspci 00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02) ... 02:00.0 CardBus bridge: Texas Instruments PCI4510 PC card Cardbus Controller (rev 03) ... 03:00.0 Ethernet controller: Xircom Cardbus Ethernet 10/100 (rev 03) 03:00.1 Serial controller: Xircom Cardbus Ethernet + 56k Modem (rev 03)

Consider t he t uple (XX:YY.Z) at t he beginning of each ent ry in t he preceding out put . XX st ands for t he PCI bus num ber. A PCI dom ain can host up t o 256 buses. I n t he lapt op used previously, t he CardBus bridge is connect ed t o PCI bus 2. This bridge sources anot her PCI bus num bered 3 t hat host s t he Xircom card. YY is t he PCI device num ber. Each bus can connect t o a m axim um of 32 PCI devices. Each device can, in t urn, im plem ent up t o eight funct ions represent ed by Z. The Xircom card can sim ult aneously perform t wo funct ions. Thus, 03:00.0 addresses t he Et hernet funct ion of t he card, while 03:00.1 corresponds t o it s m odem com m unicat ion funct ion. I ssue lspci –t t o elicit a t ree- like layout of t he PCI buses and devices on your syst em : bash> lspci –t -[0000:00]-+-00.0 +-00.1 +-00.3 +-02.0 +-02.1 +-1d.0 +-1d.1 +-1d.2 +-1d.7 +-1e.0-[0000:02-05]--+-[0000:03]-+-00.0 | | \-00.1 | \-[0000:02]-+-00.0 | +-00.1 | +-01.0 | \-02.0 +-1f.0

As you can see from t he preceding out put ( and in Figure 10.1) , t o walk t he PCI bus and reach t he Xircom m odem ( 03:00.01) or Et hernet cont roller ( 03:00.0) , you have t o st art from your PCI dom ain ( labeled 0000 in

t he preceding out put ) , t raverse a PCI - t o- PCI bridge (00:1e.0), and t hen cross a PCI - t o- CardBus host cont roller (02:0.0) . The sysfs represent at ion of t he PCI subsyst em m irrors t his layout : bash> ls /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/0000:03:00.0/ ... net:eth2 Ethernet ... bash> ls /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/0000:03:00.1/ ... tty:ttyS1 Modem ...

As you saw earlier, PCI devices possess a 256- byt e m em ory region t hat holds configurat ion regist ers. This space is t he key t o ident ify t he m ake and capabilit ies of PCI cards. Let 's t ake a peek inside t he configurat ion spaces of t he CardBus cont roller and t he Xircom dual- funct ion card previously used. The Xircom card has t wo configurat ion spaces, one per support ed funct ion: Code View: bash> lspci –x 00:00.0 Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02) 00: 86 80 80 35 06 01 90 20 02 00 00 06 00 00 80 00 10: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 14 10 5c 05 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 ... 02:00.0 CardBus bridge: Texas Instruments PCI4510 PC card Cardbus Controller (rev 03) 00: 4c 10 44 ac 07 00 10 02 03 00 07 06 20 a8 82 00 10: 00 00 00 b0 a0 00 00 22 02 03 04 b0 00 00 00 f0 20: 00 f0 ff f1 00 00 00 d2 00 f0 ff d3 00 30 00 00 30: fc 30 00 00 00 34 00 00 fc 34 00 00 0b 01 00 05 ... 03:00.0 Ethernet controller: Xircom Cardbus Ethernet 10/100 (rev 03) 00: 5d 11 03 00 07 00 10 02 03 00 00 02 00 40 80 00 10: 01 30 00 00 00 00 00 d2 00 08 00 d2 00 00 00 00 20: 00 00 00 00 00 00 00 00 07 01 00 00 5d 11 81 11 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 14 28 03:00.1 Serial controller: Xircom Cardbus Ethernet + 56k Modem (rev 03) 00: 5d 11 03 01 03 00 10 02 03 02 00 07 00 00 80 00 10: 81 30 00 00 00 10 00 d2 00 18 00 d2 00 00 00 00 20: 00 00 00 00 00 00 00 00 07 02 00 00 5d 11 81 11 30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 00 00

PCI regist ers are lit t le- endian, so fact or t hat while int erpret ing t he preceding out put . You m ay also dum p PCI configurat ion regions via sysfs. So, t o look at t he configurat ion space of t he Et hernet funct ion of t he Xircom card, do t his: Code View: bash> od -x /sys/devices/pci0000:00/0000:00:1e.0/0000:02:00.0/0000:03:00.1/config 0000000 115d 0003 0007 0210 0003 0200 4000 0080 0000020 3001 0000 0000 d200 0800 d200 0000 0000 0000040 0000 0000 0000 0000 0107 0000 115d 1181

...

Table 10.2 explains som e of t he values shown in t he preceding dum p. The first t wo byt es cont ain t he vendor I D, which ident ifies t he com pany t hat m anufact ured t he card. PCI vendor I Ds are m aint ained and assigned globally. ( Point your browser t o www.pcidat abase.com for a dat abase.) As you can decipher from t he preceding out put , I nt el, Texas I nst rum ent s, and Xircom ( now acquired by I nt el) own vendor I Ds of 0x8086, 0x104C, and 0x115D, respect ively. The next t wo byt es are specific t o t he funct ionalit y of t he card and const it ut e it s device I D. From t he preceding out put , t he Et hernet funct ionalit y of t he Xircom card owns a device I D of 0x0003, while t he m odem answers t o a device I D of 0x0103. PCI cards addit ionally possess subvendor and subdevice I Ds ( see words at offset s 44 and 46 in t he preceding dum p) t o furt her pinpoint t heir ident it y.

Ta ble 1 0 .2 . PCI Con figu r a t ion Spa ce Se m a n t ics Con figu r a t ion Spa ce Offse t

Se m a n t ics

Va lu e s fr om t h e D u m p Ou t pu t for t h e Xir com Ca r d

0

Vendor I D

0x115D

2

Device I D

0x0003

10

Class code

0x0200

16 t o 39

Base address regist er 0 ( BAR 0) t o BAR5

0x3001...0000

44

Subvendor I D

0x115D

46

Subdevice I D

0x1181

Ten byt es int o t he configurat ion space lies t he code t hat describes t he class of t he device. PCI bridges have a class code st art ing wit h 0x06, net work devices possess a class code beginning wit h 0x02, and com m unicat ion devices own a class code com m encing wit h 0x07. Thus, in t he preceding exam ple, t he CardBus bridge, t he Et hernet card, and t he serial m odem own class codes of 0x0607, 0x0200, and 0x0700, respect ively. You can find class code definit ions in include/ linux/ pci_ids.h. PCI drivers regist er t he vendor I Ds, device I Ds, and class codes t hat t hey support wit h t he PCI subsyst em . Using t his dat abase, t he PCI subsyst em binds an insert ed card t o t he appropriat e device driver aft er gleaning it s ident it y from it s configurat ion space. We will see how t his is done when we im plem ent an exam ple driver lat er.

Acce ssin g PCI Re gion s PCI devices cont ain t hree addressable regions: configurat ion space, I / O port s, and device m em ory. Let 's learn how t o access t hese m em ory regions from a device driver.

Con figu r a t ion Spa ce The kernel offers a set of six funct ions t hat your driver can use t o operat e on PCI configurat ion space: pci_read_config_[byte|word|dword](struct pci_dev *pdev, int offset, int *value); and pci_write_config_[byte|word|dword](struct pci_dev *pdev, int offset, int value);

I n t he argum ent list , struct pci_dev is t he PCI device st ruct ure, and offset is t he byt e posit ion in t he configurat ion space t hat you want t o access. For read funct ions, value is a point er t o a supplied dat a buffer, and for writ e rout ines, it cont ains t he dat a t o be writ t en. Let 's consider som e exam ples:

To decipher t he I RQ num ber assigned t o a card funct ion, use t he following: unsigned char irq; pci_read_config_byte(pdev, PCI_INTERRUPT_LINE, &irq); As per t he PCI specificat ion, offset 60 inside t he PCI configurat ion space holds t he I RQ num ber assigned t o t he card. All configurat ion regist er offset s are expressively defined in include/ linux/ pci_regs.h, so use PCI_INTERRUPT_LINE rat her t han 60 t o specify t his offset . Sim ilarly, t o read t he PCI st at us regist er ( t wo byt es at offset six in t he configurat ion space) , do t his: unsigned short status; pci_read_config_word(pdev, PCI_STATUS, &status); Only t he first 64 byt es of t he configurat ion space are st andardized. The device m anufact urer defines desired sem ant ics t o t he rest . The Xircom card used earlier, assigns four byt es at offset 64 for power m anagem ent purposes. To disable power m anagem ent , t he Xircom CardBus driver, drivers/ net / t ulip/ xircom _cb.c, does t his: #define PCI_POWERMGMT 0x40 pci_write_config_dword(pdev, PCI_POWERMGMT, 0x0000);

I / O a n d M e m or y PCI cards have up t o six I / O or m em ory regions. I / O regions cont ain regist ers, and m em ory regions hold dat a. Video cards, for exam ple, have I / O spaces t hat accom m odat e cont rol regist ers and m em ory regions t hat m ap t o fram e buffers. Not all cards have addressable m em ory regions, however. The sem ant ics of I / O and m em ory spaces are hardware- dependent and can be obt ained from t he device dat a sheet . Like for configurat ion m em ory, t he kernel offers a set of helpers t o operat e on I / O and m em ory regions of PCI devices:

Code View: unsigned long pci_resource_[start|len|end|flags] (struct pci_dev *pdev, int bar);

To operat e on an I / O region such as t he device cont rol regist ers of a PCI video card, t he driver needs t o do t he following: 1.

Get t he I / O base address from t he appropriat e base address regist er ( bar) in t he configurat ion space:

unsigned long io_base = pci_resource_start(pdev, bar); This assum es t hat t he device cont rol regist ers for t his card are m apped t o t he m em ory region associat ed wit h bar, whose value can range from 0 t hrough 5, as shown in Table 10.2. 2.

Mark t his region as being spoken for, using t he kernel's request_region() regulat ory m echanism discussed in Chapt er 5 , " Charact er Drivers" :

request_region(io_base, length, "my_driver"); Here, length is t he size of t he cont rol regist er space and my_driver ident ifies t he region's owner. Look for t he ent ry cont aining my_driver in / pr oc/ iopor t s t o spot t his m em ory region. You m ay inst ead use t he wrapper funct ion pci_request_region(), defined in drivers/ pci/ pci.c. 3.

Add t he regist er's offset obt ained from t he dat a- sheet , t o t he base address gleaned in St ep 1. Operat e on t his address using t he inb() and outb() fam ily of funct ions discussed in Chapt er 5 :

/* Read */ register_data = inl(io_base + REGISTER_OFFSET); /* Use */ /* ... */ /* Write */ outl(register_data, iobase + REGISTER_OFFSET);

To operat e on a m em ory region such as t he fram e buffer on t he above PCI video card, follow t hese st eps:

1.

Get t he base address, lengt h, and flags associat ed wit h t he m em ory region:

unsigned long mmio_base = pci_resource_start(pdev, bar); unsigned long mmio_length = pci_resource_length(pdev, bar); unsigned long mmio_flags = pci_resource_flags(pdev, bar); This assum es t hat t his m em ory is m apped t o t he base address regist er, bar. 2.

Mark ownership of t his region using t he kernel's request_mem_region() regulat ory m echanism :

request_mem_region(mmio_base, mmio_length, "my_driver"); You m ay inst ead use t he wrapper funct ion pci_request_region(), m ent ioned previously. 3.

Obt ain CPU access t o t he device m em ory obt ained in St ep 1. Cert ain m em ory regions, such as t he ones t hat hold regist ers, need t o guard against side effect s, so t hey are m arked as not being prefet chable ( or cacheable) by t he CPU. Ot her regions, such as t he one used in t his exam ple, can be cached. Depending on t he access flag, use t he appropriat e funct ion t o obt ain kernel virt ual addresses corresponding t o t he m apped region:

void __iomem *buffer; if (flags & IORESOURCE_CACHEABLE) { buffer = ioremap(mmio_base, mmio_length); } else { buffer = ioremap_nocache(mmio_base, mmio_length); }

To be safe, and t o avoid perform ing t he preceding checks, use t he services of pci_iomap() defined in lib/ iom ap.c inst ead: buffer = pci_iomap(pdev, bar, mmio_length);

D ir e ct M e m or y Acce ss Direct Mem ory Access ( DMA) is t he capabilit y t o t ransfer dat a from a peripheral t o m ain m em ory wit hout t he CPU's int ervent ion. DMA boost s t he perform ance of peripherals m anyfold, because it doesn't burn CPU cycles t o m ove dat a. PCI net working cards and I DE disk drives are com m on exam ples of peripherals relying on DMA for dat a t ransfer. DMA is init iat ed by a DMA m ast er. The PC m ot herboard has a DMA cont roller on t he Sout h Bridge t hat can m ast er t he I / O bus and init iat e DMA t o or from a peripheral. This is usually t he case for legacy I SA cards. However, buses such as PCI can m ast er t he bus and init iat e DMA t ransfers. CardBus cards are sim ilar t o PCI and also support DMA m ast ering. PCMCI A devices, on t he ot her hand, do not support DMA m ast ering, but t he PCMCI A cont roller, which is usually wired t o a PCI bus, m ight have DMA m ast ering capabilit ies. The issue of cache coherency is synonym ous wit h DMA. For opt im um perform ance, processors cache recent ly accessed byt es, so dat a passing bet ween t he CPU and m ain m em ory st ream s t hrough t he processor cache. During DMA, however, dat a t ravels direct ly bet ween t he DMA cont roller and m ain m em ory and, hence, bypasses t he processor cache. This evasion has t he pot ent ial t o int roduce inconsist encies because t he processor m ight work on st ale dat a living in it s cache. Som e archit ect ures aut om at ically synchronize t he cache wit h m ain m em ory using a t echnique called bus snooping. Many ot hers rely on soft ware t o achieve coherency, however. We will learn how t o perform coherent DMA operat ions aft er int roducing a few m ore t opics. DMA can occur synchronously or asynchronously. An exam ple of t he form er is DMA from a syst em fram e buffer t o an LCD cont roller. A user applicat ion writ es pixel dat a t o a DMA- m apped fram e buffer via / dev/ fbX, while t he LCD cont roller uses DMA t o collect t his dat a synchronously at t im ed int ervals. We discuss m ore about t his in Chapt er 12, " Video Drivers." An exam ple of asynchronous DMA is t he t ransm it and receive of dat a fram es bet ween t he CPU and a net work card discussed in Chapt er 15, " Net work I nt erface Cards." Syst em m em ory regions t hat are t he source or dest inat ion of DMA t ransfers are called DMA buffers. I f a bus int erface has addressing lim it at ions, t hat 'll affect t he m em ory range t hat can hold DMA buffers. So, DMA buffers suit able for a 24- bit bus such as I SA can live only in t he bot t om 16MB of syst em m em ory called ZONE_DMA ( see t he sect ion " Allocat ing Mem ory" in Chapt er 2 , " A Peek I nside t he Kernel" ) . PCI buses are 32- bit s wide by default , so you won't usually face such lim it at ions on 32- bit plat form s. To inform t he kernel about any special needs of DMA- able buffers, use t he following: dma_set_mask(struct device *dev, u64 mask);

I f t his funct ion ret urns success, you m ay DMA t o any address t hat is mask bit s in lengt h. For exam ple, t he e1000 PCI - X Gigabit Et hernet driver (drivers/ net / e1000/ e1000_m ain.c) does t he following: if (!(err = pci_set_dma_mask(pdev, DMA_64BIT_MASK))) { /* System supports 64-bit DMA */ pci_using_dac = 1; } else { /* See if 32-bit DMA is supported */ if ((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) { /* No, let's abort */ E1000_ERR("No usable DMA configuration, aborting\n"); return err; } /* 32-bit DMA */ pci_using_dac = 0; }

I / O devices view DMA buffers t hrough t he lens of t he bus cont roller and any int ervening I / O m em ory m anagem ent unit ( I OMMU) . Because of t his, I / O devices work wit h bus addresses, rat her t han physical or kernel virt ual addresses. So, when you inform a PCI card about t he locat ion of a DMA buffer, you have t o let it know t he buffer's bus address. DMA service rout ines m ap t he kernel virt ual address of DMA buffers t o bus addresses so t hat bot h t he device and t he CPU can access t he buffers. Bus addresses are of t ype dma_addr_t, defined in include/ asm - your- arch/ t ypes.h . There are a couple m ore concept s wort h knowing about DMA. One is t he idea of bounce buffers. Bounce buffers reside in DMA- able regions and are used as t em porary m em ory when DMA is request ed t o/ from non- DMA- able m em ory regions. An exam ple is DMA t o an address higher t han 4GB from a 32- bit PCI peripheral when t here is no int ervening I OMMU. Dat a is first t ransferred t o a bounce buffer and t hen copied t o t he final dest inat ion. The second concept is a flavor of DMA called scat t er- gat her. When dat a t o be DMA'ed is spread over discont inuous regions, scat t er-gat her capabilit y enables t he hardware t o gat her cont ent s of t he scat t ered buffers at one go. The reverse occurs when dat a is DMA'ed from t he card t o buffers scat t ered in m em ory. Scat t er- gat her capabilit y boost s perform ance by elim inat ing t he need t o service m ult iple DMA request s. The kernel feat ures a healt hy API t hat m asks m any of t he int ernal det ails of configuring DMA. This API get s sim pler if you are writ ing a driver for a PCI card t hat support s bus m ast ering. ( Most PCI cards do.) PCI DMA rout ines are essent ially wrappers around t he generic DMA service rout ines and are defined in include/ asm generic/ pci- dm a- com pat .h. I n t his chapt er, we use only t he PCI DMA API . The kernel provides t wo classes of DMA service rout ines t o PCI drivers:

1 . Consist ent ( or coherent ) DMA access m et hods. These rout ines guarant ee dat a coherency in t he face of DMA act ivit y. I f bot h t he PCI device and t he CPU are likely t o frequent ly operat e on a DMA buffer, consist ency is crucial, so use t he consist ent API . The t rade- off is a degree of perform ance penalt y. To obt ain a consist ent DMA buffer, call t his service rout ine: void *pci_alloc_consistent(struct pci_dev *pdev, size_t size, dma_addr_t *dma_handle); This funct ion allocat es a DMA buffer, generat es it s bus address, and ret urns t he associat ed kernel virt ual address. The first t wo argum ent s respect ively hold t he PCI device st ruct ure ( which is discussed lat er) and t he size of t he request ed DMA buffer. The t hird argum ent , dma_handle, is a point er t o t he bus address t hat t he funct ion call generat es. The following snippet allocat es and frees a consist ent DMA buffer: /* Allocate */ void *vaddr = pci_alloc_consistent(pdev, size, &dma_handle); /* Use */ /* ... */ /* Free */ pci_free_consistent(pdev, size, vaddr, dma_handle); 2 . St r eam ing DMA access m et hods. These rout ines do not guarant ee consist ency and are fast er as a result . They are useful when t here is not m uch need for shared access bet ween t he CPU and t he I / O device. When a st ream ed buffer has been m apped for device access, t he driver has t o explicit ly unm ap ( or sync) it before t he CPU can reliably operat e on it . There are t wo fam ilies of st ream ing access rout ines: pci_[map|unmap|dma_sync]_single() and pci_[map|unmap|dma_sync]_sg(). The first funct ion fam ily m aps, unm aps, and synchronizes a single preallocat ed DMA buffer. pci_map_single() is prot ot yped as follows: dma_addr_t pci_map_single(struct pci_dev *pdev, void *ptr,

size_t size, int direction); The first t hree argum ent s respect ively hold t he PCI device st ruct ure, t he kernel virt ual address of a preallocat ed DMA buffer, and t he size of t he supplied buffer. The fourt h argum ent , direction, can be one of t he following: PCI_DMA_BIDIRECTION, PCI_DMA_TODEVICE, PCI_DMA_FROMDEVICE, or PCI_DMA_NONE. The nam es are self- explanat ory, but t he first opt ion is expensive, and t he last is for debugging. We discuss st ream ed DMA m apping furt her when we develop an exam ple driver lat er. The second funct ion fam ily m aps, unm aps, and synchronizes a scat t er- gat her list of DMA buffers. pci_map_sg() is prot ot yped as follows: int pci_map_sg(struct pci_dev *pdev, struct scatterlist *sgl, int num_entries, int direction); The scat t ered list is specified using t he second argum ent , struct scatterlist, defined in include/ asm your- arch/ scat t erlist .h. num_entries is t he num ber of ent ries in t he scatterlist. The first and last argum ent s are t he sam e as t hat described for pci_map_single(). The funct ion ret urns t he num ber of m apped ent ries: num_mapped = pci_map_sg(pdev, sgl, num_entries, PCI_DMA_TODEVICE); for (i=0; ihard_start_xmit = &mydevice_xmit; /* See Listing 10.6 */ /* More net_dev initializations */ /* ... */ /* Get the I/O address for this PCI region. All card registers specified in Table 10.3 are assumed to be in bar 0 */ ioaddr = pci_resource_start(pdev, 0); /* Claim a 128-byte I/O region */ request_region(ioaddr, 128, "ntwrk"); /* Fill in resource information obtained from the PCI layer */ net_dev->base_addr = ioaddr; net_dev->irq = pdev->irq; /* ... */ /* Setup DMA. Defined in Listing 10.5 */ dma_descriptor_setup(pdev); /* Register the driver with the network layer. This will allot an unused ethX interface */ register_netdev(net_dev); /* ... */ } /* Remove method */ static void __devexit net_driver_remove(struct pci_dev *pdev) { /* Free transmit and receive DMA buffers. Defined in Listing 10.5 */ dma_descriptor_release(pdev); /* Release memory regions */ /* ... */ /* Unregister from the networking layer */ unregister_netdev(dev); free_netdev(dev);

/* ... */ }

List in g 1 0 .4 . Pr obin g t h e M ode m Fu n ct ion Code View: /* Probe method */ static int __devinit modem_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct uart_port port; /* See Chapter 6, "Serial Drivers" */ /* Ask low-level PCI code to enable I/O and memory regions for this PCI device */ pci_enable_device(pdev); /* Get the PCI IRQ and I/O address to be used and populate the uart_port structure (see Chapter 6) with these resources. Look at include/linux/pci.h for helper functions */ /* ... */ /* Register this information with the serial layer and get an unused ttySX port allotted to the card. Look at Chapter 6 for more on serial drivers */ serial8250_register_port(&port); /* ... */ } /* Remove method */ static void __devexit modem_driver_remove(struct pci_dev *dev) { /* Unregister the port from the serial layer */ serial8250_unregister_port(&port); /* Disable device */ pci_disable_device(dev); }

To recap, let 's t race t he code pat h from t he t im e you insert t he Et hernet - Modem CardBus card unt il you are allot t ed a net work int erface ( et hX) and a serial port (/ dev/ t t ySX) :

1 . For each support ed CardBus funct ion, t he corresponding driver init ializat ion rout ine regist ers a pci_device_id t able of support ed cards and a probe() rout ine. This is shown in List ing 10.1 and List ing 10.2.

2 . The PCI hot plug layer det ect s card insert ion and gleans t he vendor I D and device I D of each device funct ion from t he card's PCI configurat ion space.

3 . Because t he I Ds m at ch wit h t hose regist ered by t he card's Et hernet and serial drivers, t he corresponding probe() m et hods are invoked. This result s in t he invocat ion of net_driver_probe() and modem_driver_probe() described respect ively in List ing 10.3 and List ing 10.4.

4 . The probe() m et hods configure t he Et hernet and m odem port ions of t he PCI driver wit h resource inform at ion. This leads t o t he allocat ion of a net working int erface ( et hX) and a serial port ( t t ySX) t o t he card. From t his point on, applicat ion dat a flows over t hese int erfaces.

D a t a Tr a n sfe r The net work funct ion belonging t o t he sam ple CardBus device uses t he following st rat egy for dat a t ransfer: The card expect s t he device driver t o supply it wit h an array of t wo receive DMA descript ors and an array of t wo t ransm it DMA descript ors. Each DMA descript or cont ains t he address of an associat ed dat a buffer, it s lengt h, and a cont rol word. You can use t he cont rol word t o t ell t he device whet her t he descript or cont ains valid dat a. For a t ransm it descript or, you m ay also program it t o request an int errupt aft er dat a t ransm ission. The card looks for a valid descript or and DMA's dat a t o/ from t he associat ed dat a buffer. To suit t his elem ent ary schem e, t he exam ple driver uses only t he coherent DMA int erface. The driver coherent ly allocat es a large buffer t hat holds t he descript ors and t heir associat ed dat a buffers. The receive and t ransm it buffers are 1536 byt es long t o m at ch t he m axim um t ransm ission unit ( MTU) of Et hernet fram es. The descript ors and buffers are pict orially shown in Figure 10.2. The t op 24 byt es of each array in t he figure hold t wo 12- byt e DMA descript ors, and t he rest of t he m em ory is occupied by t wo 1536- byt e DMA buffers. The 12- byt e descript or layout shown in t he figure is assum ed t o m at ch t he form at specified in t he card's dat a sheet .

Figu r e 1 0 .2 . D M A de scr ipt or s a n d bu ffe r s for t h e Ca r dBu s de vice .

Table 10.3 shows t he regist er layout of t he card's net work funct ion.

Ta ble 1 0 .3 . Re gist e r La you t of t h e Ca r d's N e t w or k Fu n ct ion Re gist e r N a m e

D e scr ipt ion

Offse t in t o I / O Spa ce

DMA_RX_REGISTER

Holds t he bus address of t he receive DMA descript or array ( dma_bus_rx)

0x0

DMA_TX_REGISTER

Holds t he bus address of t he t ransm it DMA descript or array ( dma_bus_tx)

0x4

CONTROL_REGISTER Cont rol word t hat com m ands t he card t o init iat e DMA, st op DMA, and so on

List in g 1 0 .5 . Se t t in g Up D M A D e scr ipt or s a n d Bu ffe r s Code View: /* Device-specific data structure for the Ethernet Function */ struct device_data { struct pci_dev *pdev; /* The PCI Device structure */ struct net_device *ndev; /* The Net Device structure */ void *dma_buffer_rx; /* Kernel virtual address of the receive descriptor */

0x8

dma_addr_t dma_bus_rx; void *dma_buffer_tx; dma_addr_t dma_bus_tx;

/* Bus address of the receive descriptor */ /* Kernel virtual address of the transmit descriptor */ /* Bus address of the transmit descriptor */

/* ... */ spin_lock_t device_lock; /* Serialize */ } *mydev_data; /* On-card registers related to DMA */ #define DMA_RX_REGISTER_OFFSET 0x0 /* Offset of the register holding the bus address of the RX descriptor */ #define DMA_TX_REGISTER_OFFSET 0x4 /* Offset of the register holding the bus address of the TX descriptor */ #define CONTROL_REGISTER 0x8 /* Offset of the control register */ /* Control Register Defines */ #define INITIATE_XMIT

0x1

/* Descriptor control word definitions */ #define FREE_FLAG 0x1 /* Free Descriptor */ #define INTERRUPT_FLAG 0x2 /* Assert interrupt after DMA */ /* Invoked from Listing 10.3 */ static void dma_descriptor_setup(struct pci_dev *pdev) { /* Allocate receive DMA descriptors and buffers */ mydev_data->dma_buffer_rx = pci_alloc_consistent(pdev, 3096, &mydev_data->dma_bus_rx); /* Fill the two receive descriptors as shown in Figure 10.2 */ /* RX descriptor 1 */ mydev_data->dma_buffer_rx[0] = cpu_to_le32((unsigned long) (mydev_data->dma_bus_rx + 24)); /* Buffer address */ mydev_data->dma_buffer_rx[1] = 1536; /* Buffer length */ mydev_data->dma_buffer_rx[2] = FREE_FLAG; /* Descriptor is free */ /* RX descriptor 2 */ mydev_data->dma_buffer_rx[3] = cpu_to_le32((unsigned long) (mydev_data->dma_bus_rx + 1560)); /* Buffer address */ mydev_data->dma_buffer_rx[4] = 1536; /* Buffer length */ mydev_data->dma_buffer_rx[5] = FREE_FLAG; /* Descriptor is free */ wmb(); /* Write Memory Barrier */ /* Write the address of the receive descriptor to the appropriate register in the card. The I/O base address, ioaddr, was populated in Listing 10.3 */ outl(cpu_to_le32((unsigned long)mydev_data->dma_bus_rx), ioaddr + DMA_RX_REGISTER_OFFSET); /* Allocate transmit DMA descriptors and buffers */ mydev_data->dma_buffer_tx = pci_alloc_consistent(pdev, 3096, &mydev_data->dma_bus_tx); /* Fill the two transmit descriptors as shown in Figure 10.2 */ /* TX descriptor 1 */

mydev_data->dma_buffer_tx[0] = cpu_to_le32((unsigned long) (mydev_data->dma_bus_tx + 24)); /* Buffer address */ mydev_data->dma_buffer_tx[1] = 1536; /* Buffer length */ /* Valid descriptor. Generate an interrupt after completing the DMA */ mydev_data->dma_buffer_tx[2] = (FREE_FLAG | INTERRUPT_FLAG); /* TX descriptor 2 */ mydev_data->dma_buffer_tx[3] = cpu_to_le32((unsigned long) (mydev_data->dma_bus_tx + 1560)); /* Buffer address */ mydev_data->dma_buffer_tx[4] = 1536; /* Buffer length */ mydev_data->dma_buffer_tx[5] = (FREE_FLAG | INTERRUPT_FLAG); wmb(); /* Write Memory Barrier */ /* Write the address of the transmit descriptor to the appropriate register in the card. The I/O base, ioaddr, was populated in Listing 10.3 */ outl(cpu_to_le32((unsigned long)mydev_data->dma_bus_tx), ioaddr + DMA_TX_REGISTER_OFFSET); } /* Invoked from Listing 10.3 */ static void dma_descriptor_release(struct pci_dev *pdev) { pci_free_consistent(pdev, 3096, mydev_data->dma_bus_tx); pci_free_consistent(pdev, 3096, mydev_data->dma_bus_rx); }

List ing 10.5 enforces a writ e barrier by calling wmb() t o prevent t he CPU from reordering t he outl() before populat ing t he DMA descript or. On an x86 processor, wmb() reduces t o a NOP because I nt el CPUs enforce writ es in program order. When writ ing t he DMA descript or address t o t he card and when populat ing t he buffer's bus address inside t he DMA descript or, t he driver convert s t he nat ive byt e order t o PCI lit t le- endian form at using cpu_to_le32(). On I nt el CPUs, t his again has no effect because bot h PCI and I nt el processors com m unicat e in lit t le- endian. On several ot her archit ect ures, for exam ple, an ARM9 CPU running in t he big- endian m ode, bot h wmb() and cpu_to_le32() assum e significance. Now t hat you have t he descript ors and buffers m apped and ready t o go, it 's t im e t o look at how dat a is exchanged bet ween t he syst em and t he CardBus device, as shown in List ing 10.6. We won't dwell on t he net work int erfaces and net working dat a st ruct ures because Chapt er 15 is devot ed t o doing t hat . List in g 1 0 .6 . Re ce ivin g a n d Tr a n sm it t in g D a t a Code View: /* The interrupt handler */ static irqreturn_t mydevice_interrupt(int irq, void *devid) { struct sk_buff *skb; /* ... */ /* If this is a receive interrupt, collect the packet and pass it on to higher layers. Look at the control word in each RX DMA descriptor to figure out whether it contains data. Assume for convenience that the first RX descriptor was used by the card

to DMA this received packet */ packet_size = mydev_data->dma_buffer_rx[1]; /* In real world drivers, the sk_buff is pre-allocated, streammapped, and supplied to the card after setting the FREE_FLAG during device open(). The received data is directly DMA'ed to this sk_buff instead of the memcpy() performed here, as you will learn in Chapter 15. The card clears the FREE_FLAG before the DMA */ skb = dev_alloc_skb(packet_size); /* See Figure 15.2 of Chapter 15 */ skb->dev = mydev_data->ndev; /* Owner network device */ memcpy(skb, mydev_data->dma_buffer_rx[24], packet_size); /* Pass the received data to higher-layer protocols */ skb_put(skb, packet_size); netif_rx(skb); /* ... */ /* Make the descriptor available to the card again */ mydev_data->dma_buffer_rx[2] |= FREE_FLAG; } /* This function is registered in Listing 10.3 and is called from the networking layer. More on network device interfaces in Chapter 15 */ static int mydevice_xmit(struct sk_buff *skb, struct net_device *dev) { /* Use a valid TX descriptor. Look at Figure 10.2 */ /* Locking has been omitted for simplicity */ if (mydev_data->dma_buffer_tx[2] & FREE_FLAG) { /* Use first TX descriptor */ /* In the real world, DMA occurs directly from the sk_buff as you will learn later on! */ memcpy(mydev_data->dma_buffer_tx[24], skb->data, skb->len); mydev_data->dma_buffer_tx[1] = skb->len; mydev_data->dma_buffer_tx[2] &= ~FREE_FLAG; mydev_data->dma_buffer_tx[2] |= INTERRUPT_FLAG; } else if (mydev_data->dma_buffer[5] & FREE_FLAG) { /* Use second TX descriptor */ memcpy(mydev_data->dma_buffer_tx[1560], skb->data, skb->len); mydev_data->dma_buffer_tx[4] = skb->len; mydev_data->dma_buffer_tx[5] &= ~FREE_FLAG; mydev_data->dma_buffer_tx[5] |= INTERRUPT_FLAG; } else { return –EIO; /* Signal error to the caller */ } wmb(); /* Don't reorder writes across this barrier */ /* Ask the card to initiate DMA. ioaddr is defined in Listing 10.3 */ outl(INITIATE_XMIT, ioaddr + CONTROL_REGISTER); }

When t he CardBus device receives an Et hernet packet , it DMAs it t o a free RX descript or and int errupt s t he CPU. The int errupt handler mydevice_interrupt() collect s t he packet from t he receive DMA buffer, copies it t o a net working dat a st ruct ure (sk_buff) , and passes it on t o higher prot ocol layers.

The t ransm it rout ine my_device_xmit() is responsible for init iat ing DMA in t he reverse direct ion. I t DMAs t ransm it packet s t o card m em ory. For t his, my_device_xmit() chooses a TX DMA descript or t hat is unused by t he card ( or whose FREE_FLAG is set ) and uses t he associat ed t ransm it buffer for dat a t ransfer. FREE_FLAG is cleared soon aft er, signaling t hat t he descript or now belongs t o t he card. The descript or is released in t he int errupt handler ( FREE_FLAG is set again) when t he card assert s an int errupt aft er com plet ing t he t ransm it ( not shown in List ing 10.6) . This exam ple driver uses a sim plified buffer m anagem ent schem e t hat is not perform ance- sensit ive. High- speed net work drivers im plem ent a m ore elaborat e plan t hat em ploys a com binat ion of coherent and st ream ing DMA m appings. They m aint ain linked list s of t ransm it and receive descript ors and im plem ent free and in- use pools for buffer m anagem ent . Their receive and t ransm it dat a st ruct ures look like t his: Code View: /* Ring of receive buffers */ struct rx_list { void *dma_buffer_rx; /* Kernel virtual address of the transmit descriptor */ dma_addr_t dma_bus_rx; /* Bus address of the transmit descriptor */ unsigned int size; /* Buffer size */ struct list_head next_desc; /* Pointer to the next element */ struct sk_buff *skb; /* Network Packet */ dma_addr_t sk_bus; /* Bus address of network packet */ } *rxlist; /* Ring of transmit buffers */ struct tx_list { void *dma_buffer_tx; /* Kernel virtual address of the receive descriptor */ dma_addr_t dma_bus_tx; /* Bus address of the transmit descriptor */ unsigned int size; /* Buffer size */ struct list_head next_desc; /* Pointer to the next element */ struct sk_buff *skb; /* Network Packet */ dma_addr_t sk_bus; /* Bus address of network packet */ } *txlist;

The receive and t ransm it DMA descript ors ( rxlist->dma_buffer_rx and txlist->dma_buffer_tx) are m apped coherent ly as done in List ing 10.5. The payload buffers ( rxlist->skb->data and txlist->skb->data) are, however, m apped using st ream ing DMA. The receive buffers are preallocat ed and st ream m apped int o a free pool during device open, while t he t ransm it buffers are m apped on- t he- fly. This avoids t he ext ra dat a copy perform ed by mydevice_interrupt() from t he coherent ly m apped receive DMA buffer t o t he net work buffer ( and t he ext ra copy done by mydevice_xmit() in t he reverse direct ion) . Code View: /* Preallocating/replenishing receive buffers. Also see the section, "Buffer Management and Concurrency Control" in Chapter 15 */ /* ... */ struct sk_buff *skb = dev_alloc_skb(); skb_reserve(skb, NET_IP_ALIGN); /* Map using streaming DMA */ rxlist->sk_bus = pci_map_single(pdev, rxlist->skb->data, rxlist->skb->len, PCI_DMA_TODEVICE);

/* Allocate a DMA descriptor and populate it with the address mapped above. Add the descriptor to the receive descriptor ring */ /* ... */

D e bu ggin g Enable Bus Opt ions PCI Support PCI Debugging in t he kernel configurat ion m enu t o ask t he PCI core t o em it debug m essages. Explore / proc/ bus/ pci/ devices and / sys/ devices/ pciX: Y/ for inform at ion about PCI devices on your syst em such as t he CardBus Et hernet - Modem card discussed in t his chapt er. / proc/ int errupt s list s I RQs act ive on your syst em , including t hose used by t he PCI layer. As you saw, lspci gleans inform at ion about all PCI buses and devices on your syst em . You m ay also use it t o dum p t he configurat ion space of PCI cards. A PCI bus analyzer can help debug low- level problem s and t une perform ance.

Look in g a t t h e Sou r ce s PCI core and bus access rout ines live in drivers/ pci/ . To obt ain a list of helper rout ines offered by t he PCI subsyst em , search for EXPORT_SYMBOL inside t his direct ory. For definit ions and prot ot ypes relat ed t o t he PCI layer, look at include/ linux/ pci* .h. You can spot several PCI device drivers in subdirect ories under drivers/ net / , dr iver s/ scsi/ , and drivers/ video/ . To locat e all PCI drivers, recursively grep t he drivers/ t ree for pci_register_driver(). I f you do not find a good st art ing point t o develop a cust om PCI net work driver, begin wit h t he skelet al PCI net work driver drivers/ net / pci- skelet on.c. For a brief t ut orial on PCI program m ing, look at Docum ent at ion/ pci.t xt . For a descript ion of t he PCI DMA API , read Docum ent at ion/ DMA- m apping.t xt . Table 10.4 sum m arizes t he m ain dat a st ruct ures used in t his chapt er. Table 10.5 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 1 0 .4 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

pci_dev

include/ linux/ pci.h

Represent at ion of a PCI device

pci_driver

include/ linux/ pci.h

Represent at ion of a PCI driver

pci_device_id

include/ linux/ m od_devicet able.h

I dent it y of a PCI card

dma_addr_t

include/ asm - your- arch/ t ypes.h

Bus address of a DMA buffer

scatterlist

include/ asm - your- arch/ scat t erlist .h

Scat t er- gat her list of DMA buffers

sk_buff

include/ linux/ skbuff.h

Main net working dat a st ruct ure ( see Chapt er 15 for m ore explanat ions)

Ta ble 1 0 .5 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

pci_read_config_byte() pci_read_config_word() pci_read_config_dword() pci_write_config_byte() pci_write_config_word() pci_write_config_dword()

include/ linux/ pci.h drivers/ pci/ access.c

Rout ines t o operat e on t he PCI configurat ion space.

pci_resource_start() pci_resource_len() pci_resource_end() pci_resource_flags()

include/ linux/ pci.h

These rout ines operat e on PCI I / O and m em ory regions t o obt ain t he base address, lengt h, end address, and cont rol flags.

pci_request_region()

drivers/ pci/ pci.c

Reserves PCI I / O or m em ory regions.

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

ioremap() ioremap_nocache() pci_iomap()

include/ asm - yourarch/ io.h

Obt ains CPU access t o device m em ory.

arch/ yourarch/ m m / iorem ap.c lib/ iom ap.c pci_set_dma_mask()

drivers/ pci/ pci.c

I f t his funct ion ret urns success, you m ay DMA t o any address wit hin t he m ask specified as argum ent .

pci_alloc_consistent()

include/ asm generic/ pci- dm acom pat .h include/ asm your- arch/ dm am apping.h

Obt ains a cache- coherent DMA bufferm apping.

pci_free_consistent()

include/ asm generic/ pci- dm acom pat .h include/ asm your- arch/ dm am apping.h

Unm aps a cache- coherent DMA buffer.

pci_map_single()

include/ asm generic/ pci- dm acom pat .h include/ asm your- arch/ dm am apping.h

Obt ains a st ream ing DMA buffer m apping.

pci_unmap_single()

include/ asm generic/ pci- dm acom pat .h include/ asm your- arch/ dm am apping.h

Unm aps a st ream ing DMA buffer.

pci_dma_sync_single()

include/ asm generic/ pci- dm acom pat .h include/ asm your- arch/ dm am apping.h

Synchronizes a st ream ing DMA buffer so t hat t he CPU can reliably operat e on it .

pci_map_sg() pci_unmap_sg() pci_dma_sync_sg()

include/ asm generic/ pci- dm acom pat .h include/ asm your- arch/ dm am apping.h

Maps/ unm aps/ synchronizes a scat t er- gat her list of st ream ing DMA buffers.

pci_register_driver()

include/ linux/ pci.h drivers/ pci/ pci- driver.c

Regist ers a driver wit h t he PCI core.

pci_unregister_driver()

drivers/ pci/ pci- driver.c

Unregist ers a driver from t he PCI core.

pci_enable_device()

drivers/ pci/ pci.c

Asks low- level PCI code t o enable I / O and m em ory regions for t his device.

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

pci_disable_device()

drivers/ pci/ pci.c

Reverse of pci_enable_device().

pci_set_master()

drivers/ pci/ pci.c

Set s t he device in DMA busm ast ering m ode.

Ch a pt e r 1 1 . Un ive r sa l Se r ia l Bu s I n Th is Ch a pt e r 312 USB Archit ect ure 317 Linux- USB Subsyst em 317 Driver Dat a St ruct ures 324 Enum erat ion 324 Device Exam ple: Telem et ry Card 338 Class Drivers 348 Gadget Drivers 349 Debugging 351 Looking at t he Sources

Universal serial bus ( USB) is t he de fact o ext ernal bus in t oday's com put ers. USB, wit h it s support for hot plugging, generic class drivers, and versat ile dat a- t ransfer m odes, is t he usual rout e in t he consum er elect ronics space t o bring a diverse spect rum of t echnologies t o com put er syst em s. I t s sweeping popularit y and t he accom panying econom ics of volum e have played a part in fueling t he adopt ion and accept ance of com put er peripheral t echnologies around t he world.

USB Ar ch it e ct u r e

USB is a m ast er- slave prot ocol where a host cont roller com m unicat es wit h client devices. Figure 11.1 shows USB in t he PC environm ent . The USB host cont roller is part of t he Sout h Bridge chipset and com m unicat es wit h t he processor over t he PCI bus.

Figu r e 1 1 .1 . USB in t h e PC e n vir on m e n t .

Figure 11.2 illust rat es USB on an em bedded device. The SoC in t he figure has built - in USB cont roller silicon t hat support s four buses and t hree m odes of operat ion:

Bus 1 runs in host m ode and is wired t o an A- t ype recept acle via a USB t ransceiver ( see t he sidebar "USB Recept acles and Transceivers" ) . You can connect a USB pen drive or a keyboard t o t his port .

Bus 2 also funct ions in host m ode but t he associat ed t ransceiver is connect ed t o an int ernal USB device rat her t han t o a recept acle. Exam ples of int ernal USB devices are biom et ric scanners, crypt ographic engines, print ers, Disk- On- Chips ( DOCs) , t ouch cont rollers, and t elem et ry cards.

Bus 3 runs in device m ode and is wired t o a B- t ype recept acle t hrough a t ransceiver. The B- t ype recept acle connect s t o a host com put er via a B- t o- A cable. I n t his m ode, t he em bedded device funct ions as, for exam ple, a USB pen drive, and export s a st orage part it ion t o t he out side world. Em bedded devices such as MP3 players and cell phones are m ore likely t han PC syst em s t o be at t he device side of USB, so m any em bedded SoCs support a USB device cont roller in addit ion t o a host cont roller.

Bus 4 is driven by an On- The- Go ( OTG) cont roller. You can use t his port , for exam ple, t o eit her connect a pen drive t o your syst em or t o t urn your syst em int o a pen drive and connect it t o a host . Unlike buses 1

t o 3, bus 4 uses an int elligent t ransceiver t hat exchanges cont rol inform at ion wit h t he processor over I 2 C. The t ransceiver is wired t o a Mini- AB OTG recept acle. I f t wo em bedded devices support OTG, t hey can direct ly com m unicat e wit hout t he int ervent ion of a host com put er.

Figu r e 1 1 .2 . USB on a n e m be dde d syst e m .

[ View full size im age]

Most of t his chapt er is writ t en from t he perspect ive of a syst em residing at t he host - side of USB. We briefly look at t he device funct ion in t he sect ion "Gadget Drivers." Mainst ream host cont roller drivers ( HCDs) are already available, so in t his chapt er we furt her confine ourselves t o drivers for USB devices ( also called client drivers) .

USB Recept acles and Transceivers USB host s use four- pin A- t ype rect angular recept acles, whereas USB devices connect via four- pin B- t ype square recept acles. I n bot h cases, t he four pins are different ial dat a signals D+ and D- , a volt age line VBUS, and ground. VBUS is used t o supply power from USB host s t o USB devices. VBUS is t hus pulled high on an A connect or but receives power on a B connect or. USB OTG cont rollers connect t o five- pin Mini- AB rect angular recept acles having a sm aller form fact or. Four of t he Mini- AB pins are ident ical t o what we discussed previously; t he fift h is an I D pin used t o det ect whet her t he connect ed peripheral is a host or a device. The sam e t ransceiver chip ( such as TUSB1105 from Texas I nst rum ent s) can be used on USB host s and devices. You m ay t hus choose t o use t he sam e t ransceiver part on buses 1 t hrough 3 in Figure 11.2. OTG requires a special- purpose t ransceiver chip ( such as I SP1301 from Philips Sem iconduct ors) , however.

Bu s Spe e ds USB support s t hree operat ional speeds. The original USB 1.0 specificat ion support s 1.5MBps, referred t o as lowspeed USB. USB 1.1, t he next version of t he specificat ion, handles 12MBps, called full- speed USB. The current level of t he specificat ion is USB 2.0, which support s 480MBps, or high- speed USB. USB 2.0 is backwardcom pat ible wit h t he earlier versions of t he specificat ion. Peripherals such as USB keyboards and m ice are exam ples of low- speed devices, and USB st orage drives are exam ples of full- speed and high- speed devices. Today's PC syst em s are USB 2.0- com pliant and allow all t hree t arget speeds, but som e em bedded cont rollers adhere t o USB 1.1 and support only full- speed and low- speed m odes of operat ion.

H ost Con t r olle r s USB host cont rollers conform t o one of a few st andards:

Un ive r sa l H ost Con t r olle r I n t e r fa ce ( UH CI ) : The UHCI specificat ion was init iat ed by I nt el, so your PC is likely t o have t his cont roller if it 's I nt el- based.

Ope n H ost Con t r olle r I n t e r fa ce ( OH CI ) : The OHCI specificat ion originat ed from com panies such as Com paq and Microsoft . An OHCI - com pat ible cont roller has m ore int elligence built in t o hardware t han UHCI , so an OHCI HCD is relat ively sim pler t han a UHCI HCD.

En h a n ce d H ost Con t r olle r I n t e r fa ce ( EH CI ) : This is t he host cont roller t hat support s high- speed USB 2.0 devices. EHCI cont rollers usually have eit her a UHCI or OHCI com panion cont roller t o handle slower devices.

USB OTG con t r olle r s: They are get t ing increasingly popular in em bedded m icrocont rollers. Wit h OTG support , each com m unicat ing end can act as a dual- role device ( DRD) . By init iat ing a dialog using t he Host Negot iat ion Prot ocol ( HNP) , a DRD can swit ch it self t o host m ode or device m ode based on t he desired funct ionalit y.

I n addit ion t o t hese m ainst ream USB host cont rollers, Linux support s a few m ore cont rollers. An exam ple is t he HCD for t he I SP116x chip. Host cont rollers have a built - in hardware com ponent called t he root hub. The root hub is a virt ual hub t hat sources USB port s. The port s, in t urn, can connect t o ext ernal or int ernal physical hubs and source m ore port s,

yielding a t ree t opology.

Tr a n sfe r Type s Dat a exchange wit h a USB device can be one of four t ypes:

Cont rol t ransfers, used t o carry configurat ion and cont rol inform at ion

Bulk t ransfers t hat ferry large quant it ies of t im e- insensit ive dat a

I nt errupt t ransfers t hat exchange sm all quant it ies of t im e- sensit ive dat a

I sochronous t ransfers for real- t im e dat a at predict able bit rat es

A USB st orage drive, for exam ple, uses cont rol t ransfers t o issue disk access com m ands and bulk t ransfers t o exchange dat a. A keyboard uses int errupt t ransfers t o carry key st rokes wit hin predict able delays. A device t hat needs t o st ream audio dat a in real t im e uses isochronous t ransfers. The responsibilit ies of t he four t ransfer t ypes for USB Bluet oot h devices are discussed in t he sect ion " Device Exam ple: USB Adapt er" in Chapt er 16, " Linux Wit hout Wires."

Addr e ssin g Each addressable unit in a USB device is called an endpoint . The address assigned t o an endpoint is called an endpoint address. Each endpoint address has an associat ed dat a t ransfer t ype. I f an endpoint is responsible for bulk dat a t ransfer, for exam ple, it 's called a bulk endpoint . Endpoint address 0 is used exclusively for device configurat ion. A cont rol pipe is at t ached t o t his endpoint for device enum erat ion ( see t he sect ion "Enum erat ion" ) . An endpoint can be associat ed wit h upst ream or downst ream dat a t ransfer. Dat a arriving upst ream from a device is called an IN t ransfer, whereas dat a flowing downst ream t o a device is an OUT t ransfer. IN and OUT t ransfers own separat e address spaces. So, you can have a bulk IN endpoint and a bulk OUT endpoint answering t o t he sam e address. USB resem bles I 2 C on som e count s and PCI on ot hers as sum m arized in Table 11.1. USB's device addressing is sim ilar t o I 2 C, while it support s hot plugging like PCI . USB device addresses, like st andard I 2 C, do not consum e a port ion of t he CPU's address space. Rat her, t hey reside in a privat e space ranging from 1 t o 127.

Ta ble 1 1 .1 . USB's Sim ila r it ie s w it h I 2 C a n d PCI

USB's sim ilarit ies wit h I 2 C:

USB and I 2 C are m ast er- slave prot ocols.

Device addresses reside in a privat e 7- bit space.

Device- resident m em ory is not m apped t o t he CPU's m em ory or I / O space, so it does not consum e CPU resources. USB's sim ilarit ies wit h PCI :

Devices can be hot plugged.

Device driver archit ect ure resem bles PCI drivers. Bot h classes of drivers own probe()/disconnect()[ 1 ] m et hods and I D t ables ident ifying t he devices t hey support .

Support s high speeds. Slower t han PCI , however. See Table 10.1 in Chapt er 10, " Peripheral Com ponent I nt erconnect ," for t he speeds support ed by different m em bers of t he PCI fam ily.

USB host cont rollers, like PCI cont rollers, usually have built - in DMA engines t hat can m ast er t he bus.

Support s m ult ifunct ion devices. USB support s int erface descript ors per funct ion. Each PCI device funct ion has it s own device I D and configurat ion space.

[ 1]

disconnect() is called remove() in PCI parlance.

Ch a pt e r 1 1 . Un ive r sa l Se r ia l Bu s I n Th is Ch a pt e r 312 USB Archit ect ure 317 Linux- USB Subsyst em 317 Driver Dat a St ruct ures 324 Enum erat ion 324 Device Exam ple: Telem et ry Card 338 Class Drivers 348 Gadget Drivers 349 Debugging 351 Looking at t he Sources

Universal serial bus ( USB) is t he de fact o ext ernal bus in t oday's com put ers. USB, wit h it s support for hot plugging, generic class drivers, and versat ile dat a- t ransfer m odes, is t he usual rout e in t he consum er elect ronics space t o bring a diverse spect rum of t echnologies t o com put er syst em s. I t s sweeping popularit y and t he accom panying econom ics of volum e have played a part in fueling t he adopt ion and accept ance of com put er peripheral t echnologies around t he world.

USB Ar ch it e ct u r e

USB is a m ast er- slave prot ocol where a host cont roller com m unicat es wit h client devices. Figure 11.1 shows USB in t he PC environm ent . The USB host cont roller is part of t he Sout h Bridge chipset and com m unicat es wit h t he processor over t he PCI bus.

Figu r e 1 1 .1 . USB in t h e PC e n vir on m e n t .

Figure 11.2 illust rat es USB on an em bedded device. The SoC in t he figure has built - in USB cont roller silicon t hat support s four buses and t hree m odes of operat ion:

Bus 1 runs in host m ode and is wired t o an A- t ype recept acle via a USB t ransceiver ( see t he sidebar "USB Recept acles and Transceivers" ) . You can connect a USB pen drive or a keyboard t o t his port .

Bus 2 also funct ions in host m ode but t he associat ed t ransceiver is connect ed t o an int ernal USB device rat her t han t o a recept acle. Exam ples of int ernal USB devices are biom et ric scanners, crypt ographic engines, print ers, Disk- On- Chips ( DOCs) , t ouch cont rollers, and t elem et ry cards.

Bus 3 runs in device m ode and is wired t o a B- t ype recept acle t hrough a t ransceiver. The B- t ype recept acle connect s t o a host com put er via a B- t o- A cable. I n t his m ode, t he em bedded device funct ions as, for exam ple, a USB pen drive, and export s a st orage part it ion t o t he out side world. Em bedded devices such as MP3 players and cell phones are m ore likely t han PC syst em s t o be at t he device side of USB, so m any em bedded SoCs support a USB device cont roller in addit ion t o a host cont roller.

Bus 4 is driven by an On- The- Go ( OTG) cont roller. You can use t his port , for exam ple, t o eit her connect a pen drive t o your syst em or t o t urn your syst em int o a pen drive and connect it t o a host . Unlike buses 1

t o 3, bus 4 uses an int elligent t ransceiver t hat exchanges cont rol inform at ion wit h t he processor over I 2 C. The t ransceiver is wired t o a Mini- AB OTG recept acle. I f t wo em bedded devices support OTG, t hey can direct ly com m unicat e wit hout t he int ervent ion of a host com put er.

Figu r e 1 1 .2 . USB on a n e m be dde d syst e m .

[ View full size im age]

Most of t his chapt er is writ t en from t he perspect ive of a syst em residing at t he host - side of USB. We briefly look at t he device funct ion in t he sect ion "Gadget Drivers." Mainst ream host cont roller drivers ( HCDs) are already available, so in t his chapt er we furt her confine ourselves t o drivers for USB devices ( also called client drivers) .

USB Recept acles and Transceivers USB host s use four- pin A- t ype rect angular recept acles, whereas USB devices connect via four- pin B- t ype square recept acles. I n bot h cases, t he four pins are different ial dat a signals D+ and D- , a volt age line VBUS, and ground. VBUS is used t o supply power from USB host s t o USB devices. VBUS is t hus pulled high on an A connect or but receives power on a B connect or. USB OTG cont rollers connect t o five- pin Mini- AB rect angular recept acles having a sm aller form fact or. Four of t he Mini- AB pins are ident ical t o what we discussed previously; t he fift h is an I D pin used t o det ect whet her t he connect ed peripheral is a host or a device. The sam e t ransceiver chip ( such as TUSB1105 from Texas I nst rum ent s) can be used on USB host s and devices. You m ay t hus choose t o use t he sam e t ransceiver part on buses 1 t hrough 3 in Figure 11.2. OTG requires a special- purpose t ransceiver chip ( such as I SP1301 from Philips Sem iconduct ors) , however.

Bu s Spe e ds USB support s t hree operat ional speeds. The original USB 1.0 specificat ion support s 1.5MBps, referred t o as lowspeed USB. USB 1.1, t he next version of t he specificat ion, handles 12MBps, called full- speed USB. The current level of t he specificat ion is USB 2.0, which support s 480MBps, or high- speed USB. USB 2.0 is backwardcom pat ible wit h t he earlier versions of t he specificat ion. Peripherals such as USB keyboards and m ice are exam ples of low- speed devices, and USB st orage drives are exam ples of full- speed and high- speed devices. Today's PC syst em s are USB 2.0- com pliant and allow all t hree t arget speeds, but som e em bedded cont rollers adhere t o USB 1.1 and support only full- speed and low- speed m odes of operat ion.

H ost Con t r olle r s USB host cont rollers conform t o one of a few st andards:

Un ive r sa l H ost Con t r olle r I n t e r fa ce ( UH CI ) : The UHCI specificat ion was init iat ed by I nt el, so your PC is likely t o have t his cont roller if it 's I nt el- based.

Ope n H ost Con t r olle r I n t e r fa ce ( OH CI ) : The OHCI specificat ion originat ed from com panies such as Com paq and Microsoft . An OHCI - com pat ible cont roller has m ore int elligence built in t o hardware t han UHCI , so an OHCI HCD is relat ively sim pler t han a UHCI HCD.

En h a n ce d H ost Con t r olle r I n t e r fa ce ( EH CI ) : This is t he host cont roller t hat support s high- speed USB 2.0 devices. EHCI cont rollers usually have eit her a UHCI or OHCI com panion cont roller t o handle slower devices.

USB OTG con t r olle r s: They are get t ing increasingly popular in em bedded m icrocont rollers. Wit h OTG support , each com m unicat ing end can act as a dual- role device ( DRD) . By init iat ing a dialog using t he Host Negot iat ion Prot ocol ( HNP) , a DRD can swit ch it self t o host m ode or device m ode based on t he desired funct ionalit y.

I n addit ion t o t hese m ainst ream USB host cont rollers, Linux support s a few m ore cont rollers. An exam ple is t he HCD for t he I SP116x chip. Host cont rollers have a built - in hardware com ponent called t he root hub. The root hub is a virt ual hub t hat sources USB port s. The port s, in t urn, can connect t o ext ernal or int ernal physical hubs and source m ore port s,

yielding a t ree t opology.

Tr a n sfe r Type s Dat a exchange wit h a USB device can be one of four t ypes:

Cont rol t ransfers, used t o carry configurat ion and cont rol inform at ion

Bulk t ransfers t hat ferry large quant it ies of t im e- insensit ive dat a

I nt errupt t ransfers t hat exchange sm all quant it ies of t im e- sensit ive dat a

I sochronous t ransfers for real- t im e dat a at predict able bit rat es

A USB st orage drive, for exam ple, uses cont rol t ransfers t o issue disk access com m ands and bulk t ransfers t o exchange dat a. A keyboard uses int errupt t ransfers t o carry key st rokes wit hin predict able delays. A device t hat needs t o st ream audio dat a in real t im e uses isochronous t ransfers. The responsibilit ies of t he four t ransfer t ypes for USB Bluet oot h devices are discussed in t he sect ion " Device Exam ple: USB Adapt er" in Chapt er 16, " Linux Wit hout Wires."

Addr e ssin g Each addressable unit in a USB device is called an endpoint . The address assigned t o an endpoint is called an endpoint address. Each endpoint address has an associat ed dat a t ransfer t ype. I f an endpoint is responsible for bulk dat a t ransfer, for exam ple, it 's called a bulk endpoint . Endpoint address 0 is used exclusively for device configurat ion. A cont rol pipe is at t ached t o t his endpoint for device enum erat ion ( see t he sect ion "Enum erat ion" ) . An endpoint can be associat ed wit h upst ream or downst ream dat a t ransfer. Dat a arriving upst ream from a device is called an IN t ransfer, whereas dat a flowing downst ream t o a device is an OUT t ransfer. IN and OUT t ransfers own separat e address spaces. So, you can have a bulk IN endpoint and a bulk OUT endpoint answering t o t he sam e address. USB resem bles I 2 C on som e count s and PCI on ot hers as sum m arized in Table 11.1. USB's device addressing is sim ilar t o I 2 C, while it support s hot plugging like PCI . USB device addresses, like st andard I 2 C, do not consum e a port ion of t he CPU's address space. Rat her, t hey reside in a privat e space ranging from 1 t o 127.

Ta ble 1 1 .1 . USB's Sim ila r it ie s w it h I 2 C a n d PCI

USB's sim ilarit ies wit h I 2 C:

USB and I 2 C are m ast er- slave prot ocols.

Device addresses reside in a privat e 7- bit space.

Device- resident m em ory is not m apped t o t he CPU's m em ory or I / O space, so it does not consum e CPU resources. USB's sim ilarit ies wit h PCI :

Devices can be hot plugged.

Device driver archit ect ure resem bles PCI drivers. Bot h classes of drivers own probe()/disconnect()[ 1 ] m et hods and I D t ables ident ifying t he devices t hey support .

Support s high speeds. Slower t han PCI , however. See Table 10.1 in Chapt er 10, " Peripheral Com ponent I nt erconnect ," for t he speeds support ed by different m em bers of t he PCI fam ily.

USB host cont rollers, like PCI cont rollers, usually have built - in DMA engines t hat can m ast er t he bus.

Support s m ult ifunct ion devices. USB support s int erface descript ors per funct ion. Each PCI device funct ion has it s own device I D and configurat ion space.

[ 1]

disconnect() is called remove() in PCI parlance.

Lin u x - USB Su bsyst e m Look at Figure 11.3 t o underst and t he archit ect ure of t he Linux- USB subsyst em . The const it uent pieces of t he subsyst em are as follows:

The USB core. Like t he core layers of driver subsyst em s t hat you saw in previous chapt ers, t he USB core is a code base consist ing of rout ines and st ruct ures available t o HCDs and client drivers. The core also provides a level of indirect ion t hat renders client drivers independent of host cont rollers.

HCDs t hat drive different host cont rollers.

A hub driver for t he root hub ( and physical hubs) and a helper kernel t hread khubd t hat m onit ors all port s connect ed t o t he hub. Det ect ing port st at us changes and configuring hot plugged devices is t im econsum ing and is best accom plished using a helper t hread for reasons you learned in Chapt er 3 , " Kernel Facilit ies." The khubd t hread is asleep by default . The hub driver wakes khubd whenever it det ect s a st at us change on a USB port connect ed t o it .

Device drivers for USB client devices.

The USB filesyst em usbfs t hat let s you drive USB devices from user space. We discuss user m ode USB drivers in Chapt er 19, " Drivers in User Space."

Figu r e 1 1 .3 . Th e Lin u x - USB su bsyst e m .

[ View full size im age]

For end- t o- end operat ion, t he USB subsyst em calls on various ot her kernel layers for assist ance. To support USB m ass st orage devices, for exam ple, t he USB subsyst em works in t andem wit h SCSI drivers, as shown in Figure 11.3. To drive USB- Bluet oot h keyboards, t he st akeholders are fourfold: t he USB subsyst em , t he Bluet oot h layer, t he input subsyst em , and t he t t y layer.

D r ive r D a t a St r u ct u r e s When you writ e a USB client driver, you have t o work wit h several dat a st ruct ures. Let 's look at t he im port ant ones.

Th e usb_device St r u ct u r e Each device driver subsyst em relies on a special- purpose dat a st ruct ure t o int ernally represent a device. The usb_device st ruct ure is t o t he USB subsyst em , what pci_dev is t o t he PCI layer, and what net_device is t o t he net work driver layer. usb_device is defined in include/ linux/ usb.h as follows: struct usb_device { /* ... */ enum usb_device_state state; /* Configured, Not Attached, etc */ enum usb_device_speed speed; /* High/full/low (or error) */ /* ... */ struct usb_device *parent; /* Our hub, unless we're the root */ /* ... */ struct usb_device_descriptor descriptor; /* Descriptor */ struct usb_host_config *config; /* All of the configs */ struct usb_host_config *actconfig; /* The active config */ /* ... */ int maxchild; /* No: of ports if hub */ struct usb_device *children[USB_MAXCHILDREN]; /* Child devices */ /* ... */ };

We use t his st ruct ure when we develop an exam ple driver for a USB t elem et ry card lat er.

USB Re qu e st Block s USB Request Block ( URB) is t he cent erpiece of t he USB dat a t ransfer m echanism . A URB is t o t he USB st ack, what an sk_buff ( discussed in Chapt er 15, " Net work I nt erface Cards" ) is t o t he net working st ack. Let 's t ake a peek inside a URB. The following definit ion is from include/ linux/ usb.h , om it t ing fields not of part icular int erest t o device drivers: Code View: struct urb { struct kref kref; /* ... */ struct usb_device *dev; unsigned int pipe; int status; unsigned int transfer_flags; void *transfer_buffer; dma_addr_t transfer_dma; int transfer_buffer_length;

/* Reference count of the URB */ /* (in) pointer to associated device */ /* (in) pipe information */ /* (return) non-ISO status */ /* (in) URB_SHORT_NOT_OK | ...*/ /* (in) associated data buffer */ /* (in) dma addr for transfer_buffer */ /* (in) data buffer length */

/* ... */ unsigned char *setup_packet; /* ... */ int interval; /* ... */ void *context; usb_complete_t complete; /* ... */

/* (in) setup packet */ /* (modify) transfer interval (INT/ISO) */ /* (in) context for completion */ /* (in) completion routine */

};

There are t hree st eps t o using a URB: creat e, populat e, and subm it . To creat e a URB, use usb_alloc_urb(). This funct ion allocat es and zeros- out URB m em ory, init ializes a kobj ect associat ed wit h t he URB, and init ializes a spinlock t o prot ect t he URB. To populat e a URB, use t he following helper rout ines offered by t he USB core: void usb_fill_[control|int|bulk]_urb( struct urb *urb, /* struct usb_device *usb_dev, /* unsigned int pipe, /* unsigned char *setup_packet, /* void *transfer_buffer, /* int buffer_length, /* usb_complete_t completion_fn, /* void *context, /* int interval); /*

URB pointer */ USB device structure */ Pipe encoding */ For Control URBs only! */ Buffer for I/O */ I/O buffer length */ Callback routine */ For use by completion_fn */ For Interrupt URBs only! */

The sem ant ics of t he previous rout ines will get clearer when we develop t he exam ple driver lat er on. These helper rout ines are available t o cont rol, int errupt , and bulk URBs but not t o isochronous ones. To subm it a URB for dat a t ransfer, use usb_submit_urb(). URB subm ission is asynchronous. The usb_fill_[control|int|bulk]_urb() funct ions list ed previously t ake t he address of a callback funct ion as argum ent . The callback rout ine execut es aft er t he URB subm ission com plet es and accom plishes t hings such as checking subm ission st at us and freeing t he dat a- t ransfer buffer. The USB core also offers wrapper int erfaces t hat provide a façade of synchronous URB subm ission: int usb_[control|interrupt|bulk]_msg(struct usb_device *usb_dev, unsigned int pipe, ...);

usb_bulk_msg (), for exam ple, builds a bulk URB, subm it s it , and blocks unt il t he operat ion com plet es. You don't have t o supply a callback funct ion because a generic com plet ion rout ine serves t hat purpose. You don't need t o explicit ly creat e and populat e t he URB eit her, because usb_bulk_msg() does t hat for you at no addit ional cost . We will use t his int erface in our exam ple driver. usb_free_urb() is used t o free a reference t o a com plet ed URB, whereas usb_unlink_urb() cancels a pending URB operat ion. As m ent ioned in t he sect ion " Sysfs, Kobj ect s, and Device Classes" in Chapt er 4 , " Laying t he Groundwork," a URB cont ains a kref obj ect t o t rack references t o it . usb_submit_urb() increm ent s t he reference count using kref_get(). usb_free_urb() decrem ent s t he reference count using kref_put() and perform s t he free

operat ion only if t here are no rem aining references. A URB is associat ed wit h an abst ract ion called a pipe, which we discuss next .

Pip e s A pipe is an int eger encoding of a com binat ion of t he following:

The endpoint address

The direct ion of dat a t ransfer (IN or OUT)

The t ype of dat a t ransfer ( cont rol, int errupt , bulk, or isochronous)

A pipe is t he address elem ent of each USB dat a t ransfer and is an im port ant field in t he URB st ruct ure. To help populat e t his field, t he USB core provides t he following helper m acros: usb_[rcv|snd][ctrl|int|bulk|isoc]pipe(struct usb_device *usb_dev, __u8 endpointAddress);

where usb_dev is a point er t o t he associat ed usb_device st ruct ure, and endpointAddress is t he assigned endpoint address bet ween 1 and 127. To creat e a bulk pipe in t he OUT direct ion, for exam ple, call usb_sndbulkpipe(). For a cont rol pipe in t he IN direct ion, invoke usb_rcvctrlpipe(). While referring t o a URB, it 's oft en qualified by t he t ransfer t ype of t he associat ed pipe. I f a URB is at t ached t o a bulk pipe, for exam ple, it 's called a bulk URB.

D e scr ipt or St r u ct u r e s The USB specificat ion defines a series of descr ipt or s t o hold inform at ion about a device. The Linux- USB core defines dat a st ruct ures corresponding t o each descript or. Descript ors are of four t ypes:

Device descript ors cont ain general inform at ion such as t he product I D and vendor I D of t he device. usb_device_descriptor is t he st ruct ure corresponding t o device descript ors.

Configurat ion descript ors are used t o describe different configurat ion m odes such as bus- powered and self- powered operat ion. usb_config_descriptor is t he dat a st ruct ure associat ed wit h configurat ion descript ors.

I nt erface descript ors allow USB devices t o support m ult iple funct ions. usb_interface_descriptor defines int erface descript ors.

Endpoint descript ors carry inform at ion associat ed wit h t he final endpoint s of a device. usb_endpoint_descriptor is t he st ruct ure in quest ion.

These descript or form at s are defined in Chapt er 9 of t he USB specificat ion, whereas t he m at ching st ruct ures are

defined in include/ linux/ usb/ ch9.h. List ing 11.1 shows t he hierarchical t opology of t he descript ors and print s all endpoint addresses associat ed wit h a USB device. To t his end, it t raverses t he t ree consist ing of t he four t ypes of descript ors described previously. The following is t he out put generat ed by List ing 11.1 for a USB CD drive: Endpoint Address = 1 Endpoint Address = 82 Endpoint Address = 83

The first address belongs t o a bulk IN endpoint , t he second address is owned by a bulk OUT endpoint , and t he t hird addresses an int errupt IN endpoint . There are m ore dat a st ruct ures associat ed wit h USB client drivers, such as usb_device_id, usb_driver, and usb_class_driver. We will m eet t hem when we do hands- on developm ent in t he sect ion " Device Exam ple: Telem et ry Card."

List in g 1 1 .1 . Pr in t All USB En dpoin t Addr e sse s on a D e vice

[ View full size im age]

En u m e r a t ion The life of a hot plugged USB device st art s wit h a process called enum erat ion by which t he host learns about t he device's capabilit ies and configures it . The hub driver is t he com ponent in t he Linux- USB subsyst em responsible for enum erat ion. Let 's look at t he sequence of st eps t hat achieve device enum erat ion when you plug in a USB pen drive int o a host com put er: 1.

The root hub report s a change in t he port 's current due t o t he device at t achm ent . The hub driver det ect s t his st at us change, called a USB_PORT_STAT_C_CONNECTION in Linux- USB t erm inology, and awakens khubd.

2.

Khubd deciphers t he ident it y of t he USB port subj ect ed t o t he st at us change. I n t his case, it 's t he port where you plugged in t he pen drive.

3.

Next , khubd chooses a device address bet ween 1 and 127 and assigns it t o t he pen drive's bulk endpoint using a cont rol URB at t ached t o endpoint 0.

4.

Khubd uses t he above cont rol URB at t ached t o endpoint 0 t o obt ain t he device descript or from t he pen drive. I t t hen request s t he device's configurat ion descript ors and select s a suit able one. I n t he case of t he pen drive, only a single configurat ion descript or is on offer.

5.

Khubd request s t he USB core t o bind a m at ching client driver t o t he insert ed device.

When enum erat ion is com plet e and t he device is bound t o a driver, khubd invokes t he associat ed client driver's probe() m et hod. I n t his case, khubd calls storage_probe() defined in drivers/ usb/ st orage/ usb.c. From t his point on, t he m ass st orage driver is responsible for norm al device operat ion.

D e vice Ex a m ple : Te le m e t r y Ca r d Now t hat you know t he basics of Linux- USB, it 's t im e t o look at an exam ple device. Consider a syst em equipped wit h a t elem et ry card connect ed t o t he processor via int ernal USB, as shown in bus 2 of Figure 11.2. The card acquires dat a from a rem ot e device and ferries it t o t he processor over USB. An exam ple t elem et ry card is a m edical- grade board t hat m onit ors or program s an im plant ed device. Let 's assum e t hat our exam ple t elem et ry card has t he following endpoint s having t he sem ant ics described in Table 11.2:

A cont rol endpoint at t ached t o an on- card configurat ion regist er

A bulk IN endpoint t hat passes rem ot e t elem et ry inform at ion collect ed by t he card t o t he processor

A bulk OUT endpoint t hat t ransfers dat a in t he reverse direct ion

Ta ble 1 1 .2 . Re gist e r Spa ce in t h e Te le m e t r y Ca r d Re gist e r

Associa t e d En dpoin t

Telem et ry Configurat ion Regist er

Cont rol endpoint 0 ( regist er offset 0xA) .

Telem et ry Dat a- I n Regist er

Bulk IN endpoint . The endpoint address is assigned during device enum erat ion.

Telem et ry Dat a- Out Regist er

Bulk OUT endpoint . The endpoint address is assigned during device enum erat ion.

Let 's build a m inim al driver for t his card part ly based on t he USB skelet on driver, drivers/ usb/ usb- skelet on.c. Because PCMCI A, PCI , and USB devices have sim ilar charact erist ics such as hot plug support , som e driver m et hods and dat a st ruct ures belonging t o t hese subsyst em s resem ble each ot her. This is especially t rue for t he port ions responsible for init ializing and probing. As we progress t hrough t he t elem et ry driver and not ice parallels wit h what we learned for PCI drivers in Chapt er 10, we will pause and t ake not e.

I n it ia lizin g a n d Pr obin g Like PCI and PCMCI A drivers, USB drivers have probe()/disconnect()[ 2] m et hods t o support hot plugging, and a t able t hat cont ains t he ident it y of devices t hey support . A USB device is ident ified by t he usb_device_id st ruct ure defined in include/ linux/ m od_devicet able.h . You m ay recall from t he previous chapt er t hat t he pci_device_id st ruct ure, also defined in t he sam e header file, ident ifies PCI devices.

[ 2]

disconnect() is called remove() in PCI and PCMCI A parlance.

struct usb_device_id { /* ... */ __u16 idVendor;

/* Vendor ID */

__u16 /* ... */ __u8 __u8 __u8 /* ... */ };

idProduct;

/* Device ID */

bDeviceClass; /* Device class */ bDeviceSubClass; /* Device subclass */ bDeviceProtocol; /* Device protocol */

idVendor and idProduct, respect ively, hold t he m anufact urer I D and product I D, whereas bDeviceClass, bDeviceSubClass, and bDeviceProtocol cat egorize t he device based on it s funct ionalit y. This classificat ion, det erm ined by t he USB specificat ion, allows im plem ent at ion of generic client drivers as discussed in t he sect ion "Class Drivers" lat er. List ing 11.2 im plem ent s t he t elem et ry driver's init ializat ion rout ine, usb_tele_init(), which calls on usb_register() t o regist er it s usb_driver st ruct ure wit h t he USB core. As shown in t he list ing, usb_driver t ies t he driver's probe() m et hod, disconnect() m et hod, and usb_device_id t able t oget her. usb_driver is sim ilar t o pci_driver, except t hat t he disconnect() m et hod in t he form er is nam ed remove() in t he lat t er. List in g 1 1 .2 . I n it ia lizin g t h e D r ive r Code View: #define USB_TELE_VENDOR_ID #define USB_TELE_PRODUCT_ID

0xABCD 0xCDEF

/* Manufacturer's Vendor ID */ /* Device's Product ID */

/* USB ID Table specifying the devices that this driver supports */ static struct usb_device_id tele_ids[] = { { USB_DEVICE(USB_TELE_VENDOR_ID, USB_TELE_PRODUCT_ID) }, { } /* Terminate */ }; MODULE_DEVICE_TABLE(usb, tele_ids); /* The usb_driver structure for this driver */ static struct usb_driver tele_driver { .name = "tele", /* Unique name */ .probe = tele_probe, /* See Listing 11.3 */ .disconnect = tele_disconnect, /* See Listing 11.3 */ .id_table = tele_ids, /* See above */ }; /* Module Initialization */ static int __init usb_tele_init(void) { /* Register with the USB core */ result = usb_register(&tele_driver); /* ... */ return 0; } /* Module Exit */ static void __exit usb_tele_exit(void) { /* Unregister from the USB core */

usb_deregister(&tele_driver); return; } module_init(usb_tele_init); module_exit(usb_tele_exit);

The USB_DEVICE() m acro creat es a usb_device_id from t he vendor and product I Ds supplied t o it . This is analogous t o t he PCI_DEVICE() m acro discussed in t he previous chapt er. The MODULE_DEVICE_TABLE() m acro m arks tele_ids in t he m odule im age so t hat t he m odule can be loaded on dem and if t he card is hot plugged. This is again sim ilar t o what we discussed for PCMCI A and PCI devices in t he previous t wo chapt ers. When t he USB core det ect s a device wit h propert ies m at ching t he ones declared in t he usb_device_id t able belonging t o a client driver, it invokes t he probe() m et hod regist ered by t hat driver. When t he device is unplugged or if t he m odule is unloaded, t he USB core invokes t he driver's disconnect() m et hod. List ing 11.3 im plem ent s t he probe() and disconnect() m et hods of t he t elem et ry driver. I t st art s by defining a device- specific st ruct ure, tele_device_t, which cont ains t he following fields:

A point er t o t he associat ed usb_device.

A point er t o t he usb_interface. Revisit List ing 11.1 t o see t his st ruct ure in use.

A cont rol URB ( ctrl_urb) t o com m unicat e wit h t he t elem et ry configurat ion regist er, and a ctrl_req t o form ulat e program m ing request s t o t his regist er. These fields are described in t he next sect ion "Accessing Regist ers."

The card has a bulk IN endpoint t hrough which you can glean t he collect ed t elem et ry inform at ion. Associat ed wit h t his endpoint are t hree fields: bulk_in_addr, which holds t he endpoint address; bulk_in_buf, which st ores received dat a; and bulk_in_len, which cont ains t he size of t he receive dat a buffer.

The card has a bulk OUT endpoint t o facilit at e downst ream dat a t ransfer. tele_device_t has a field called bulk_out_addr t o st ore t he address of t his endpoint . There are fewer dat a st ruct ures in t he OUT direct ion because in t his sim ple case we use a synchronous URB subm ission int erface t hat hides several im plem ent at ion det ails.

Khubd invokes t he card's probe() m et hod, tele_probe(), soon aft er enum erat ion. tele_probe() perform s t hree t asks:

1.

Allocat es m em ory for t he device- specific st ruct ure, tele_device_t.

2.

I nit ializes t he following fields in tele_device_t relat ed t o t he device's bulk endpoint s: bulk_in_buf, bulk_in_len, bulk_in_addr, and bulk_out_addr. For t his, it uses t he dat a collect ed by t he hub driver during enum erat ion. This dat a is available in descript or st ruct ures discussed in t he sect ion "Descript or St ruct ures."

3.

Export s t he charact er device / dev/ t ele t o user space. Applicat ions operat e over / dev/ t ele t o exchange dat a wit h t he t elem et ry card. tele_probe() invokes usb_register_dev() and supplies it t he file_operations t hat form t he underlying pillars of t he / dev/ t ele int erface via t he usb_class_driver st ruct ure.

The address of t he device- specific st ruct ure allocat ed in St ep 1 has t o be saved so t hat ot her m et hods can access it . To achieve t his, t he t elem et ry driver uses a t hreefold st rat egy depending on t he funct ion argum ent s available t o various driver rout ines. To save t his st ruct ure point er bet ween t he probe() and open() invocat ion t hreads, t he driver uses t he device's driver_data field via t he pair of funct ions, usb_set_intfdata() and usb_get_intfdata(). To save t he address of t he st ruct ure point er bet ween t he open() t hread and ot her ent ry point s, open() st ores it in t he / dev/ t ele's file->private_data field. This is because t he kernel supplies t hese char ent ry point s wit h / dev/ t ele's inode point er as argum ent rat her t han t he usb_interface point er. To glean t he address of t he device- specific st ruct ure from URB callback funct ions, t he associat ed subm ission t hreads use t he URB's context field as t he st orage area. Look at List ings 11.3, 11.4, and 11.5 t o see t hese m echanism s in act ion. All USB charact er devices answer t o m aj or num ber 180. I f you enable CONFIG_USB_DYNAMIC_MINORS during kernel configurat ion, t he USB core dynam ically select s a m inor num ber from t he available pool. This behavior is sim ilar t o regist ering m isc drivers aft er specifying MISC_DYNAMIC_MINOR in t he miscdevice st ruct ure ( as discussed in t he sect ion "Misc Drivers" in Chapt er 5 , " Charact er Drivers" ) . I f you choose not t o enable CONFIG_USB_DYNAMIC_MINORS, t he USB subsyst em select s an available m inor num ber st art ing at t he m inor base set in t he usb_class_driver st ruct ure. List in g 1 1 .3 . Pr obin g a n d D iscon n e ct in g Code View: /* Device-specific structure */ typedef struct { struct usb_device *usbdev; struct usb_interface *interface; struct urb *ctrl_urb; struct usb_ctrlrequest unsigned char size_t __u8 __u8 /* ... */

ctrl_req; *bulk_in_buf; bulk_in_len; bulk_in_addr; bulk_out_addr;

/* Device representation */ /* Interface representation*/ /* Control URB for register access */ /* Control request as per the spec */ /* Receive data buffer */ /* Receive buffer size */ /* IN endpoint address */ /* OUT endpoint address */ /* Locks, waitqueues, statistics.. */

} tele_device_t; #define TELE_MINOR_BASE 0xAB

/* Assigned by the Linux-USB subsystem maintainer */

/* Conventional char driver entry points. See Chapter 5, "Character Drivers." */ static struct file_operations tele_fops = { .owner = THIS_MODULE, /* Owner */

.read .write .ioctl .open .release };

= = = = =

tele_read, tele_write, tele_ioctl, tele_open, tele_release,

/* /* /* /* /*

Read method */ Write method */ Ioctl method */ Open method */ Close method */

static struct usb_class_driver tele_class = { .name = "tele", .fops = &tele_fops, /* Connect with /dev/tele */ .minor_base = TELE_MINOR_BASE, /* Minor number start */ }; /* The probe() method is invoked by khubd after device enumeration. The first argument, interface, contains information gleaned during the enumeration process. id is the entry in the driver's usb_device_id table that matches the values read from the telemetry card. tele_probe() is based on skel_probe() defined in drivers/usb/usb-skeleton.c */ static int tele_probe(struct usb_interface *interface, const struct usb_device_id *id) { struct usb_host_interface *iface_desc; struct usb_endpoint_descriptor *endpoint; tele_device_t *tele_device; int retval = -ENOMEM; /* Allocate the device-specific structure */ tele_device = kzalloc(sizeof(tele_device_t), GFP_KERNEL); /* Fill the usb_device and usb_interface */ tele_device->usbdev = usb_get_dev(interface_to_usbdev(interface)); tele_device->interface = interface; /* Set up endpoint information from the data gleaned during device enumeration */ iface_desc = interface->cur_altsetting; for (int i = 0; i < iface_desc->desc.bNumEndpoints; ++i) { endpoint = &iface_desc->endpoint[i].desc; if (!tele_device->bulk_in_addr && usb_endpoint_is_bulk_in(endpoint)) { /* Bulk IN endpoint */ tele_device->bulk_in_len = le16_to_cpu(endpoint->wMaxPacketSize); tele_device->bulk_in_addr = endpoint->bEndpointAddress; tele_device->bulk_in_buf = kmalloc(tele_device->bulk_in_len, GFP_KERNEL); } if (!tele_device->bulk_out_addr && usb_endpoint_is_bulk_out(endpoint)) { /* Bulk OUT endpoint */ tele_device->bulk_out_addr = endpoint->bEndpointAddress; } }

if (!(tele_device->bulk_in_addr && tele_device->bulk_out_addr)) { return retval; } /* Attach the device-specific structure to this interface. We will retrieve it from tele_open() */ usb_set_intfdata(interface, tele_device); /* Register the device */ retval = usb_register_dev(interface, &tele_class); if (retval) { usb_set_intfdata(interface, NULL); return retval; } printk("Telemetry device now attached to /dev/tele\n"); return 0; } /* Disconnect method. Called when the device is unplugged or when the module is unloaded */ static void tele_disconnect(struct usb_interface *interface) { tele_device_t *tele_device; /* ... */ /* Reverse of usb_set_intfdata() invoked from tele_probe() */ tele_device = usb_get_intfdata(interface); /* Zero out interface data */ usb_set_intfdata(interface, NULL); /* Release /dev/tele */ usb_deregister_dev(interface, &tele_class); /* NULL the interface. In the real world, protect this operation using locks */ tele_device->interface = NULL; /* ... */ }

Acce ssin g Re gist e r s The open() m et hod init ializes t he on- card t elem et ry configurat ion regist er when an applicat ion opens / dev / t ele. To set t he cont ent s of t his regist er, tele_open() subm it s a cont rol URB at t ached t o t he default endpoint 0. When you subm it a cont rol URB, you have t o supply an associat ed cont rol request . The st ruct ure t hat sends a cont rol request t o a USB device has t o conform t o Chapt er 9 of t he USB specificat ion and is defined as follows in include/ linux/ usb/ ch9.h : struct usb_ctrlrequest { __u8 bRequestType; __u8 bRequest;

__le16 wValue; __le16 wIndex; __le16 wLength; } __attribute__ ((packed));

Let 's t ake a look at t he com ponent s t hat m ake up a usb_ctrlrequest. The bRequest field ident ifies t he cont rol request . bRequestType qualifies t he request by encoding t he dat a t ransfer direct ion, t he request cat egory, and whet her t he recipient is a device, int erface, endpoint , or som et hing else. bRequest can eit her belong t o a set of st andard values or be vendor- defined. I n our exam ple, t he bRequest for writ ing t o t he t elem et ry configurat ion regist er is a vendor- defined one. wValue holds t he dat a t o be writ t en t o t he regist er, wIndex is t he desired offset int o t he regist er space, and wLength is t he num ber of byt es t o be t ransferred. List ing 11.4 im plem ent s tele_open(). I t s m ain t ask is t o program t he t elem et ry configurat ion regist er wit h appropriat e values. Before browsing t he list ing, revisit t he tele_device_t st ruct ure defined in List ing 11.3 focusing on t wo fields: ctrl_urb and ctrl_req. The form er is a cont rol URB for com m unicat ing wit h t he configurat ion regist er, whereas t he lat t er is t he associat ed usb_ctrlrequest. To program t he t elem et ry configurat ion regist er, tele_open() does t he following: 1.

Allocat es a cont rol URB t o prepare for t he regist er writ e.

2.

Creat es a usb_ctrlrequest and populat es it wit h t he request ident ifier, request t ype, regist er offset , and t he value t o be program m ed.

3.

Creat es a cont rol pipe at t ached t o endpoint 0 of t he t elem et ry card t o carry t he cont rol URB.

4.

Because tele_open() subm it s t he URB asynchronously, it needs t o wait for t he associat ed callback funct ion t o finish before ret urning t o it s caller. To t his end, tele_open() calls on t he kernel's com plet ion API for assist ance using init_completion(). St ep 7 calls t he m at ching wait_for_completion() t hat wait s unt il t he callback funct ion invokes complete(). We discussed t he com plet ion API in t he sect ion "Com plet ion I nt erface" in Chapt er 3 .

5.

I nit ializes fields in t he cont rol URB using usb_fill_control_urb(). This includes t he usb_ctrlrequest populat ed in St ep 2.

6.

Subm it s t he URB t o t he USB core using usb_submit_urb().

7.

Wait s unt il t he callback funct ion signals t hat t he regist er program m ing is com plet e.

8.

Ret urns t he st at us.

List in g 1 1 .4 . I n it ia lize t h e Te le m e t r y Con figu r a t ion Re gist e r Code View: /* Offset of the Telemetry configuration register within the on-card register space */ #define TELEMETRY_CONFIG_REG_OFFSET 0x0A /* Value to program in the configuration register */ #define TELEMETRY_CONFIG_REG_VALUE 0xBC /* The vendor-defined bRequest for programming the configuration register */

#define TELEMETRY_REQUEST_WRITE

0x0D

/* The vendor-defined bRequestType */ #define TELEMETRY_REQUEST_WRITE_REGISTER 0x0E /* Open method */ static int tele_open(struct inode *inode, struct file *file) { struct completion tele_config_done; tele_device_t *tele_device; void *tele_ctrl_context; char *tmp; __le16 tele_config_index = TELEMETRY_CONFIG_REG_OFFSET; unsigned int tele_ctrl_pipe; struct usb_interface *interface; /* Obtain the pointer to the device-specific structure. We saved it using usb_set_intfdata() in tele_probe() */ interface = usb_find_interface(&tele_driver, iminor(inode)); tele_device = usb_get_intfdata(interface); /* Allocate a URB for the control transfer */ tele_device->ctrl_urb = usb_alloc_urb(0, GFP_KERNEL); if (!tele_device->ctrl_urb) { return -EIO; } /* Populate the Control Request */ tele_device->ctrl_req.bRequestType = TELEMETRY_REQUEST_WRITE; tele_device->ctrl_req.bRequest = TELEMETRY_REQUEST_WRITE_REGISTER; tele_device->ctrl_req.wValue = cpu_to_le16(TELEMETRY_CONFIG_REG_VALUE); tele_device->ctrl_req.wIndex = cpu_to_le16p(&tele_config_index); tele_device->ctrl_req.wLength = cpu_to_le16(1); tele_device->ctrl_urb->transfer_buffer_length = 1; tmp = kmalloc(1, GFP_KERNEL); *tmp = TELEMETRY_CONFIG_REG_VALUE; /* Create a control pipe attached to endpoint 0 */ tele_ctrl_pipe = usb_sndctrlpipe(tele_device->usbdev, 0); /* Initialize the completion mechanism */ init_completion(&tele_config_done); /* Set URB context. The context is part of the URB that is passed to the callback function as an argument. In this case, the context is the completion structure, tele_config_done */ tele_ctrl_context = (void *)&tele_config_done; /* Initialize the fields in the control URB */ usb_fill_control_urb(tele_device->ctrl_urb, tele_device->usbdev, tele_ctrl_pipe, (char *) &tele_device->ctrl_req, tmp, 1, tele_ctrl_callback, tele_ctrl_context); /* Submit the URB */

usb_submit_urb(tele_device->ctrl_urb, GFP_ATOMIC); /* Wait until the callback returns indicating that the telemetry configuration register has been successfully initialized */ wait_for_completion(&tele_config_done); /* Release our reference to the URB */ usb_free_urb(urb); kfree(tmp); /* Save the device-specific object to the file's private_data so that you can directly retrieve it from other entry points such as tele_read() and tele_write() */ file->private_data = tele_device; /* Return the URB transfer status */ return(tele_device->ctrl_urb->status); }

/* Callback function */ static void tele_ctrl_callback(struct urb *urb) { complete((struct completion *)urb->context); }

You can render tele_open() sim pler using usb_control_msg(), a blocking version of usb_submit_urb() t hat int ernally hides synchronizat ion and callback det ails for cont rol URBs. We preferred t he asynchronous approach for learning purposes.

D a t a Tr a n sfe r List ing 11.5 im plem ent s t he read() and write() ent ry point s of t he t elem et ry driver. These m et hods perform t he real work when an applicat ion reads or writ es t o / dev / t ele. tele_read() perform s synchronous URB subm ission because t he calling process want s t o block unt il t elem et ry dat a is available. tele_write(), however, uses asynchronous subm ission and ret urns t o t he calling t hread wit hout wait ing for a confirm at ion t hat t he dat a accept ed by t he driver has been successfully t ransferred t o t he device. Because asynchronous t ransfers go hand in hand wit h a callback rout ine, List ing 11.5 im plem ent s tele_write_callback(). This rout ine exam ines urb->status t o decipher t he subm ission st at us. I t also frees t he t ransfer buffer allocat ed by tele_write(). List in g 1 1 .5 . D a t a Ex ch a n ge w it h t h e Te le m e t r y Ca r d Code View: /* Read entry point */ static ssize_t tele_read(struct file *file, char *buffer, size_t count, loff_t *ppos) {

int retval, bytes_read; tele_device_t *tele_device; /* Get the address of tele_device */ tele_device = (tele_device_t *)file->private_data; /* ... */ /* Synchronous read */ retval = usb_bulk_msg(tele_device->usbdev, usb_rcvbulkpipe(tele_device->usbdev, tele_device->bulk_in_addr), tele_device->bulk_in_buf, min(tele_device->bulk_in_len, count), &bytes_read, 5000);

/* usb_device */ /* /* /* /* /*

Pipe */ Read buffer */ Bytes to read */ Bytes read */ Timeout in 5 sec */

/* Copy telemetry data to user space */ if (!retval) { if (copy_to_user(buffer, tele_device->bulk_in_buf, bytes_read)) { return -EFAULT; } else { return bytes_read; } } return retval; } /* Write entry point */ static ssize_t tele_write(struct file *file, const char *buffer, size_t write_count, loff_t *ppos) { char *tele_buf = NULL; struct urb *urb = NULL; tele_device_t *tele_device; /* Get the address of tele_device */ tele_device = (tele_device_t *)file->private_data; /* ... */ /* Allocate a bulk URB */ urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) { return -ENOMEM; } /* Allocate a DMA-consistent transfer buffer and copy in data from user space. On return, tele_buf contains the buffer's CPU address, while urb->transfer_dma contains the DMA address */ tele_buf = usb_buffer_alloc(tele_dev->usbdev, write_count, GFP_KERNEL, &urb->transfer_dma); if (copy_from_user(tele_buf, buffer, write_count)) { usb_buffer_free(tele_device->usbdev, write_count, tele_buf, urb->transfer_dma); usb_free_urb(urb);

return -EFAULT } /* Populate bulk URB fields */ usb_fill_bulk_urb(urb, tele_device->usbdev, usb_sndbulkpipe(tele_device->usbdev, tele_device->bulk_out_addr), tele_buf, write_count, tele_write_callback, tele_device); /* urb->transfer_dma is valid, so preferably utilize that for data transfer */ urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP; /* Submit URB asynchronously */ usb_submit_urb(urb, GFP_KERNEL); /* Release URB reference */ usb_free_urb(urb); return(write_count); } /* Write callback */ static void tele_write_callback(struct urb *urb) { tele_device_t *tele_device; /* Get the address of tele_device */ tele_device = (tele_device_t *)urb->context; /* urb->status contains the submission status. It's 0 if successful. Resubmit the URB in case of errors other than -ENOENT, -ECONNRESET, and -ESHUTDOWN */ /* ... */ /* Free the transfer buffer. usb_buffer_free() is the release-counterpart of usb_buffer_alloc() called from tele_write() */ usb_buffer_free(urb->dev, urb->transfer_buffer_length, urb->transfer_buffer, urb->transfer_dma); }

Cla ss D r ive r s The USB specificat ion int roduces t he concept of device classes and describes t he funct ionalit y of each class driver. Exam ples of st andard device classes include m ass st orage, net working, hubs, serial convert ers, audio, video, im aging, m odem s, print ers, and hum an int erface devices ( HI Ds) . Class drivers are generic and let you plug and play a wide array of cards wit hout t he need for developing and inst alling drivers for every single device. The Linux- USB subsyst em includes support for m aj or class drivers. Each USB device has a class and a subclass code. The m ass st orage class ( 0x08) , for exam ple, support s subclasses such as com pact disc ( 0x02) , t ape (0x03) , and solid- st at e st orage ( 0x06) . As you saw previously, device drivers populat e t he usb_device_id st ruct ure wit h t he classes and subclasses t hey support . You can glean a device's class and subclass inform at ion by looking at t he " I : " lines in t he / proc/ bus/ usb/ devices out put . Let 's t ake a look at som e im port ant class drivers.

M a ss St or a ge I n USB parlance, m ass st orage refers t o USB hard disks, pen drives, CD- ROMs, floppy drives, and sim ilar st orage devices. USB m ass st orage devices adhere t o t he Sm all Com put er Syst em I nt erface ( SCSI ) prot ocol t o com m unicat e wit h host syst em s. Block access t o USB st orage devices is hence rout ed t hrough t he kernel's SCSI subsyst em . Figure 11.4 provides you an overview of t he int eract ion bet ween USB st orage and SCSI subsyst em s. As shown in t he figure, t he SCSI subsyst em is archit ect ed int o t hree layers:

1 . Top- level drivers for devices such as disks ( sd.c) and CD- ROMs ( sr.c)

2 . A m iddle- level layer t hat scans t he bus, configures devices, and glues t op- level drivers t o low- level drivers

3 . Low- level SCSI adapt er drivers

Figu r e 1 1 .4 . USB m a ss st or a ge a n d SCSI .

[ View full size im age]

The m ass st orage driver regist ers it self as a virt ual SCSI adapt er. The virt ual adapt er com m unicat es upst ream via SCSI com m ands and downst ream using URBs. A USB disk appears t o higher layers as a SCSI device at t ached t o t his virt ual adapt er. To bet t er underst and t he int eract ions bet ween t he USB and SCSI layers, let 's im plem ent a m odificat ion t o t he USB m ass st orage driver. The usbfs node / proc/ bus/ usb/ devices, cont ains t he propert ies and connect ion det ails of all USB devices at t ached t o t he syst em . The " T: " line in t he / proc/ bus/ usb/ devices out put , for exam ple, cont ains t he bus num ber, t he device's dept h from t he root hub, operat ional speed, and so on. The " P: " line cont ains t he vendor I D, product I D, and revision num ber of t he device. All t he inform at ion available in / proc/ bus/ usb/ devices is m anaged by t he USB subsyst em , but t here is one piece m issing t hat is under t he j urisdict ion of t he SCSI subsyst em . The / dev node nam e associat ed wit h t he USB st orage device (sd[ a- z] [ 1- 9]

for disks and sr[ 0- 9] for CD- ROMs) is not available in / proc/ bus/ usb/ devices. To overcom e t his lim it at ion, let 's add an " N: " line t hat displays t he / dev node nam e associat ed wit h t he device. List ing 11.6 shows t he necessary code changes in t he form of a source pat ch t o t he 2.6.23.1 kernel t ree.

List in g 1 1 .6 . Addin g a D isk 's / de v N a m e t o u sbfs Code View: include/scsi/scsi_host.h: struct Scsi_Host { /* ... */ void *shost_data; + char snam[8]; /* /dev node name for this disk */ /* ... */ };

drivers/usb/storage/usb.h: struct us_data { /* ... */ + char magic[4]; };

include/linux/usb.h: struct usb_interface { /* ... */ + void *private_data; };

drivers/usb/storage/usb.c: static int storage_probe(struct usb_interface *intf, const struct usb_device_id *id) { /* ... */ memset(us, 0, sizeof(struct us_data)); + intf->private_data = (void *) us; + strncpy(us->magic, "disk", 4); mutex_init(&(us->dev_mutex)); /* ... */ }

drivers/scsi/sd.c: static int sd_probe(struct device *dev) { /* ... */ add_disk(gd); + memset(sdp->host->snam,0, sizeof(sdp->host->snam)); + strncpy(sdp->host->snam, gd->disk_name, 3); sdev_printk(KERN_NOTICE, sdp, "Attached scsi %sdisk %s\n", sdp->removable ? "removable " : "", gd->disk_name); /* ... */ } drivers/scsi/sr.c: static int sr_probe(struct device *dev) {

/* ... */ add_disk(disk); + memset(sdev->host->snam,0, sizeof(sdev->host->snam)); + strncpy(sdev->host->snam, cd->cdi.name, 3); sdev_printk(KERN_DEBUG, sdev, "Attached scsi CD-ROM %s\n", cd->cdi.name); /* ... */ } drivers/usb/core/devices.c: /* ... */ #include + #include + #include "../storage/usb.h" static ssize_t usb_device_dump(char __user **buffer, size_t *nbytes, loff_t *skip_bytes, loff_t *file_offset, struct usb_device *usbdev, struct usb_bus *bus, int level, int index, int count) { /* ... */ ssize_t total_written = 0; + struct us_data *us_d; + struct Scsi_Host *s_h; /* ... */ data_end = pages_start + sprintf(pages_start, format_topo, bus->busnum, level, parent_devnum, index, count, usbdev->devnum, speed, usbdev->maxchild); + /* Assume this device supports only one interface */ + us_d = (struct us_data *) + (usbdev->actconfig->interface[0]->private_data); + + if ((us_d) && (!strncmp(us_d->magic, "disk", 4))) { + s_h = (struct Scsi_Host *) container_of((void *)us_d, + struct Scsi_Host, + hostdata); + data_end += sprintf(data_end, "N: "); + data_end += sprintf(data_end, "Device=%.100s",s_h->snam); + if (!strncmp(s_h->snam, "sr", 2)) { + data_end += sprintf(data_end, " (CDROM)\n"); + } else if (!strncmp(s_h->snam, "sd", 2)) { + data_end += sprintf(data_end, " (Disk)\n"); + } + } /* ... */ }

To underst and List ing 11.6, let 's first t race t he code flow, cont inuing from where we left off in t he sect ion "Enum erat ion." I n t hat sect ion, we insert ed a USB pen drive and followed t he execut ion t rain unt il t he invocat ion of storage_probe(), t he probe() m et hod of t he m ass st orage driver. Moving furt her:

1.

storage_probe() regist ers a virt ual SCSI adapt er by calling scsi_add_host(), supplying a privat e dat a st ruct ure called us_data as argum ent . scsi_add_host() ret urns a Scsi_Host st ruct ure for t his virt ual adapt er, wit h space at t he end for us_data.

2.

I t st art s a kernel t hread called usb- st orage t o handle all SCSI com m ands queued t o t he virt ual adapt er.

3.

I t schedules a kernel t hread called usb- st or- scan t hat request s t he SCSI m iddle- level layer t o scan t he bus for at t ached devices.

4.

The device scan init iat ed in St ep 3 discovers t he presence of t he insert ed pen drive and binds t he upperlevel SCSI disk driver ( sd.c) t o t he device. This result s in t he invocat ion of t he SCSI disk driver's probe m et hod, sd_probe().

5.

The sd driver allocat es a / dev/ sd* node t o t he disk. From t his point on, applicat ions use t his int erface t o access t he USB disk. The SCSI subsyst em queues disk com m ands t o t he virt ual adapt er, which t he usbst orage kernel t hread handles using appropriat e URBs.

Now t hat you know t he basics, let 's dissect List ing 11.6, looking at t he dat a st ruct ure addit ions first . The list ing adds a snam field t o t he Scsi_Host st ruct ure t o hold t he associat ed SCSI / dev nam e t hat we are int erest ed in. I t also adds a privat e field t o t he usb_interface st ruct ure t o associat e each USB int erface wit h it s us_data. Because us_data is relevant only for st orage devices, we need t o ensure t he validit y of t he privat e field of a USB int erface before accessing it as us_data. For t his, List ing 11.6 adds a m agic st ring, " disk," t o us_data. The usbfs m odificat ion in List ing 11.6 ( t o drivers/ usb/ core/ devices.c) pulls out t he us_data associat ed wit h each int erface via t he privat e dat a field of it s usb_interface. I t t hen lat ches on t o t he associat ed Scsi_Host using t he container_of() funct ion, because as you saw in St ep 1 previously, usb_data is glued t o t he end of t he corresponding Scsi_Host. As you furt her saw in St ep 5, Scsi_Host cont ains t he / dev node nam es t hat t he sd and sr drivers populat e. Usbfs st it ches t oget her an " N: " line using t his inform at ion. The following is t he / proc/ bus/ usb/ devices out put aft er int egrat ing t he changes in List ing 11.6 and at t aching a PNY USB pen drive, an Addonics CD- ROM drive, and a Seagat e hard disk t o a lapt op via a USB hub. The " N: " lines announce t he ident it y of t he / dev node corresponding t o each device: Code View: bash> cat /proc/bus/usb/devices ... T: Bus=04 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=480 MxCh= 0 N: Device=sda(Disk) D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=154b ProdID=0002 Rev= 1.00 S: Manufacturer=PNY S: Product=USB 2.0 FD S: SerialNumber=6E5C07005B4F C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr= 0mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usbstorage E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms T: N: D: P: S:

Bus=04 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#= 5 Spd=480 MxCh= 0 Device=sr0(CDROM) Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 Vendor=0bf6 ProdID=a002 Rev= 3.00 Manufacturer=Addonics

S: S: C:* I:* E: E: E: T: N: D: P: S: S: S: C:* I:* E: E: ...

Product=USB to IDE Cable SerialNumber=1301011002A9AFA9 #Ifs= 1 Cfg#= 2 Atr=c0 MxPwr= 98mA If#= 0 Alt= 0 #EPs= 3 Cls=08(stor.) Sub=06 Prot=50 Driver=usbstorage Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=125us Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Ad=83(I) Atr=03(Int.) MxPS= 2 Ivl=32ms Bus=04 Lev=02 Prnt=02 Port=02 Cnt=03 Dev#= 4 Spd=480 MxCh= 0 Device=sdb(Disk) Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 Vendor=0bc2 ProdID=0501 Rev= 0.01 Manufacturer=Seagate Product=USB Mass Storage SerialNumber=000000062459 #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usbstorage Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

As you can see, t he SCSI subsyst em has allot t ed sda t o t he pen drive, sr0 t o t he CD- ROM, and sdb t o t he hard disk. User- space applicat ions operat e on t hese nodes t o com m unicat e wit h t he respect ive devices. As you saw in Chapt er 4 , wit h t he arrival of udev, however, you have t he opt ion of creat ing higher- level abst ract ions t o ident ify each device wit hout relying on t he ident it y of t he / dev nam es allocat ed by t he SCSI subsyst em .

USB- Se r ia l USB- t o- serial convert ers bring serial port capabilit ies t o your com put er via USB. You can use a USB- t o- serial convert er, for exam ple, t o get a serial debug console from an em bedded device on a developm ent lapt op t hat has no serial port s. I n Chapt er 6 , " Serial Drivers," you learned t he benefit s of t he kernel's layered serial archit ect ure. Figure 11.5 illust rat es how t he USB- Serial layer fit s int o t he kernel's serial fram ework.

Figu r e 1 1 .5 . Th e USB- Se r ia l la ye r .

[ View full size im age]

A USB- serial driver is sim ilar t o ot her USB client drivers except t hat it avails t he services of a USB- Serial core in addit ion t o t he USB core. The USB- Serial core provides t he following:

A t t y driver t hat insulat es low- level USB- t o- serial convert er drivers from higher serial layers such as line disciplines.

Generic probe() and disconnect() rout ines t hat individual USB- serial drivers can leverage.

Device nodes t o access USB- serial port s from user space. Applicat ions operat e on USB- serial port s via / dev/ t t yUSBX, where X is t he serial port num ber. Term inal em ulat ors such as m inicom and prot ocols such as PPP run unchanged over t hese int erfaces.

A low- level USB- t o- serial convert er driver essent ially does t he following:

1 . Regist ers a usb_serial_driver st ruct ure wit h t he USB- Serial core using usb_serial_register(). The ent ry point s supplied as part of usb_serial_driver form t he crux of t he driver.

2 . Populat es a usb_driver st ruct ure and regist ers it wit h t he USB core using usb_register(). This is sim ilar t o what t he exam ple t elem et ry driver does, except t hat a serial convert er driver can count on t he generic probe() and disconnect() rout ines provided by t he USB- Serial core.

List ing 11.7 cont ains snippet s from t he FTDI driver ( drivers/ usb/ serial/ ft di_sio.c) t hat accom plish t hese t wo regist rat ions for USB- t o- serial convert ers based on FTDI chipset s. List in g 1 1 .7 . A Sn ippe t fr om t h e FTD I D r ive r Code View: /* The usb_driver structure */ static struct usb_driver ftdi_driver = { .name = "ftdi_sio", .probe = usb_serial_probe, .disconnect

=

.id_table

=

.no_dynamic_id =

/* Name */ /* Provided by the USB-Serial core */ usb_serial_disconnect,/* Provided by the USB-Serial core */ id_table_combined, /* List of supported devices built around the FTDI chip */ 1, /* Supported ids cannot be added dynamically */

}; /* The usb_serial_driver structure */ static struct usb_serial_driver ftdi_sio_device = { /* ... */ .num_ports = 1, .probe = ftdi_sio_probe, .port_probe = ftdi_sio_port_probe, .port_remove = ftdi_sio_port_remove, .open = ftdi_open, .close = ftdi_close, .throttle = ftdi_throttle, .unthrottle = ftdi_unthrottle, .write = ftdi_write, .write_room = ftdi_write_room, .chars_in_buffer = ftdi_chars_in_buffer, .read_bulk_callback = ftdi_read_bulk_callback, .write_bulk_callback = ftdi_write_bulk_callback, /* ... */ }; /* Driver Initialization */ static int __init ftdi_init(void) { /* ... */ /* Register with the USB-Serial core */ retval = usb_serial_register(&ftdi_sio_device); /* ... */ /* Register with the USB core */ retval = usb_register(&ftdi_driver); /* ... */ }

H u m a n I n t e r fa ce D e vice s Devices such as keyboards and m ice are called hum an int erface devices ( HI Ds) . Take a look at t he sect ion "USB and Bluet oot h Keyboards" in Chapt er 7 , " I nput Drivers," for a discussion on t he USB HI D class.

Blu e t oot h A USB- Bluet oot h dongle is a quick way t o Bluet oot h- enable your com put er so t hat it can com m unicat e wit h Bluet oot h- equipped devices such as cell phones, m ice, or handhelds. Chapt er 16 discusses t he USB Bluet oot h class.

Ga dge t D r ive r s I n a t ypical usage scenario, an em bedded device connect s t o a PC host over USB. Em bedded com put ers usually belong t o t he device side of USB, unlike PC syst em s t hat funct ion as USB host s. Because Linux runs on bot h em bedded and PC syst em s, it needs support t o run on eit her end of USB. The USB Gadget proj ect brings USB device m ode capabilit y t o em bedded Linux syst em s. Bus 3 of t he em bedded Linux device in Figure 11.2 can, for exam ple, use a gadget driver t o let t he device funct ion as a m ass st orage drive when connect ed t o a host com put er. Before proceeding, let 's briefly look at som e relat ed t erm inology. The USB cont roller at t he device side is variously called a device cont roller , peripheral cont roller, client cont roller , or funct ion cont roller . The t erm s gadget and gadget driver are com m only used rat her t han t he heavily overloaded words dev ice and dev ice driver. USB gadget support is now part of t he m ainline kernel and cont ains t he following:

Drivers for USB device cont rollers int egrat ed int o SoC fam ilies such as I nt el PXA, Texas I nst rum ent s OMAP, and At m el AT91. These drivers addit ionally provide a gadget API t hat gadget drivers can use.

Gadget drivers for device classes such as st orage, net working, and serial convert ers. These drivers answer t o t heir class when t hey receive enum erat ion request s from host - side soft ware. A st orage gadget driver, for exam ple, ident ifies it self as a class 0x08 ( m ass st orage class) device and export s a st orage part it ion t o t he host . You can specify t he associat ed block device node or filenam e via a m odule- insert ion param et er. Because t he export ed region has t o appear t o t he host as a m ass st orage device, t he gadget driver im plem ent s t he SCSI int eract ions required by t he USB m ass st orage prot ocol. Gadget drivers are also available for Et hernet and serial devices.

A skelet al gadget driver, drivers/ usb/ gadget / zero.c, t hat you m ay use t o t est device cont roller drivers.

Gadget drivers use t he services of t he gadget API provided by device cont roller drivers. They populat e a usb_gadget_driver st ruct ure and regist er it wit h t he kernel using usb_gadget_register_driver(). Hardware specifics are hidden inside t he gadget API im plem ent at ion offered by individual device cont roller drivers, so t he gadget drivers t hem selves are hardware independent . Docum ent at ion/ DocBook/ gadget .t m pl provides an overview of t he gadget API . Have a look at ht t p: / / linuxusb.org/ gadget / for m ore on t he gadget proj ect .

D e bu ggin g A USB bus analyzer m agnifies t he goings- on in t he bus and is useful for debugging low- level problem s. I f you can't get hold of an analyzer, you m ight be able t o m ake do wit h t he kernel's soft USB t racer, usbm on. This t ool capt ures t raffic bet ween USB host cont rollers and devices. To collect a t race, read from t he debugfs[ 3] file / sys/ kernel/ debug/ usbm on/ Xt , where X is t he bus num ber t o which your device is connect ed.

[ 3]

An in- m em ory filesyst em t o export kernel debug dat a t o user space.

For exam ple, consider a USB disk connect ed t o a PC. From t he associat ed " T: " line in / proc/ bus/ usb/ devices, you can see t hat t he drive is at t ached t o bus 1: T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 2 Spd=480 MxCh= 0

Ensure t hat you have enabled debugfs ( CONFIG_DEBUG_FS) and usbm on ( CONFIG_USB_MON) support in your kernel. This is a snapshot of usbm on out put while copying a file from t he disk: Code View: bash> mount -t debugfs none_debugs /sys/kernel/debug/ bash> cat /sys/kernel/debug/usbmon/1u ... ee6a5c40 3718782540 S Bi:1:002:1 -115 20480 < ee6a5cc0 3718782567 S Bi:1:002:1 -115 65536 < ee6a5d40 3718782595 S Bi:1:002:1 -115 36864 < ee6a5c40 3718788189 C Bi:1:002:1 0 20480 = 0f846801 118498f\ 15c60500 01680106 5e846801 608498fe 6f280087 68000000 ee6a5cc0 3718800994 C Bi:1:002:1 0 65536 = 118498fe 15c60500\ 01680106 5e846801 608498fe 6f280087 68000000 00884800 ee6a5d40 3718801001 C Bi:1:002:1 0 36864 = 13608498 fe4f4a01\ 00514a01 006f2800 87680000 00008848 00000100 b7f00100 ...

Each out put line st art s wit h t he URB address, followed by an event t im est am p. An S in t he next colum n indicat es URB subm ission, and a C announces a callback. The following field has t he form at URBType:Bus#:DeviceAddress:Endpoint#. I n t he preceding out put , a URBType of Bi st ands for a bulk URB in t he IN direct ion. Aft er t his, usbm on dum ps t he URB st at us, dat a lengt h, a dat a t ag ( = or < in t he preceding out put ) , and t he dat a words ( if t he t ag is =) . The last t hree lines in t he preceding out put are callbacks associat ed wit h bulk URBs subm it t ed in earlier lines. You can m at ch t he callbacks wit h t he relat ed subm issions using t he URB addresses. Docum ent at ion/ usb/ usbm on.t xt det ails usbm on synt ax and cont ains exam ple code t o parse t he out put int o hum an readable form . USB Support USB Verbose Debug Messages during kernel configurat ion, I f you t urn on Device Drivers t he kernel will em it t he cont ent s of all dev_dbg() st at em ent s present in t he USB subsyst em . You can glean device and bus specific inform at ion from t he USB filesyst em ( usbfs) node, / proc/ bus/ usb/ devices. And as we discuss in Chapt er 19, " Drivers in User Space," usbfs also let s you im plem ent USB device drivers in user space. Even when t he final dest inat ion of your USB driver is inside t he kernel, st art ing wit h a user- space driver can ease debugging and t est ing.

The linux- usb- devel m ailing list is t he forum t o discuss quest ions relat ed t o USB device drivers. Visit ht t ps: / / list s.sourceforge.net / list s/ list info/ linux- usb- devel for subscript ion and archive ret rieval inform at ion. Read www.linux- usb.org/ usbt est for ideas on USB t est ing. The hom e page of t he Linux- USB proj ect is www.linux- usb.org. You m ay download t he USB 2.0 specificat ion, OTG supplem ent , and ot her relat ed st andards from www.usb.org/ developers/ docs.

Look in g a t t h e Sou r ce s The USB core layer lives in drivers/ usb/ core/ . This direct ory also cont ains URB m anipulat ion rout ines and t he usbfs im plem ent at ion. The hub driver and khubd are part of drivers/ usb/ core/ hub.c. The drivers/ usb/ host / direct ory cont ains host cont roller device drivers. USB- relat ed header definit ions reside in include/ linux/ usb* .h. The usbm on t racer is in drivers/ usb/ m on/ . Look inside Docum ent at ion/ usb/ for Linux- USB docum ent at ion. USB class drivers st ay in various subdirect ories under drivers/ usb/ . The m ass st orage driver drivers/ usb/ st orage/ , in t andem wit h t he SCSI subsyst em dr iver s/ scsi/ , im plem ent s t he USB m ass st orage prot ocol. The drivers/ input / [ 4] direct ory t ree includes drivers for USB input devices such as keyboards and m ice; drivers/ usb/ serial/ has drivers for USB- t o- serial convert ers; drivers/ usb/ m edia/ support s USB m ult im edia devices; drivers/ net / usb/ [ 5] has drivers for USB Et hernet dongles; and drivers/ usb/ m isc/ cont ains drivers for m iscellaneous USB devices such as LEDs, LCDs, and fingerprint sensors. Look at drivers/ usb/ usb- skelet on.c for a st art ing point driver t em plat e if you can't zero in on a closer m at ch.

[ 4]

Before t he 2.6.22 kernel release, USB input device drivers used t o reside in drivers/ usb/ input / .

[ 5]

Before t he 2.6.22 kernel release, USB net work device drivers used t o reside in drivers/ usb/ net / .

The USB gadget subsyst em is in drivers/ usb/ gadget / . This direct ory cont ains USB device cont roller drivers, and gadget drivers for m ass st orage ( file_st or age.c) , serial convert ers ( serial.c) , and Et hernet net working (et her.c) . Table 11.3 cont ains t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he source t ree. Table 11.4 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 1 1 .3 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

urb

include/ linux/ usb.h

Cent erpiece of t he USB dat a t ransfer m echanism

pipe

include/ linux/ usb.h

Address elem ent of a URB

usb_device_descriptor usb_config_descriptor usb_interface_descriptor usb_endpoint_descriptor

include/ linux/ usb/ ch9.h

Descript ors t hat hold inform at ion about a USB device

usb_device

include/ linux/ usb.h

Represent at ion of a USB device

usb_device_id

include/ linux/ m od_devicet able.h I dent it y of a USB device

usb_driver

include/ linux/ usb.h

Represent at ion of a USB client driver

usb_gadget_driver

include/ linux/ usb_gadget .h

Represent at ion of a USB gadget driver

Ta ble 1 1 .4 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

usb_register()

include/ linux/ usb.h drivers/ usb/ core/ driver.c

Regist ers a usb_driver wit h t he USB core

usb_deregister()

drivers/ usb/ core/ driver.c

Unregist ers a usb_driver from t he USB core

usb_set_intfdata()

include/ linux/ usb.h

At t aches device- specific dat a t o a usb_interface

usb_get_intfdata()

include/ linux/ usb.h

Det aches device- specific dat a from a usb_interface

usb_register_dev()

drivers/ usb/ core/ file.c

Associat es a charact er int erface wit h a USB client driver

usb_deregister_dev()

drivers/ usb/ core/ file.c

Dissociat es a charact er int erface from a USB client driver

usb_alloc_urb()

drivers/ usb/ core/ urb.c

Allocat es a URB

usb_fill_[control|int|bulk]_urb()

include/ linux/ usb.h

Populat es a URB

usb_[control|interrupt|bulk]_msg()

drivers/ usb/ core/ m essage.c Wrappers for synchronous URB subm ission

usb_submit_urb()

drivers/ usb/ core/ urb.c

Subm it s a URB t o t he USB core

usb_free_urb()

drivers/ usb/ core/ urb.c

Frees references t o a com plet ed URB

usb_unlink_urb()

drivers/ usb/ core/ urb.c

Frees references t o a pending URB

usb_[rcv|snd][ctrl|int|bulk|isoc]pipe() include/ linux/ usb.h

Creat es a USB pipe

usb_find_interface()

drivers/ usb/ core/ usb.c

Get s t he usb_interface associat ed wit h a USB client driver

usb_buffer_alloc()

drivers/ usb/ core/ usb.c

Allocat es a consist ent DMA t ransfer buffer

usb_buffer_free()

drivers/ usb/ core/ usb.c

Frees a buffer t hat was allocat ed using usb_buffer_alloc()

usb_serial_register()

drivers/ usb/ serial/ usbserial.c

Regist ers a driver wit h t he USBSerial core

usb_serial_deregister()

drivers/ usb/ serial/ usbserial.c

Unregist ers a driver from t he USBSerial core

usb_gadget_register_driver()

Device cont roller drivers in drivers/ usb/ gadget /

Regist ers a gadget wit h a device cont roller driver

Ch a pt e r 1 2 . Vide o D r ive r s I n Th is Ch a pt e r 356 Display Archit ect ure 359 Linux- Video Subsyst em 361 Display Param et ers 362 The Fram e Buffer API 365 Fram e Buffer Drivers 380 Console Drivers 387 Debugging 388 Looking at t he Sources

Video hardware generat es visual out put for a com put er syst em t o display. I n t his chapt er, let 's find out how t he kernel support s video cont rollers and discover t he advant ages offered by t he fram e buffer abst ract ion. Let 's also learn t o writ e console drivers t hat display m essages em it t ed by t he kernel.

D ispla y Ar ch it e ct u r e Figure 12.1 shows t he display assem bly on a PC- com pat ible syst em . The graphics cont roller t hat is part of t he Nort h Bridge ( see t he sidebar " The Nort h Bridge" ) connect s t o different t ypes of display devices using several int erface st andards ( see t he sidebar "Video Cabling St andards" ) .

Figu r e 1 2 .1 . D ispla y con n e ct ion on a PC syst e m .

Video Graphics Array ( VGA) is t he original display st andard int roduced by I BM, but it 's m ore of a resolut ion specificat ion t oday. VGA refers t o a resolut ion of 640x480, whereas newer st andards such as eXt ended Graphics Array ( XGA) and Super Video Graphics Array ( SVGA) support higher resolut ions of 800x600 and 1024x768. Quart er VGA ( QVGA) panels having a resolut ion of 320x240 are com m on on em bedded devices, such as handhelds and sm art phones. Graphics cont rollers in t he x86 world com pat ible wit h VGA and it s derivat ives offer a charact er- based t ext m ode and a pixel- based graphics m ode. The non- x86 em bedded space is non- VGA, however, and has no concept of a dedicat ed t ext m ode.

The Nort h Bridge I n earlier chapt ers, you learned about peripheral buses such as LPC, I 2 C, PCMCI A, PCI , and USB, all of which are sourced from t he Sout h Bridge on PC- cent ric syst em s. Display archit ect ure, however, t akes us inside t he Nort h Bridge. A Nort h Bridge in t he I nt el- based PC archit ect ure is eit her a Graphics and Mem ory Cont roller Hub ( GMCH) or a Mem ory Cont roller Hub ( MCH) . The form er cont ains a m em ory cont roller, a Front Side Bus ( FSB) cont roller, and a graphics cont roller. The lat t er lacks an int egrat ed graphics cont roller but provides an Accelerat ed Graphics Port ( AGP) channel t o connect ext ernal graphics hardware. Consider, for exam ple, t he I nt el 855 GMCH Nort h Bridge chipset . The FSB cont roller in t he 855 GMCH int erfaces wit h Pent ium M processors. The m em ory cont roller support s Dual Dat a Rat e ( DDR) SDRAM m em ory chips. The int egrat ed graphics cont roller let s you connect t o display devices using analog VGA, LVDS, or DVI ( see t he sidebar " Video Cabling St andards" ) . The 855 GMCH enables you t o sim ult aneously send out put t o t wo displays, so you can, for exam ple, dispat ch sim ilar or separat e inform at ion t o your lapt op's LCD panel and an ext ernal CRT m onit or at t he sam e t im e. Recent Nort h Bridge chipset s, such as t he AMD 690G, include support for HDMI ( see t he following sidebar) in addit ion t o VGA and DVI .

Video Cabling St andards Several int erfacing st andards specify t he connect ion bet ween video cont rollers and display devices. Display devices and t he cabling t echnologies t hey use follow:

An analog display such as a cat hode ray t ube ( CRT) m onit or t hat has a st andard VGA connect or.

A digit al flat - panel display such as a lapt op Thin Film Transist or ( TFT) LCD t hat has a lowvolt age different ial signaling ( LVDS) connect or.

A display m onit or t hat com plies wit h t he Digit al Visual I nt erface ( DVI ) specificat ion. DVI is a st andard developed by t he Digit al Display Working Group ( DDWG) for carrying high- qualit y video. There are t hree DVI subclasses: digit al- only ( DVI - D) , analog- only ( DVI - A) , and digit al- and- analog ( DVI - I ) .

A display m onit or t hat com plies wit h t he High- Definit ion Television ( HDTV) specificat ion using t he High- Definit ion Mult im edia I nt erface ( HDMI ) . HDMI is a m odern digit al audio- video cable st andard t hat support s high dat a rat es. Unlike video- only st andards such as DVI , HDMI can carry bot h pict ure and sound.

Em bedded SoCs usually have an on- chip LCD cont roller, as shown in Figure 12.2. The out put em anat ing from t he LCD cont roller are TTL (Transist or- Transist or Logic) signals t hat pack 18 bit s of flat - panel video dat a, six each for t he t hree prim ary colors, red, green, and blue. Several handhelds and phones use QVGA- t ype int ernal LCD panels t hat direct ly receive t he TTL flat - panel video dat a sourced by LCD cont rollers.

Figu r e 1 2 .2 . D ispla y con n e ct ion on a n e m be dde d syst e m .

The em bedded device, as in Figure 12.3, support s dual display panels: an int ernal LVDS flat - panel LCD and an

ext ernal DVI m onit or. The int ernal TFT LCD t akes an LVDS connect or as input , so an LVDS t ransm it t er chip is used t o convert t he flat - panel signals t o LVDS. An exam ple of an LVDS t ransm it t er chip is DS90C363B from Nat ional Sem iconduct or. The ext ernal DVI m onit or t akes only a DVI connect or, so a DVI t ransm it t er is used t o convert t he 18- bit video signals t o DVI - D. An I 2 C int erface is provided so t hat t he device driver can configure t he DVI t ransm it t er regist ers. An exam ple of a DVI t ransm it t er chip is SiI 164 from Silicon I m age.

Figu r e 1 2 .3 . LVD S a n d D VI con n e ct ion s on a n e m be dde d syst e m .

[ View full size im age]

Ch a pt e r 1 2 . Vide o D r ive r s I n Th is Ch a pt e r 356 Display Archit ect ure 359 Linux- Video Subsyst em 361 Display Param et ers 362 The Fram e Buffer API 365 Fram e Buffer Drivers 380 Console Drivers 387 Debugging 388 Looking at t he Sources

Video hardware generat es visual out put for a com put er syst em t o display. I n t his chapt er, let 's find out how t he kernel support s video cont rollers and discover t he advant ages offered by t he fram e buffer abst ract ion. Let 's also learn t o writ e console drivers t hat display m essages em it t ed by t he kernel.

D ispla y Ar ch it e ct u r e Figure 12.1 shows t he display assem bly on a PC- com pat ible syst em . The graphics cont roller t hat is part of t he Nort h Bridge ( see t he sidebar " The Nort h Bridge" ) connect s t o different t ypes of display devices using several int erface st andards ( see t he sidebar "Video Cabling St andards" ) .

Figu r e 1 2 .1 . D ispla y con n e ct ion on a PC syst e m .

Video Graphics Array ( VGA) is t he original display st andard int roduced by I BM, but it 's m ore of a resolut ion specificat ion t oday. VGA refers t o a resolut ion of 640x480, whereas newer st andards such as eXt ended Graphics Array ( XGA) and Super Video Graphics Array ( SVGA) support higher resolut ions of 800x600 and 1024x768. Quart er VGA ( QVGA) panels having a resolut ion of 320x240 are com m on on em bedded devices, such as handhelds and sm art phones. Graphics cont rollers in t he x86 world com pat ible wit h VGA and it s derivat ives offer a charact er- based t ext m ode and a pixel- based graphics m ode. The non- x86 em bedded space is non- VGA, however, and has no concept of a dedicat ed t ext m ode.

The Nort h Bridge I n earlier chapt ers, you learned about peripheral buses such as LPC, I 2 C, PCMCI A, PCI , and USB, all of which are sourced from t he Sout h Bridge on PC- cent ric syst em s. Display archit ect ure, however, t akes us inside t he Nort h Bridge. A Nort h Bridge in t he I nt el- based PC archit ect ure is eit her a Graphics and Mem ory Cont roller Hub ( GMCH) or a Mem ory Cont roller Hub ( MCH) . The form er cont ains a m em ory cont roller, a Front Side Bus ( FSB) cont roller, and a graphics cont roller. The lat t er lacks an int egrat ed graphics cont roller but provides an Accelerat ed Graphics Port ( AGP) channel t o connect ext ernal graphics hardware. Consider, for exam ple, t he I nt el 855 GMCH Nort h Bridge chipset . The FSB cont roller in t he 855 GMCH int erfaces wit h Pent ium M processors. The m em ory cont roller support s Dual Dat a Rat e ( DDR) SDRAM m em ory chips. The int egrat ed graphics cont roller let s you connect t o display devices using analog VGA, LVDS, or DVI ( see t he sidebar " Video Cabling St andards" ) . The 855 GMCH enables you t o sim ult aneously send out put t o t wo displays, so you can, for exam ple, dispat ch sim ilar or separat e inform at ion t o your lapt op's LCD panel and an ext ernal CRT m onit or at t he sam e t im e. Recent Nort h Bridge chipset s, such as t he AMD 690G, include support for HDMI ( see t he following sidebar) in addit ion t o VGA and DVI .

Video Cabling St andards Several int erfacing st andards specify t he connect ion bet ween video cont rollers and display devices. Display devices and t he cabling t echnologies t hey use follow:

An analog display such as a cat hode ray t ube ( CRT) m onit or t hat has a st andard VGA connect or.

A digit al flat - panel display such as a lapt op Thin Film Transist or ( TFT) LCD t hat has a lowvolt age different ial signaling ( LVDS) connect or.

A display m onit or t hat com plies wit h t he Digit al Visual I nt erface ( DVI ) specificat ion. DVI is a st andard developed by t he Digit al Display Working Group ( DDWG) for carrying high- qualit y video. There are t hree DVI subclasses: digit al- only ( DVI - D) , analog- only ( DVI - A) , and digit al- and- analog ( DVI - I ) .

A display m onit or t hat com plies wit h t he High- Definit ion Television ( HDTV) specificat ion using t he High- Definit ion Mult im edia I nt erface ( HDMI ) . HDMI is a m odern digit al audio- video cable st andard t hat support s high dat a rat es. Unlike video- only st andards such as DVI , HDMI can carry bot h pict ure and sound.

Em bedded SoCs usually have an on- chip LCD cont roller, as shown in Figure 12.2. The out put em anat ing from t he LCD cont roller are TTL (Transist or- Transist or Logic) signals t hat pack 18 bit s of flat - panel video dat a, six each for t he t hree prim ary colors, red, green, and blue. Several handhelds and phones use QVGA- t ype int ernal LCD panels t hat direct ly receive t he TTL flat - panel video dat a sourced by LCD cont rollers.

Figu r e 1 2 .2 . D ispla y con n e ct ion on a n e m be dde d syst e m .

The em bedded device, as in Figure 12.3, support s dual display panels: an int ernal LVDS flat - panel LCD and an

ext ernal DVI m onit or. The int ernal TFT LCD t akes an LVDS connect or as input , so an LVDS t ransm it t er chip is used t o convert t he flat - panel signals t o LVDS. An exam ple of an LVDS t ransm it t er chip is DS90C363B from Nat ional Sem iconduct or. The ext ernal DVI m onit or t akes only a DVI connect or, so a DVI t ransm it t er is used t o convert t he 18- bit video signals t o DVI - D. An I 2 C int erface is provided so t hat t he device driver can configure t he DVI t ransm it t er regist ers. An exam ple of a DVI t ransm it t er chip is SiI 164 from Silicon I m age.

Figu r e 1 2 .3 . LVD S a n d D VI con n e ct ion s on a n e m be dde d syst e m .

[ View full size im age]

Lin u x - Vide o Su bsyst e m The concept of fram e buffers is cent ral t o video on Linux, so let 's first find out what t hat offers. Because video adapt ers can be based on different hardware archit ect ures, t he im plem ent at ion of higher kernel layers and applicat ions m ight need t o vary across video cards. This result s in nonuniform schem es t o handle different video cards. The ensuing nonport abilit y and ext ra code necessit at e great er invest m ent and m aint enance. The fram e buffer concept solves t his problem by describing a general abst ract ion and specifying a program m ing int erface t hat allows applicat ions and higher kernel layers t o be writ t en in a plat form - independent m anner. Figure 12.4 shows you t he fram e buffer advant age.

Figu r e 1 2 .4 . Th e fr a m e bu ffe r a dva n t a ge .

The kernel's fram e buffer int erface t hus allows applicat ions t o be independent of t he vagaries of t he underlying graphics hardware. Applicat ions run unchanged over diverse t ypes of video hardware if t hey and t he display drivers conform t o t he fram e buffer int erface. As you will soon find out , t he com m on fram e buffer program m ing int erface also brings hardware independence t o kernel layers, such as t he fram e buffer console driver.

Today, several applicat ions, such as web browsers and m ovie players, work direct ly over t he fram e buffer int erface. Such applicat ions can do graphics wit hout help from a windowing syst em . The X Windows server ( Xfbdev) is capable of working over t he fram e buffer int erface, as shown in Figure 12.5.

Figu r e 1 2 .5 . Lin u x - Vide o su bsyst e m .

[ View full size im age]

The Linux- Video subsyst em shown in Figure 12.5 is a collect ion of low- level display drivers, m iddle- level fram e buffer and console layers, a high- level virt ual t erm inal driver, user m ode drivers part of X Windows, and ut ilit ies t o configure display param et ers. Let 's t race t he figure t op down:

The X Windows GUI has t wo opt ions for operat ing over video cards. I t can use eit her a suit able built - in user- space driver for t he card or work over t he fram e buffer subsyst em .

Text m ode consoles funct ion over t he virt ual t erm inal charact er driver. Virt ual t erm inals, int roduced in t he sect ion "TTY Drivers" in Chapt er 6 , " Serial Drivers," are full- screen t ext - based t erm inals t hat you get when you logon in t ext m ode. Like X Windows, t ext consoles have t wo operat ional choices. They can eit her work over a card- specific console driver, or use t he generic fram e buffer console driver ( fbcon) if t he kernel support s a low- level fram e buffer driver for t he card in quest ion.

D ispla y Pa r a m e t e r s Som et im es, configuring t he propert ies associat ed wit h your display panel m ight be t he only driver changes t hat you need t o m ake t o enable video on your device, so let 's st art learning about video drivers by looking at com m on display param et ers. We will assum e t hat t he associat ed driver conform s t o t he fram e buffer int erface, and use t he fbset ut ilit y t o obt ain display charact erist ics: bash> fbset mode "1024x768-60" # D: 65.003 MHz, H: 48.365 kHz, V: 60.006 Hz geometry 1024 768 1024 768 8 timings 15384 168 16 30 2 136 6 hsync high vsync high rgba 8/0,8/0,8/0,0/0 endmode

The D: value in t he out put st ands for t he dot clock , which is t he speed at which t he video hardware draws pixels on t he display. The value of 65.003MHz in t he preceding out put m eans t hat it 'll t ake ( 1/ 65.003* 1000000) or about 15,384 picoseconds for t he video cont roller t o draw a single pixel. This durat ion is called t he pixclock and is shown as t he first num eric param et er in t he line st art ing wit h timings. The num bers against " geom et ry" announce t hat t he visible and virt ual resolut ions are 1024x768 ( SVGA) and t hat t he bit s required t o st ore inform at ion pert aining t o a pixel is 8. The H: value specifies t he horizont al scan rat e, which is t he num ber of horizont al display lines scanned by t he video hardware in one second. This is t he inverse of t he pixclock t im es t he X- resolut ion. The V: value is t he rat e at which t he ent ire display is refreshed. This is t he inverse of t he pixclock t im es t he visible X- resolut ion t im es t he visible Y- resolut ion, which is around 60Hz in t his exam ple. I n ot her words, t he LCD is refreshed 60 t im es in a second. Video cont rollers issue a horizont al sync ( HSYNC) pulse at t he end of each line and a vert ical sync ( VSYNC) pulse aft er each display fram e. The durat ions of HSYNC ( in t erm s of pixels) and VSYNC ( in t erm s of pixel lines) are shown as t he last t wo param et ers in t he line st art ing wit h "timings." The larger your display, t he bigger t he likely values of HSYNC and VSYNC. The four num bers before t he HSYNC durat ion in t he timings line announce t he lengt h of t he right display m argin ( or horizont al front porch) , left m argin ( or horizont al back porch) , lower m argin ( or vert ical front porch) , and upper m argin ( or vert ical back porch) , respect ively. Docum ent at ion/ fb/ fram ebuffer.t xt and t he m an page of fb.m odes pict orially show t hese param et ers.

To t ie t hese param et ers t oget her, let 's calculat e t he pixclock value for a given refresh rat e, which is 60.006Hz in our exam ple: dotclock = (X-resolution + left margin + right margin + HSYNC length) * (Y-resolution + upper margin + lower margin + VSYNC length) * refresh rate = (1024 + 168 + 16 + 136) * (768 + 30 + 2 + 6) * 60.006 = 65.003 MHz pixclock = 1/dotclock = 15384 picoseconds (which matches with the fbset output above)

Th e Fr a m e Bu ffe r API Let 's next wet our feet in t he fram e buffer API . The fram e buffer core layer export s device nodes t o user space so t hat applicat ions can access each support ed video device. / dev/ fbX is t he node associat ed wit h fram e buffer device X. The following are t he m ain dat a st ruct ures t hat int erest users of t he fram e buffer API . I nside t he kernel, t hey are defined in include/ linux/ fb.h, whereas in user land, t heir definit ions reside in / usr/ include/ linux/ fb.h:

1 . Variable inform at ion pert aining t o t he video card t hat you saw in t he fbset out put in t he previous sect ion is held in struct fb_var_screeninfo. This st ruct ure cont ains fields such as t he X- resolut ion, Y- resolut ion, bit s required t o hold a pixel, pixclock, HSYNC durat ion, VSYNC durat ion, and m argin lengt hs. These values are program m able by t he user: struct fb_var_screeninfo { __u32 xres; /* Visible resolution in the X axis */ __u32 yres; /* Visible resolution in the Y axis */ /* ... */ __u32 bits_per_pixel; /* Number of bits required to hold a pixel */ /* ... */ __u32 pixclock; /* Pixel clock in picoseconds */ __u32 left_margin; /* Time from sync to picture */ __u32 right_margin; /* Time from picture to sync */ /* ... */ __u32 hsync_len; /* Length of horizontal sync */ __u32 vsync_len; /* Length of vertical sync */ /* ... */ }; 2 . Fixed inform at ion about t he video hardware, such as t he st art address and size of fram e buffer m em ory, is held in struct fb_fix_screeninfo. These values cannot be alt ered by t he user: struct fb_fix_screeninfo { char id[16]; /* Identification string */ unsigned long smem_start; /* Start of frame buffer memory */ __u32 smem_len; /* Length of frame buffer memory */ /* ... */ }; 3 . The fb_cmap st ruct ure specifies t he color m ap, which is used t o convey t he user's definit ion of colors t o t he underlying video hardware. You can use t his st ruct ure t o define t he RGB ( Red, Green, Blue) rat io t hat you desire for different colors: struct fb_cmap { __u32 start; __u32 len; __u16 *red; __u16 *green; __u16 *blue; __u16 *transp; };

/* /* /* /* /* /*

First entry */ Number of entries */ Red values */ Green values */ Blue values */ Transparency. Discussed later on */

List ing 12.1 is a sim ple applicat ion t hat works over t he fram e buffer API . The program clears t he screen by operat ing on / dev/ fb0, t he fram e buffer device node corresponding t o t he display. I t first deciphers t he visible resolut ions and t he bit s per pixel in a hardware- independent m anner using t he fram e buffer API , FBIOGET_VSCREENINFO. This int erface com m and gleans t he display's variable param et ers by operat ing on t he fb_var_screeninfo st ruct ure. The program t hen goes on t o mmap() t he fram e buffer m em ory and clears each const it uent pixel bit .

List in g 1 2 .1 . Cle a r t h e D ispla y in a H a r dw a r e - I n de pe n de n t M a n n e r Code View: #include #include #include #include #include struct fb_var_screeninfo vinfo; int main(int argc, char *argv[]) { int fbfd, fbsize, i; unsigned char *fbbuf; /* Open video memory */ if ((fbfd = open("/dev/fb0", O_RDWR)) < 0) { exit(1); } /* Get variable display parameters */ if (ioctl(fbfd, FBIOGET_VSCREENINFO, &vinfo)) { printf("Bad vscreeninfo ioctl\n"); exit(2); } /* Size of frame buffer = (X-resolution * Y-resolution * bytes per pixel) */ fbsize = vinfo.xres*vinfo.yres*(vinfo.bits_per_pixel/8); /* Map video memory */ if ((fbbuf = mmap(0, fbsize, PROT_READ|PROT_WRITE, MAP_SHARED, fbfd, 0)) == (void *) -1){ exit(3); } /* Clear the screen */ for (i=0; idev); /* ... */ /* Obtain the associated resource defined while registering the corresponding platform_device */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); /* Get the kernel's sanction for using the I/O memory chunk starting from LCD_CONTROLLER_BASE and having a size of LCD_CONTROLLER_SIZE bytes */ res = request_mem_region(res->start, res->end - res->start + 1, pdev->name); /* Fill the fb_info structure with fixed (info->fix) and variable (info->var) values such as frame buffer length, xres, yres, bits_per_pixel, fbops, cmap, etc */ initialize_fb_info(info, pdev); /* Not expanded */ info->fbops = &myfb_ops; fb_alloc_cmap(&info->cmap, 16, 0); /* DMA-map the frame buffer memory coherently. info->screen_base holds the CPU address of the mapped buffer, info->fix.smem_start carries the associated hardware address */ info->screen_base = dma_alloc_coherent(0, info->fix.smem_len, (dma_addr_t *)&info->fix.smem_start, GFP_DMA | GFP_KERNEL); /* Set the information in info->var to the appropriate LCD controller registers */ myfb_set_par(info); /* Register with the frame buffer core */ register_framebuffer(info); return 0; } /* Platform driver's remove() routine */ static int myfb_remove(struct platform_device *pdev) { struct fb_info *info = platform_get_drvdata(pdev); struct resource *res;

/* Disable screen refresh, turn off DMA,.. */ myfb_disable_controller(info); /* Unregister frame buffer driver */ unregister_framebuffer(info); /* Deallocate color map */ fb_dealloc_cmap(&info->cmap); kfree(info->pseudo_palette); /* Reverse of framebuffer_alloc() */ framebuffer_release(info); /* Release memory region */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); release_mem_region(res->start, res->end - res->start + 1); platform_set_drvdata(pdev, NULL); return 0; } /* The platform driver structure */ static struct platform_driver myfb_driver = { .probe = myfb_probe, .remove = myfb_remove, .driver = { .name = "myfb", }, }; /* Module Initialization */ int __init myfb_init(void) { platform_device_add(&myfb_device); return platform_driver_register(&myfb_driver); } /* Module Exit */ void __exit myfb_exit(void) { platform_driver_unregister(&myfb_driver); platform_device_unregister(&myfb_device); } module_init(myfb_init); module_exit(myfb_exit);

Con sole D r ive r s A console is a device t hat displays printk() m essages generat ed by t he kernel. I f you look at Figure 12.5, you can see t hat console drivers lie in t wo t iers: a t op level const it ut ing drivers such as t he virt ual t erm inal driver, t he print er console driver, and t he exam ple USB_UART console driver ( discussed soon) , and bot t om - level drivers t hat are responsible for advanced operat ions. Consequent ly, t here are t wo m ain int erface definit ion st ruct ures used by console drivers. Top- level console drivers revolve around struct console, which defines basic operat ions such as setup() and write(). Bot t om - level drivers cent er on struct consw t hat specifies advanced operat ions such as set t ing cursor propert ies, console swit ching, blanking, resizing, and set t ing palet t e inform at ion. These st ruct ures are defined in include/ linux/ console.h as follows:

1 . struct console { char name[8]; void (*write)(struct console *, const char *, unsigned); int (*read)(struct console *, char *, unsigned); /* ... */ void (*unblank)(void); int (*setup)(struct console *, char *); /* ... */ };

2 . struct consw { struct module *owner; const char *(*con_startup)(void); void (*con_init)(struct vc_data *, int); void (*con_deinit)(struct vc_data *); void (*con_clear)(struct vc_data *, int, int, int, int); void (*con_putc)(struct vc_data *, int, int, int); void (*con_putcs)(struct vc_data *, const unsigned short *, int, int, int); void (*con_cursor)(struct vc_data *, int); int (*con_scroll)(struct vc_data *, int, int, int, int); /* ... */ }; As you m ight have guessed by looking at Figure 12.5, m ost console devices need bot h levels of drivers working in t andem . The vt driver is t he t op- level console driver in m any sit uat ions. On PC- com pat ible syst em s, t he VGA console driver (vgacon) is usually t he bot t om - level console driver; whereas on em bedded devices, t he fram e buffer console driver (fbcon) is oft en t he bot t om - level driver. Because of t he indirect ion offered by t he fram e buffer abst ract ion, fbcon, unlike ot her bot t om - level console drivers, is hardware- independent . Let 's briefly look at t he archit ect ure of bot h levels of console drivers:

The t op- level driver populat es a struct console wit h prescribed ent ry point s and regist ers it wit h t he kernel using register_console(). Unregist ering is accom plished using unregister_console(). This is t he driver t hat int eract s wit h printk(). The ent ry point s belonging t o t his driver call on t he services of t he associat ed bot t om - level console driver.

The bot t om - level console driver populat es a struct consw wit h specified ent ry point s and regist ers it wit h t he kernel using register_con_driver(). Unregist ering is done using unregister_con_driver(). When t he syst em support s m ult iple console drivers, t he driver m ight inst ead invoke take_over_console() t o regist er it self and t ake over t he exist ing console. give_up_console() accom plishes t he reverse. For convent ional displays, bot t om - level drivers int eract wit h t he t op- level vt console driver and t he vc_screen charact er driver t hat allows access t o virt ual console m em ory.

Som e sim ple consoles, such as line print ers and t he USB_UART discussed next , need only a t op- level console driver. The fbcon driver in t he 2.6 kernel also support s console rot at ion. Display panels on PDAs and cell phones are usually m ount ed in port rait orient at ion, whereas aut om ot ive dashboards and I P phones are exam ples of syst em s where t he display panel is likely t o be in landscape m ode. Som et im es, due t o econom ics or ot her fact ors, an em bedded device m ay require a landscape LCD t o be m ount ed in port rait m ode or vice versa. Console rot at ion support com es handy in such sit uat ions. Because fbcon is hardware- independent , t he console rot at ion im plem ent at ion is also generic. To enable console rot at ion, enable CONFIG_FRAMEBUFFER_CONSOLE_ROTATION during kernel configurat ion and add fbcon=rotate:X t o t he kernel com m and line, where X is 0 for norm al orient at ion, 1 for 90- degree rot at ion, 2 for 180- degree rot at ion, and 3 for 270- degree rot at ion.

D e vice Ex a m ple : Ce ll Ph on e Re visit e d To learn how t o writ e console drivers, let 's revisit t he Linux cell phone t hat we used in Chapt er 6 . Our t ask in t his sect ion is t o develop a console driver t hat operat es over t he USB_UARTs in t he cell phone. For convenience, Figure 12.7 reproduces t he cell phone from Figure 6.5 in Chapt er 6 . Let 's writ e a console driver t hat get s printk() m essages out of t he door via a USB_UART. The m essages are picked up by a PC host and displayed t o t he user via a t erm inal em ulat or session.

Figu r e 1 2 .7 . Con sole ove r USB_UART.

[ View full size im age]

List ing 12.3 develops t he console driver t hat works over t he USB_UARTs. The usb_uart_port[] st ruct ure and a

few definit ions used by t he USB_UART driver in Chapt er 6 are included in t his list ing, t oo, t o creat e a com plet e driver. Com m ent s associat ed wit h t he list ing explain t he driver's operat ion. Figure 12.7 shows t he posit ion of our exam ple USB_UART console driver wit hin t he Linux- Video subsyst em . As you can see, t he USB_UART is a sim ple device t hat needs only a t op- level console driver. List in g 1 2 .3 . Con sole ove r USB_UART Code View: #include #include #include #define USB_UART_PORTS

/* The cell phone has 2 USB_UART ports */ /* Each USB_UART has a 3-byte register set consisting of UU_STATUS_REGISTER at offset 0, UU_READ_DATA_REGISTER at offset 1, and UU_WRITE_DATA_REGISTER at offset 2, as shown in Table One of Chapter 6, "Serial Drivers" */ #define USB_UART1_BASE 0xe8000000 /* Memory base for USB_UART1 */ #define USB_UART2_BASE 0xe9000000 /* Memory base for USB_UART1 */ #define USB_UART_REGISTER_SPACE 0x3 /* Semantics of bits in the status register */ #define USB_UART_TX_FULL 0x20 #define USB_UART_RX_EMPTY 0x10 #define USB_UART_STATUS 0x0F #define #define #define #define

USB_UART1_IRQ USB_UART2_IRQ USB_UART_CLK_FREQ USB_UART_FIFO_SIZE

/* Parameters static struct { .mapbase .iotype .irq .uartclk .fifosize .flags .line }, { .mapbase .iotype .irq .uartclk .fifosize .flags .line } };

2

3 4 16000000 32

of each supported USB_UART port */ uart_port usb_uart_port[] = { = = = = = = =

(unsigned int)USB_UART1_BASE, UPIO_MEM, /* Memory mapped */ USB_UART1_IRQ, /* IRQ */ USB_UART_CLK_FREQ, /* Clock HZ */ USB_UART_FIFO_SIZE, /* Size of the FIFO */ UPF_BOOT_AUTOCONF, /* UART port flag */ 0, /* UART Line number */

= = = = = = =

(unsigned int)USB_UART2_BASE, UPIO_MEM, /* Memory mapped */ USB_UART2_IRQ, /* IRQ */ USB_UART_CLK_FREQ, /* CLock HZ */ USB_UART_FIFO_SIZE, /* Size of the FIFO */ UPF_BOOT_AUTOCONF, /* UART port flag */ 1, /* UART Line number */

/* Write a character to the USB_UART port */ static void usb_uart_putc(struct uart_port *port, unsigned char c) {

/* Wait until there is space in the TX FIFO of the USB_UART. Sense this by looking at the USB_UART_TX_FULL bit in the status register */ while (__raw_readb(port->membase) & USB_UART_TX_FULL); /* Write the character to the data port*/ __raw_writeb(c, (port->membase+1)); } /* Console write */ static void usb_uart_console_write(struct console *co, const char *s, u_int count) { int i; /* Write each character */ for (i = 0; i < count; i++, s++) { usb_uart_putc(&usb_uart_port[co->index], *s); } } /* Get communication parameters */ static void __init usb_uart_console_get_options(struct uart_port *port, int *baud, int *parity, int *bits) { /* Read the current settings (possibly set by a bootloader) or return default values for parity, number of data bits, and baud rate */ *parity = 'n'; *bits = 8; *baud = 115200; } /* Setup console communication parameters */ static int __init usb_uart_console_setup(struct console *co, char *options) { struct uart_port *port; int baud, bits, parity, flow; /* Validate port number and get a handle to the appropriate structure */ if (co->index == -1 || co->index >= USB_UART_PORTS) { co->index = 0; } port = &usb_uart_port[co->index]; /* Use functions offered by the serial layer to parse options */ if (options) { uart_parse_options(options, &baud, &parity, &bits, &flow); } else { usb_uart_console_get_options(port, &baud, &parity, &bits); } return uart_set_options(port, co, baud, parity, bits, flow); } /* Populate the console structure */

static struct console usb_uart_console = { .name = "ttyUU", /* Console name */ .write = usb_uart_console_write, /* How to printk to the console */ .device = uart_console_device, /* Provided by the serial core */ .setup = usb_uart_console_setup, /* How to setup the console */ .flags = CON_PRINTBUFFER, /* Default flag */ .index = -1, /* Init to invalid value */ };

/* Console Initialization */ static int __init usb_uart_console_init(void) { /* ... */ /* Register this console */ register_console(&usb_uart_console); return 0; } console_initcall(usb_uart_console_init); /* Mark console init */

Aft er t his driver has been built as part of t he kernel, you can act ivat e it by appending console=ttyUUX ( where X is 0 or 1) t o t he kernel com m and line.

Boot Logo A popular feat ure offered by t he fram e buffer subsyst em is t he boot logo. To display a logo, enable CONFIG_LOGO during kernel configurat ion and select an available logo. You m ay also add a cust om logo im age in t he drivers/ video/ logo/ direct ory. CLUT224 is a com m only used boot logo im age form at t hat support s 224 colors. The working of t his form at is sim ilar t o pseudo palet t es described in t he sect ion "Color Modes." A CLUT224 im age is a C file cont aining t wo st ruct ures:

A CLUT (Color Look Up Table) , which is a charact er array of 224 RGB t uples ( t hus having a size of 224* 3 byt es) . Each 3- byt e CLUT elem ent is a com binat ion of red, green, and blue colors.

A dat a array whose each byt e is an index int o t he CLUT. The indices st art at 32 and ext end unt il 255 ( t hus support ing 224 colors) . I ndex 32 refers t o t he first elem ent in t he CLUT. The logo m anipulat ion code ( in drivers/ video/ fbm em .c) creat es fram e buffer pixel dat a from t he CLUT t uple corresponding t o each index in t he dat a array. I m age display is accom plished using t he low- level fram e buffer driver's fb_imageblit() m et hod, as indicat ed in t he sect ion " Accelerat ed Met hods."

Ot her support ed logo form at s are t he 16- color vga16 and t he black- and- whit e m ono. Script s are available in t he t op- level script s/ direct ory t o convert st andard Port able Pixel Map ( PPM) files t o t he support ed logo form at s.

I f t he fram e buffer device is also t he console, boot m essages scroll under t he logo. You m ay prefer t o disable console m essages on product ion- level syst em s ( by adding console=/dev/null t o t he kernel com m and line) and display a cust om er- supplied CLUT224 " splash screen" im age as t he boot logo.

D e bu ggin g The virt ual fram e buffer driver, enabled by set t ing CONFIG_FB_VIRTUAL in t he configurat ion m enu, operat es over a pseudo graphics adapt er. You can use t his driver's assist ance t o debug t he fram e buffer subsyst em . Som e fram e buffer drivers, such as int elfb, offer an addit ional configurat ion opt ion t hat you m ay enable t o generat e driver- specific debug inform at ion. To discuss issues relat ed t o fram e buffer drivers, subscribe t o t he linux- fbdev- devel m ailing list , ht t ps: / / list s.sourceforge.net / list s/ list info/ linux- fbdev- devel/ . Debugging console drivers is not an easy j ob because you can't call printk() from inside t he driver. I f you have a spare console device such as a serial port , you can im plem ent a UART/ t t y form fact or of your console driver first ( as we did in Chapt er 6 for t he USB_UART device used in t his chapt er) and debug t hat driver by operat ing on / dev/ t t y and print ing m essages t o t he spare console. You can t hen repackage t he debugged code regions in t he form of a console driver.

Look in g a t t h e Sou r ce s The fram e buffer core layer and low- level fram e buffer drivers reside in t he drivers/ video/ direct ory. Generic fram e buffer st ruct ures are defined in include/ linux/ fb.h, whereas chipset - specific headers st ay inside include/ video/ . The fbm em driver, drivers/ video/ fbm em .c, creat es t he / dev/ fbX charact er devices and is t he front end for handling fram e buffer ioct l com m ands issued by user applicat ions. The int elfb driver, drivers/ video/ int elfb/ * , is t he low- level fram e buffer driver for several I nt el graphics cont rollers such as t he one int egrat ed wit h t he 855 GME Nort h Bridge. The radeonfb driver, drivers/ video/ at y/ * , is t he fram e buffer driver for Radeon Mobilit y AGP graphics hardware from ATI t echnologies. The source files, drivers/ video/ * fb.c, are all fram e buffer drivers for graphics cont rollers, including t hose int egrat ed int o several SoCs. You can use drivers/ video/ skelet onfb.c as t he st art ing point if you are writ ing a cust om low- level fram e buffer driver. Look at Docum ent at ion/ fb/ * for m ore docum ent at ion on t he fram e buffer layer. The hom e page of t he Linux fram e buffer proj ect is www.linux- fbdev.org. This websit e cont ains HOWTOs, links t o fram e buffer drivers and ut ilit ies, and point ers t o relat ed web pages. Console drivers, bot h fram e buffer- based and ot herwise, live inside drivers/ video/ console/ . To find out how printk() logs kernel m essages t o an int ernal buffer and calls console drivers, look at kernel/ print k.c. Table 12.2 cont ains t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he source t ree. Table 12.3 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er wit h t he locat ion of t heir definit ions.

Ta ble 1 2 .2 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

fb_info

include/ linux/ fb.h

Cent ral dat a st ruct ure used by low- level fram e buffer drivers

fb_ops

include/ linux/ fb.h

Cont ains addresses of all ent ry point s provided by low- level fram e buffer drivers

fb_var_screeninfo

include/ linux/ fb.h

Cont ains variable inform at ion pert aining t o video hardware such as t he Xresolut ion, Y- resolut ion, and HYSNC/VSYNC durat ions

fb_fix_screeninfo

include/ linux/ fb.h

Fixed inform at ion about video hardware such as t he st art address of t he fram e buffer

fb_cmap

include/ linux/ fb.h

The RGB color m ap for a fram e buffer dev ice

console

include/ linux/ console.h

Represent at ion of a t op- level console driver

consw

include/ linux/ console.h

Represent at ion of a bot t om - level console driver

Ta ble 1 2 .3 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

register_framebuffer()

drivers/ video/ fbm em .c

Regist ers a low- level fram e buffer device.

unregister_framebuffer() drivers/ video/ fbm em .c

Releases a low- level fram e buffer device.

framebuffer_alloc()

drivers/ video/ fbsysfs.c

Allocat es m em ory for an fb_info st ruct ure.

framebuffer_release()

drivers/ video/ fbsysfs.c

Reverse of framebuffer_alloc().

fb_alloc_cmap()

drivers/ video/ fbcm ap.c

Allocat es color m ap.

fb_dealloc_cmap()

drivers/ video/ fbcm ap.c

Frees color m ap.

dma_alloc_coherent()

include/ asm Allocat es and m aps a coherent DMA generic/ dm a- m apping.h buffer. See pci_alloc_consistent() in Chapt er 10.

dma_free_coherent()

include/ asm Frees a coherent DMA buffer. See generic/ dm a- m apping.h pci_free_consistent() in Chapt er 10.

register_console()

kernel/ print k.c

Regist ers a t op- level console driver.

unregister_console()

kernel/ print k.c

Unregist ers a t op- level console driver.

register_con_driver() take_over_console()

drivers/ char/ vt .c

Regist ers/ binds a bot t om - level console driver.

unregister_con_driver() give_up_console()

drivers/ char/ vt .c

Unregist ers/ unbinds a bot t om - level console driver.

Ch a pt e r 1 3 . Au dio D r ive r s I n Th is Ch a pt e r 392 Audio Archit ect ure 394 Linux- Sound Subsyst em 396 Device Exam ple: MP3 Player 412 Debugging 412 Looking at t he Sources

Audio hardware provides com put er syst em s t he capabilit y t o generat e and capt ure sound. Audio is an int egral com ponent in bot h t he PC and t he em bedded space, for chat t ing on a lapt op, m aking a call from a cell phone, list ening t o an MP3 player, st ream ing m ult im edia from a set - t op box, or announcing inst ruct ions on a m edical- grade syst em . I f you run Linux on any of t hese devices, you need t he services offered by t he Linux- Sound subsyst em . I n t his chapt er, let 's find out how t he kernel support s audio cont rollers and codecs. Let 's learn t he archit ect ure of t he Linux- Sound subsyst em and t he program m ing m odel t hat it export s.

Au dio Ar ch it e ct u r e Figure 13.1 shows audio connect ion on a PC- com pat ible syst em . The audio cont roller on t he Sout h Bridge, t oget her wit h an ext ernal codec, int erfaces wit h analog audio circuit ry.

Figu r e 1 3 .1 . Au dio in t h e PC e n vir on m e n t .

[ View full size im age]

An audio codec convert s digit al audio dat a t o analog sound signals for playing t hrough speakers and perform s t he reverse operat ion for recording t hrough a m icrophone. Ot her com m on audio input s and out put s t hat int erface wit h a codec include headset s, earphones, handset s, hands- free, line- in, and line- out . A codec also offers m ixer funct ionalit y t hat connect s it t o a com binat ion of t hese audio input s and out put s, and cont rols t he volum e gain of associat ed audio signals. [ 1]

[ 1] This definit ion of a m ixer is from a soft ware perspect ive. Sound m ixing or dat a m ixing refers t o t he capabilit y of som e codecs t o m ix m ult iple sound st ream s and generat e a single st ream . This is needed, for exam ple, if you want t o superim pose an announcem ent while a voice com m unicat ion is in progress on an I P phone. The alsa- lib library, discussed in t he lat t er part of t his chapt er, support s a plug- in feat ure called dm ix t hat perform s dat a m ixing in soft ware if your codec does not have t he capabilit y t o perform t his operat ion in hardware.

Digit al audio dat a is obt ained by sam pling analog audio signals at specific bit rat es using a t echnique called pulse code m odulat ion ( PCM) . CD qualit y is, for exam ple, sound sam pled at 44.1KHz, using 16 bit s t o hold each sam ple. A codec is responsible for recording audio by sam pling at support ed PCM bit rat es and for playing audio originally sam pled at different PCM bit rat es. A sound card m ay support one or m ore codecs. Each codec m ay, in t urn, support one or m ore audio subst ream s in m ono or st ereo. The Audio Codec'97 ( AC'97) and t he I nt er- I C Sound ( I 2 S) bus are exam ples of indust ry st andard int erfaces t hat connect audio cont rollers t o codecs:

The I nt el AC'97 specificat ion, downloadable from ht t p: / / download.int el.com / , specifies t he sem ant ics and locat ions of audio regist ers. Configurat ion regist ers are part of t he audio cont roller, while t he I / O regist er space is sit uat ed inside t he codec. Request s t o operat e on I / O regist ers are forwarded by t he audio cont roller t o t he codec over t he AC'97 link. The regist er t hat cont rols line- in volum e, for exam ple, lives at offset 0x10 wit hin t he AC'97 I / O space. The PC syst em in Figure 13.1 uses AC'97 t o com m unicat e wit h an ext ernal codec.

The I 2 S specificat ion, downloadable from www.nxp.com / acrobat _download/ various/ I 2SBUS.pdf, is a codec

int erface st andard developed by Philips. The em bedded device shown in Figure 13.2 uses I 2 S t o send audio dat a t o t he codec. Program m ing t he codec's I / O regist ers is done via t he I 2 C bus.

Figu r e 1 3 .2 . Au dio con n e ct ion on a n e m be dde d syst e m .

AC'97 has lim it at ions pert aining t o t he num ber of support ed channels and bit rat es. Recent Sout h Bridge chipset s from I nt el feat ure a new t echnology called High Definit ion ( HD) Audio t hat offers higher- qualit y, surround sound, and m ult ist ream ing capabilit ies.

Ch a pt e r 1 3 . Au dio D r ive r s I n Th is Ch a pt e r 392 Audio Archit ect ure 394 Linux- Sound Subsyst em 396 Device Exam ple: MP3 Player 412 Debugging 412 Looking at t he Sources

Audio hardware provides com put er syst em s t he capabilit y t o generat e and capt ure sound. Audio is an int egral com ponent in bot h t he PC and t he em bedded space, for chat t ing on a lapt op, m aking a call from a cell phone, list ening t o an MP3 player, st ream ing m ult im edia from a set - t op box, or announcing inst ruct ions on a m edical- grade syst em . I f you run Linux on any of t hese devices, you need t he services offered by t he Linux- Sound subsyst em . I n t his chapt er, let 's find out how t he kernel support s audio cont rollers and codecs. Let 's learn t he archit ect ure of t he Linux- Sound subsyst em and t he program m ing m odel t hat it export s.

Au dio Ar ch it e ct u r e Figure 13.1 shows audio connect ion on a PC- com pat ible syst em . The audio cont roller on t he Sout h Bridge, t oget her wit h an ext ernal codec, int erfaces wit h analog audio circuit ry.

Figu r e 1 3 .1 . Au dio in t h e PC e n vir on m e n t .

[ View full size im age]

An audio codec convert s digit al audio dat a t o analog sound signals for playing t hrough speakers and perform s t he reverse operat ion for recording t hrough a m icrophone. Ot her com m on audio input s and out put s t hat int erface wit h a codec include headset s, earphones, handset s, hands- free, line- in, and line- out . A codec also offers m ixer funct ionalit y t hat connect s it t o a com binat ion of t hese audio input s and out put s, and cont rols t he volum e gain of associat ed audio signals. [ 1]

[ 1] This definit ion of a m ixer is from a soft ware perspect ive. Sound m ixing or dat a m ixing refers t o t he capabilit y of som e codecs t o m ix m ult iple sound st ream s and generat e a single st ream . This is needed, for exam ple, if you want t o superim pose an announcem ent while a voice com m unicat ion is in progress on an I P phone. The alsa- lib library, discussed in t he lat t er part of t his chapt er, support s a plug- in feat ure called dm ix t hat perform s dat a m ixing in soft ware if your codec does not have t he capabilit y t o perform t his operat ion in hardware.

Digit al audio dat a is obt ained by sam pling analog audio signals at specific bit rat es using a t echnique called pulse code m odulat ion ( PCM) . CD qualit y is, for exam ple, sound sam pled at 44.1KHz, using 16 bit s t o hold each sam ple. A codec is responsible for recording audio by sam pling at support ed PCM bit rat es and for playing audio originally sam pled at different PCM bit rat es. A sound card m ay support one or m ore codecs. Each codec m ay, in t urn, support one or m ore audio subst ream s in m ono or st ereo. The Audio Codec'97 ( AC'97) and t he I nt er- I C Sound ( I 2 S) bus are exam ples of indust ry st andard int erfaces t hat connect audio cont rollers t o codecs:

The I nt el AC'97 specificat ion, downloadable from ht t p: / / download.int el.com / , specifies t he sem ant ics and locat ions of audio regist ers. Configurat ion regist ers are part of t he audio cont roller, while t he I / O regist er space is sit uat ed inside t he codec. Request s t o operat e on I / O regist ers are forwarded by t he audio cont roller t o t he codec over t he AC'97 link. The regist er t hat cont rols line- in volum e, for exam ple, lives at offset 0x10 wit hin t he AC'97 I / O space. The PC syst em in Figure 13.1 uses AC'97 t o com m unicat e wit h an ext ernal codec.

The I 2 S specificat ion, downloadable from www.nxp.com / acrobat _download/ various/ I 2SBUS.pdf, is a codec

int erface st andard developed by Philips. The em bedded device shown in Figure 13.2 uses I 2 S t o send audio dat a t o t he codec. Program m ing t he codec's I / O regist ers is done via t he I 2 C bus.

Figu r e 1 3 .2 . Au dio con n e ct ion on a n e m be dde d syst e m .

AC'97 has lim it at ions pert aining t o t he num ber of support ed channels and bit rat es. Recent Sout h Bridge chipset s from I nt el feat ure a new t echnology called High Definit ion ( HD) Audio t hat offers higher- qualit y, surround sound, and m ult ist ream ing capabilit ies.

Lin u x - Sou n d Su bsyst e m Advanced Linux Sound Archit ect ure ( ALSA) is t he sound subsyst em of choice in t he 2.6 kernel. Open Sound Syst em ( OSS) , t he sound layer in t he 2.4 kernel, is now obsolet e and depreciat ed. To help t he t ransit ion from OSS t o ALSA, t he lat t er provides OSS em ulat ion t hat allows applicat ions conform ing t o t he OSS API t o run unchanged over ALSA. Linux- Sound fram eworks such as ALSA and OSS render audio applicat ions independent of t he underlying hardware, j ust as codec st andards such as AC'97 and I 2 S do away wit h t he need of writ ing separat e audio drivers for each sound card. Take a look at Figure 13.3 t o underst and t he archit ect ure of t he Linux- Sound subsyst em . The const it uent pieces of t he subsyst em are as follows:

The sound core, which is a code base consist ing of rout ines and st ruct ures available t o ot her com ponent s of t he Linux- Sound layer. Like t he core layers belonging t o ot her driver subsyst em s, t he sound core provides a level of indirect ion t hat renders each com ponent in t he sound subsyst em independent of t he ot hers. The core also plays an im port ant role in export ing t he ALSA API t o higher applicat ions. The following / dev/ snd/ * device nodes shown in Figure 13.3 are creat ed and m anaged by t he ALSA core: / dev/ snd/ cont rolC0 is a cont rol node ( t hat applicat ions use for cont rolling volum e gain and such) , / dev/ snd/ pcm C0D0p is a playback device ( p at t he end of t he device nam e st ands for playback) , and / dev/ snd/ pcm C0D0c is a recording device ( c at t he end of t he device nam e st ands for capt ure) . I n t hese device nam es, t he int eger following C is t he card num ber, and t hat aft er D is t he device num ber. An ALSA driver for a card t hat has a voice codec for t elephony and a st ereo codec for m usic m ight export / dev/ snd/ pcm C0D0p t o read audio st ream s dest ined for t he form er and / dev/ snd/ pcm C0D1p t o channel m usic bound for t he lat t er.

Audio cont roller drivers specific t o cont roller hardware. To drive t he audio cont roller present in t he I nt el I CH Sout h Bridge chipset s, for exam ple, use t he snd_int el8x0 driver.

Audio codec int erfaces t hat assist com m unicat ion bet ween cont rollers and codecs. For AC'97 codecs, use t he snd_ac97_codec and ac97_bus m odules.

An OSS em ulat ion layer t hat act s as a conduit bet ween OSS- aware applicat ions and t he ALSA- enabled kernel. This layer export s / dev nodes t hat m irror what t he OSS layer offered in t he 2.4 kernels. These nodes, such as / dev/ dsp, / dev/ adsp, and / dev/ m ixer, allow OSS applicat ions t o run unchanged over ALSA. The OSS / dev/ dsp node m aps t o t he ALSA nodes / dev/ snd/ pcm C0D0* , / dev/ adsp corresponds t o / dev/ snd/ pcm C0D1* , and / dev/ m ixer associat es wit h / dev/ snd/ cont rolC0.

Procfs and sysfs int erface im plem ent at ions for accessing inform at ion via / proc/ asound/ and / sys/ class/ sound/ .

The user- space ALSA library, alsa- lib, which provides t he libasound.so obj ect . This library eases t he j ob of t he ALSA applicat ion program m er by offering several canned rout ines t o access ALSA drivers.

The alsa- ut ils package t hat includes ut ilit ies such as alsam ixer, am ixer , alsact l, and aplay. Use alsam ixer or am ixer t o change volum e levels of audio signals such as line- in, line- out , or m icrophone, and alsact l t o cont rol set t ings for ALSA drivers. To play audio over ALSA, use aplay.

Figu r e 1 3 .3 . Lin u x - Sou n d ( ALSA) su bsyst e m .

[ View full size im age]

To obt ain a bet t er underst anding of t he archit ect ure of t he Linux- Sound layer, let 's look at t he ALSA driver is used t o at t ach com m ent s) : m odules running on a lapt op in t andem wit h Figure 13.3 ( Code View: bash> lsmod|grep snd

snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm

33148 92000 3104 40512 16640 73316

0 1 1 0 1 3

snd_timer snd

22148 50820

1 6

soundcore snd_page_alloc

8960 10344

1 2

Audio Controller Driver Audio Codec Interface Audio Codec Bus OSS Emulation snd_pcm_oss OSS Volume Control snd_intel8x0,snd_ac97_codec,snd_pcm_oss Core layer snd_pcm Core layer snd_intel8x0,snd_ac97_codec,snd_pcm_oss, snd_mixer_oss,snd_pcm,snd_timer Core layer snd Core layer snd_intel8x0,snd_pcm Core layer snd_intel8x0 snd_ac97_codec

D e vice Ex a m ple : M P3 Pla ye r Figure 13.4 shows audio operat ion on an exam ple Linux Bluet oot h MP3 player built around an em bedded SoC. You can program t he Linux cell phone ( t hat we used in Chapt er 6 , " Serial Drivers," and Chapt er 12, " Video Drivers" ) t o download songs from t he I nt ernet at night when phone rat es are presum ably cheaper and upload it t o t he MP3 player's Com pact Flash ( CF) disk via Bluet oot h so t hat you can list en t o t he songs next day during office com m ut e.

Figu r e 1 3 .4 . Au dio on a Lin u x M P3 pla ye r .

[ View full size im age]

Our t ask is t o develop t he audio soft ware for t his device. An applicat ion on t he player reads songs from t he CF disk and decodes it int o syst em m em ory. A kernel ALSA driver gat hers t he m usic dat a from syst em m em ory and dispat ches it t o t ransm it buffers t hat are part of t he SoC's audio cont roller. This PCM dat a is forwarded t o t he codec, which plays t he m usic t hrough t he device's speaker. As in t he case of t he navigat ion syst em discussed in t he preceding chapt er, we will assum e t hat Linux support s t his SoC, and t hat all archit ect ure- dependent services such as DMA are support ed by t he kernel. The audio soft ware for t he MP3 player t hus consist s of t wo part s:

1 . A user program t hat decodes MP3 files reads from t he CF disk and convert s it int o raw PCM. To writ e a nat ive ALSA decoder applicat ion, you can leverage t he helper rout ines offered by t he alsa- lib library. The sect ion "ALSA Program m ing" looks at how ALSA applicat ions int eract wit h ALSA drivers. You also have t he opt ion of cust om izing public dom ain MP3 players such as m adplay (ht t p: / / sourceforge.net / proj ect s/ m ad/ ) t o suit t his device.

2 . A low- level kernel ALSA audio driver. The following sect ion is devot ed t o writ ing t his driver.

One possible hardware im plem ent at ion of t he device shown in Figure 13.4 is by using a PowerPC 405LP SoC and a Texas I nst rum ent s TLV320 audio codec. The CPU core in t hat case is t he 405 processor and t he on- chip audio cont roller is t he Codec Serial I nt erface ( CSI ) . SoCs com m only have a highperform ance int ernal local bus t hat connect s t o cont rollers, such as DRAM and video, and a separat e onchip peripheral bus t o int erface wit h low- speed peripherals such as serial port s, I 2 C, and GPI O. I n t he case of t he 405LP, t he form er is called t he Processor Local Bus ( PLB) and t he lat t er is known as t he Onchip Peripheral Bus ( OPB) . The PCMCI A/ CF cont roller hangs off t he PLB, whereas t he audio cont roller int erface connect s t o t he OPB.

An audio driver is built out of t hree m ain ingredient s:

1 . Rout ines t hat handle playback

2 . Rout ines t hat handle capt ure

3 . Mixer cont rol funct ions

Our driver im plem ent s playback, but does not support recording because t he MP3 player in t he exam ple has no m icrophone. The driver also sim plifies t he m ixer funct ion. Rat her t han offering t he full com plim ent of volum e cont rols, such as speaker, earphone, and line- out , it allows only a single generic volum e cont rol. The regist er layout of t he MP3 player's audio hardware shown in Table 13.1 m irrors t hese assum pt ions and sim plificat ions, and does not conform t o st andards such as AC'97 alluded t o earlier. So, t he codec has a SAMPLING_RATE_REGISTER t o configure t he playback ( digit al- t o- analog) sam pling rat e but no regist ers t o set t he capt ure ( analog- t o- digit al) rat e. The VOLUME_REGISTER configures a single global volum e.

Ta ble 1 3 .1 . Re gist e r La you t of t h e Au dio H a r dw a r e in Figu r e 1 3 .4 Re gist e r N a m e

D e scr ipt ion

VOLUME_REGISTER

Cont rols t he codec's global volum e.

SAMPLING_RATE_REGISTER

Set s t he codec's sam pling rat e for digit al- t o- analog conversion.

CLOCK_INPUT_REGISTER

Configures t he codec's clock source, divisors, and so on.

CONTROL_REGISTER

Enables int errupt s, configures int errupt cause ( such as com plet ion of a buffer t ransfer) , reset s hardware, enables/ disables bus operat ion, and so on.

STATUS_REGISTER

St at us of codec audio event s.

DMA_ADDRESS_REGISTER

The exam ple hardware support s a single DMA buffer descript or. Real- world cards m ay support m ult iple descript ors and m ay have addit ional regist ers t o hold

Re gist e r N a m e

D e scr ipt ion descript ors and m ay have addit ional regist ers t o hold param et ers such as t he descript or t hat is current ly being processed, t he posit ion of t he current sam ple inside t he buffer, and so on. DMA is perform ed t o t he buffers in t he audio cont roller, so t his regist er resides in t he cont roller's m em ory space.

DMA_SIZE_REGISTER

Holds t he size of t he DMA t ransfer t o/ from t he SoC. This regist er also resides inside t he audio cont roller.

List ing 13.1 is a skelet al ALSA audio driver for t he MP3 player and liberally em ploys pseudo code ( wit hin com m ent s) t o cut out ext raneous det ail. ALSA is a sophist icat ed fram ework, and conform ing audio drivers are usually several t housand lines long. List ing 13.1 get s you st art ed only on your audio driver explorat ions. Cont inue your learning by falling back t o t he m ight y Linux- Sound sources inside t he t op- level sound/ direct ory.

D r ive r M e t h ods a n d St r u ct u r e s Our exam ple driver is im plem ent ed as a plat form driver. Let 's t ake a look at t he st eps perform ed by t he plat form driver's probe() m et hod, mycard_audio_probe(). We will digress a bit under each st ep t o explain relat ed concept s and im port ant dat a st ruct ures t hat we encount er, and t his will t ake us t o ot her part s of t he driver and help t ie t hings t oget her. mycard_audio_probe()does t he following: 1.

Creat es an inst ance of a sound card by invoking snd_card_new():

struct snd_card *card = snd_card_new(-1, id[dev->id], THIS_MODULE, 0); The first argum ent t o snd_card_new() is t he card index ( t hat ident ifies t his card am ong m ult iple sound cards in t he syst em ) , t he second argum ent is t he I D t hat 'll be st ored in t he id field of t he ret urned snd_card st ruct ure, t he t hird argum ent is t he owner m odule, and t he last argum ent is t he size of a privat e dat a field t hat 'll be m ade available via t he private_data field of t he ret urned snd_card st ruct ure ( usually t o st ore chip- specific dat a such as int errupt levels and I / O addresses) . snd_card represent s t he creat ed sound card and is defined as follows in include/ sound/ core.h :

struct snd_card { int number; /* char id[16]; /* /* ... */ struct module *module; /* void *private_data; /* /* ... */ struct list_head controls; /* struct device *dev; /* /* ... */ };

Card index */ Card ID */ Owner module */ Private data */

All controls for this card */ Device assigned to this card*/

The remove() count erpart of t he probe m et hod, mycard_audio_remove(), releases t he snd_card from t he ALSA fram ework using snd_card_free(). 2.

Creat es a PCM playback inst ance and associat es it wit h t he card creat ed in St ep 1, using snd_pcm_new():

int snd_pcm_new(struct snd_card *card, char *id, int device, int playback_count, int capture_count, struct snd_pcm **pcm); The argum ent s are, respect ively, t he sound card inst ance creat ed in St ep 1, an ident ifier st ring, t he device index, t he num ber of support ed playback st ream s, t he num ber of support ed capt ure st ream s ( 0 in our exam ple) , and a point er t o st ore t he allocat ed PCM inst ance. The allocat ed PCM inst ance is defined as follows in include/ sound/ pcm .h:

Code View: struct snd_pcm { struct snd_card *card; /* Associated snd_card */ /* ... */ struct snd_pcm_str streams[2]; /* Playback and capture streams of this PCM component. Each stream may support substreams if your h/w supports it */ /* ... */ struct device *dev; /* Associated hardware device */ };

The snd_device_new() rout ine lies at t he core of snd_pcm_new() and ot her sim ilar com ponent inst ant iat ion funct ions. snd_device_new() t ies a com ponent and a set of operat ions wit h t he associat ed snd_card ( see St ep 3) . 3.

Connect s playback operat ions wit h t he PCM inst ance creat ed in St ep 2, by calling snd_pcm_set_ops(). The snd_pcm_ops st ruct ure specifies t hese operat ions for t ransferring PCM audio t o t he codec. List ing 13.1 accom plishes t his as follows:

Code View: /* Operators for the PCM playback stream */ static struct snd_pcm_ops mycard_playback_ops = { .open = mycard_pb_open, /* Open */ .close = mycard_pb_close, /* Close */ .ioctl = snd_pcm_lib_ioctl, /* Use to handle special commands, else specify the generic ioctl handler snd_pcm_lib_ioctl()*/ .hw_params = mycard_hw_params, /* Called when higher layers set hardware parameters such as audio format. DMA buffer allocation is also done from here */ .hw_free = mycard_hw_free, /* Free resources allocated in mycard_hw_params() */ .prepare = mycard_pb_prepare, /* Prepare to transfer the audio stream. Set audio format such as S16_LE (explained soon), enable interrupts,.. */ .trigger = mycard_pb_trigger, /* Called when the PCM engine starts, stops, or pauses. The second argument specifies why it was called. This

function cannot go to sleep */ }; /* Connect the operations with the PCM instance */ snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &mycard_playback_ops);

I n List ing 13.1, mycard_pb_prepare() configures t he sam pling rat e int o t he SAMPLING_RATE_REGISTER, clock source int o t he CLOCKING_INPUT_REGISTER, and t ransm it com plet e int errupt enablem ent int o t he CONTROL_REGISTER. The trigger() m et hod, mycard_pb_trigger(), m aps an audio buffer populat ed by t he ALSA fram ework on- t he- fly using dma_map_single(). ( We discussed st ream ing DMA in Chapt er 10, " Peripheral Com ponent I nt erconnect ." ) The m apped DMA buffer address is program m ed int o t he DMA_ADDRESS_REGISTER. This regist er is part of t he audio cont roller in t he SoC, unlike t he earlier regist ers t hat reside inside t he codec. The audio cont roller forwards t he DMA'ed dat a t o t he codec for playback. Anot her relat ed obj ect is t he snd_pcm_hardware st ruct ure, which announces t he PCM com ponent 's hardware capabilit ies. For our exam ple device, t his is defined in List ing 13.1 as follows:

Code View: /* Hardware capabilities of the PCM playback stream */ static struct snd_pcm_hardware mycard_playback_stereo = { .info = (SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_PAUSE | SNDRV_PCM_INFO_RESUME); /* mmap() is supported. The stream has pause/resume capabilities */ .formats = SNDRV_PCM_FMTBIT_S16_LE,/* Signed 16 bits per channel, little endian */ .rates = SNDRV_PCM_RATE_8000_48000,/* DAC Sampling rate range */ .rate_min = 8000, /* Minimum sampling rate */ .rate_max = 48000, /* Maximum sampling rate */ .channels_min = 2, /* Supports a left and a right channel */ .channels_max = 2, /* Supports a left and a right channel */ .buffer_bytes_max = 32768, /* Max buffer size */ };

This obj ect is t ied wit h t he associat ed snd_pcm from t he open() operat or, mycard_playback_open(), using t he PCM runt im e inst ance. Each open PCM st ream has a runt im e obj ect called snd_pcm_runtime t hat cont ains all inform at ion needed t o m anage t hat st ream . This is a gigant ic st ruct ure of soft ware and hardware configurat ions defined in include/ sound/ pcm .h and cont ains snd_pcm_hardware as one of it s com ponent fields. 4.

Preallocat es buffers using snd_pcm_lib_preallocate_pages_for_all(). DMA buffers are subsequent ly obt ained from t his preallocat ed area by mycard_hw_params() using snd_pcm_lib_malloc_pages() and st ored in t he PCM runt im e inst ance alluded t o in St ep 3. mycard_pb_trigger() DMA- m aps t his buffer while st art ing a PCM operat ion and unm aps it while st opping t he PCM operat ion.

5.

Associat es a m ixer cont rol elem ent wit h t he sound card using snd_ctl_add() for global volum e cont rol:

snd_ctl_add(card, snd_ctl_new1(&mycard_playback_vol, &myctl_private)); snd_ctl_new1() t akes an snd_kcontrol_new st ruct ure as it s first argum ent and ret urns a point er t o an snd_kcontrol st ruct ure. List ing 13.1 defines t his as follows:

static struct snd_kcontrol_new mycard_playback_vol = { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, /* Ctrl element is of type MIXER */ .name = "MP3 volume", /* Name */ .index = 0, /* Codec No: 0 */ .info = mycard_pb_vol_info, /* Volume info */ .get = mycard_pb_vol_get, /* Get volume */ .put = mycard_pb_vol_put, /* Set volume */ }; The snd_kcontrol st ruct ure describes a cont rol elem ent . Our driver uses it as a knob for general volum e cont rol. snd_ctl_add() regist ers an snd_kcontrol elem ent wit h t he ALSA fram ework. The const it uent cont rol m et hods are invoked when user applicat ions such as alsam ixer are execut ed. I n List ing 13.1, t he snd_kcontrol put() m et hod, mycard_playback_volume_put(), writ es request ed volum e set t ings t o t he codec's VOLUME_REGISTER. 6.

And finally, regist ers t he sound card wit h t he ALSA fram ework:

snd_card_register(card);

codec_write_reg() ( used, but left unim plem ent ed in List ing 13.1) writ es values t o codec regist ers by com m unicat ing over t he bus t hat connect s t he audio cont roller in t he SoC t o t he ext ernal codec. I f t he underlying bus prot ocol is I 2 C or SPI , for exam ple, codec_write_reg() uses t he int erface funct ions discussed in Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol." I f you want t o creat e a / proc int erface in your driver for dum ping regist ers during debug or t o export a param et er during norm al operat ion, use t he services of snd_card_proc_new() and friends. List ing 13.1 does not use / proc int erface files. I f you build and load t he driver m odule in List ing 13.1, you will see t wo new device nodes appearing on t he MP3 player: / dev/ snd/ pcm C0D0p and / dev/ snd/ cont rolC0. The form er is t he int erface for audio playback, and t he lat t er is t he int erface for m ixer cont rol. The MP3 decoder applicat ion, wit h t he help of alsa- lib, st ream s m usic by operat ing over t hese device nodes. List in g 1 3 .1 . ALSA D r ive r for t h e Lin u x M P3 Pla ye r Code View: include #include #include #include #include #include #include /* Playback rates supported by the codec */ static unsigned int mycard_rates[] = { 8000, 48000, }; /* Hardware constraints for the playback channel */ static struct snd_pcm_hw_constraint_list mycard_playback_rates = { .count = ARRAY_SIZE(mycard_rates), .list = mycard_rates,

.mask = 0, }; static struct platform_device *mycard_device; static char *id[SNDRV_CARDS] = SNDRV_DEFAULT_STR; /* Hardware capabilities of the PCM stream */ static struct snd_pcm_hardware mycard_playback_stereo = { .info = (SNDRV_PCM_INFO_MMAP | SNDRV_PCM_INFO_BLOCK_TRANSFER), .formats = SNDRV_PCM_FMTBIT_S16_LE, /* 16 bits per channel, little endian */ .rates = SNDRV_PCM_RATE_8000_48000, /* DAC Sampling rate range */ .rate_min = 8000, /* Minimum sampling rate */ .rate_max = 48000, /* Maximum sampling rate */ .channels_min = 2, /* Supports a left and a right channel */ .channels_max = 2, /* Supports a left and a right channel */ .buffer_bytes_max = 32768, /* Maximum buffer size */ }; /* Open the device in playback mode */ static int mycard_pb_open(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; /* Initialize driver structures */ /* ... */ /* Initialize codec registers */ /* ... */ /* Associate the hardware capabilities of this PCM component */ runtime->hw = mycard_playback_stereo; /* Inform the ALSA framework about the constraints that the codec has. For example, in this case, it supports PCM sampling rates of 8000Hz and 48000Hz only */ snd_pcm_hw_constraint_list(runtime, 0, SNDRV_PCM_HW_PARAM_RATE, &mycard_playback_rates); return 0; } /* Close */ static int mycard_pb_close(struct snd_pcm_substream *substream) { /* Disable the codec, stop DMA, free data structures */ /* ... */ return 0; } /* Write to codec registers by communicating over the bus that connects the SoC to the codec */ void codec_write_reg(uint codec_register, uint value) { /* ... */ } /* Prepare to transfer an audio stream to the codec */ static int mycard_pb_prepare(struct snd_pcm_substream *substream)

{ /* Enable Transmit DMA complete interrupt by writing to CONTROL_REGISTER using codec_write_reg() */ /* Set the sampling rate by writing to SAMPLING_RATE_REGISTER */ /* Configure clock source and enable clocking by writing to CLOCK_INPUT_REGISTER */ /* Allocate DMA descriptors for audio transfer */ return 0; } /* Audio trigger/stop/.. */ static int mycard_pb_trigger(struct snd_pcm_substream *substream, int cmd) { switch (cmd) { case SNDRV_PCM_TRIGGER_START: /* Map the audio substream's runtime audio buffer (which is an offset into runtime->dma_area) using dma_map_single(), populate the resulting address to the audio controller's DMA_ADDRESS_REGISTER, and perform DMA */ /* ... */ break; case SNDRV_PCM_TRIGGER_STOP: /* Shut the stream. Unmap DMA buffer using dma_unmap_single() */ /* ... */ break; default: return -EINVAL; break; } return 0; } /* Allocate DMA buffers using memory preallocated for DMA from the probe() method. dma_[map|unmap]_single() operate on this area later on */ static int mycard_hw_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *hw_params) { /* Use preallocated memory from mycard_audio_probe() to satisfy this memory request */ return snd_pcm_lib_malloc_pages(substream, params_buffer_bytes(hw_params)); } /* Reverse of mycard_hw_params() */ static int mycard_hw_free(struct snd_pcm_substream *substream) { return snd_pcm_lib_free_pages(substream); }

/* Volume info */ static int mycard_pb_vol_info(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_info *uinfo) { uinfo->type = SNDRV_CTL_ELEM_TYPE_INTEGER; /* Integer type */ uinfo->count = 1; /* Number of values */ uinfo->value.integer.min = 0; /* Minimum volume gain */ uinfo->value.integer.max = 10; /* Maximum volume gain */ uinfo->value.integer.step = 1; /* In steps of 1 */ return 0; } /* Playback volume knob */ static int mycard_pb_vol_put(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *uvalue) { int global_volume = uvalue->value.integer.value[0]; /* Write global_volume to VOLUME_REGISTER using codec_write_reg() */ /* ... */ /* If the volume changed from the current value, return 1. If there is an error, return negative code. Else return 0 */ } /* Get playback volume */ static int mycard_pb_vol_get(struct snd_kcontrol *kcontrol, struct snd_ctl_elem_value *uvalue) { /* Read global_volume from VOLUME_REGISTER and return it via uvalue->integer.value[0] */ /* ... */ return 0; } /* Entry points for the playback mixer */ static struct snd_kcontrol_new mycard_playback_vol = { .iface = SNDRV_CTL_ELEM_IFACE_MIXER, /* Control is of type MIXER */ .name = "MP3 Volume", /* Name */ .index = 0, /* Codec No: 0 */ .info = mycard_pb_vol_info, /* Volume info */ .get = mycard_pb_vol_get, /* Get volume */ .put = mycard_pb_vol_put, /* Set volume */ }; /* Operators for the PCM playback stream */ static struct snd_pcm_ops mycard_playback_ops = { .open = mycard_playback_open, /* Open */ .close = mycard_playback_close, /* Close */ .ioctl = snd_pcm_lib_ioctl, /* Generic ioctl handler */ .hw_params = mycard_hw_params, /* Hardware parameters */ .hw_free = mycard_hw_free, /* Free h/w params */ .prepare = mycard_playback_prepare, /* Prepare to transfer audio stream */ .trigger = mycard_playback_trigger, /* Called when the PCM engine

starts/stops/pauses */ }; /* Platform driver probe() method */ static int __init mycard_audio_probe(struct platform_device *dev) { struct snd_card *card; struct snd_pcm *pcm; int myctl_private; /* Instantiate an snd_card structure */ card = snd_card_new(-1, id[dev->id], THIS_MODULE, 0); /* Create a new PCM instance with 1 playback substream and 0 capture streams */ snd_pcm_new(card, "mycard_pcm", 0, 1, 0, &pcm); /* Set up our initial DMA buffers */ snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_CONTINUOUS, snd_dma_continuous_data (GFP_KERNEL), 256*1024, 256*1024); /* Connect playback operations with the PCM instance */ snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &mycard_playback_ops); /* Associate a mixer control element with this card */ snd_ctl_add(card, snd_ctl_new1(&mycard_playback_vol, &myctl_private)); strcpy(card->driver, "mycard"); /* Register the sound card */ snd_card_register(card); /* Store card for access from other methods */ platform_set_drvdata(dev, card); return 0; } /* Platform driver remove() method */ static int mycard_audio_remove(struct platform_device *dev) { snd_card_free(platform_get_drvdata(dev)); platform_set_drvdata(dev, NULL); return 0; } /* Platform driver definition */ static struct platform_driver mycard_audio_driver = { .probe = mycard_audio_probe, /* Probe method */ .remove = mycard_audio_remove, /* Remove method */ .driver = { .name = "mycard_ALSA",

}, }; /* Driver Initialization */ static int __init mycard_audio_init(void) { /* Register the platform driver and device */ platform_driver_register(&mycard_audio_driver); mycard_device = platform_device_register_simple("mycard_ALSA", -1, NULL, 0); return 0; } /* Driver Exit */ static void __exit mycard_audio_exit(void) { platform_device_unregister(mycard_device); platform_driver_unregister(&mycard_audio_driver); } module_init(mycard_audio_init); module_exit(mycard_audio_exit); MODULE_LICENSE("GPL");

ALSA Pr ogr a m m in g To underst and how t he user space alsa- lib library int eract s wit h kernel space ALSA drivers, let 's writ e a sim ple applicat ion t hat set s t he volum e gain of t he MP3 player. We will m ap t he alsa- lib services used by t he applicat ion t o t he m ixer cont rol m et hods defined in List ing 13.1. Let 's begin by loading t he driver and exam ining t he m ixer's capabilit ies: bash> amixer contents ... numid=3,iface=MIXER,name="MP3 Volume" ; type=INTEGER,... ...

I n t he volum e- cont rol applicat ion, first allocat e space for t he alsa- lib obj ect s necessary t o perform t he volum econt rol operat ion: #include snd_ctl_elem_value_t *nav_control; snd_ctl_elem_id_t *nav_id; snd_ctl_elem_info_t *nav_info; snd_ctl_elem_value_alloca(&nav_control); snd_ctl_elem_id_alloca(&nav_id); snd_ctl_elem_info_alloca(&nav_info);

Next , set t he int erface t ype t o SND_CTL_ELEM_IFACE_MIXER as specified in t he mycard_playback_vol st ruct ure in List ing 13.1: snd_ctl_elem_id_set_interface(nav_id, SND_CTL_ELEM_IFACE_MIXER);

Now set t he numid for t he MP3 volum e obt ained from t he am ixer out put above: snd_ctl_elem_id_set_numid(nav_id, 3); /* num_id=3 */

Open t he m ixer node, / dev/ snd/ cont rolC0. The t hird argum ent t o snd_ctl_open() specifies t he card num ber in t he node nam e: snd_ctl_open(&nav_handle, card, 0); /* Connect data structures */ snd_ctl_elem_info_set_id(nav_info, nav_id); snd_ctl_elem_info(nav_handle, nav_info);

Elicit t he type field in t he snd_ctl_elem_info st ruct ure defined in mycard_pb_vol_info() in List ing 13.1 as follows: if (snd_ctl_elem_info_get_type(nav_info) != SND_CTL_ELEM_TYPE_INTEGER) { printk("Mismatch in control type\n"); }

Get t he support ed codec volum e range by com m unicat ing wit h t he mycard_pb_vol_info() driver m et hod: long desired_volume = 5; long min_volume = snd_ctl_elem_info_get_min(nav_info); long max_volume = snd_ctl_elem_info_get_max(nav_info); /* Ensure that the desired_volume is within min_volume and max_volume */ /* ... */

As per t he definit ion of mycard_pb_vol_info() in List ing 13.1, t he m inim um and m axim um values ret urned by t he above alsa- lib helper rout ines are 0 and 10, respect ively. Finally, set t he desired volum e and writ e it t o t he codec: snd_ctl_elem_value_set_integer(nav_control, 0, desired_volume); snd_ctl_elem_write(nav_handle, nav_control);

The call t o snd_ctl_elem_write() result s in t he invocat ion of mycard_pb_vol_put(), which writ es t he desired volum e gain t o t he codec's VOLUME_REGISTER.

MP3 Decoding Com plexit y The MP3 decoder applicat ion running on t he player, as shown in Figure 13.4, requires a supply rat e of MP3 fram es from t he CF disk t hat can sust ain t he com m on MP3 sam pling rat e of 128KBps. This is usually not a problem for m ost low- MI Ps devices, but in case it is, consider buffering each song in m em ory before decoding it . ( MP3 fram es at 128KBps roughly consum e 1MB per m inut e of m usic.) MP3 decoding is light weight and can usually be accom plished on- t he- fly, but MP3 encoding is heavy- dut y and cannot be achieved in real t im e wit hout hardware assist . Voice codecs such as G.711 and G.729 used in Voice over I P ( VoI P) environm ent s can, however, encode and decode audio dat a in real t im e.

D e bu ggin g You m ay t urn on opt ions under Device Drivers Sound Advanced Linux Sound Archit ect ure in t he kernel configurat ion m enu t o include ALSA debug code (CONFIG_SND_DEBUG) , verbose printk() m essages (CONFIG_SND_VERBOSE_PRINTK) , and verbose procfs cont ent ( CONFIG_SND_VERBOSE_PROCFS) . Procfs inform at ion pert aining t o ALSA drivers resides in / proc/ asound/ . Look inside / sys/ class/ sound/ for t he device m odel inform at ion associat ed wit h each sound- class device. I f you t hink you have found a bug in an ALSA driver, post it t o t he alsa- devel m ailing list ( ht t p: / / m ailm an.alsaproj ect .org/ m ailm an/ list info/ alsa- devel) . The linux- audio- dev m ailing list (ht t p: / / m usic.colum bia.edu/ m ailm an/ list info/ linux- audio- dev/ ) , also called t he Linux Audio Developers ( LAD) list , discusses quest ions relat ed t o t he Linux- sound archit ect ure and audio applicat ions.

Look in g a t t h e Sou r ce s The sound core, audio buses, archit ect ures, and t he obsolet e OSS suit e all have t heir own separat e subdirect ories under sound/ . For t he AC'97 int erface im plem ent at ion, look inside sound/ pci/ ac97/ . For an exam ple I 2 S- based audio driver, look at sound/ soc/ at 91/ at 91- ssc.c, t he audio driver for At m el's AT91- series ARM- based em bedded SoCs. Use sound/ drivers/ dum m y.c as a st art ing point for developing your cust om ALSA driver if you cannot find a closer m at ch. Docum ent at ion/ sound/ * cont ains inform at ion on ALSA and OSS drivers. Docum ent at ion/ sound/ alsa/ DocBook/ cont ains a DocBook on writ ing ALSA drivers. An ALSA configurat ion guide is available in Docum ent at ion/ sound/ alsa/ ALSA- Configurat ion.t xt . The Sound- HOWTO, downloadable from ht t p: / / t ldp.org/ HOWTO/ Sound- HOWTO/ , answers several frequent ly asked quest ions pert aining t o Linux support for audio devices. Madplay is a soft ware MP3 decoder and player t hat is bot h ALSA- and OSS- aware. You can look at it s sources for t ips on user- space audio program m ing. Two no- frills OSS t ools for basic playback and recording are rawplay and rawrec, whose sources are downloadable from ht t p: / / rawrec.sourceforge.net / . You can find t he hom e page of t he Linux- ALSA proj ect at www.alsa- proj ect .org. Here, you will find t he lat est news on ALSA drivers, det ails on t he ALSA program m ing API , and inform at ion on subscribing t o relat ed m ailing list s. Sources of alsa- ut ils and alsa- lib, downloadable from t his page, can aid you while developing ALSA- aware applicat ions. Table 13.2 cont ains t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he source t ree. Table 13.3 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 1 3 .2 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

snd_card

include/ sound/ core.h

Represent at ion of a sound card

snd_pcm

include/ sound/ pcm .h

An inst ance of a PCM obj ect

snd_pcm_ops

include/ sound/ pcm .h

Used t o connect operat ions wit h a PCM obj ect

snd_pcm_substream

include/ sound/ pcm .h

I nform at ion about t he current audio st ream

snd_pcm_runtime

include/ sound/ pcm .h

Runt im e det ails of t he audio st ream

snd_kcontrol_new

include/ sound/ cont rol.h Represent at ion of an ALSA cont rol elem ent

Ta ble 1 3 .3 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

snd_card_new()

sound/ core/ init .c

I nst ant iat es an snd_card st ruct ure

snd_card_free()

sound/ core/ init .c

Frees an inst ant iat ed snd_card

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

snd_card_register()

sound/ core/ init .c

Regist ers a sound card wit h t he ALSA fram ework

snd_pcm_lib_preallocate_pages_for_all() sound/ core/ pcm _m em ory.c Preallocat es buffers for a sound card snd_pcm_lib_malloc_pages()

sound/ core/ pcm _m em ory.c Allocat es DMA buffers for a sound card

snd_pcm_new()

sound/ core/ pcm .c

Creat es an inst ance of a PCM obj ect

snd_pcm_set_ops()

sound/ core/ pcm _lib.c

Connect s playback or capt ure operat ions wit h a PCM obj ect

snd_ctl_add()

sound/ core/ cont rol.c

Associat es a m ixer cont rol elem ent wit h a sound card

snd_ctl_new1()

sound/ core/ cont rol.c

Allocat es an snd_kcontrol st ruct ure and init ializes it wit h supplied cont rol operat ions

snd_card_proc_new()

sound/ core/ info.c

Creat es a / proc ent ry and assigns it t o a card inst ance

Ch a pt e r 1 4 . Block D r ive r s I n Th is Ch a pt e r 416 St orage Technologies 421 Linux Block I / O Layer 422 I / O Schedulers 423 Block Driver Dat a St ruct ures and Met hods 426 Device Exam ple: Sim ple St orage Cont r oller 434 Advanced Topics 436 Debugging 437 Looking at t he Sources

Block devices are st orage m edia capable of random access. Unlike charact er devices, block devices can hold filesyst em dat a. I n t his chapt er, let 's find out how Linux support s st orage buses and devices.

St or a ge Te ch n ologie s Let 's st art by t aking a t our of t he popular st orage t echnologies found in t oday's com put er syst em s. We'll also associat e t hese t echnologies wit h t he corresponding device driver subsyst em s in t he kernel source t ree. I nt egrat ed Drive Elect ronics ( I DE) is t he com m on st orage int erface t echnology used in t he PC environm ent . ATA ( short for Advanced Technology At t achm ent ) is t he official nam e for t he relat ed specificat ions. The I DE/ ATA

st andard began wit h ATA- 1; t he lat est version is ATA- 7 and support s bandwidt hs of up t o 133MBps. I nt ervening versions of t he specificat ion are ATA- 2, which int roduced logical block addressing ( LBA) ; ATA- 3, which enabled SMART- capable disks ( discussed lat er) ; ATA- 4, which brought support for Ult ra DMA and t he associat ed 33MBps t hroughput ; ATA- 5, which increased m axim um t ransfer speeds t o 66MBps; and ATA- 6, which provided for 100MBps dat a rat es. St orage devices such as CD- ROMs and t apes connect t o t he st andard I DE cable using a special prot ocol called t he ATA Packet I nt erface ( ATAPI ) . [ 1] ATAPI was int roduced along wit h ATA- 4.

[ 1]

The ATAPI prot ocol is closer t o SCSI t han t o I DE.

The floppy disk cont roller in PC syst em s has t radit ionally been part of t he Super I / O chipset about which we learned in Chapt er 6 , " Serial Drivers." These int ernal drives, however, have given way t o fast er ext ernal USB floppy drives in t oday's PC environm ent . Figure 14.1 shows an ATA- 7 disk drive connect ed t o an I DE host adapt er t hat 's part of t he Sout h Bridge chipset on a PC syst em . Also shown connect ed are an ATAPI CD- ROM drive and a floppy drive.

Figu r e 1 4 .1 . St or a ge m e dia in a PC syst e m .

I DE/ ATA is a parallel bus t echnology ( som et im es called Parallel ATA or PATA) and cannot scale t o high speeds, as you learned while discussing PCI e in Chapt er 10, " Peripheral Com ponent I nt erconnect ." Serial ATA ( SATA) is a m odern serial bus evolut ion of PATA t hat support s t ransfer speeds in t he realm of 300MBps and beyond. I n addit ion t o offering higher t hroughput t han PATA, SATA brings capabilit ies such as hot swapping. SATA t echnology is st eadily replacing PATA. See t he sidebar "libATA" t o learn about t he new ATA subsyst em in t he kernel t hat support s bot h SATA and PATA.

libATA libATA is t he new ATA subsyst em in t he Linux kernel. I t consist s of a set of ATA library rout ines and a collect ion of low- level drivers t hat use t hem . libATA support s bot h SATA and PATA. SATA drivers in libATA have been around for som e t im e under dr iver s/ scsi/ , but PATA drivers and t he new drivers/ at a/ direct ory t hat now houses all libATA sources were int roduced wit h t he 2.6.19 kernel release. I f your syst em is enabled wit h SATA st orage, you need t he services of libATA in t andem wit h t he SCSI subsyst em . libATA support for PATA is st ill experim ent al, and by default , PATA drivers cont inue t o use t he legacy I DE drivers t hat live in dr iver s/ ide/ . Assum e t hat your syst em is SATA- enabled via an I nt el I CH7 Sout h Bridge chipset . You need t he following libATA com ponent s t o access your disk:

1 . Th e libATA cor e — To enable t his, set CONFIG_ATA during kernel configurat ion. For a list of library funct ions offered by t he core, grep for EXPORT_SYMBOL_GPL inside t he drivers/ at a/ direct ory.

2 . Adva n ce d H ost Con t r olle r I n t e r fa ce ( AH CI ) su ppor t — AHCI specifies t he regist er int erface support ed by SATA host adapt ers and is enabled by choosing CONFIG_AHCI at configurat ion t im e.

3 . Th e h ost con t r olle r a da pt e r dr ive r — For t he I CH7, enable CONFIG_ATA_PIIX.

Addit ionally, you need t he m id- level and upper- level SCSI drivers ( CONFIG_SCSI and friends) . Aft er you have loaded all t hese kernel com ponent s, your SATA disk part it ions appear t o t he syst em as / dev/ sd* , j ust like SCSI or USB m ass st orage part it ions. The hom e page of t he libATA proj ect is ht t p: / / linux- at a.org/ . A DocBook is available as part of t he kernel source t ree in Docum ent at ion/ DocBook/ libat a.t m pl. A libATA developer's guide is available at www.kernel.org/ pub/ linux/ kernel/ people/ j garzik/ libat a.pdf.

Sm all Com put er Syst em I nt erface ( SCSI ) is t he st orage t echnology of choice in servers and high- end workst at ions. SCSI is som ewhat fast er t han SATA and support s speeds of t he order of 320MBps. SCSI has t radit ionally been a parallel int erface st andard, but , like ATA, has recent ly shift ed t o serial operat ion wit h t he advent of a bus t echnology called Serial At t ached SCSI ( SAS) . The kernel's SCSI subsyst em is archit ect ed int o t hree layers: t op- level drivers for m edia such as disks, CDROMs, and t apes; a m iddle- level layer t hat scans t he SCSI bus and configures devices; and low- level host adapt er drivers. We learned about t hese layers in t he sect ion "Mass St orage" in Chapt er 11, " Universal Serial Bus." Refer back t o Figure 11.4 in t hat chapt er t o see how t he different com ponent s of t he SCSI subsyst em int eract wit h each ot her. [ 2] USB m ass st orage drives use flash m em ory int ernally but com m unicat e wit h host syst em s using t he SCSI prot ocol.

[ 2] SCSI support is discussed in ot her part s of t his book, t oo. The sect ion " User Mode SCSI " in Chapt er 19, " Drivers in User Space," discusses t he SCSI Generic ( sg) int erface t hat let s you direct ly dispat ch com m ands from user space t o SCSI devices. The sect ion "iSCSI " in Chapt er 20, " More Devices and Drivers," briefly looks at t he iSCSI prot ocol, which allows t he t ransport of SCSI packet s t o a rem ot e block device over a TCP/ I P net work.

Redundant array of inexpensive disks ( RAI D) is a t echnology built in t o som e SCSI and SATA cont rollers t o achieve redundancy and reliabilit y. Various RAI D levels have been defined. RAI D- 1, for exam ple, specifies disk m irroring, where dat a is duplicat ed on separat e disks. Linux drivers are available for several RAI D- capable disk drives. The kernel also offers a m ult idisk ( m d) driver t hat im plem ent s m ost RAI D levels in soft ware. Miniat ure st orage is t he nam e of t he gam e in t he em bedded consum er elect ronics space. Transfer speeds in t his dom ain are m uch lower t han t hat offered by t he t echnologies discussed t hus far. Secure Digit al ( SD) cards and t heir sm aller form - fact or derivat ives ( m iniSD and m icroSD) are popular st orage m edia [ 3] in devices such as cam eras, cell phones, and m usic players. Cards com plying wit h version 1.01 of t he SD card specificat ion support t ransfer speeds of up t o 10MBps. SD st orage has evolved from an older, slower, but com pat ible t echnology called Mult iMediaCar d ( MMC) t hat support s dat a rat es of 2.5MBps. The kernel cont ains an SD/ MMC subsyst em in drivers/ m m c/ .

[ 3]

See t he sidebar "WiFi over SDI O" in Chapt er 16, " Linux Wit hout Wires," t o learn about nonst orage t echnologies available in SD form fact or.

The sect ion " PCMCI A St orage" in Chapt er 9 , " PCMCI A and Com pact Flash," looked at different PCMCI A/ CF flavors of st orage cards and t heir corresponding kernel drivers. PCMCI A m em ory cards such as m icrodrives support t rue I DE operat ion, whereas t hose t hat int ernally use solid- st at e m em ory em ulat e I DE and export an I DE program m ing m odel t o t he kernel. I n bot h t hese cases, t he kernel's I DE subsyst em can be used t o enable t he card. Table 14.1 sum m arizes im port ant st orage t echnologies and t he locat ion of t he associat ed device drivers in t he kernel source t ree.

Ta ble 1 4 .1 . St or a ge Te ch n ologie s a n d Associa t e d D e vice D r ive r s St or a ge Te ch n ology

D e scr ipt ion

Sou r ce File

I DE/ ATA

St orage int erface t echnology in t he PC environm ent . Support s dat a rat es of 133MBps for ATA- 7.

drivers/ ide/ ide- disk.c, driver/ ide/ ide- io.c, drivers/ ide/ ide- probe.c or drivers/ at a/ ( Experim ent al)

ATAPI

St orage devices such as CD- ROMs and t apes connect t o t he st andard I DE cable using t he ATAPI prot ocol.

drivers/ ide/ ide- cd.c or drivers/ at a/ ( Experim ent al)

Floppy ( int ernal)

The floppy cont roller resides in t he Super drivers/ block/ floppy.c I / O chip on t he LPC bus in PC- com pat ible syst em s. Support s t ransfer rat es of t he order of 150KBps.

SATA

Serial evolut ion of I DE/ ATA. Support s speeds of 300MBps and beyond.

SCSI

St orage t echnology popular in t he server dr iver s/ scsi/ environm ent . Support s t ransfer rat es of 320MBps for Ult ra320 SCSI .

drivers/ at a/ , dr iver s/ scsi/

St or a ge Te ch n ology

D e scr ipt ion

Sou r ce File

USB Mass St orage

This refers t o USB hard disks, pen drives, drivers/ usb/ st orage/ , CD- ROMs, and floppy drives. Look at t he dr iver s/ scsi/ sect ion "Mass St orage" in Chapt er 11. USB 2.0 devices can com m unicat e at speeds of up t o 60MBps.

RAI D: Hardware RAI D

This is a capabilit y built int o high- end SCSI / SATA disk cont rollers t o achieve redundancy and reliabilit y.

dr iver s/ scsi/ , drivers/ at a/

Soft ware RAI D

On Linux, t he m ult idisk ( m d) driver im plem ent s several RAI D levels in soft ware.

drivers/ m d/

SD/ m iniSD/ m icroSD

Sm all form - fact or st orage m edia popular drivers/ m m c/ in consum er elect ronic devices such as cam eras and cell phones. Support s t ransfer rat es of up t o 10MBps.

MMC

Older rem ovable st orage st andard t hat 's drivers/ m m c/ com pat ible wit h SD cards. Support s dat a rat es of 2.5MBps.

PCMCI A/ CF st orage cards

PCMCI A/ CF form fact or of m iniat ure I DE drives, or solid- st at e m em ory cards t hat em ulat e I DE. See t he sect ion "PCMCI A St orage" in Chapt er 9 .

drivers/ ide/ legacy/ ide- cs.c or drivers/ at a/ pat a_pcm cia.c ( experim ent al)

Block device em ulat ion over flash m em ory

Em ulat es a hard disk over flash m em ory. drivers/ m t d/ m t dblock.c, See t he sect ion " Block Device Em ulat ion" drivers/ m t d/ m t d_blkdevs.c in Chapt er 17, " Mem ory Technology Devices."

Virt ual block devices on Linux: RAM disk

I m plem ent s support t o use a RAM region drivers/ block/ rd.c as a block device.

Loopback device

I m plem ent s support t o use a regular file as a block device.

drivers/ block/ loop.c

Ch a pt e r 1 4 . Block D r ive r s I n Th is Ch a pt e r 416 St orage Technologies 421 Linux Block I / O Layer 422 I / O Schedulers 423 Block Driver Dat a St ruct ures and Met hods 426 Device Exam ple: Sim ple St orage Cont r oller 434 Advanced Topics 436 Debugging 437 Looking at t he Sources

Block devices are st orage m edia capable of random access. Unlike charact er devices, block devices can hold filesyst em dat a. I n t his chapt er, let 's find out how Linux support s st orage buses and devices.

St or a ge Te ch n ologie s Let 's st art by t aking a t our of t he popular st orage t echnologies found in t oday's com put er syst em s. We'll also associat e t hese t echnologies wit h t he corresponding device driver subsyst em s in t he kernel source t ree. I nt egrat ed Drive Elect ronics ( I DE) is t he com m on st orage int erface t echnology used in t he PC environm ent . ATA ( short for Advanced Technology At t achm ent ) is t he official nam e for t he relat ed specificat ions. The I DE/ ATA

st andard began wit h ATA- 1; t he lat est version is ATA- 7 and support s bandwidt hs of up t o 133MBps. I nt ervening versions of t he specificat ion are ATA- 2, which int roduced logical block addressing ( LBA) ; ATA- 3, which enabled SMART- capable disks ( discussed lat er) ; ATA- 4, which brought support for Ult ra DMA and t he associat ed 33MBps t hroughput ; ATA- 5, which increased m axim um t ransfer speeds t o 66MBps; and ATA- 6, which provided for 100MBps dat a rat es. St orage devices such as CD- ROMs and t apes connect t o t he st andard I DE cable using a special prot ocol called t he ATA Packet I nt erface ( ATAPI ) . [ 1] ATAPI was int roduced along wit h ATA- 4.

[ 1]

The ATAPI prot ocol is closer t o SCSI t han t o I DE.

The floppy disk cont roller in PC syst em s has t radit ionally been part of t he Super I / O chipset about which we learned in Chapt er 6 , " Serial Drivers." These int ernal drives, however, have given way t o fast er ext ernal USB floppy drives in t oday's PC environm ent . Figure 14.1 shows an ATA- 7 disk drive connect ed t o an I DE host adapt er t hat 's part of t he Sout h Bridge chipset on a PC syst em . Also shown connect ed are an ATAPI CD- ROM drive and a floppy drive.

Figu r e 1 4 .1 . St or a ge m e dia in a PC syst e m .

I DE/ ATA is a parallel bus t echnology ( som et im es called Parallel ATA or PATA) and cannot scale t o high speeds, as you learned while discussing PCI e in Chapt er 10, " Peripheral Com ponent I nt erconnect ." Serial ATA ( SATA) is a m odern serial bus evolut ion of PATA t hat support s t ransfer speeds in t he realm of 300MBps and beyond. I n addit ion t o offering higher t hroughput t han PATA, SATA brings capabilit ies such as hot swapping. SATA t echnology is st eadily replacing PATA. See t he sidebar "libATA" t o learn about t he new ATA subsyst em in t he kernel t hat support s bot h SATA and PATA.

libATA libATA is t he new ATA subsyst em in t he Linux kernel. I t consist s of a set of ATA library rout ines and a collect ion of low- level drivers t hat use t hem . libATA support s bot h SATA and PATA. SATA drivers in libATA have been around for som e t im e under dr iver s/ scsi/ , but PATA drivers and t he new drivers/ at a/ direct ory t hat now houses all libATA sources were int roduced wit h t he 2.6.19 kernel release. I f your syst em is enabled wit h SATA st orage, you need t he services of libATA in t andem wit h t he SCSI subsyst em . libATA support for PATA is st ill experim ent al, and by default , PATA drivers cont inue t o use t he legacy I DE drivers t hat live in dr iver s/ ide/ . Assum e t hat your syst em is SATA- enabled via an I nt el I CH7 Sout h Bridge chipset . You need t he following libATA com ponent s t o access your disk:

1 . Th e libATA cor e — To enable t his, set CONFIG_ATA during kernel configurat ion. For a list of library funct ions offered by t he core, grep for EXPORT_SYMBOL_GPL inside t he drivers/ at a/ direct ory.

2 . Adva n ce d H ost Con t r olle r I n t e r fa ce ( AH CI ) su ppor t — AHCI specifies t he regist er int erface support ed by SATA host adapt ers and is enabled by choosing CONFIG_AHCI at configurat ion t im e.

3 . Th e h ost con t r olle r a da pt e r dr ive r — For t he I CH7, enable CONFIG_ATA_PIIX.

Addit ionally, you need t he m id- level and upper- level SCSI drivers ( CONFIG_SCSI and friends) . Aft er you have loaded all t hese kernel com ponent s, your SATA disk part it ions appear t o t he syst em as / dev/ sd* , j ust like SCSI or USB m ass st orage part it ions. The hom e page of t he libATA proj ect is ht t p: / / linux- at a.org/ . A DocBook is available as part of t he kernel source t ree in Docum ent at ion/ DocBook/ libat a.t m pl. A libATA developer's guide is available at www.kernel.org/ pub/ linux/ kernel/ people/ j garzik/ libat a.pdf.

Sm all Com put er Syst em I nt erface ( SCSI ) is t he st orage t echnology of choice in servers and high- end workst at ions. SCSI is som ewhat fast er t han SATA and support s speeds of t he order of 320MBps. SCSI has t radit ionally been a parallel int erface st andard, but , like ATA, has recent ly shift ed t o serial operat ion wit h t he advent of a bus t echnology called Serial At t ached SCSI ( SAS) . The kernel's SCSI subsyst em is archit ect ed int o t hree layers: t op- level drivers for m edia such as disks, CDROMs, and t apes; a m iddle- level layer t hat scans t he SCSI bus and configures devices; and low- level host adapt er drivers. We learned about t hese layers in t he sect ion "Mass St orage" in Chapt er 11, " Universal Serial Bus." Refer back t o Figure 11.4 in t hat chapt er t o see how t he different com ponent s of t he SCSI subsyst em int eract wit h each ot her. [ 2] USB m ass st orage drives use flash m em ory int ernally but com m unicat e wit h host syst em s using t he SCSI prot ocol.

[ 2] SCSI support is discussed in ot her part s of t his book, t oo. The sect ion " User Mode SCSI " in Chapt er 19, " Drivers in User Space," discusses t he SCSI Generic ( sg) int erface t hat let s you direct ly dispat ch com m ands from user space t o SCSI devices. The sect ion "iSCSI " in Chapt er 20, " More Devices and Drivers," briefly looks at t he iSCSI prot ocol, which allows t he t ransport of SCSI packet s t o a rem ot e block device over a TCP/ I P net work.

Redundant array of inexpensive disks ( RAI D) is a t echnology built in t o som e SCSI and SATA cont rollers t o achieve redundancy and reliabilit y. Various RAI D levels have been defined. RAI D- 1, for exam ple, specifies disk m irroring, where dat a is duplicat ed on separat e disks. Linux drivers are available for several RAI D- capable disk drives. The kernel also offers a m ult idisk ( m d) driver t hat im plem ent s m ost RAI D levels in soft ware. Miniat ure st orage is t he nam e of t he gam e in t he em bedded consum er elect ronics space. Transfer speeds in t his dom ain are m uch lower t han t hat offered by t he t echnologies discussed t hus far. Secure Digit al ( SD) cards and t heir sm aller form - fact or derivat ives ( m iniSD and m icroSD) are popular st orage m edia [ 3] in devices such as cam eras, cell phones, and m usic players. Cards com plying wit h version 1.01 of t he SD card specificat ion support t ransfer speeds of up t o 10MBps. SD st orage has evolved from an older, slower, but com pat ible t echnology called Mult iMediaCar d ( MMC) t hat support s dat a rat es of 2.5MBps. The kernel cont ains an SD/ MMC subsyst em in drivers/ m m c/ .

[ 3]

See t he sidebar "WiFi over SDI O" in Chapt er 16, " Linux Wit hout Wires," t o learn about nonst orage t echnologies available in SD form fact or.

The sect ion " PCMCI A St orage" in Chapt er 9 , " PCMCI A and Com pact Flash," looked at different PCMCI A/ CF flavors of st orage cards and t heir corresponding kernel drivers. PCMCI A m em ory cards such as m icrodrives support t rue I DE operat ion, whereas t hose t hat int ernally use solid- st at e m em ory em ulat e I DE and export an I DE program m ing m odel t o t he kernel. I n bot h t hese cases, t he kernel's I DE subsyst em can be used t o enable t he card. Table 14.1 sum m arizes im port ant st orage t echnologies and t he locat ion of t he associat ed device drivers in t he kernel source t ree.

Ta ble 1 4 .1 . St or a ge Te ch n ologie s a n d Associa t e d D e vice D r ive r s St or a ge Te ch n ology

D e scr ipt ion

Sou r ce File

I DE/ ATA

St orage int erface t echnology in t he PC environm ent . Support s dat a rat es of 133MBps for ATA- 7.

drivers/ ide/ ide- disk.c, driver/ ide/ ide- io.c, drivers/ ide/ ide- probe.c or drivers/ at a/ ( Experim ent al)

ATAPI

St orage devices such as CD- ROMs and t apes connect t o t he st andard I DE cable using t he ATAPI prot ocol.

drivers/ ide/ ide- cd.c or drivers/ at a/ ( Experim ent al)

Floppy ( int ernal)

The floppy cont roller resides in t he Super drivers/ block/ floppy.c I / O chip on t he LPC bus in PC- com pat ible syst em s. Support s t ransfer rat es of t he order of 150KBps.

SATA

Serial evolut ion of I DE/ ATA. Support s speeds of 300MBps and beyond.

SCSI

St orage t echnology popular in t he server dr iver s/ scsi/ environm ent . Support s t ransfer rat es of 320MBps for Ult ra320 SCSI .

drivers/ at a/ , dr iver s/ scsi/

St or a ge Te ch n ology

D e scr ipt ion

Sou r ce File

USB Mass St orage

This refers t o USB hard disks, pen drives, drivers/ usb/ st orage/ , CD- ROMs, and floppy drives. Look at t he dr iver s/ scsi/ sect ion "Mass St orage" in Chapt er 11. USB 2.0 devices can com m unicat e at speeds of up t o 60MBps.

RAI D: Hardware RAI D

This is a capabilit y built int o high- end SCSI / SATA disk cont rollers t o achieve redundancy and reliabilit y.

dr iver s/ scsi/ , drivers/ at a/

Soft ware RAI D

On Linux, t he m ult idisk ( m d) driver im plem ent s several RAI D levels in soft ware.

drivers/ m d/

SD/ m iniSD/ m icroSD

Sm all form - fact or st orage m edia popular drivers/ m m c/ in consum er elect ronic devices such as cam eras and cell phones. Support s t ransfer rat es of up t o 10MBps.

MMC

Older rem ovable st orage st andard t hat 's drivers/ m m c/ com pat ible wit h SD cards. Support s dat a rat es of 2.5MBps.

PCMCI A/ CF st orage cards

PCMCI A/ CF form fact or of m iniat ure I DE drives, or solid- st at e m em ory cards t hat em ulat e I DE. See t he sect ion "PCMCI A St orage" in Chapt er 9 .

drivers/ ide/ legacy/ ide- cs.c or drivers/ at a/ pat a_pcm cia.c ( experim ent al)

Block device em ulat ion over flash m em ory

Em ulat es a hard disk over flash m em ory. drivers/ m t d/ m t dblock.c, See t he sect ion " Block Device Em ulat ion" drivers/ m t d/ m t d_blkdevs.c in Chapt er 17, " Mem ory Technology Devices."

Virt ual block devices on Linux: RAM disk

I m plem ent s support t o use a RAM region drivers/ block/ rd.c as a block device.

Loopback device

I m plem ent s support t o use a regular file as a block device.

drivers/ block/ loop.c

Lin u x Block I / O La ye r The block I / O layer was considerably overhauled bet ween t he 2.4 and 2.6 kernel releases. The m ot ivat ion for t he redesign was t hat t he block layer, m ore t han ot her kernel subsyst em s, has t he pot ent ial t o im pact overall syst em perform ance. Let 's t ake a look at Figure 14.2 t o learn t he workings of t he Linux block I / O layer. The st orage m edia cont ains files residing in a filesyst em , such as EXT3 or Reiserfs. User applicat ions invoke I / O syst em calls t o access t hese files. The result ing filesyst em operat ions pass t hrough t he generic Virt ual File Syst em ( VFS) layer before ent ering t he individual filesyst em driver. The buffer cache speeds up filesyst em access t o block devices by caching disk blocks. I f a block is found in t he buffer cache, t he t im e required t o access t he disk t o read t he block is saved. Dat a dest ined for each block device is lined up in a request queue. The filesyst em driver populat es t he request queue belonging t o t he desired block device, whereas t he block driver receives and consum es request s from t he corresponding queue. I n bet ween, I / O schedulers m anipulat e t he request queue so as t o m inim ize disk access lat encies and m axim ize t hroughput .

Figu r e 1 4 .2 . Block I / O on Lin u x .

Let 's next exam ine t he different I / O schedulers available on Linux.

I / O Sch e du le r s Block devices suffer seek t im es, t he lat ency t o m ove t he disk head from it s exist ing posit ion t o t he disk sect or of int erest . The m ain goal of an I / O scheduler is t o increase syst em t hroughput by m inim izing t hese seek t im es. To achieve t his, I / O schedulers m aint ain t he request queue in sort ed order according t o t he disk sect ors associat ed wit h t he const it uent request s. New request s are insert ed int o t he queue such t hat t his order is m aint ained. I f an exist ing request in t he queue is associat ed wit h an adj acent disk sect or, t he new request is m erged wit h it . Because of t hese propert ies, I / O schedulers bear an operat ional resem blance t o elevat ors—t hey schedule request s in a single direct ion unt il t he last request er in t he line is serviced. The I / O scheduler in 2.4 kernels im plem ent ed a st raight forward version of t his algorit hm and was called t he Linus elevat or. This t urned out t o be inadequat e under real- world condit ions, however, and was replaced in t he 2.6 kernel by a suit e of four schedulers: Deadline, Ant icipat ory, Com plet e Fair Queuing, and Noop. The scheduler used by default is Ant icipat ory, but t his can be changed during kernel configurat ion or by changing t he value of / sys/ block/ [ disk] / queue/ scheduler. ( Replace [ disk] wit h hda if you are using an I DE disk, for exam ple.) Table 14.2 briefly describes Linux I / O schedulers.

Ta ble 1 4 .2 . Lin u x I / O Sch e du le r s I / O Sch e du le r

D e scr ipt ion

Sou r ce File

Linus elevat or

St raight forward im plem ent at ion of t he st andard m erge- and- sort I / O scheduling algorit hm .

drivers/ block/ elevat or.c ( in t he 2.4 kernel t ree)

Deadline

I n addit ion t o what t he Linus elevat or block/ deadlinedoes, t he Deadline scheduler associat es iosched.c ( in t he 2.6 expirat ion t im es wit h each request in kernel t ree) order t o ensure t hat a burst of request s t o t he sam e disk region do not st arve request s t o regions t hat are fart her away. Moreover, read operat ions are grant ed m ore priorit y t han writ e operat ions because user processes usually block unt il t heir read request s com plet e. The Deadline scheduler t hus ensures t hat each I / O request is serviced wit hin a t im e lim it , which is im port ant for som e dat abase loads.

Ant icipat ory

Sim ilar t o t he Deadline scheduler, except t hat aft er servicing read request s, t he Ant icipat ory scheduler wait s for a predet erm ined am ount of t im e ant icipat ing furt her request s. This scheduling t echnique is suit ed for workst at ion/ int eract ive loads.

Com plet e Fair Queuing ( CFQ)

Sim ilar t o t he Linus elevat or, except t hat block/ cfq- iosched.c ( in t he CFQ scheduler m aint ains one request t he 2.6 kernel t ree) queue per originat ing process, rat her t han one generic queue. This ensures t hat each process ( or process group) get s a fair port ion of t he I / O and prevent s one process from st arving ot hers.

Noop

The Noop scheduler doesn't spend t im e

block/ as- iosched.c ( in t he 2.6 kernel t ree)

block/ noop- iosched.c

I / O Sch e du le r Noop

D e scr ipt ion Sou r ce File The Noop scheduler doesn't spend t im e block/ noop- iosched.c t raversing t he request queue searching ( in t he 2.6 kernel t ree) for opt im al insert ion point s. I nst ead, it sim ply adds new request s t o t he t ail of t he request queue. This scheduler is t hus ideal for sem iconduct or st orage m edia t hat have no m oving part s and, hence, no seek lat encies. An exam ple is a Disk- OnModule ( DOM) , which int ernally uses flash m em ory.

At a concept ual level, I / O scheduling resem bles process scheduling. Whereas I / O scheduling provides an illusion t o processes t hat t hey own t he disk, process scheduling gives processes t he illusion t hat t hey own t he CPU. Bot h I / O and process schedulers on Linux have undergone ext ensive changes in recent t im es. Process scheduling is discussed in Chapt er 19.

Block D r ive r D a t a St r u ct u r e s a n d M e t h ods Let 's now shift focus t o t he m ain t opic of t his chapt er, block device drivers. I n t his sect ion, we t ake a look at t he im port ant dat a st ruct ures and driver m et hods t hat you are likely t o encount er while im plem ent ing a block device driver. We use t hese st ruct ures and m et hods in t he next sect ion when we im plem ent a block driver for a fict it ious st orage cont roller. The following are t he m ain block driver dat a st ruct ures:

1 . The kernel represent s a disk using t he gendisk ( short for generic disk) st ruct ure defined in include/ linux/ genhd.h : struct gendisk { int major; int first_minor; int minors;

/* Device major number */ /* Starting minor number */ /* Maximum number of minors. You have one minor number per disk partition */ /* Disk name */

char disk_name[32]; /* ... */ struct block_device_operations *fops; /* Block device operations. Described soon. */ struct request_queue *queue; /* The request queue associated with this disk. Discussed next. */ /* ... */ }; 2 . The I / O request queue associat ed wit h each block driver is described using t he request_queue st ruct ure defined in include/ linux/ blkdev.h. This is a big st ruct ure, but it s only const it uent field t hat you m ight use is t he request st ruct ure, which is described next .

3 . Each request in a request_queue is represent ed using a request st ruct ure defined in include/ linux/ blkdev.h: struct request { /* ... */ struct request_queue *q; /* ... */ sector_t sector;

/* The container request queue */ /* Sector from which data access is requested */

/* ... */ unsigned long nr_sectors; /* Number of sectors left to submit */ /* ... */ struct bio *bio; /* The associated bio. Discussed soon. */ /* ... */ char *buffer; /* The buffer for data transfer */

/* ... */ struct request *next_rq;

/* Next request in the queue */

}; 4 . block_device_operations is t he block driver count erpart of t he file_operations st ruct ure used by charact er drivers. I t cont ains t he following ent ry point s associat ed wit h a block driver:

St andard m et hods such as open(), release(), and ioctl()

Specialized m et hods such as media_changed() and revalidate_disk() t hat support rem ovable block devices block_device_operations is defined as follows in include/ linux/ fs.h: struct block_device_operations { int (*open) (struct inode *, struct file *); /* Open */ int (*release) (struct inode *, struct file *);/* Close */ int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); /* I/O Control */ /* ... */ int (*media_changed) (struct gendisk *); /* Check if media is available or ejected */ int (*revalidate_disk) (struct gendisk *); /* Gear up for newly inserted media */ /* ... */ }; 5 . When we looked at t he request st ruct ure, we saw t hat it was associat ed wit h a bio. A bio st ruct ure is a low- level descript ion of block I / O operat ions at page- level granularit y. I t 's defined in include/ linux/ bio.h as follows: struct bio { sector_t struct bio /* .. */ unsigned long

bi_sector; *bi_next; bi_rw;

/* ... */ struct bio_vec *bi_io_vec; unsigned short unsigned short

bi_vcnt; bi_idx;

/* Sector from which data access is requested */ /* List of bio nodes */ /* Bottom bits of bi_rw contain the data-transfer direction */ /* Pointer to an array of bio_vec structures */ /* Size of the bio_vec array */ /* Index of the current bio_vec in the array */

/* ... */ }; Block dat a is int ernally represent ed as an I / O vect or using an array of bio_vec st ruct ures. Each elem ent of t he bio_vec array is m ade up of a (page, page_offset, length) t uple t hat describes a segm ent of t he I / O block. Maint aining I / O request s as a vect or of pages brings several advant ages, including a leaner im plem ent at ion and efficient scat t er/ gat her.

Before ending t his sect ion, let 's briefly look at block driver ent ry point s. Block drivers are broadly built using t hree t ypes of m et hods:

The usual init ializat ion and exit m et hods.

Met hods t hat are part of t he block_device_operations described previously.

A request m et hod. Block drivers, unlike char devices, do not support read()/write() m et hods for dat a t ransfer. I nst ead, t hey perform disk access using a special rout ine called t he request m et hod.

The block core layer offers a set of library rout ines t hat driver m et hods can leverage. The sam ple driver in t he next sect ion calls on t he services of several of t hese library rout ines.

D e vice Ex a m ple : Sim ple St or a ge Con t r olle r Consider t he em bedded device shown in Figure 14.3. The SoC cont ains a built - in st orage cont roller t hat com m unicat es wit h a block device. The archit ect ure is sim ilar t o SD/ MMC m edia, but our sam ple st orage cont roller is described by t he elem ent ary regist er set list ed in Table 14.3. The SECTOR_NUMBER_REGISTER specifies t he sect or from which dat a access is request ed. [ 4] The SECTOR_COUNT_REGISTER cont ains t he num ber of sect ors t o be t ransferred. Dat a is m oved via t he DATA_REGISTER. The COMMAND_REGISTER program s t he act ion t hat t he st orage cont roller has t o t ake ( for exam ple, whet her t o read from t he m edia or writ e t o it ) . The STATUS_REGISTER cont ains bit s t hat signal whet her t he cont roller is busy perform ing an operat ion.

[ 4] The st orage m edia in our sam ple device has a flat sect or- space geom et ry. I DE cont rollers, on t he ot her hand, support a cylinder head sector ( CHS) geom et ry specified by a device head regist er, a low cylinder regist er, and a high cylinder regist er, in addit ion t o t he sect or num ber regist er.

Figu r e 1 4 .3 . St or a ge on a n e m be dde d de vice .

Ta ble 1 4 .3 . Re gist e r La you t of t h e St or a ge Con t r olle r Re gist e r N a m e

D e scr ipt ion of Con t e n t s

SECTOR_NUMBER_REGISTER

The sect or on which t he next disk operat ion is t o be perform ed.

SECTOR_COUNT_REGISTER

Num ber of sect ors t o be read or writ t en.

COMMAND_REGISTER

The act ion t o be t aken ( for exam ple, read or writ e) .

STATUS_REGISTER

Result s of operat ions, int errupt st at us, and error flags.

DATA_REGISTER

I n t he read pat h, t he st orage cont roller fet ches dat a from t he disk t o int ernal buffers. The driver accesses t he int ernal buffer via t his regist er. I n t he writ e pat h, dat a writ t en by t he driver t o t his regist er is t ransferred t o t he int ernal buffer, from where t he cont roller copies it t o disk.

Let 's call t he st orage cont roller m yblkdev. This sim ple device is neit her int errupt driven nor support s DMA. We'll also assum e t hat t he m edia is not rem ovable. Our t ask is t o writ e a block driver for m yblkdev. Our driver is m inim al, albeit com plet e. I t does not handle power m anagem ent and is not part icularly perform ance- sensit ive.

I n it ia liza t ion List ing 14.1 cont ains t he driver init ializat ion m et hod, myblkdev_init (), which perform s t he following st eps: 1.

Regist ers t he block device using register_blkdev(). This block library rout ine assigns an unused m aj or num ber t o m yblkdev and adds an ent ry for t he device in / proc/ devices.

2.

Associat es a request m et hod wit h t he block device. I t does t his by supplying t he address of myblkdev_request() t o blk_init_queue(). The call t o blk_init_queue() ret urns t he request_queue for m yblkdev. Refer back t o Figure 14.2 t o see how t he request_queue sit s relat ive t o t he driver. The second argum ent t o blk_init_queue(), myblkdev_lock, is a spinlock t o prot ect t he request_queue from concurrent access.

3.

Hardware perform s disk t ransact ions in unit s of sect ors, whereas soft ware subsyst em s, such as filesyst em s, deal wit h dat a in t erm s of blocks. The com m on sect or size is 512 byt es; t he usual block size is 4096 byt es. You need t o inform t he block layer about t he sect or size support ed by your st orage hardware and t he m axim um num ber of sect ors t hat your driver can receive per request . myblkdev_init() accom plishes t hese by invoking blk_queue_hardsect_size() and blk_queue_max_sectors(), r espect ively.

4.

Allocat es a gendisk corresponding t o m yblkdev using alloc_disk() and populat es it . One im port ant gendisk field t hat myblkdev_init() supplies is t he address of t he driver's block_device_operations. Anot her param et er t hat myblkdev_init() fills in is t he st orage capacit y of m yblkdev in unit s of sect ors. This is accom plished by calling set_capacity(). Each gendisk also cont ains a flag t hat signals t he propert ies of t he underlying st orage hardware. I f t he drive is rem ovable, for exam ple, t he gendisk's flag field should be m arked GENHD_FL_REMOVABLE.

5.

Associat es t he gendisk prepared in St ep 4 wit h t he request_queue obt ained in St ep 2. Also, connect s t he gendisk wit h t he device's m aj or/ m inor num bers and a nam e.

6.

Adds t he disk t o t he block I / O layer by invoking add_disk(). When t his is done, t he driver has t o be ready t o receive request s. So, t his is usually t he last st ep of t he init ializat ion sequence.

The block device is now available t o t he syst em as / dev/ m yblkdev. I f t he device support s m ult iple disk part it ions, t hey appear as / dev/ m yblkdevX, where X is t he part it ion num ber. List in g 1 4 .1 . I n it ia lizin g t h e D r ive r Code View: #include #include static struct gendisk *myblkdisk; /* Representation of a disk */ static struct request_queue *myblkdev_queue; /* Associated request queue */ int myblkdev_major = 0; /* Ask the block subsystem to choose a major number */ static DEFINE_SPINLOCK(myblkdev_lock);/* Spinlock that protects myblkdev_queue from concurrent access */

int myblkdisk_size

= 256*1024;

/* Disk size in kilobytes. For a PC hard disk, one way to glean this is via the BIOS */ /* Hardware sector size */

int myblkdev_sect_size = 512; /* Initialization */ static int __init myblkdev_init(void) { /* Register this block driver with the kernel */ if ((myblkdev_major = register_blkdev(myblkdev_major, "myblkdev")) fops = &myblkdev_fops; /* Set the capacity of the storage media in terms of number of sectors */ set_capacity(myblkdisk, myblkdisk_size*2); myblkdisk->queue = myblkdev_queue; myblkdisk->major = myblkdev_major; myblkdisk->first_minor = 0; sprintf(myblkdisk->disk_name, "myblkdev"); /* Add the gendisk to the block I/O subsystem */ add_disk(myblkdisk); return 0; } /* Exit */ static void __exit myblkdev_exit(void) { /* Invalidate partitioning information and perform cleanup */ del_gendisk(myblkdisk); /* Drop references to the gendisk so that it can be freed */ put_disk(myblkdisk); /* Dissociate the driver from the request_queue. Internally calls elevator_exit() */ blk_cleanup_queue(myblkdev_queue);

/* Unregister the block device */ unregister_blkdev(myblkdev_major, "myblkdev"); } module_init(myblkdev_init); module_exit(myblkdev_exit); MODULE_LICENSE("GPL");

Block D e vice Ope r a t ion s Let 's next t ake a look at t he m ain m et hods cont ained in a block driver's block_device_operations. A block driver's open() m et hod is called during operat ions such as m ount ing a filesyst em residing on t he m edia or perform ing a filesyst em check ( fsck) . Many of t he t asks accom plished during open() are hardwaredependent . The CD- ROM driver, for exam ple, locks t he drive door. The SCSI driver checks whet her t he device has set a writ e- prot ect t ab, and, if so, fails if a writ e- enabled open is request ed. I f t he device is rem ovable, open() invokes t he service rout ine check_disk_change() t o check whet her t he m edia has changed. I f your driver needs t o support specific com m ands, im plem ent support for it using t he ioctl() m et hod. A floppy driver, for exam ple, support s a com m and t o ej ect t he m edia. The media_changed() m et hod checks whet her t he st orage m edia has changed, so t his is not relevant for fixed devices such as m yblkdev. The SCSI disk driver's media_changed() m et hod, for exam ple, det ect s whet her an insert ed USB pen drive has changed. The sole block device operat ion support ed by m yblkdev is t he ioctl() m et hod, myblkdev_ioctl(). The block layer it self handles generic ioct ls and invokes t he driver's ioctl() m et hod only t o handle device- specific com m ands. I n List ing 14.2, myblkdev_ioctl() im plem ent s t he GET_DEVICE_ID com m and t hat elicit s a device I D from t he cont roller. The com m and is issued via t he COMMAND_REGISTER, and t he I D dat a is obt ained from t he DATA_REGISTER. List in g 1 4 .2 . Block D e vice Ope r a t ion s

Code View: #define GET_DEVICE_ID 0xAA00

/* Ioctl command definition */

/* The ioctl operation */ static int myblkdev_ioctl (struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) { unsigned char status; switch (cmd) { case GET_DEVICE_ID: outb(GET_IDENTITY_CMD, COMMAND_REGISTER); /* Wait as long as the controller is busy */ while ((status = inb(STATUS_REGISTER)) & BUSY_STATUS); /* Obtain ID and return it to user space */ return put_user(inb(DATA_REGISTER), (long __user *)arg); default: return -EINVAL; } } /* Block device operations */ static struct block_device_operations myblkdev_fops = { .owner = THIS_MODULE, /* Owner of this structure */ .ioctl = myblkdev_ioctl, /* The following operations are not implemented for our example storage controller: open(), release(), unlocked_ioctl(), compat_ioctl(), direct_access(), getgeo(), revalidate_disk(), and media_changed() */ };

D isk Acce ss As m ent ioned previously, block drivers perform disk access operat ions using a request() m et hod. The block I / O subsyst em invokes a driver's request() m et hod whenever it desires t o process request s wait ing in t he driver's request_queue. The request()m et hod does not run in t he cont ext of t he user process request ing t he dat a t ransfer, however. The address of t he associat ed request_queue is passed as an argum ent t o t he request() m et hod. As you saw earlier, t he kernel holds a request lock before calling t he request() m et hod. This is t o prot ect t he associat ed request queue from concurrent access. Because of t his, if your request() m et hod has t o call any funct ions t hat m ay go t o sleep, it has t o drop t he lock before doing so and reacquire it before ret urning. List ing 14.3 cont ains our driver's request m et hod, myblkdev_request(). This funct ion uses t he services of elv_next_request() t o obt ain t he next request from t he request_queue. I f t he queue cont ains no m ore pending request s, elv_next_request() ret urns NULL. elv_next_request() is nam ed so because, as you learned previously, I / O scheduling algorit hm s are variat ions of t he basic m odus operandi adopt ed by elevat ors t o service request s. Aft er handling a request , t he driver asks t he block layer t o end I / O on t hat request by calling end_request(). You can specify success or an error code using t he second argum ent t o end_request(). Request s collect ed from t he request_queue cont ain t he st art ing sect or from which dat a access is request ed (req->sector in List ing 14.3, t he num ber of sect ors on which I / O needs t o be perform ed (req->nr_sectors) , t he buffer t hat cont ains t he dat a t o be t ransferred ( req->buffer) , and t he direct ion of dat a m ovem ent

(rq_data_dir(req)) . As shown in List ing 14.3, myblkdev_request() perform s t he required regist er program m ing wit h t he help of t hese param et ers. List in g 1 4 .3 . Th e Re qu e st Fu n ct ion Code View: #define READ_SECTOR_CMD #define WRITE_SECTOR_CMD #define GET_IDENTITY_CMD

1 2 3

#define BUSY_STATUS

0x10

#define #define #define #define #define

0x20000000 0x20000001 0x20000002 0x20000003 0x20000004

SECTOR_NUMBER_REGISTER SECTOR_COUNT_REGISTER COMMAND_REGISTER STATUS_REGISTER DATA_REGISTER

/* Request method */ static void myblkdev_request(struct request_queue *rq) { struct request *req; unsigned char status; int i, good = 0; /* Loop through the requests waiting in line */ while ((req = elv_next_request(rq)) != NULL) { /* Program the start sector and the number of sectors */ outb(req->sector, SECTOR_NUMBER_REGISTER); outb(req->nr_sectors, SECTOR_COUNT_REGISTER); /* We are interested only in filesystem requests. A SCSI command is another possible type of request. For the full list, look at the enumeration of rq_cmd_type_bits in include/linux/blkdev.h */ if (blk_fs_request(req)) { switch(rq_data_dir(req)) { case READ: /* Issue Read Sector Command */ outb(READ_SECTOR_CMD, COMMAND_REGISTER); /* Traverse all requested sectors, byte by byte */ for (i = 0; i < 512*req->nr_sectors; i++) { /* Wait until the disk is ready. Busy duration should be in the order of microseconds. Sitting in a tight loop for simplicity; more intelligence required in the real world */ while ((status = inb(STATUS_REGISTER)) & BUSY_STATUS); /* Read data from disk to the buffer associated with the request */ req->buffer[i] = inb(DATA_REGISTER); } good = 1; break; case WRITE: /* Issue Write Sector Command */ outb(WRITE_SECTOR_CMD, COMMAND_REGISTER);

/* Traverse all requested sectors, byte by byte */ for (i = 0; i < 512*req->nr_sectors; i++) { /* Wait until the disk is ready. Busy duration should be in the order of microseconds. Sitting in a tight loop for simplicity; more intelligence required in the real world */ while ((status = inb(STATUS_REGISTER)) & BUSY_STATUS); /* Write data to disk from the buffer associated with the request */ outb(req->buffer[i], DATA_REGISTER); } good = 1; break; } } end_request(req, good); } }

Adva n ce d Topics Unlike our sam ple st orage driver t hat t ransfers dat a byt e by byt e, perform ance- sensit ive block drivers rely on DMA for dat a t ransfer. Consider, for exam ple, t he request() m et hod of t he disk array driver for Com paq SMART2 cont rollers drivers/ block/ - cpqarray.c reproduced here from t he 2.6.23.1 kernel sources: Code View: static do_ida_request(struct request_queue *q) { struct request *creq; struct scatterlist tmp_sg[SG_MAX]; cmdlist_t *c; ctrl_info_t *h = q->queuedata; int seg; /* ... */ creq = elv_next_request(q); /* ... */ c->rq = creq; seg = blk_rq_map_sg(q, creq, tmp_sg); /* ... */ for (i=0; ireq.sg[i].size = tmp_sg[i].length; c->req.sg[i].addr = (__u32) pci_map_page(h->pci_dev, tmp_sg[i].page, tmp_sg[i].offset, tmp_sg[i].length, dir); } /* ... */ }

DMA operat ions work at bio level. As you saw earlier, I / O request s are m ade up of bios, each of which cont ains an array of bio_vecs, which in t urn hold inform at ion about t he const it uent m em ory pages. Assum ing t hat bio point s t o t he bio st ruct ure associat ed wit h an I / O request , bio->bi_sector cont ains t he st art ing sect or from which dat a access is request ed, bio_cur_sectors(bio) ret urns t he num ber of sect ors on which I / O is t o be perform ed, and bio_data_dir(bio) provides t he direct ion of dat a t ransfer. The addresses of t he physical pages associat ed wit h t he dat a buffer are described by t he array of bio_vecs point ed t o by bio->bi_io_vec. To it erat e over each bio in a request , you can use t he rq_for_each_bio()m acro. To furt her loop t hrough each page segm ent in a bio, use bio_for_each_segment(). I n t he preceding code snippet , blk_rq_map_sg() int ernally invokes rq_for_each_bio() and bio_for_each_segment()t o loop t hrough all pages const it ut ing t he request and builds a scat t er/ gat her list , tmp_sg. St ream ing DMA m appings for each page in t he creat ed scat t er/ gat her list is perform ed by pci_map_page(). Unlike our sam ple driver t hat busy- wait s for request ed operat ions t o finish, t he cpqarray driver im plem ent s an int errupt handler, do_ida_intr(), t o receive alert s from t he hardware upon com plet ion of com m ands. Som e drivers, such as t he ram disk driver (drivers/ block/ rd.c) and t he loopback driver (drivers/ block/ loop.c) ,

work over virt ual block devices t hat do not benefit from t he opt im izing sort and m erge operat ions on t he request queue. Such drivers ent irely bypass t he request queue and direct ly obt ain bios from t he block layer using a make_request() funct ion. So, inst ead of regist ering a request queue handler using blk_init_queue(), drivers/ block/ rd.c supplies a make_request() rout ine using blk_queue_make_request() as follows: static int __init rd_init(void) { /* ... */ blk_queue_make_request(rd_queue[i], &rd_make_request); /* ... */ } static int rd_make_request(struct request_queue *q, struct bio *bio) { /* ... */ }

D e bu ggin g The hdparm ut ilit y elicit s various PATA/ SATA disk param et ers from t he underlying kernel driver. To benchm ark disk read speeds on a SATA drive, for exam ple, do t his: bash> hdparm -T -t /dev/sda /dev/sda: Timing cached reads: 2564 MB in 2.00 seconds = 1283.57 MB/sec Timing buffered disk reads: 132 MB in 3.03 seconds = 43.61 MB/sec

For t he full capabilit ies of hdparm , read t he m an pages. Self- Monit oring, Analysis, and Report ing Technology ( SMART) is a syst em built in t o m any m odern ATA and SCSI disks t o m onit or failures and perform self- t est s. A user- space daem on nam ed sm ar t d collect s t he inform at ion gat hered by SMART- capable disks wit h t he help of t he underlying device driver. Look at t he m an pages of sm art d, sm art ct l, and sm art d.conf t o learn how t o obt ain healt h st at us from SMART- enabled disks. I f your dist ribut ion doesn't prepackage hdparm and SMART t ools, you m ay download t hem from ht t p: / / sourceforge.net / proj ect s/ hdparm / and ht t p: / / sourceforge.net / proj ect s/ sm art m ont ools/ , respect ively. Files under / pr oc/ ide/ cont ain inform at ion about I DE disk drives on your syst em . To obt ain t he geom et ry of t he first I DE disk, for exam ple, look at t he cont ent s of / proc/ ide/ ide0/ hda/ geom et ry . I nform at ion pert aining t o SCSI devices is available under / proc/ scsi/ . You can gat her disk part it ion inform at ion from / proc/ part it ions. The sysfs direct ory of int erest for I DE devices is / sys/ bus/ ide/ and for SCSI is / sys/ bus/ scsi/ . I n addit ion, each block device act ive on t he syst em owns a subdirect ory under / sys/ block/ , which cont ains associat ed request queue param et ers, const it uent part it ion det ails, and st at e inform at ion. Som e kernel configurat ion opt ions are available t hat t rigger t he em ission of debug out put from t he block subsyst em . CONFIG_BLK_DEV_IO_TRACE provides t he abilit y t o t race t he block layer. CONFIG_SCSI_CONSTANTS and CONFIG_SCSI_LOGGING t urn on SCSI error report ing and logging, respect ively. The linux- ide m ailing list is t he forum t o discuss quest ions relat ed t o t he Linux- I DE subsyst em . Subscribe t o t he linux- scsi m ailing list and browse t hrough it s archives for discussions pert aining t o t he Linux- SCSI subsyst em .

Look in g a t t h e Sou r ce s Table 14.1 cont ains t he locat ion of kernel driver sources for various st orage t echnologies. Take a look at Docum ent at ion/ ide.t xt , Docum ent at ion/ scsi/ * , and Docum ent at ion/ cdrom / for inform at ion about associat ed st orage drivers. The t op- level block/ direct ory cont ains I / O scheduling algorit hm s and t he block core layer. Table 14.2 list s t he source files in t his direct ory t hat im plem ent various I / O schedulers. Look at Docum ent at ion/ block/ for relat ed docum ent at ion. Table 14.4 cont ains t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he source t ree. Table 14.5 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er, along wit h t he locat ion of t heir definit ions.

Ta ble 1 4 .4 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

gendisk

include/ linux/ genhd.h

Represent at ion of a disk.

request_queue

include/ linux/ blkdev.h

The I / O request queue associat ed wit h a gendisk.

request

include/ linux/ blkdev.h

Each request in a request_queue is described using t his st ruct ure.

block_device_operations

include/ linux/ fs.h

Block device driver m et hods.

bio

include/ linux/ bio.h

Low- level descript ion of block I / O operat ions.

Ta ble 1 4 .5 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

register_blkdev()

block/ genhd.c

Regist ers a block driver wit h t he kernel

unregister_blkdev()

block/ genhd.c

Unregist ers a block driver from t he kernel

alloc_disk()

block/ genhd.c

Allocat es a gendisk

add_disk()

block/ genhd.c

Adds a populat ed gendisk t o t he kernel block layer

del_gendisk()

fs/ part it ions/ check.c

Frees a gendisk

blk_init_queue()

block/ ll_rw_blk.c

Allocat es a request_queue and regist ers a request() funct ion t o process t he request s in t he queue

blk_cleanup_queue()

block/ ll_rw_blk.c

Reverse of blk_init_queue()

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

blk_queue_make_request()

block/ ll_rw_blk.c

Regist ers a make_request() funct ion, which bypasses t he request queue and direct ly obt ains request s from t he block layer

rq_for_each_bio()

include/ linux/ blkdev.h

I t erat es over each bio in a r equest

bio_for_each_segment()

include/ linux/ bio.h

Loops t hrough each page segm ent in a bio

blk_rq_map_sg()

block/ ll_rw_blk.c

I t erat es t hrough t he bio segm ent s const it ut ing a request and builds a scat t er/ gat her list

blk_queue_max_sectors()

block/ ll_rw_blk.c

Set s t he m axim um sect ors for a request in t he associat ed request queue

blk_queue_hardsect_size()

block/ ll_rw_blk.c

Sect or size support ed by t he st orage hardware.

set_capacity()

include/ linux/ genhd.h

Set s t he capacit y of t he st orage m edia in t erm s of num ber of sect ors

blk_fs_request()

include/ linux/ blkdev.h

Checks whet her a request obt ained from t he request queue is a filesyst em request

elv_next_request()

block/ elevat or.c

Obt ains t he next ent ry from t he request queue

end_request()

block/ ll_rw_blk.c

Ends I / O on a request

Ch a pt e r 1 5 . N e t w or k I n t e r fa ce Ca r ds I n Th is Ch a pt e r 440 Driver Dat a St ruct ures 448 Talking wit h Prot ocol Layers 450 Buffer Managem ent and Concurrency Cont rol 451 Device Exam ple: Et hernet NI C 457 I SA Net work Drivers 458 Asynchronous Transfer Mode 459 Net work Throughput 461 Looking at t he Sources

Connect ivit y im part s int elligence. You rarely com e across a com put er syst em t oday t hat does not support som e form of net working. I n t his chapt er, let 's focus on device drivers for net work int erface cards ( NI Cs) t hat carry I nt ernet Prot ocol ( I P) t raffic on a local area net work ( LAN) . Most of t he chapt er is bus agnost ic, but wherever bus specifics are necessary, it assum es PCI . To give you a flavor of ot her net work t echnologies, we also t ouch on Asynchronous Transfer Mode ( ATM) . We end t he chapt er by pondering on perform ance and t hroughput . NI C drivers are different from ot her driver classes in t hat t hey do not rely on / dev or / sys t o com m unicat e wit h user space. Rat her, applicat ions int eract wit h a NI C driver via a net work int erface ( for exam ple, et h0 for t he first Et hernet int erface) t hat abst ract s an underlying prot ocol st ack.

D r ive r D a t a St r u ct u r e s When you writ e a device driver for a NI C, you have t o operat e on t hree classes of dat a st ruct ures:

1 . St ruct ures t hat form t he building blocks of t he net work prot ocol st ack. The socket buffer or struct sk_buff defined in include/ linux/ sk_buff.h is t he key st ruct ure used by t he kernel's TCP/ I P st ack.

2 . St ruct ures t hat define t he int erface bet ween t he NI C driver and t he prot ocol st ack. struct net_device defined in include/ linux/ net device.h is t he core st ruct ure t hat const it ut es t his int erface.

3 . St ruct ures relat ed t o t he I / O bus. PCI and it s derivat ives are com m on buses used by t oday's NI Cs.

We t ake a det ailed look at socket buffers and t he net_device int erface in t he next t wo sect ions. We covered PCI dat a st ruct ures in Chapt er 10, " Peripheral Com ponent I nt erconnect ," so we won't revisit t hem here.

Sock e t Bu ffe r s sk_buffs provide efficient buffer handling and flow- cont rol m echanism s t o Linux net working layers. Like DMA descript ors t hat cont ain m et adat a on DMA buffers, sk_buffs hold cont rol inform at ion describing at t ached m em ory buffers t hat carry net work packet s ( see Figure 15.1) . sk_buffs are enorm ous st ruct ures having dozens of elem ent s, but in t his chapt er we confine ourselves t o t hose t hat int erest t he net work device driver writ er. An sk_buff links it self t o it s associat ed packet buffer using five m ain fields:

head, which point s t o t he st art of t he packet

data, which point s t o t he st art of packet payload

tail, which point s t o t he end of packet payload

end, which point s t o t he end of t he packet

len, t he am ount of dat a t hat t he packet cont ains

Figu r e 1 5 .1 . sk_buff ope r a t ion s.

Assum e skb point s t o an sk_buff, skb->head, skb->data, skb->tail, and skb->end slide over t he associat ed packet buffer as t he packet t raverses t he prot ocol st ack in eit her direct ion. skb->data, for exam ple, point s t o t he header of t he prot ocol t hat is current ly processing t he packet . When a packet reaches t he I P layer via t he receive pat h, skb->data point s t o t he I P header; when t he packet passes on t o TCP, however, skb->data m oves t o t he st art of t he TCP header. And as t he packet drives t hrough various prot ocols adding or discarding header dat a, skb->len get s updat ed, t oo. sk_buffs also cont ain point ers ot her t han t he four m aj or ones previously m ent ioned. skb->nh, for exam ple, rem em bers t he posit ion of t he net work prot ocol header irrespect ive of t he current posit ion of skb->data. To illust rat e how a NI C driver works wit h sk_buffs, Figure 15.1 shows dat a t ransit ions on t he receive dat a pat h. For convenience of illust rat ion, t he figure sim plist ically assum es t hat t he operat ions shown are execut ed in sequence. However, for operat ional efficiency in t he real world, t he first t wo st eps (dev_alloc_skb() and

skb_reserve()) are perform ed while init ially preallocat ing a ring of receive buffers; t he t hird st ep is accom plished by t he NI C hardware as it direct ly DMA's t he received packet int o a preallocat ed sk_buff; and t he final t wo st eps (skb_put() and netif_rx()) are execut ed from t he receive int errupt handler. To creat e an sk_buff t o hold a received packet , Figure 15.1 uses dev_alloc_skb(). This is an int errupt - safe rout ine t hat allocat es m em ory for an sk_buff and associat es it wit h a packet payload buffer. dev_kfree_skb() accom plishes t he reverse of dev_alloc_skb(). Figure 15.1 next calls skb_reserve() t o add a 2- byt e padding bet ween t he st art of t he packet buffer and t he beginning of t he payload. This st art s t he I P header at a perform ance- friendly 16- byt e boundary because t he preceding Et hernet headers are 14 byt es long. The rest of t he code st at em ent s in Figure 15.1 fill t he payload buffer wit h t he received packet and m ove skb->data, skb>tail, and skb->len t o reflect t his operat ion. There are m ore sk_buff access rout ines relevant t o som e NI C drivers. skb_clone(), for exam ple, creat es a copy of a supplied skb_buff wit hout copying t he cont ent s of t he associat ed packet buffer. Look inside net / core/ skbuff.c for t he full list of sk_buff library funct ions.

Th e N e t D e vice I n t e r fa ce NI C drivers use a st andard int erface t o int eract wit h t he TCP/ I P st ack. The net_device st ruct ure, which is even m ore gigant ic t han t he sk_buff st ruct ure, defines t his com m unicat ion int erface. To prepare ourselves for exploring t he innards of t he net_device st ruct ure, let 's first follow t he st eps t raced by a NI C driver during init ializat ion. Refer t o init_mycard() in List ing 15.1 as we m ove along:

The driver allocat es a net_device st ruct ure using alloc_netdev(). More com m only, it uses a suit able wrapper around alloc_netdev(). An Et hernet NI C driver, for exam ple, calls alloc_etherdev(). A WiFi driver ( discussed in t he next chapt er) invokes alloc_ieee80211(), and an I rDa driver calls upon alloc_irdadev(). All t hese funct ions t ake t he size of a privat e dat a area as argum ent and creat e t his area in addit ion t o t he net_device it self: struct net_device *netdev; struct priv_struct *mycard_priv; netdev = alloc_etherdev(sizeof(struct priv_struct)); mycard_priv = netdev->priv; /* Private area created by alloc_etherdev() */ Next , t he driver populat es various fields in t he net_device t hat it allocat ed and regist ers t he populat ed net_device wit h t he net work layer using register_netdev(netdev). The driver reads t he NI C's Media Access Cont rol ( MAC) address from an accom panying EEPROM and configures Wake- On- LAN ( WOL) if required. Et hernet cont rollers usually have a com panion nonvolat ile EEPROM t o hold inform at ion such as t heir MAC address and WOL pat t ern, as shown in Figure 15.2. The form er is a unique 48- bit address t hat is globally assigned. The lat t er is a m agic sequence; if found in received dat a, it rouses t he NI C if it 's in suspend m ode. I f t he NI C needs on- card firm ware t o operat e, t he driver downloads it using request_firmware(), as discussed in t he sect ion "Microcode Download" in Chapt er 4 , " Laying t he Groundwork."

Let 's now look at t he m et hods t hat define t he net_device int erface. We cat egorize t hem under six heads for sim plicit y. Wherever relevant , t his sect ion point s you t o t he exam ple NI C driver developed in List ing 15.1 of t he sect ion "Device Exam ple: Et hernet NI C."

Act iva t ion

The net_device int erface requires convent ional m et hods such as open(), close(), and ioctl(). The kernel opens an int erface when you act ivat e it using a t ool such as ifconfig: bash> ifconfig eth0 up

open() set s up receive and t ransm it DMA descript ors and ot her driver dat a st ruct ures. I t also regist ers t he NI C's int errupt handler by calling request_irq(). The net_device st ruct ure is passed as t he devid argum ent t o request_irq() so t hat t he int errupt handler get s direct access t o t he associat ed net_device. ( See mycard_open() and mycard_interrupt() in List ing 15.1 t o find out how t his is done.) The kernel calls close() when you pull down an act ive net work int erface. This accom plishes t he reverse of open().

D a t a Tr a n sfe r Dat a t ransfer m et hods form t he crux of t he net_device int erface. I n t he t ransm it pat h, t he driver supplies a m et hod called hard_start_xmit, which t he prot ocol layer invokes t o pass packet s down for onward t ransm ission: Code View: netdev->hard_start_xmit = &mycard_xmit_frame; /* Transmit Method. See Listing 15.1 */

Unt il recent ly, net work drivers didn't provide a net_device m et hod for collect ing received dat a. I nst ead, t hey asynchronously int errupt ed t he prot ocol layer wit h packet payload. This old int erface has, however, given way t o a New API ( NAPI ) t hat is a m ixt ure of an int errupt - driven driver push and a poll- driver prot ocol pull. A NAPI aware driver t hus needs t o supply a poll() m et hod and an associat ed weight t hat cont rols polling fairness: netdev->poll = &mycard_poll; /* Poll Method. See Listing 15.1 */ netdev->weight = 64;

We elaborat e on dat a- t ransfer m et hods in t he sect ion "Talking wit h Prot ocol Layers."

W a t ch dog The net_device int erface provides a hook t o ret urn an unresponsive NI C t o operat ional st at e. I f t he prot ocol layer senses no t ransm issions for a predet erm ined am ount of t im e, it assum es t hat t he NI C has hung and invokes a driver- supplied recovery m et hod t o reset t he card. The driver set s t he wat chdog t im eout t hrough netdev->watchdog_timeo and regist ers t he address of t he recovery funct ion via netdev->tx_timeout: netdev->tx_timeout = &mycard_timeout; /* Method to reset the NIC */ netdev->watchdog_timeo = 8*HZ; /* Reset if no response detected for 8 seconds */

Because t he recovery m et hod execut es in t im er- int errupt cont ext , it usually schedules a t ask out side of t hat cont ext t o reset t he NI C.

St a t ist ics To enable user land t o collect net work st at ist ics, t he NI C driver populat es a net_device_stats st ruct ure and

provides a get_stats() m et hod t o ret rieve it . Essent ially t he driver does t he following:

1 . Updat es different t ypes of st at ist ics from relevant ent ry point s: #include struct net_device_stats mycard_stats; static irqreturn_t mycard_interrupt(int irq, void *dev_id) { /* ... */ if (packet_received_without_errors) { mycard_stats.rx_packets++; /* One more received packet */ } /* ... */ } 2 . I m plem ent s t he get_stats() m et hod t o ret rieve t he st at ist ics: static struct net_device_stats *mycard_get_stats(struct net_device *netdev) { /* House keeping */ /* ... */ return(&mycard_stats); } 3 . Supplies t he ret rieve m et hod t o higher layers: netdev->get_stats = &mycard_get_stats; /* ... */ register_netdev(netdev); To collect st at ist ics from your NI C, t rigger invocat ion of mycard_get_stats() by execut ing an appropriat e user m ode com m and. For exam ple, t o find t he num ber of packet s received t hrough t he et h0 int erface, do t his: bash> cat /sys/class/net/eth0/statistics/rx_packets 124664

WiFi drivers need t o t rack several param et ers not relevant t o convent ional NI Cs, so t hey im plem ent a st at ist ic collect ion m et hod called get_wireless_stats() in addit ion t o get_stats(). The m echanism for regist ering get_wireless_stats() for t he benefit of WiFi- aware user space ut ilit ies is discussed in t he sect ion " WiFi" in t he next chapt er.

Con figu r a t ion NI C drivers need t o support user space t ools t hat are responsible for set t ing and get t ing device param et ers. Et ht ool configures param et ers for Et hernet NI Cs. To support et ht ool, t he underlying NI C driver does t he following:

1 . Populat es an ethtool_ops st ruct ure, defined in include/ linux/ et ht ool.h wit h prescribed ent ry point s: #include /* Ethtool_ops methods */ struct ethtool_ops mycard_ethtool_ops = { /* ... */ .get_eeprom = mycard_get_eeprom, /* Dump EEPROM contents */ /* ... */ }; 2 . I m plem ent s t he m et hods t hat are part of ethtool_ops: static int mycard_get_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom, uint8_t *bytes) { /* Access the accompanying EEPROM and pull out data */ /* ... */ } 3 . Export s t he address of it s ethtool_ops: netdev->ethtool_ops = &mycard_ethtool_ops; /* ... */ register_netdev(netdev); Aft er t hese are done, et ht ool can operat e over your Et hernet NI C. To dum p EEPROM cont ent s using et ht ool, do t his: bash> ethtool -e eth0 Offset Values ----------0x0000 00 0d 60 79 32 0a 00 0b ff ff 10 20 ff ff ff ff ...

Et ht ool com es packaged wit h som e dist ribut ions; but if you don't have it , download it from ht t p: / / sourceforge.net / proj ect s/ gkernel/ . Refer t o t he m an page for it s full capabilit ies. There are m ore configurat ion- relat ed m et hods t hat a NI C driver provides t o higher layers. An exam ple is t he m et hod t o change t he MTU size of t he net work int erface. To support t his, supply t he relevant m et hod t o net_device: netdev->change_mtu = &mycard_change_mtu; /* ... */ register_netdev(netdev);

The kernel invokes mycard_change_mtu() when you execut e a suit able user com m and t o alt er t he MTU of your card: bash> echo 1500 > /sys/class/net/eth0/mtu

Bu s Spe cific Next com e bus- specific det ails such as t he st art address and size of t he NI C's on- card m em ory. For a PCI NI C driver, t his configurat ion will look like t his: netdev->mem_start = pci_resource_start(pdev, 0); netdev->mem_end = netdev->mem_start + pci_resource_len(pdev, 0);

We discussed PCI resource funct ions in Chapt er 10.

Ch a pt e r 1 5 . N e t w or k I n t e r fa ce Ca r ds I n Th is Ch a pt e r 440 Driver Dat a St ruct ures 448 Talking wit h Prot ocol Layers 450 Buffer Managem ent and Concurrency Cont rol 451 Device Exam ple: Et hernet NI C 457 I SA Net work Drivers 458 Asynchronous Transfer Mode 459 Net work Throughput 461 Looking at t he Sources

Connect ivit y im part s int elligence. You rarely com e across a com put er syst em t oday t hat does not support som e form of net working. I n t his chapt er, let 's focus on device drivers for net work int erface cards ( NI Cs) t hat carry I nt ernet Prot ocol ( I P) t raffic on a local area net work ( LAN) . Most of t he chapt er is bus agnost ic, but wherever bus specifics are necessary, it assum es PCI . To give you a flavor of ot her net work t echnologies, we also t ouch on Asynchronous Transfer Mode ( ATM) . We end t he chapt er by pondering on perform ance and t hroughput . NI C drivers are different from ot her driver classes in t hat t hey do not rely on / dev or / sys t o com m unicat e wit h user space. Rat her, applicat ions int eract wit h a NI C driver via a net work int erface ( for exam ple, et h0 for t he first Et hernet int erface) t hat abst ract s an underlying prot ocol st ack.

D r ive r D a t a St r u ct u r e s When you writ e a device driver for a NI C, you have t o operat e on t hree classes of dat a st ruct ures:

1 . St ruct ures t hat form t he building blocks of t he net work prot ocol st ack. The socket buffer or struct sk_buff defined in include/ linux/ sk_buff.h is t he key st ruct ure used by t he kernel's TCP/ I P st ack.

2 . St ruct ures t hat define t he int erface bet ween t he NI C driver and t he prot ocol st ack. struct net_device defined in include/ linux/ net device.h is t he core st ruct ure t hat const it ut es t his int erface.

3 . St ruct ures relat ed t o t he I / O bus. PCI and it s derivat ives are com m on buses used by t oday's NI Cs.

We t ake a det ailed look at socket buffers and t he net_device int erface in t he next t wo sect ions. We covered PCI dat a st ruct ures in Chapt er 10, " Peripheral Com ponent I nt erconnect ," so we won't revisit t hem here.

Sock e t Bu ffe r s sk_buffs provide efficient buffer handling and flow- cont rol m echanism s t o Linux net working layers. Like DMA descript ors t hat cont ain m et adat a on DMA buffers, sk_buffs hold cont rol inform at ion describing at t ached m em ory buffers t hat carry net work packet s ( see Figure 15.1) . sk_buffs are enorm ous st ruct ures having dozens of elem ent s, but in t his chapt er we confine ourselves t o t hose t hat int erest t he net work device driver writ er. An sk_buff links it self t o it s associat ed packet buffer using five m ain fields:

head, which point s t o t he st art of t he packet

data, which point s t o t he st art of packet payload

tail, which point s t o t he end of packet payload

end, which point s t o t he end of t he packet

len, t he am ount of dat a t hat t he packet cont ains

Figu r e 1 5 .1 . sk_buff ope r a t ion s.

Assum e skb point s t o an sk_buff, skb->head, skb->data, skb->tail, and skb->end slide over t he associat ed packet buffer as t he packet t raverses t he prot ocol st ack in eit her direct ion. skb->data, for exam ple, point s t o t he header of t he prot ocol t hat is current ly processing t he packet . When a packet reaches t he I P layer via t he receive pat h, skb->data point s t o t he I P header; when t he packet passes on t o TCP, however, skb->data m oves t o t he st art of t he TCP header. And as t he packet drives t hrough various prot ocols adding or discarding header dat a, skb->len get s updat ed, t oo. sk_buffs also cont ain point ers ot her t han t he four m aj or ones previously m ent ioned. skb->nh, for exam ple, rem em bers t he posit ion of t he net work prot ocol header irrespect ive of t he current posit ion of skb->data. To illust rat e how a NI C driver works wit h sk_buffs, Figure 15.1 shows dat a t ransit ions on t he receive dat a pat h. For convenience of illust rat ion, t he figure sim plist ically assum es t hat t he operat ions shown are execut ed in sequence. However, for operat ional efficiency in t he real world, t he first t wo st eps (dev_alloc_skb() and

skb_reserve()) are perform ed while init ially preallocat ing a ring of receive buffers; t he t hird st ep is accom plished by t he NI C hardware as it direct ly DMA's t he received packet int o a preallocat ed sk_buff; and t he final t wo st eps (skb_put() and netif_rx()) are execut ed from t he receive int errupt handler. To creat e an sk_buff t o hold a received packet , Figure 15.1 uses dev_alloc_skb(). This is an int errupt - safe rout ine t hat allocat es m em ory for an sk_buff and associat es it wit h a packet payload buffer. dev_kfree_skb() accom plishes t he reverse of dev_alloc_skb(). Figure 15.1 next calls skb_reserve() t o add a 2- byt e padding bet ween t he st art of t he packet buffer and t he beginning of t he payload. This st art s t he I P header at a perform ance- friendly 16- byt e boundary because t he preceding Et hernet headers are 14 byt es long. The rest of t he code st at em ent s in Figure 15.1 fill t he payload buffer wit h t he received packet and m ove skb->data, skb>tail, and skb->len t o reflect t his operat ion. There are m ore sk_buff access rout ines relevant t o som e NI C drivers. skb_clone(), for exam ple, creat es a copy of a supplied skb_buff wit hout copying t he cont ent s of t he associat ed packet buffer. Look inside net / core/ skbuff.c for t he full list of sk_buff library funct ions.

Th e N e t D e vice I n t e r fa ce NI C drivers use a st andard int erface t o int eract wit h t he TCP/ I P st ack. The net_device st ruct ure, which is even m ore gigant ic t han t he sk_buff st ruct ure, defines t his com m unicat ion int erface. To prepare ourselves for exploring t he innards of t he net_device st ruct ure, let 's first follow t he st eps t raced by a NI C driver during init ializat ion. Refer t o init_mycard() in List ing 15.1 as we m ove along:

The driver allocat es a net_device st ruct ure using alloc_netdev(). More com m only, it uses a suit able wrapper around alloc_netdev(). An Et hernet NI C driver, for exam ple, calls alloc_etherdev(). A WiFi driver ( discussed in t he next chapt er) invokes alloc_ieee80211(), and an I rDa driver calls upon alloc_irdadev(). All t hese funct ions t ake t he size of a privat e dat a area as argum ent and creat e t his area in addit ion t o t he net_device it self: struct net_device *netdev; struct priv_struct *mycard_priv; netdev = alloc_etherdev(sizeof(struct priv_struct)); mycard_priv = netdev->priv; /* Private area created by alloc_etherdev() */ Next , t he driver populat es various fields in t he net_device t hat it allocat ed and regist ers t he populat ed net_device wit h t he net work layer using register_netdev(netdev). The driver reads t he NI C's Media Access Cont rol ( MAC) address from an accom panying EEPROM and configures Wake- On- LAN ( WOL) if required. Et hernet cont rollers usually have a com panion nonvolat ile EEPROM t o hold inform at ion such as t heir MAC address and WOL pat t ern, as shown in Figure 15.2. The form er is a unique 48- bit address t hat is globally assigned. The lat t er is a m agic sequence; if found in received dat a, it rouses t he NI C if it 's in suspend m ode. I f t he NI C needs on- card firm ware t o operat e, t he driver downloads it using request_firmware(), as discussed in t he sect ion "Microcode Download" in Chapt er 4 , " Laying t he Groundwork."

Let 's now look at t he m et hods t hat define t he net_device int erface. We cat egorize t hem under six heads for sim plicit y. Wherever relevant , t his sect ion point s you t o t he exam ple NI C driver developed in List ing 15.1 of t he sect ion "Device Exam ple: Et hernet NI C."

Act iva t ion

The net_device int erface requires convent ional m et hods such as open(), close(), and ioctl(). The kernel opens an int erface when you act ivat e it using a t ool such as ifconfig: bash> ifconfig eth0 up

open() set s up receive and t ransm it DMA descript ors and ot her driver dat a st ruct ures. I t also regist ers t he NI C's int errupt handler by calling request_irq(). The net_device st ruct ure is passed as t he devid argum ent t o request_irq() so t hat t he int errupt handler get s direct access t o t he associat ed net_device. ( See mycard_open() and mycard_interrupt() in List ing 15.1 t o find out how t his is done.) The kernel calls close() when you pull down an act ive net work int erface. This accom plishes t he reverse of open().

D a t a Tr a n sfe r Dat a t ransfer m et hods form t he crux of t he net_device int erface. I n t he t ransm it pat h, t he driver supplies a m et hod called hard_start_xmit, which t he prot ocol layer invokes t o pass packet s down for onward t ransm ission: Code View: netdev->hard_start_xmit = &mycard_xmit_frame; /* Transmit Method. See Listing 15.1 */

Unt il recent ly, net work drivers didn't provide a net_device m et hod for collect ing received dat a. I nst ead, t hey asynchronously int errupt ed t he prot ocol layer wit h packet payload. This old int erface has, however, given way t o a New API ( NAPI ) t hat is a m ixt ure of an int errupt - driven driver push and a poll- driver prot ocol pull. A NAPI aware driver t hus needs t o supply a poll() m et hod and an associat ed weight t hat cont rols polling fairness: netdev->poll = &mycard_poll; /* Poll Method. See Listing 15.1 */ netdev->weight = 64;

We elaborat e on dat a- t ransfer m et hods in t he sect ion "Talking wit h Prot ocol Layers."

W a t ch dog The net_device int erface provides a hook t o ret urn an unresponsive NI C t o operat ional st at e. I f t he prot ocol layer senses no t ransm issions for a predet erm ined am ount of t im e, it assum es t hat t he NI C has hung and invokes a driver- supplied recovery m et hod t o reset t he card. The driver set s t he wat chdog t im eout t hrough netdev->watchdog_timeo and regist ers t he address of t he recovery funct ion via netdev->tx_timeout: netdev->tx_timeout = &mycard_timeout; /* Method to reset the NIC */ netdev->watchdog_timeo = 8*HZ; /* Reset if no response detected for 8 seconds */

Because t he recovery m et hod execut es in t im er- int errupt cont ext , it usually schedules a t ask out side of t hat cont ext t o reset t he NI C.

St a t ist ics To enable user land t o collect net work st at ist ics, t he NI C driver populat es a net_device_stats st ruct ure and

provides a get_stats() m et hod t o ret rieve it . Essent ially t he driver does t he following:

1 . Updat es different t ypes of st at ist ics from relevant ent ry point s: #include struct net_device_stats mycard_stats; static irqreturn_t mycard_interrupt(int irq, void *dev_id) { /* ... */ if (packet_received_without_errors) { mycard_stats.rx_packets++; /* One more received packet */ } /* ... */ } 2 . I m plem ent s t he get_stats() m et hod t o ret rieve t he st at ist ics: static struct net_device_stats *mycard_get_stats(struct net_device *netdev) { /* House keeping */ /* ... */ return(&mycard_stats); } 3 . Supplies t he ret rieve m et hod t o higher layers: netdev->get_stats = &mycard_get_stats; /* ... */ register_netdev(netdev); To collect st at ist ics from your NI C, t rigger invocat ion of mycard_get_stats() by execut ing an appropriat e user m ode com m and. For exam ple, t o find t he num ber of packet s received t hrough t he et h0 int erface, do t his: bash> cat /sys/class/net/eth0/statistics/rx_packets 124664

WiFi drivers need t o t rack several param et ers not relevant t o convent ional NI Cs, so t hey im plem ent a st at ist ic collect ion m et hod called get_wireless_stats() in addit ion t o get_stats(). The m echanism for regist ering get_wireless_stats() for t he benefit of WiFi- aware user space ut ilit ies is discussed in t he sect ion " WiFi" in t he next chapt er.

Con figu r a t ion NI C drivers need t o support user space t ools t hat are responsible for set t ing and get t ing device param et ers. Et ht ool configures param et ers for Et hernet NI Cs. To support et ht ool, t he underlying NI C driver does t he following:

1 . Populat es an ethtool_ops st ruct ure, defined in include/ linux/ et ht ool.h wit h prescribed ent ry point s: #include /* Ethtool_ops methods */ struct ethtool_ops mycard_ethtool_ops = { /* ... */ .get_eeprom = mycard_get_eeprom, /* Dump EEPROM contents */ /* ... */ }; 2 . I m plem ent s t he m et hods t hat are part of ethtool_ops: static int mycard_get_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom, uint8_t *bytes) { /* Access the accompanying EEPROM and pull out data */ /* ... */ } 3 . Export s t he address of it s ethtool_ops: netdev->ethtool_ops = &mycard_ethtool_ops; /* ... */ register_netdev(netdev); Aft er t hese are done, et ht ool can operat e over your Et hernet NI C. To dum p EEPROM cont ent s using et ht ool, do t his: bash> ethtool -e eth0 Offset Values ----------0x0000 00 0d 60 79 32 0a 00 0b ff ff 10 20 ff ff ff ff ...

Et ht ool com es packaged wit h som e dist ribut ions; but if you don't have it , download it from ht t p: / / sourceforge.net / proj ect s/ gkernel/ . Refer t o t he m an page for it s full capabilit ies. There are m ore configurat ion- relat ed m et hods t hat a NI C driver provides t o higher layers. An exam ple is t he m et hod t o change t he MTU size of t he net work int erface. To support t his, supply t he relevant m et hod t o net_device: netdev->change_mtu = &mycard_change_mtu; /* ... */ register_netdev(netdev);

The kernel invokes mycard_change_mtu() when you execut e a suit able user com m and t o alt er t he MTU of your card: bash> echo 1500 > /sys/class/net/eth0/mtu

Bu s Spe cific Next com e bus- specific det ails such as t he st art address and size of t he NI C's on- card m em ory. For a PCI NI C driver, t his configurat ion will look like t his: netdev->mem_start = pci_resource_start(pdev, 0); netdev->mem_end = netdev->mem_start + pci_resource_len(pdev, 0);

We discussed PCI resource funct ions in Chapt er 10.

Ta lk in g w it h Pr ot ocol La ye r s I n t he preceding sect ion, you discovered t he driver m et hods dem anded by t he net_device int erface. Let 's now t ake a closer look at how net work dat a flows over t his int erface.

Re ce ive Pa t h You learned in Chapt er 4 t hat soft irqs are bot t om half m echanism s used by perform ance- sensit ive subsyst em s. NI C drivers use NET_RX_SOFTIRQ t o offload t he work of post ing received dat a packet s t o prot ocol layers. The driver achieves t his by calling netif_rx() from it s receive int errupt handler: netif_rx(skb);

/* struct sk_buff *skb */

NAPI , alluded t o earlier, im proves t his convent ional int errupt - driven receive algorit hm t o lower dem ands on CPU ut ilizat ion. When net work load is heavy, t he syst em m ight get bogged down by t he large num ber of int errupt s t hat it t akes. NAPI 's st rat egy is t o use a polled m ode when net work act ivit y is heavy but fall back t o int errupt m ode when t he t raffic get s light . NAPI - aware drivers swit ch bet ween int errupt and polled m odes based on net work load. This is done as follows:

1 . I n int errupt m ode, t he int errupt handler post s received packet s t o prot ocol layers by scheduling NET_RX_SOFTIRQ. I t t hen disables NI C int errupt s and swit ches t o polled m ode by adding t he device t o a poll list : if (netif_rx_schedule_prep(netdev)) /* Housekeeping */ { /* Disable NIC interrupt */ disable_nic_interrupt(); /* Post the packet to the protocol layer and add the device to the poll list */ __netif_rx_schedule(netdev); } 2 . The driver provides a poll() m et hod via it s net_device st ruct ure.

3 . I n t he polled m ode, t he driver's poll() m et hod processes packet s in t he ingress queue. When t he queue becom es em pt y, t he driver re- enables int errupt s and swit ches back t o int errupt m ode by calling netif_rx_complete().

Look at mycard_interrupt(), init_mycard(), and mycard_poll() in List ing 15.1 t o see NAPI in act ion.

Tr a n sm it Pa t h For dat a t ransm ission, t he int eract ion bet ween prot ocol layers and t he NI C driver is st raight forward. The prot ocol st ack invokes t he driver's hard_start_xmit() m et hod wit h t he out going sk_buff as argum ent . The driver get s t he packet out of t he door by DMA- ing packet dat a t o t he NI C. DMA and t he m anagem ent of relat ed dat a st ruct ures for PCI NI C drivers were discussed in Chapt er 10. The driver program s t he NI C t o int errupt t he processor aft er it finishes t ransm it t ing a predet erm ined num ber of packet s. Only when a t ransm it - com plet e int errupt occurs signaling com plet ion of a t ransm it operat ion can t he

driver reclaim or free resources such as DMA descript ors, DMA buffers, and sk_buffs associat ed wit h t he t ransm it t ed packet .

Flow Con t r ol The driver conveys it s readiness or reluct ance t o accept prot ocol dat a by, respect ively, calling netif_start_queue() and netif_stop_queue(). During device open(), t he NI C driver calls netif_start_queue() t o ask t he prot ocol layer t o st art adding t ransm it packet s t o t he egress queue. During norm al operat ion, however, t he driver m ight require egress queuing t o st op on occasion. Exam ples include t he t im e window when t he driver is replenishing dat a st ruct ures, or when it 's closing t he device. Throt t ling t he downst ream flow is accom plished by calling netif_stop_queue(). To request t he net working st ack t o rest art egress queuing, say when t here are sufficient free buffers, t he NI C driver invokes netif_wake_queue(). To check t he current flow- cont rol st at e, t oss a call t o netif_queue_stopped().

Bu ffe r M a n a ge m e n t a n d Con cu r r e n cy Con t r ol A high- perform ance NI C driver is a com plex piece of soft ware requiring int ricat e dat a st ruct ure m anagem ent . As discussed in t he sect ion "Dat a Transfer " in Chapt er 10, a NI C driver m aint ains linked list s ( or " rings" ) of t ransm it and receive DMA descript ors, and im plem ent s free and in- use pools for buffer m anagem ent . The driver t ypically im plem ent s a m ult ipronged st rat egy t o m aint ain buffer levels: preallocat e a ring of DMA descript ors and associat ed sk_buffs during device open, replenish free pools by allocat ing new m em ory if available buffers dip below a predet erm ined wat erm ark, and reclaim used buffers int o t he free pool when t he NI C generat es t ransm it - com plet e and receive int errupt s. Each elem ent in t he NI C driver's receive ring, for exam ple, is populat ed as follows: /* Allocate an sk_buff and the associated data buffer. See Figure 15.1 */ skb = dev_alloc_skb(MAX_NIC_PACKET_SIZE); /* Align the data pointer */ skb_reserve(skb, NET_IP_ALIGN); /* DMA map for NIC access. The following invocation assumes a PCI NIC. pdev is a pointer to the associated pci_dev structure */ pci_map_single(pdev, skb->data, MAX_NIC_PACKET_SIZE, PCI_DMA_FROMDEVICE); /* Create a descriptor containing this sk_buff and add it to the RX ring */ /* ... */

During recept ion, t he NI C direct ly DMA's dat a t o an sk_buff in t he preceding preallocat ed ring and int errupt s t he processor. The receive int errupt handler, in t urn, passes t he packet t o higher prot ocol layers. Developing ring dat a st ruct ures will m ake t his discussion as well as t he exam ple driver in t he next sect ion loaded, so refer t o t he sources of t he I nt el PRO/ 1000 driver in t he drivers/ net / e1000/ direct ory for a com plet e illust rat ion. Concurrent access prot ect ion goes hand- in- hand wit h m anaging such com plex dat a st ruct ures in t he face of m ult iple execut ion t hreads such as t ransm it , receive, t ransm it - com plet e int errupt s, receive int errupt s, and NAPI polling. We discussed several concurrency cont rol t echniques in Chapt er 2 , " A Peek I nside t he Kernel."

D e vice Ex a m ple : Et h e r n e t N I C Now t hat you have t he background, it 's t im e t o writ e a NI C driver by gluing t he pieces discussed so far. List ing 15.1 im plem ent s a skelet al Et hernet NI C driver. I t only im plem ent s t he m ain net_device m et hods. For help in developing t he rest of t he m et hods, refer t o t he e1000 driver m ent ioned earlier. List ing 15.1 is generally independent of t he underlying I / O bus but is slight ly t ilt ed t o PCI . I f you are writ ing a PCI NI C driver, you have t o blend List ing 15.1 wit h t he exam ple PCI driver im plem ent ed in Chapt er 10. List in g 1 5 .1 . An Et h e r n e t N I C D r ive r Code View: #include #include #include #include struct net_device_stats mycard_stats;

/* Statistics */

/* Fill ethtool_ops methods from a suitable place in the driver */ struct ethtool_ops mycard_ethtool_ops = { /* ... */ .get_eeprom = mycard_get_eeprom, /* Dump EEPROM contents */ /* ... */ }; /* Initialize/probe the card. For PCI cards, this is invoked from (or is itself) the probe() method. In that case, the function is declared as: static struct net_device *init_mycard(struct pci_dev *pdev, const struct pci_device_id *id) */ static struct net_device * init_mycard() { struct net_device *netdev; struct priv_struct mycard_priv; /* ... */ netdev = alloc_etherdev(sizeof(struct priv_struct)); /* Common methods */ netdev->open = &mycard_open; netdev->stop = &mycard_close; netdev->do_ioctl = &mycard_ioctl; /* Data transfer */ netdev->hard_start_xmit = &mycard_xmit_frame; /* Transmit */ netdev->poll = &mycard_poll; /* Receive - NAPI netdev->weight = 64; /* Fairness */ /* Watchdog */ netdev->tx_timeout = &mycard_timeout; netdev->watchdog_timeo = 8*HZ;

*/

/* Recovery function */ /* 8-second timeout */

/* Statistics and configuration */ netdev->get_stats = &mycard_get_stats; /* Statistics support */ netdev->ethtool_ops = &mycard_ethtool_ops; /* Ethtool support */

netdev->set_mac_address = &mycard_set_mac; /* Change MAC */ netdev->change_mtu = &mycard_change_mtu; /* Alter MTU */ strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);

/* Name (for PCI) */

/* Bus-specific parameters. For a PCI NIC, it looks as follows */ netdev->mem_start = pci_resource_start(pdev, 0); netdev->mem_end = netdev->mem_start + pci_resource_len(pdev, 0); /* Register the interface */ register_netdev(netdev); /* ... */ /* Get MAC address from attached EEPROM */ /* ... */ /* Download microcode if needed */ /* ... */ }

/* The interrupt handler */ static irqreturn_t mycard_interrupt(int irq, void *dev_id) { struct net_device *netdev = dev_id; struct sk_buff *skb; unsigned int length; /* ... */ if (receive_interrupt) { /* We were interrupted due to packet reception. At this point, the NIC has already DMA'ed received data to an sk_buff that was pre-allocated and mapped during device open. Obtain the address of the sk_buff depending on your data structure design and assign it to 'skb'. 'length' is similarly obtained from the NIC by reading the descriptor used to DMA data from the card. Now, skb->data contains the receive data. */ /* ... */ /* For PCI cards, perform a pci_unmap_single() on the received buffer in order to allow the CPU to access it */ /* ... */ /* Allow the data go to the tail of the packet by moving skb->tail down by length bytes and increasing skb->len correspondingly */ skb_put(skb, length) /* Pass the packet to the TCP/IP stack */ #if !defined (USE_NAPI) /* Do it the old way */ netif_rx(skb); #else /* Do it the NAPI way */ if (netif_rx_schedule_prep(netdev))) { /* Disable NIC interrupt. Implementation not shown. */ disable_nic_interrupt();

/* Post the packet to the protocol layer and add the device to the poll list */ __netif_rx_schedule(netdev); } #endif } else if (tx_complete_interrupt) { /* Transmit Complete Interrupt */ /* ... */ /* Unmap and free transmit resources such as DMA descriptors and buffers. Free sk_buffs or reclaim them into a free pool */ /* ... */ } } /* Driver open */ static int mycard_open(struct net_device *netdev) { /* ... */ /* Request irq */ request_irq(irq, mycard_interrupt, IRQF_SHARED, netdev->name, dev); /* Fill transmit and receive rings */ /* See the section, "Buffer Management and Concurrency Control" */ /* ... */ /* Provide free descriptor addresses to the card */ /* ... */ /* Convey your readiness to accept data from the networking stack */ netif_start_queue(netdev); /* ... */ } /* Driver close */ static int mycard_close(struct net_device *netdev) { /* ... */ /* Ask the networking stack to stop sending down data */ netif_stop_queue(netdev); /* ... */ } /* Called when the device is unplugged or when the module is released. For PCI cards, this is invoked from (or is itself) the remove() method. In that case, the function is declared as: static void __devexit mycard_remove(struct pci_dev *pdev) */ static void __devexit mycard_remove()

{ struct net_device *netdev; /* ... */ /* For a PCI card, obtain the associated netdev as follows, assuming that the probe() method performed a corresponding pci_set_drvdata(pdev, netdev) after allocating the netdev */ netdev = pci_get_drvdata(pdev); /* unregister_netdev(netdev);

/* Reverse of register_netdev() */

/* ... */ free_netdev(netdev);

/* Reverse of alloc_netdev() */

/* ... */ }

/* Suspend method. For PCI devices, this is part of the pci_driver structure discussed in Chapter 10 */ static int mycard_suspend(struct pci_dev *pdev, pm_message_t state) { /* ... */ netif_device_detach(netdev); /* ... */ }

/* Resume method. For PCI devices, this is part of the pci_driver structure discussed in Chapter 10 */ static int mycard_resume(struct pci_dev *pdev) { /* ... */ netif_device_attach(netdev); /* ... */ }

/* Get statistics */ static struct net_device_stats * mycard_get_stats(struct net_device *netdev) { /* House keeping */ /* ... */ return(&mycard_stats); } /* Dump EEPROM contents. This is an ethtool_ops operation */ static int mycard_get_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom, uint8_t *bytes) { /* Read data from the accompanying EEPROM */ /* ... */ }

/* Poll method */ static int mycard_poll(struct net_device *netdev, int *budget) { /* Post packets to the protocol layer using netif_receive_skb() */ /* ... */ if (no_more_ingress_packets()){ /* Remove the device from the polled list */ netif_rx_complete(netdev); /* Fall back to interrupt mode. Implementation not shown */ enable_nic_interrupt(); return 0; } } /* Transmit method */ static int mycard_xmit_frame(struct sk_buff *skb, struct net_device *netdev) { /* DMA the transmit packet from the associated sk_buff to card memory */ /* ... */ /* Manage buffers */ /* ... */ }

Et hernet PHY Et hernet cont rollers im plem ent t he MAC layer and have t o be used in t andem wit h a Physical layer ( PHY) t ransceiver. The form er corresponds t o t he dat alink layer of t he Open Syst em s I nt erconnect ( OSI ) m odel, while t he lat t er im plem ent s t he physical layer. Several SoCs have built - in MACs t hat connect t o ext ernal PHYs. The Media I ndependent I nt erface ( MI I ) is a st andard int erface t hat connect s a Fast Et hernet MAC t o a PHY. The Et hernet device driver com m unicat es wit h t he PHY over MI I t o configure param et ers such as PHY I D, line speed, duplex m ode, and aut o negot iat ion. Look at include/ linux/ m ii.h for MI I regist er definit ions.

I SA N e t w or k D r ive r s Let 's now t ake a peek at an I SA NI C. The CS8900 is a 10Mbps Et hernet cont roller chip from Cryst al Sem iconduct or ( now Cirrus Logic) . This chip is com m only used t o Et hernet - enable em bedded devices, especially for debug purposes. Figure 15.2 shows a connect ion diagram surrounding a CS8900. Depending on t he processor on your board and t he chip- select used t o drive t he chip, t he CS8900 regist ers m ap t o different regions in t he CPU's I / O address space. The device driver for t his cont roller is an I SA- t ype driver ( look at t he sect ion "I SA and MCA" in Chapt er 20, " More Devices and Drivers" ) t hat probes candidat e address regions t o det ect t he cont roller's presence. The I SA probe m et hod elicit s t he cont roller's I / O base address by looking for a signat ure such as t he chip I D.

Figu r e 1 5 .2 . Con n e ct ion dia gr a m su r r ou n din g a CS8 9 0 0 Et h e r n e t con t r olle r .

[ View full size im age]

Look at drivers/ net / cs89x0.c for t he source code of t he CS8900 driver. cs89x0_probe1() probes I / O address ranges t o sense a CS8900. I t t hen reads t he current configurat ion of t he chip. During t his st ep, it accesses t he CS8900's com panion EEPROM and gleans t he cont roller's MAC address. Like t he driver in List ing 15.1, cs89x0.c is also built using netif_*() and skb_*() int erface rout ines. Som e plat form s t hat use t he CS8900 allow DMA. I SA devices, unlike PCI cards, do not have DMA m ast ering capabilit ies, so t hey need an ext ernal DMA cont roller t o t ransfer dat a.

Asyn ch r on ou s Tr a n sfe r M ode ATM is a high- speed, connect ion- orient ed, back- bone t echnology. ATM guarant ees high qualit y of service ( QoS) and low lat encies, so it 's used for carrying audio and video t raffic in addit ion t o dat a. Here's a quick sum m ary of t he ATM prot ocol: ATM com m unicat ion t akes place in unit s of 53- byt e cells. Each cell begins wit h a 5- byt e header t hat carries a virt ual pat h ident ifier ( VPI ) and a virt ual circuit ident ifier ( VCI ) . ATM connect ions are eit her swit ched virt ual circuit s ( SVCs) or perm anent virt ual circuit s ( PVCs) . During SVC est ablishm ent , VPI / VCI pairs are configured in int ervening ATM swit ches t o rout e incom ing cells t o appropriat e egress port s. For PVCs, t he VPI / VCI pairs are perm anent ly configured in t he ATM swit ches and not set up and t orn down for each connect ion. There are t hree ways you can run TCP/ I P over ATM, all of which are support ed by Linux- ATM:

1 . Classical I P over ATM ( CLI P) as specified in RFC[ 1 ] 1577.

[ 1]

Request For Com m ent s ( RFC) are docum ent s t hat specify net working st andards.

2 . Em ulat ing a LAN over an ATM net work. This is called LAN Em ulat ion ( LANE) .

3 . Mult i Prot ocol over ATM ( MPoA) . This is a rout ing t echnique t hat im proves perform ance.

Linux- ATM is an experim ent al collect ion of kernel drivers, user space ut ilit ies, and daem ons. You will find ATM drivers and prot ocols under drivers/ at m / and net / at m / , respect ively, in t he source t ree. ht t p: / / linuxat m .sourceforge.net / host s user- space program s required t o use Linux- ATM. Linux also incorporat es an ATM socket API consist ing of SVC socket s (AF_ATMSVC) and PVC socket s ( AF_ATMPVC) . A prot ocol called Mult iprot ocol Label Swit ching ( MPLS) is replacing ATM. The Linux- MPLS proj ect , host ed at ht t p: / / m pls- linux.sourceforge.net / , is not yet part of t he m ainline kernel. We look at som e ATM- relat ed t hroughput issues in t he next sect ion.

N e t w or k Th r ou gh pu t Several t ools are available t o benchm ark net work perform ance. Net perf, available for free from www.net perf.org , can set up com plex TCP/ UDP connect ion scenarios. You can use script s t o cont rol charact erist ics such as prot ocol param et ers, num ber of sim ult aneous sessions, and size of dat a blocks. Benchm arking is accom plished by com paring t he result ing t hroughput wit h t he m axim um pract ical bandwidt h t hat t he net working t echnology yields. For exam ple, a 155Mbps ATM adapt er produces a m axim um I P t hroughput of 135Mbps, t aking int o account t he ATM cell header size, overheads due t o t he ATM Adapt at ion Layer ( AAL) , and t he occasional m aint enance cells sent by t he physical Synchronous Opt ical Net working ( SONET) layer. To obt ain opt im al t hroughput , you have t o design your NI C driver for high perform ance. I n addit ion, you need an in- dept h underst anding of t he net work prot ocol t hat your driver ferries.

D r ive r Pe r for m a n ce Let 's t ake a look at som e driver design issues t hat can affect t he horsepower of your NI C:

Minim izing t he num ber of inst ruct ions in t he m ain dat a pat h is a key crit erion while designing drivers for fast NI Cs. Consider a 1Gbps Et hernet adapt er wit h 1MB of on- board m em ory. At line rat e, t he card m em ory can hold up t o 8 m illiseconds of received dat a. This direct ly t ranslat es t o t he m axim um allowable inst ruct ion pat h lengt h. Wit hin t his pat h lengt h, incom ing packet s have t o be reassem bled, DMAed t o m em ory, processed by t he driver, prot ect ed from concurrent access, and delivered t o higher layer pr ot ocols.

During program m ed I / O ( PI O) , dat a t ravels all t he way from t he device t o t he CPU, before it get s writ t en t o m em ory. Moreover, t he CPU get s int errupt ed whenever t he device needs t o t ransfer dat a, and t his cont ribut es t o lat encies and cont ext swit ch delays. DMAs do not suffer from t hese bot t lenecks, but can t urn out t o be m ore expensive t han PI Os if t he dat a t o be t ransferred is less t han a t hreshold. This is because sm all DMAs have high relat ive overheads for building descript ors and flushing corresponding processor cache lines for dat a coherency. A perform ance- sensit ive device driver m ight use PI O for sm all packet s and DMA for larger ones, aft er experim ent ally det erm ining t he t hreshold.

For PCI net work cards having DMA m ast ering capabilit y, you have t o det erm ine t he opt im al DMA burst size, which is t he t im e for which t he card cont rols t he bus at one st ret ch. I f t he card burst s for a long durat ion, it m ay hog t he bus and prevent t he processor from keeping up wit h dat a DMA- ed previously. PCI drivers program t he burst size via a regist er in t he PCI configurat ion space. Norm ally t he NI C's burst size is program m ed t o be t he sam e as t he cache line size of t he processor, which is t he num ber of byt es t hat t he processor reads from syst em m em ory each t im e t here is a cache m iss. I n pract ice, however, you m ight need t o connect a bus analyzer t o det erm ine t he beneficial burst durat ion because fact ors such as t he presence of a split bus ( m ult iple bus t ypes like I SA and PCI ) on your syst em can influence t he opt im al value.

Many high- speed NI Cs offer t he capabilit y t o offload t he CPU- int ensive com put at ion of TCP checksum s from t he prot ocol st ack. Som e support DMA scat t er- gat her t hat we visit ed in Chapt er 10. The driver needs t o leverage t hese capabilit ies t o achieve t he m axim um pract ical bandwidt h t hat t he underlying net work yields.

Som et im es, a driver opt im izat ion m ight creat e unexpect ed speed bum ps if it 's not sensit ive t o t he im plem ent at ion det ails of higher prot ocols. Consider an NFS- m ount ed filesyst em on a com put er equipped

wit h a high- speed NI C. Assum e t hat t he NI C driver t akes only occasional t ransm it com plet e int errupt s t o m inim ize lat encies, but t hat t he NFS server im plem ent at ion uses freeing of it s t ransm it buffers as a flowcont rol m echanism . Because t he driver frees NFS t ransm it buffers only during t he sparsely generat ed t ransm it com plet e int errupt s, file copies over NFS crawl, even as I nt ernet downloads zip along yielding m axim um t hroughput .

Pr ot ocol Pe r for m a n ce Let 's now dig int o som e prot ocol- specific charact erist ics t hat can boost or hurt net work t hroughput :

TCP window size can im pact t hroughput . The window size provides a m easure of t he am ount of dat a t hat can be t ransm it t ed before receiving an acknowledgm ent . For fast NI Cs, a sm all window size m ight result in TCP sit t ing idle, wait ing for acknowledgm ent s of packet s already t ransm it t ed. Even wit h a large window size, a sm all num ber of lost TCP packet s can affect perform ance because lost fram es can use up t he window at line speeds. I n t he case of UDP, t he window size is not relevant because it does not support acknowledgm ent s. However, a sm all packet loss can spiral int o a big rat e drop due t o t he absence of flowcont rol m echanism s.

As t he block size of applicat ion dat a writ t en t o TCP socket s increases, t he num ber of buffers copied from user space t o kernel space decreases. This lowers t he dem and on processor ut ilizat ion and is good for perform ance. I f t he block size crosses t he MTU corresponding t o t he net work prot ocol, however, processor cycles get wast ed on fragm ent at ion. The desirable block size is t hus t he out going int erface MTU, or t he largest packet t hat can be sent wit hout fragm ent at ion t hrough an I P pat h if Pat h MTU discovery m echanism s are in operat ion. While running I P over ATM, for exam ple, because t he ATM adapt at ion layer has a 64K MTU, t here is virt ually no upper bound on block size. ( RFC 1626 default s t his t o 9180.) I f you are running I P over ATM LANE, however, t he block size should m irror t he MTU size of t he respect ive LAN t echnology being em ulat ed. I t should t hus be 1500 for st andard Et hernet , 8000 for j um bo Gigabit Et hernet , and 18K for 16Mbps Token Ring.

Look in g a t t h e Sou r ce s The drivers/ net / direct ory cont ains sources of various NI C drivers. Look inside drivers/ net / e1000/ for an exam ple NI C driver. You will find net work prot ocol im plem ent at ions in t he net / direct ory. sk_buff access rout ines are in net / core/ skbuff.c. Library rout ines t hat aid t he im plem ent at ion of your driver's net_device int erface st ay in net / core/ dev.c and include/ linux/ net device.h.

TUN/ TAP Driver The TUN/ TAP device driver drivers/ net / t un.c, used for prot ocol t unneling, is an exam ple of a com binat ion of a virt ual net work driver and a pseudo char driver. The pseudo char device (/ dev/ net / t un) act s as t he underlying hardware for t he virt ual net work int erface ( t unX) , so inst ead of t ransm it t ing fram es t o a physical net work, t he TUN net work driver sends it t o an applicat ion t hat is reading from / dev/ net / t un. Sim ilarly, inst ead of receiving dat a from a physical net work, t he TUN driver accept s it from an applicat ion writ ing t o / dev/ net / t un. Look at Docum ent at ion/ net working/ t unt ap.t xt for m ore explanat ions and usage scenarios. Since bot h net work and char port ions of t he driver do not have t o deal wit h t he com plexit ies of hardware int eract ion, it serves as a very readable, albeit sim plist ic, driver exam ple.

Files under / sys/ class/ net / let you operat e on NI C driver param et ers. Use t he nodes under / proc/ sys/ net / t o configure prot ocol- specific variables. To set t he m axim um TCP t ransm it window size, for exam ple, echo t he desired value t o / proc/ sys/ net / core/ wm em _m ax. The / proc/ net / direct ory has a collect ion of syst em - specific net work inform at ion. Exam ine / proc/ net / dev for st at ist ics on all NI Cs on your syst em and look at / proc/ net / arp for t he ARP t able. Table 15.1 cont ains t he m ain dat a st ruct ures used in t his chapt er and t heir locat ion in t he source t ree. Table 15.2 list s t he m ain kernel program m ing int erfaces t hat you used in t his chapt er along wit h t he locat ion of t heir definit ions.

Ta ble 1 5 .1 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

sk_buff

include/ linux/ skbuff.h

sk_buffs provide efficient buffer handling and flow- cont rol m echanism s t o Linux net working layers.

net_device

include/ linux/ net device.h I nt erface bet ween NI C drivers and t he TCP/ I P st ack.

net_device_stats

include/ linux/ net device.h St at ist ics pert aining t o a net work device.

ethtool_ops

include/ linux/ et ht ool.h

Ent ry point s t o t ie a NI C driver t o t he et ht ool ut ilit y.

Ta ble 1 5 .2 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

alloc_netdev()

net / core/ dev.c

Allocat es a net_device

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

alloc_etherdev() alloc_ieee80211() alloc_irdadev()

net / et hernet / et h.c

Wrappers t o alloc_netdev()

net / ieee80211/ ieee80211_m odule.c net / irda/ irda_device.c

free_netdev()

net / core/ dev.c

Reverse of alloc_netdev()

register_netdev()

net / core/ dev.c

Regist ers a net_device

unregister_netdev()

net / core/ dev.c

Unregist ers a net_device

dev_alloc_skb()

include/ linux/ skbuff.h

Allocat es m em ory for an sk_buff and associat es it wit h a packet payload buffer

dev_kfree_skb()

include/ linux/ skbuff.h net / core/ skbuff.c

Reverse of dev_alloc_skb()

skb_reserve()

include/ linux/ skbuff.h

Adds a padding bet ween t he st art of a packet buffer and t he beginning of payload

skb_clone()

net / core/ skbuff.c

Creat es a copy of a supplied sk_buff wit hout copying t he cont ent s of t he associat ed packet buffer

skb_put()

include/ linux/ skbuff.h

Allows packet dat a t o go t o t he t ail of t he packet

netif_rx()

net / core/ dev.c

Passes a net work packet t o t he TCP/ I P st ack

netif_rx_schedule_prep() __netif_rx_schedule()

include/ linux/ net device.h net / core/ dev.c

Passes a net work packet t o t he TCP/ I P st ack ( NAPI )

netif_receive_skb()

net / core/ dev.c

Post s packet t o t he prot ocol layer from t he poll() m et hod ( NAPI )

netif_rx_complete()

include/ linux/ net device.h

Rem oves a device from polled list ( NAPI )

netif_device_detach()

net / core/ dev.c

Det aches t he device ( com m only called during power suspend)

netif_device_attach()

net / core/ dev.c

At t aches t he device ( com m only called during power resum e)

netif_start_queue()

include/ linux/ net device.h

Conveys readiness t o accept dat a from t he net working st ack

netif_stop_queue()

include/ linux/ net device.h

Asks t he net working st ack t o st op sending down dat a

netif_wake_queue()

include/ linux/ net device.h

Rest art s egress queuing

netif_queue_stopped()

include/ linux/ net device.h

Checks flow- cont rol st at e

Ch a pt e r 1 6 . Lin u x W it h ou t W ir e s I n Th is Ch a pt e r

Bluet oot h

467

I nfrared

478

WiFi

489

Cellular Net working

496

Current Trends

500

Several sm all- foot print devices are powered by t he dual com binat ion of a wireless t echnology and Linux. Bluet oot h, I nfrared, WiFi, and cellular net working are est ablished wireless t echnologies t hat have healt hy Linux support . Bluet oot h elim inat es cables, inj ect s int elligence int o dum b devices, and opens a flood gat e of novel applicat ions. I nfrared is a low- cost , low- range, m edium - rat e, wireless t echnology t hat can net work lapt ops, connect handhelds, or dispat ch a docum ent t o a print er. WiFi is t he wireless equivalent of an Et hernet LAN. Cellular net working using General Packet Radio Service ( GPRS) or code division m ult iple access ( CDMA) keeps you I nt ernet - enabled on t he go, as long as your wanderings are confined t o service provider coverage area. Because t hese wireless t echnologies are widely available in popular form fact ors, you are likely t o end up, sooner rat her t han lat er, wit h a card t hat does not work on Linux right away. Before you st art working on enabling an unsupport ed card, you need t o know in det ail how t he kernel im plem ent s support for t he corresponding t echnology. I n t his chapt er, let 's learn how Linux enables Bluet oot h, I nfrared, WiFi, and cellular net working.

Wireless Trade- Offs Bluet oot h, I nfrared, WiFi, and GPRS serve different niches. The t rade- offs can be gauged in t erm s of speed, range, cost , power consum pt ion, ease of hardware/ soft ware co- design, and PCB real est at e usage. Table 16.1 gives you an idea of t hese param et ers, but you will have t o cont end wit h several variables when you m easure t he num bers on t he ground. The speeds list ed are t he t heoret ical m axim um s. The power consum pt ions indicat ed are relat ive, but in t he real world t hey also depend on t he vendor's im plem ent at ion t echniques, t he t echnology subclass, and t he operat ing m ode. Cost econom ics depend on t he chip form fact or and whet her t he chip cont ains built - in m icrocode t hat im plem ent s som e of t he prot ocol layers. The board real est at e consum ed depends not j ust on t he chipset , but also on t ransceivers, ant ennae, and whet her you build using off- t he- shelf ( OTS) m odules.

Bluet oot h 720Kbps 10m t o 100m ** ** ** ** I nfrared Dat a 4Mbps ( Fast I R) Up t o 1 m et er wit hin a 30- degree cone * * * * WiFi 54Mbps 150 m et ers ( indoors) **** *** *** *** GPRS 170Kbps Service provider coverage *** **** * *** Not e: The last four colum ns give relat ive m easurem ent ( depending on t he num ber of * sym bols) rat her t han absolut e v alues.

Ta ble 1 6 .1 . W ir e le ss Tr a de - Offs Spe e d

Ra n ge

Pow e r

Cost

Co- Design Effor t

Boa r d Re a l Est a t e

Som e sect ions in t his chapt er focus m ore on " syst em program m ing" t han device drivers. This is because t he corresponding regions of t he prot ocol st ack ( for exam ple, Bluet oot h RFCOMM and I nfrared net working) are already present in t he kernel and you are m ore likely t o perform associat ed user m ode cust om izat ions t han develop prot ocol cont ent or device drivers.

Blu e t oot h

Bluet oot h is a short - range cable- replacem ent t echnology t hat carries bot h dat a and voice. I t support s speeds of up t o 723Kbps ( asym m et ric) and 432Kbps ( sym m et ric) . Class 3 Bluet oot h devices have a range of 10 m et ers, and Class 1 t ransm it t ers can com m unicat e up t o 100 m et ers. Bluet oot h is designed t o do away wit h wires t hat const rict and clut t er your environm ent . I t can, for exam ple, t urn your wrist wat ch int o a front - end for a bulky Global Posit ioning Syst em ( GPS) hidden inside your backpack. Or it can, for inst ance, let you navigat e a present at ion via your handheld. Again, Bluet oot h can be t he answer if you want your lapt op t o be a hub t hat can I nt ernet - enable your Bluet oot h- aware MP3 player. I f your wrist wat ch, handheld, lapt op, or MP3 player is running Linux, knowledge of t he innards of t he Linux Bluet oot h st ack will help you ext ract m axim um m ileage out of your device. As per t he Bluet oot h specificat ion, t he prot ocol st ack consist s of t he layers shown in Figure 16.1 . The radio, link cont roller, and link m anager roughly correspond t o t he physical, dat a link, and net work layers in t he Open Syst em s I nt erconnect ( OSI ) st andard reference m odel. The Host Cont rol I nt erface ( HCI ) is t he prot ocol t hat carries dat a t o/ from t he hardware and, hence, m aps t o t he t ransport layer. The Bluet oot h Logical Link Cont rol and Adapt at ion Prot ocol ( L2CAP) falls in t he session layer. Serial port em ulat ion using Radio Frequency Com m unicat ion ( RFCOMM) , Et hernet em ulat ion using Bluet oot h Net work Encapsulat ion Prot ocol ( BNEP) , and t he Service Discovery Prot ocol ( SDP) are part of t he feat ure- rich present at ion layer. At t he t op of t he st ack reside various applicat ion environm ent s called pr ofiles. The radio, link cont roller, and link m anager are usually part of Bluet oot h hardware, so operat ing syst em support st art s at t he HCI layer.

Figu r e 1 6 .1 . Th e Blu e t oot h st a ck .

A com m on m et hod of int erfacing Bluet oot h hardware wit h a m icrocont roller is by connect ing t he chipset 's dat a lines t o t he cont roller's UART pins. Figure 13.4 of Chapt er 13 , " Audio Drivers," shows a Bluet oot h chip on an MP3 player com m unicat ing wit h t he processor via a UART. USB is anot her oft - used vehicle for com m unicat ing wit h Bluet oot h chipset s. Figure 11.2 of Chapt er 11 , " Universal Serial Bus," shows a Bluet oot h chip on an em bedded device int erfacing wit h t he processor over USB. I rrespect ive of whet her you use UART or USB ( we will look at bot h kinds of devices lat er) , t he packet form at used t o t ransport Bluet oot h dat a is HCI .

Blu e Z The BlueZ Bluet oot h im plem ent at ion is part of t he m ainline kernel and is t he official Linux Bluet oot h st ack.

Figure 16.2 shows how BlueZ m aps Bluet oot h prot ocol layers t o kernel m odules, kernel t hreads, user space daem ons, configurat ion t ools, ut ilit ies, and libraries. The m ain BlueZ com ponent s are explained here:

1 . bluet oot h.ko cont ains t he core BlueZ infrast ruct ure. All ot her BlueZ m odules ut ilize it s services. I t 's also responsible for export ing t he Bluet oot h fam ily of socket s ( AF_BLUETOOTH ) t o user space and for populat ing relat ed sysfs ent ries.

2 . For t ransport ing Bluet oot h HCI packet s over UART, t he corresponding BlueZ HCI im plem ent at ion is hci_uart .ko. For USB t ransport , it 's hci_usb.ko .

3 . l2cap.ko im plem ent s t he L2CAP adapt at ion layer t hat is responsible for segm ent at ion and reassem bly. I t also m ult iplexes bet ween different higher- layer prot ocols.

4 . To run TCP/ I P applicat ions over Bluet oot h, you have t o em ulat e Et hernet port s over L2CAP using BNEP. This is accom plished by bnep.ko. To service BNEP connect ions, BlueZ spawns a kernel t hread called kbnepd .

5 . To run serial port applicat ions such as t erm inal em ulat ors over Bluet oot h, you need t o em ulat e serial port s over L2CAP. This is accom plished by rfcom m .ko. RFCOMM also funct ions as t he pillar t hat support s net working over PPP. To service incom ing RFCOMM connect ions, rfcom m .ko spawns a kernel t hread called krfcom m d. To set up and m aint ain connect ions t o individual RFCOMM channels on t arget devices, use t he rfcom m ut ilit y.

6 . The Hum an I nt erface Devices ( HI D) layer is im plem ent ed via hidp.ko . The user m ode daem on, hidd , let s BlueZ handle input devices such as Bluet oot h m ice.

7 . Audio is handled via t he Synchronous Connect ion Orient ed ( SCO) layer im plem ent ed by sco. k o .

Figu r e 1 6 .2 . Blu e t oot h pr ot ocol la ye r s m a ppe d t o Blu e Z k e r n e l m odu le s.

[ View full size im age]

Let 's now t race t he kernel code flow for t wo exam ple Bluet oot h devices: a Com pact Flash ( CF) card and a USB adapt er.

D e vice Ex a m ple : CF Ca r d The Sharp Bluet oot h Com pact Flash card is built using a Silicon Wave chipset and uses a serial t ransport t o carry HCI packet s. There are t hree different ways by which HCI packet s can be t ransport ed serially:

1 . H4 ( UART) , which is used by t he Sharp CF card. H4 is t he st andard m et hod t o t ransfer Bluet oot h dat a over UARTs as defined by t he Bluet oot h specificat ion. Look at drivers/ bluet oot h/ hci_h4.c for t he BlueZ im plem ent at ion.

2 . H3 ( RS232) . Devices using H3 are hard t o find. BlueZ has no support for H3.

3 . BlueCore Serial Prot ocol ( BCSP) , which is a propriet ary prot ocol from Cam bridge Silicon Radio ( CSR) t hat support s error checking and ret ransm ission. BCSP is used on non- USB devices based on CSR BlueCore chips including PCMCI A and CF cards. The BlueZ BCSP im plem ent at ion lives in drivers/ bluet oot h/ hci_bcsp.c .

The read dat a pat h for t he Sharp Bluet oot h card is shown in Figure 16.3 . The first point of cont act bet ween t he card and t he kernel is at t he UART driver. As you saw in Figure 9.5 of Chapt er 9 , " PCMCI A and Com pact Flash," t he serial Card Services driver, drivers/ serial/ serial_cs.c , allows t he rest of t he operat ing syst em t o see t he Sharp card as if it were a serial device. The serial driver passes on t he received HCI packet s t o BlueZ. BlueZ im plem ent s HCI processing in t he form of a kernel line discipline. As you learned in Chapt er 6 , " Serial Drivers," line disciplines reside above t he serial driver and shape it s behavior. The HCI line discipline invokes associat ed prot ocol rout ines ( H4 in t his case) for assist ance in dat a processing. From t hen on, L2CAP and higher BlueZ layers t ake charge.

Figu r e 1 6 .3 . Re a d da t a pa t h fr om a Sh a r p Blu e t oot h CF ca r d.

[ View full size im age]

D e vice Ex a m ple : USB Ada pt e r Let 's now look at a device t hat uses USB t o t ransport HCI packet s. The Belkin Bluet oot h USB adapt er is one such gadget . I n t his case, t he Linux USB layer ( drivers/ usb/ * ) , t he HCI USB t ransport driver (drivers/ bluet oot h/ hci_usb.c ) , and t he BlueZ prot ocol st ack (net / bluet oot h/ * ) are t he m ain players t hat get t he dat a rolling. Let 's see how t hese t hree kernel layers int eract . As you learned in Chapt er 11 , USB devices exchange dat a using one or m ore of four pipes. For Bluet oot h USB devices, each pipe is responsible for carrying a part icular t ype of dat a:

1 . Cont rol pipes are used t o t ransport HCI com m ands.

2 . I nt errupt pipes are responsible for carrying HCI event s.

3 . Bulk pipes t ransfer asynchronous connect ionless ( ACL) Bluet oot h dat a.

4 . I sochronous pipes carry SCO audio dat a.

You also saw in Chapt er 11 t hat when a USB device is plugged int o a syst em , t he host cont roller driver enum erat es it using a cont rol pipe and assigns endpoint addresses bet ween 1 and 127. The configurat ion descript or read by t he USB subsyst em during enum erat ion cont ains inform at ion about t he device, such as it s class , subclass , and protocol . The Bluet oot h specificat ion defines t he (class , subclass , protocol) codes of Bluet oot h USB devices as ( 0xE, 0x01, 0x01) . The HCI USB t ransport driver ( hci_usb ) regist ers t hese values wit h t he USB core during init ializat ion. When t he Belkin USB adapt er is plugged in, t he USB core reads t he ( class , subclass , protocol) inform at ion from t he device configurat ion descript or. Because t his inform at ion m at ches t he values regist ered by hci_usb , t his driver get s at t ached t o t he Belkin USB adapt er. hci_usb reads Bluet oot h dat a from t he four USB pipes described previously and passes it on t o t he BlueZ prot ocol st ack. Linux applicat ions now run seam lessly over t his device, as shown in Figure 16.2 .

RFCOM M RFCOMM em ulat es serial port s over Bluet oot h. Applicat ions such as t erm inal em ulat ors and prot ocols such as PPP can run unchanged over t he virt ual serial int erfaces creat ed by RFCOMM.

D e vice Ex a m ple : Pill D ispe n se r To t ake an exam ple, assum e t hat you have a Bluet oot h- aware pill dispenser. When you pop a pill out of t he dispenser, it sends a m essage over a Bluet oot h RFCOMM channel. A Linux cell phone, such as t he one shown in Figure 6.5 of Chapt er 6 , reads t his alert using a sim ple applicat ion t hat est ablishes an RFCOMM connect ion t o t he pill dispenser. The phone t hen dispat ches t his inform at ion t o t he healt h- care provider's server on t he I nt ernet via it s GPRS int erface. A skelet al applicat ion on t he Linux cell phone t hat reads dat a arriving from t he pill dispenser using t he BlueZ socket API is shown in List ing 16.1 . The list ing assum es t hat you are fam iliar wit h t he basics of socket program m ing. List in g 1 6 .1 . Com m u n ica t in g w it h a Pill D ispe n se r ove r RFCOM M Code View: #include #include

/* For struct sockaddr_rc */

void sense_dispenser() { int pillfd; struct sockaddr_rc pill_rfcomm; char buffer[1024]; /* ... */ /* Create a Bluetooth RFCOMM socket */ if ((pillfd = socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM)) < 0) {

printf("Bad Bluetooth RFCOMM socket"); exit(1); } /* Connect to the pill pill_rfcomm.rc_family pill_rfcomm.rc_bdaddr pill_rfcomm.rc_channel

dispenser */ = AF_BLUETOOTH; = PILL_DISPENSER_BLUETOOTH_ADDR; = PILL_DISPENSER_RFCOMM_CHANNEL;

if (connect(pillfd, (struct sockaddr *)&pill_rfcomm, sizeof(pill_rfcomm))) { printf("Cannot connect to Pill Dispenser\n"); exit(1); } printf("Connection established to Pill Dispenser\n"); /* Poll until data is ready */ select(pillfd, &fds, NULL, NULL, &timeout); /* Data is available on this RFCOMM channel */ if (FD_ISSET(pillfd, fds)) { /* Read pill removal alerts from the dispenser */ read(pillfd, buffer, sizeof(buffer)); /* Take suitable action; e.g., send a message to the health care provider's server on the Internet via the GPRS interface */ /* ... */ } /* ... */ }

N e t w or k in g Trace down t he code pat h from t he t elnet / ft p/ ssh box in Figure 16.2 t o see how net working is archit ect ed over BlueZ Bluet oot h. As you can see, t here are t wo different ways t o net work over Bluet oot h:

1 . By running TCP/ I P direct ly over BNEP. The result ing net work is called a personal area net work ( PAN) .

2 . By running TCP/ I P over PPP over RFCOMM. This is called dialup net working ( DUN) .

The kernel im plem ent at ion of Bluet oot h net working is unlikely t o int erest t he device driver writ er and is not explored. Table 16.2 shows t he st eps required t o net work t wo lapt ops using PAN, however. Net working wit h DUN resem bles t his and is not exam ined. The lapt ops are respect ively Bluet oot h- enabled using t he Sharp CF card and t he Belkin USB adapt er discussed earlier. You can slip t he CF card int o t he first lapt op's PCMCI A slot using a passive CF- t o- PCMCI A adapt er. Look at Figure 16.2 in t andem wit h Table 16.2 t o underst and t he m appings t o corresponding BlueZ com ponent s. Table 16.2 uses bash-sharp> and bash-belkin> as t he respect ive shell prom pt s of t he t wo lapt ops.

On t he lapt op wit h t he Sharp Bluet oot h CF card

1 . St art t he HCI and service discovery daem ons: bash-sharp> hcid bash-sharp> sdpd Because t his device possesses a UART int erface, you have t o at t ach t he BlueZ st ack t o t he appropriat e serial port . I n t his case, assum e t hat ser ial_cs has allot t ed / dev/ t t yS3 t o t he card: bash-sharp> hciattach ttyS3 any 2 . Verify t hat t he HCI int erface is up: bash-sharp> hciconfig -a hci0: Type: UART BD Address: 08:00:1F:10:3B:13 ACL MTU: 60:20 UP RUNNING PSCAN ISCAN ... Manufacturer: Silicon Wave (11)

SCO MTU: 31:1

3 . Verify t hat basic BlueZ m odules are loaded: bash-sharp> lsmod Module hci_uart l2cap bluetooth ...

Size 16728 26144 47684

Used by 3 2 6 hci_uart,l2cap

4 . I nsert t he BlueZ m odule t hat im plem ent s net work encapsulat ion: bash-sharp> modprobe bnep 5 . List en for incom ing PAN connect ions: [ 1 ] bash-sharp> pand –s On t he lapt op wit h t he Belkin USB Bluet oot h adapt er

1 . St art daem ons, such as hcid and sdpd, and insert necessary kernel m odules, such as bluet oot h.ko and l2cap.ko .

2 . Because t his is a USB device, you don't need t o invoke hciat t ach, but m ake sure t hat t he hci_usb.ko m odule is insert ed.

3 . Verify t hat t he HCI int erface is up: Code View: bash-belkin> hciconfig -a hci0: Type: USB BD Address: 00:02:72:B0:33:AB ACL MTU: 192:8 UP RUNNING PSCAN ISCAN

SCO MTU: 64:8

... Manufacturer: Cambridge Silicon Radio (10)

4 . Search and discover devices in t he neighborhood: bash-belkin> hcitool -i hci0 scan --flush Scanning.... 08:00:1F:10:3B:13 bash-sharp 5 . Est ablish a PAN wit h t he first lapt op. You can get it s Bluet oot h address ( 08: 00: 1F: 10: 3B: 13) from it s hciconfig out put : bash-belkin> pand -c 08:00:1F:10:3B:13

I f you now look at t he ifconfig out put on t he t wo lapt ops, you will find t hat a new int erface nam ed bnep0 has m ade an appearance at bot h ends. Assign I P addresses t o bot h int erfaces and get ready t o t elnet and FTP!

Ta ble 1 6 .2 . N et w ork ing Tw o La pt ops Usin g Bluet oot h PAN

[ 1] A useful com m and- line opt ion t o pand is --persist , which aut om at ically at t em pt s t o reconnect when a connect ion drops. Dig int o t he m an pages for m ore invocat ion opt ions.

H u m a n I n t e r fa ce D e vice s Look at sect ions " USB and Bluet oot h Keyboards " and " USB and Bluet oot h Mice " in Chapt er 7 , " I nput Drivers," for a discussion on Bluet oot h hum an int erface devices.

Au dio Let 's t ake t he exam ple of an HBH- 30 Sony Ericsson Bluet oot h headset t o underst and Bluet oot h SCO audio. Before t he headset can st art com m unicat ing wit h a Linux device, t he Bluet oot h link layer on t he lat t er has t o discover t he form er. For t his, put t he headset in discover m ode by pressing t he but t on earm arked for device discovery. I n addit ion, you have t o configure BlueZ wit h t he headset 's personal ident ificat ion num ber ( PI N) by adding it t o / et c/ bluet oot h/ pin. An applicat ion on t he Linux device t hat uses BlueZ SCO API s can now send audio dat a t o t he headset . The audio dat a should be in a form at t hat t he headset underst ands. The HBH- 30 uses t he A- law PCM ( pulse code m odulat ion ) form at . There are public dom ain ut ilit ies for convert ing audio int o various PCM form at s. Bluet oot h chipset s com m only have PCM int erface pins in addit ion t o t he HCI t ransport int erface. I f a device support s, for inst ance, bot h Bluet oot h and Global Syst em for Mobile Com m unicat ion ( GSM) , t he PCM lines from t he GSM chipset m ay be direct ly wired t o t he Bluet oot h chip's PCM audio lines. You m ight t hen have t o configure t he Bluet oot h chip t o receive and send SCO audio packet s over it s HCI int erface inst ead of it s PCM int erface.

D e bu ggin g There are t wo BlueZ t ools useful for debugging:

1 . hcidum p t aps HCI packet s flowing back and fort h, and parses t hem int o hum an- readable form . Here's an exam ple dum p while a device inquiry is in progress: bash> hcidump -i hci0 HCIDump - HCI packet analyzer ver 1.11 device: hci0 snap_len: 1028 filter: 0xffffffff HCI Command: Inquiry (0x01|0x0001) plen 5 HCI Event: Command Status (0x0f) plen 4 HCI Event: Inquiry Result (0x02) plen 15 ... HCI Event: Inquiry Complete (0x01) plen 1 < HCI Command: Remote Name Request (0x01|0x0019) plen 10 ... 2 . The virt ual HCI driver ( hci_vhci.ko ) , as shown in Figure 16.2 , em ulat es a Bluet oot h int erface if you do not have act ual hardware.

Look in g a t t h e Sou r ce s Look inside drivers/ bluet oot h/ for BlueZ low- level drivers. Explore net / bluet oot h/ for insight s int o t he BlueZ prot ocol im plem ent at ion. Bluet oot h applicat ions fall under different profiles based on how t hey behave. For exam ple, t he cordless t elephony profile specifies how a Bluet oot h device can im plem ent a cordless phone. We discussed profiles for PAN and serial access, but t here are m any m ore profiles out t here such as fax profile, General Obj ect Exchange Profile ( GOEP) and SI M Access Profile ( SAP) . The bluez- ut ils package, downloadable from www.bluez.org , provides support for several Bluet oot h profiles. The official Bluet oot h websit e is www.bluet oot h.org . I t cont ains Bluet oot h specificat ion docum ent s and inform at ion about t he Bluet oot h Special I nt erest Group ( SI G) . Affix is an alt ernat e Bluet oot h st ack on Linux. You can download Affix from ht t p: / / affix.sourceforge.net / .

Ch a pt e r 1 6 . Lin u x W it h ou t W ir e s I n Th is Ch a pt e r

Bluet oot h

467

I nfrared

478

WiFi

489

Cellular Net working

496

Current Trends

500

Several sm all- foot print devices are powered by t he dual com binat ion of a wireless t echnology and Linux. Bluet oot h, I nfrared, WiFi, and cellular net working are est ablished wireless t echnologies t hat have healt hy Linux support . Bluet oot h elim inat es cables, inj ect s int elligence int o dum b devices, and opens a flood gat e of novel applicat ions. I nfrared is a low- cost , low- range, m edium - rat e, wireless t echnology t hat can net work lapt ops, connect handhelds, or dispat ch a docum ent t o a print er. WiFi is t he wireless equivalent of an Et hernet LAN. Cellular net working using General Packet Radio Service ( GPRS) or code division m ult iple access ( CDMA) keeps you I nt ernet - enabled on t he go, as long as your wanderings are confined t o service provider coverage area. Because t hese wireless t echnologies are widely available in popular form fact ors, you are likely t o end up, sooner rat her t han lat er, wit h a card t hat does not work on Linux right away. Before you st art working on enabling an unsupport ed card, you need t o know in det ail how t he kernel im plem ent s support for t he corresponding t echnology. I n t his chapt er, let 's learn how Linux enables Bluet oot h, I nfrared, WiFi, and cellular net working.

Wireless Trade- Offs Bluet oot h, I nfrared, WiFi, and GPRS serve different niches. The t rade- offs can be gauged in t erm s of speed, range, cost , power consum pt ion, ease of hardware/ soft ware co- design, and PCB real est at e usage. Table 16.1 gives you an idea of t hese param et ers, but you will have t o cont end wit h several variables when you m easure t he num bers on t he ground. The speeds list ed are t he t heoret ical m axim um s. The power consum pt ions indicat ed are relat ive, but in t he real world t hey also depend on t he vendor's im plem ent at ion t echniques, t he t echnology subclass, and t he operat ing m ode. Cost econom ics depend on t he chip form fact or and whet her t he chip cont ains built - in m icrocode t hat im plem ent s som e of t he prot ocol layers. The board real est at e consum ed depends not j ust on t he chipset , but also on t ransceivers, ant ennae, and whet her you build using off- t he- shelf ( OTS) m odules.

Bluet oot h 720Kbps 10m t o 100m ** ** ** ** I nfrared Dat a 4Mbps ( Fast I R) Up t o 1 m et er wit hin a 30- degree cone * * * * WiFi 54Mbps 150 m et ers ( indoors) **** *** *** *** GPRS 170Kbps Service provider coverage *** **** * *** Not e: The last four colum ns give relat ive m easurem ent ( depending on t he num ber of * sym bols) rat her t han absolut e v alues.

Ta ble 1 6 .1 . W ir e le ss Tr a de - Offs Spe e d

Ra n ge

Pow e r

Cost

Co- Design Effor t

Boa r d Re a l Est a t e

Som e sect ions in t his chapt er focus m ore on " syst em program m ing" t han device drivers. This is because t he corresponding regions of t he prot ocol st ack ( for exam ple, Bluet oot h RFCOMM and I nfrared net working) are already present in t he kernel and you are m ore likely t o perform associat ed user m ode cust om izat ions t han develop prot ocol cont ent or device drivers.

Blu e t oot h

Bluet oot h is a short - range cable- replacem ent t echnology t hat carries bot h dat a and voice. I t support s speeds of up t o 723Kbps ( asym m et ric) and 432Kbps ( sym m et ric) . Class 3 Bluet oot h devices have a range of 10 m et ers, and Class 1 t ransm it t ers can com m unicat e up t o 100 m et ers. Bluet oot h is designed t o do away wit h wires t hat const rict and clut t er your environm ent . I t can, for exam ple, t urn your wrist wat ch int o a front - end for a bulky Global Posit ioning Syst em ( GPS) hidden inside your backpack. Or it can, for inst ance, let you navigat e a present at ion via your handheld. Again, Bluet oot h can be t he answer if you want your lapt op t o be a hub t hat can I nt ernet - enable your Bluet oot h- aware MP3 player. I f your wrist wat ch, handheld, lapt op, or MP3 player is running Linux, knowledge of t he innards of t he Linux Bluet oot h st ack will help you ext ract m axim um m ileage out of your device. As per t he Bluet oot h specificat ion, t he prot ocol st ack consist s of t he layers shown in Figure 16.1 . The radio, link cont roller, and link m anager roughly correspond t o t he physical, dat a link, and net work layers in t he Open Syst em s I nt erconnect ( OSI ) st andard reference m odel. The Host Cont rol I nt erface ( HCI ) is t he prot ocol t hat carries dat a t o/ from t he hardware and, hence, m aps t o t he t ransport layer. The Bluet oot h Logical Link Cont rol and Adapt at ion Prot ocol ( L2CAP) falls in t he session layer. Serial port em ulat ion using Radio Frequency Com m unicat ion ( RFCOMM) , Et hernet em ulat ion using Bluet oot h Net work Encapsulat ion Prot ocol ( BNEP) , and t he Service Discovery Prot ocol ( SDP) are part of t he feat ure- rich present at ion layer. At t he t op of t he st ack reside various applicat ion environm ent s called pr ofiles. The radio, link cont roller, and link m anager are usually part of Bluet oot h hardware, so operat ing syst em support st art s at t he HCI layer.

Figu r e 1 6 .1 . Th e Blu e t oot h st a ck .

A com m on m et hod of int erfacing Bluet oot h hardware wit h a m icrocont roller is by connect ing t he chipset 's dat a lines t o t he cont roller's UART pins. Figure 13.4 of Chapt er 13 , " Audio Drivers," shows a Bluet oot h chip on an MP3 player com m unicat ing wit h t he processor via a UART. USB is anot her oft - used vehicle for com m unicat ing wit h Bluet oot h chipset s. Figure 11.2 of Chapt er 11 , " Universal Serial Bus," shows a Bluet oot h chip on an em bedded device int erfacing wit h t he processor over USB. I rrespect ive of whet her you use UART or USB ( we will look at bot h kinds of devices lat er) , t he packet form at used t o t ransport Bluet oot h dat a is HCI .

Blu e Z The BlueZ Bluet oot h im plem ent at ion is part of t he m ainline kernel and is t he official Linux Bluet oot h st ack.

Figure 16.2 shows how BlueZ m aps Bluet oot h prot ocol layers t o kernel m odules, kernel t hreads, user space daem ons, configurat ion t ools, ut ilit ies, and libraries. The m ain BlueZ com ponent s are explained here:

1 . bluet oot h.ko cont ains t he core BlueZ infrast ruct ure. All ot her BlueZ m odules ut ilize it s services. I t 's also responsible for export ing t he Bluet oot h fam ily of socket s ( AF_BLUETOOTH ) t o user space and for populat ing relat ed sysfs ent ries.

2 . For t ransport ing Bluet oot h HCI packet s over UART, t he corresponding BlueZ HCI im plem ent at ion is hci_uart .ko. For USB t ransport , it 's hci_usb.ko .

3 . l2cap.ko im plem ent s t he L2CAP adapt at ion layer t hat is responsible for segm ent at ion and reassem bly. I t also m ult iplexes bet ween different higher- layer prot ocols.

4 . To run TCP/ I P applicat ions over Bluet oot h, you have t o em ulat e Et hernet port s over L2CAP using BNEP. This is accom plished by bnep.ko. To service BNEP connect ions, BlueZ spawns a kernel t hread called kbnepd .

5 . To run serial port applicat ions such as t erm inal em ulat ors over Bluet oot h, you need t o em ulat e serial port s over L2CAP. This is accom plished by rfcom m .ko. RFCOMM also funct ions as t he pillar t hat support s net working over PPP. To service incom ing RFCOMM connect ions, rfcom m .ko spawns a kernel t hread called krfcom m d. To set up and m aint ain connect ions t o individual RFCOMM channels on t arget devices, use t he rfcom m ut ilit y.

6 . The Hum an I nt erface Devices ( HI D) layer is im plem ent ed via hidp.ko . The user m ode daem on, hidd , let s BlueZ handle input devices such as Bluet oot h m ice.

7 . Audio is handled via t he Synchronous Connect ion Orient ed ( SCO) layer im plem ent ed by sco. k o .

Figu r e 1 6 .2 . Blu e t oot h pr ot ocol la ye r s m a ppe d t o Blu e Z k e r n e l m odu le s.

[ View full size im age]

Let 's now t race t he kernel code flow for t wo exam ple Bluet oot h devices: a Com pact Flash ( CF) card and a USB adapt er.

D e vice Ex a m ple : CF Ca r d The Sharp Bluet oot h Com pact Flash card is built using a Silicon Wave chipset and uses a serial t ransport t o carry HCI packet s. There are t hree different ways by which HCI packet s can be t ransport ed serially:

1 . H4 ( UART) , which is used by t he Sharp CF card. H4 is t he st andard m et hod t o t ransfer Bluet oot h dat a over UARTs as defined by t he Bluet oot h specificat ion. Look at drivers/ bluet oot h/ hci_h4.c for t he BlueZ im plem ent at ion.

2 . H3 ( RS232) . Devices using H3 are hard t o find. BlueZ has no support for H3.

3 . BlueCore Serial Prot ocol ( BCSP) , which is a propriet ary prot ocol from Cam bridge Silicon Radio ( CSR) t hat support s error checking and ret ransm ission. BCSP is used on non- USB devices based on CSR BlueCore chips including PCMCI A and CF cards. The BlueZ BCSP im plem ent at ion lives in drivers/ bluet oot h/ hci_bcsp.c .

The read dat a pat h for t he Sharp Bluet oot h card is shown in Figure 16.3 . The first point of cont act bet ween t he card and t he kernel is at t he UART driver. As you saw in Figure 9.5 of Chapt er 9 , " PCMCI A and Com pact Flash," t he serial Card Services driver, drivers/ serial/ serial_cs.c , allows t he rest of t he operat ing syst em t o see t he Sharp card as if it were a serial device. The serial driver passes on t he received HCI packet s t o BlueZ. BlueZ im plem ent s HCI processing in t he form of a kernel line discipline. As you learned in Chapt er 6 , " Serial Drivers," line disciplines reside above t he serial driver and shape it s behavior. The HCI line discipline invokes associat ed prot ocol rout ines ( H4 in t his case) for assist ance in dat a processing. From t hen on, L2CAP and higher BlueZ layers t ake charge.

Figu r e 1 6 .3 . Re a d da t a pa t h fr om a Sh a r p Blu e t oot h CF ca r d.

[ View full size im age]

D e vice Ex a m ple : USB Ada pt e r Let 's now look at a device t hat uses USB t o t ransport HCI packet s. The Belkin Bluet oot h USB adapt er is one such gadget . I n t his case, t he Linux USB layer ( drivers/ usb/ * ) , t he HCI USB t ransport driver (drivers/ bluet oot h/ hci_usb.c ) , and t he BlueZ prot ocol st ack (net / bluet oot h/ * ) are t he m ain players t hat get t he dat a rolling. Let 's see how t hese t hree kernel layers int eract . As you learned in Chapt er 11 , USB devices exchange dat a using one or m ore of four pipes. For Bluet oot h USB devices, each pipe is responsible for carrying a part icular t ype of dat a:

1 . Cont rol pipes are used t o t ransport HCI com m ands.

2 . I nt errupt pipes are responsible for carrying HCI event s.

3 . Bulk pipes t ransfer asynchronous connect ionless ( ACL) Bluet oot h dat a.

4 . I sochronous pipes carry SCO audio dat a.

You also saw in Chapt er 11 t hat when a USB device is plugged int o a syst em , t he host cont roller driver enum erat es it using a cont rol pipe and assigns endpoint addresses bet ween 1 and 127. The configurat ion descript or read by t he USB subsyst em during enum erat ion cont ains inform at ion about t he device, such as it s class , subclass , and protocol . The Bluet oot h specificat ion defines t he (class , subclass , protocol) codes of Bluet oot h USB devices as ( 0xE, 0x01, 0x01) . The HCI USB t ransport driver ( hci_usb ) regist ers t hese values wit h t he USB core during init ializat ion. When t he Belkin USB adapt er is plugged in, t he USB core reads t he ( class , subclass , protocol) inform at ion from t he device configurat ion descript or. Because t his inform at ion m at ches t he values regist ered by hci_usb , t his driver get s at t ached t o t he Belkin USB adapt er. hci_usb reads Bluet oot h dat a from t he four USB pipes described previously and passes it on t o t he BlueZ prot ocol st ack. Linux applicat ions now run seam lessly over t his device, as shown in Figure 16.2 .

RFCOM M RFCOMM em ulat es serial port s over Bluet oot h. Applicat ions such as t erm inal em ulat ors and prot ocols such as PPP can run unchanged over t he virt ual serial int erfaces creat ed by RFCOMM.

D e vice Ex a m ple : Pill D ispe n se r To t ake an exam ple, assum e t hat you have a Bluet oot h- aware pill dispenser. When you pop a pill out of t he dispenser, it sends a m essage over a Bluet oot h RFCOMM channel. A Linux cell phone, such as t he one shown in Figure 6.5 of Chapt er 6 , reads t his alert using a sim ple applicat ion t hat est ablishes an RFCOMM connect ion t o t he pill dispenser. The phone t hen dispat ches t his inform at ion t o t he healt h- care provider's server on t he I nt ernet via it s GPRS int erface. A skelet al applicat ion on t he Linux cell phone t hat reads dat a arriving from t he pill dispenser using t he BlueZ socket API is shown in List ing 16.1 . The list ing assum es t hat you are fam iliar wit h t he basics of socket program m ing. List in g 1 6 .1 . Com m u n ica t in g w it h a Pill D ispe n se r ove r RFCOM M Code View: #include #include

/* For struct sockaddr_rc */

void sense_dispenser() { int pillfd; struct sockaddr_rc pill_rfcomm; char buffer[1024]; /* ... */ /* Create a Bluetooth RFCOMM socket */ if ((pillfd = socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM)) < 0) {

printf("Bad Bluetooth RFCOMM socket"); exit(1); } /* Connect to the pill pill_rfcomm.rc_family pill_rfcomm.rc_bdaddr pill_rfcomm.rc_channel

dispenser */ = AF_BLUETOOTH; = PILL_DISPENSER_BLUETOOTH_ADDR; = PILL_DISPENSER_RFCOMM_CHANNEL;

if (connect(pillfd, (struct sockaddr *)&pill_rfcomm, sizeof(pill_rfcomm))) { printf("Cannot connect to Pill Dispenser\n"); exit(1); } printf("Connection established to Pill Dispenser\n"); /* Poll until data is ready */ select(pillfd, &fds, NULL, NULL, &timeout); /* Data is available on this RFCOMM channel */ if (FD_ISSET(pillfd, fds)) { /* Read pill removal alerts from the dispenser */ read(pillfd, buffer, sizeof(buffer)); /* Take suitable action; e.g., send a message to the health care provider's server on the Internet via the GPRS interface */ /* ... */ } /* ... */ }

N e t w or k in g Trace down t he code pat h from t he t elnet / ft p/ ssh box in Figure 16.2 t o see how net working is archit ect ed over BlueZ Bluet oot h. As you can see, t here are t wo different ways t o net work over Bluet oot h:

1 . By running TCP/ I P direct ly over BNEP. The result ing net work is called a personal area net work ( PAN) .

2 . By running TCP/ I P over PPP over RFCOMM. This is called dialup net working ( DUN) .

The kernel im plem ent at ion of Bluet oot h net working is unlikely t o int erest t he device driver writ er and is not explored. Table 16.2 shows t he st eps required t o net work t wo lapt ops using PAN, however. Net working wit h DUN resem bles t his and is not exam ined. The lapt ops are respect ively Bluet oot h- enabled using t he Sharp CF card and t he Belkin USB adapt er discussed earlier. You can slip t he CF card int o t he first lapt op's PCMCI A slot using a passive CF- t o- PCMCI A adapt er. Look at Figure 16.2 in t andem wit h Table 16.2 t o underst and t he m appings t o corresponding BlueZ com ponent s. Table 16.2 uses bash-sharp> and bash-belkin> as t he respect ive shell prom pt s of t he t wo lapt ops.

On t he lapt op wit h t he Sharp Bluet oot h CF card

1 . St art t he HCI and service discovery daem ons: bash-sharp> hcid bash-sharp> sdpd Because t his device possesses a UART int erface, you have t o at t ach t he BlueZ st ack t o t he appropriat e serial port . I n t his case, assum e t hat ser ial_cs has allot t ed / dev/ t t yS3 t o t he card: bash-sharp> hciattach ttyS3 any 2 . Verify t hat t he HCI int erface is up: bash-sharp> hciconfig -a hci0: Type: UART BD Address: 08:00:1F:10:3B:13 ACL MTU: 60:20 UP RUNNING PSCAN ISCAN ... Manufacturer: Silicon Wave (11)

SCO MTU: 31:1

3 . Verify t hat basic BlueZ m odules are loaded: bash-sharp> lsmod Module hci_uart l2cap bluetooth ...

Size 16728 26144 47684

Used by 3 2 6 hci_uart,l2cap

4 . I nsert t he BlueZ m odule t hat im plem ent s net work encapsulat ion: bash-sharp> modprobe bnep 5 . List en for incom ing PAN connect ions: [ 1 ] bash-sharp> pand –s On t he lapt op wit h t he Belkin USB Bluet oot h adapt er

1 . St art daem ons, such as hcid and sdpd, and insert necessary kernel m odules, such as bluet oot h.ko and l2cap.ko .

2 . Because t his is a USB device, you don't need t o invoke hciat t ach, but m ake sure t hat t he hci_usb.ko m odule is insert ed.

3 . Verify t hat t he HCI int erface is up: Code View: bash-belkin> hciconfig -a hci0: Type: USB BD Address: 00:02:72:B0:33:AB ACL MTU: 192:8 UP RUNNING PSCAN ISCAN

SCO MTU: 64:8

... Manufacturer: Cambridge Silicon Radio (10)

4 . Search and discover devices in t he neighborhood: bash-belkin> hcitool -i hci0 scan --flush Scanning.... 08:00:1F:10:3B:13 bash-sharp 5 . Est ablish a PAN wit h t he first lapt op. You can get it s Bluet oot h address ( 08: 00: 1F: 10: 3B: 13) from it s hciconfig out put : bash-belkin> pand -c 08:00:1F:10:3B:13

I f you now look at t he ifconfig out put on t he t wo lapt ops, you will find t hat a new int erface nam ed bnep0 has m ade an appearance at bot h ends. Assign I P addresses t o bot h int erfaces and get ready t o t elnet and FTP!

Ta ble 1 6 .2 . N et w ork ing Tw o La pt ops Usin g Bluet oot h PAN

[ 1] A useful com m and- line opt ion t o pand is --persist , which aut om at ically at t em pt s t o reconnect when a connect ion drops. Dig int o t he m an pages for m ore invocat ion opt ions.

H u m a n I n t e r fa ce D e vice s Look at sect ions " USB and Bluet oot h Keyboards " and " USB and Bluet oot h Mice " in Chapt er 7 , " I nput Drivers," for a discussion on Bluet oot h hum an int erface devices.

Au dio Let 's t ake t he exam ple of an HBH- 30 Sony Ericsson Bluet oot h headset t o underst and Bluet oot h SCO audio. Before t he headset can st art com m unicat ing wit h a Linux device, t he Bluet oot h link layer on t he lat t er has t o discover t he form er. For t his, put t he headset in discover m ode by pressing t he but t on earm arked for device discovery. I n addit ion, you have t o configure BlueZ wit h t he headset 's personal ident ificat ion num ber ( PI N) by adding it t o / et c/ bluet oot h/ pin. An applicat ion on t he Linux device t hat uses BlueZ SCO API s can now send audio dat a t o t he headset . The audio dat a should be in a form at t hat t he headset underst ands. The HBH- 30 uses t he A- law PCM ( pulse code m odulat ion ) form at . There are public dom ain ut ilit ies for convert ing audio int o various PCM form at s. Bluet oot h chipset s com m only have PCM int erface pins in addit ion t o t he HCI t ransport int erface. I f a device support s, for inst ance, bot h Bluet oot h and Global Syst em for Mobile Com m unicat ion ( GSM) , t he PCM lines from t he GSM chipset m ay be direct ly wired t o t he Bluet oot h chip's PCM audio lines. You m ight t hen have t o configure t he Bluet oot h chip t o receive and send SCO audio packet s over it s HCI int erface inst ead of it s PCM int erface.

D e bu ggin g There are t wo BlueZ t ools useful for debugging:

1 . hcidum p t aps HCI packet s flowing back and fort h, and parses t hem int o hum an- readable form . Here's an exam ple dum p while a device inquiry is in progress: bash> hcidump -i hci0 HCIDump - HCI packet analyzer ver 1.11 device: hci0 snap_len: 1028 filter: 0xffffffff HCI Command: Inquiry (0x01|0x0001) plen 5 HCI Event: Command Status (0x0f) plen 4 HCI Event: Inquiry Result (0x02) plen 15 ... HCI Event: Inquiry Complete (0x01) plen 1 < HCI Command: Remote Name Request (0x01|0x0019) plen 10 ... 2 . The virt ual HCI driver ( hci_vhci.ko ) , as shown in Figure 16.2 , em ulat es a Bluet oot h int erface if you do not have act ual hardware.

Look in g a t t h e Sou r ce s Look inside drivers/ bluet oot h/ for BlueZ low- level drivers. Explore net / bluet oot h/ for insight s int o t he BlueZ prot ocol im plem ent at ion. Bluet oot h applicat ions fall under different profiles based on how t hey behave. For exam ple, t he cordless t elephony profile specifies how a Bluet oot h device can im plem ent a cordless phone. We discussed profiles for PAN and serial access, but t here are m any m ore profiles out t here such as fax profile, General Obj ect Exchange Profile ( GOEP) and SI M Access Profile ( SAP) . The bluez- ut ils package, downloadable from www.bluez.org , provides support for several Bluet oot h profiles. The official Bluet oot h websit e is www.bluet oot h.org . I t cont ains Bluet oot h specificat ion docum ent s and inform at ion about t he Bluet oot h Special I nt erest Group ( SI G) . Affix is an alt ernat e Bluet oot h st ack on Linux. You can download Affix from ht t p: / / affix.sourceforge.net / .

I n fr a r e d I nfrared ( I R) rays are opt ical waves lying bet ween t he visible and t he m icrowave regions of t he elect rom agnet ic spect rum . One use of I R is in point - t o- point dat a com m unicat ion. Using I R, you can exchange visit ing cards bet ween PDAs, net work t wo lapt ops, or dispat ch a docum ent t o a print er. I R has a range of up t o 1 m et er wit hin a 30- degree cone, spreading from –15 t o + 15 degrees. There are t wo popular flavors of I R com m unicat ion: St andard I R ( SI R) , which support s speeds of up t o 115.20 Kbaud; and Fast I R ( FI R) , which has a bandwidt h of 4Mbps. Figure 16.4 shows I R connect ion on a lapt op. UART1 in t he Super I / O chipset is I R- enabled, so an I R t ransceiver is direct ly connect ed t o it . Lapt ops having no I R support in t heir Super I / O chip m ay rely on an ext ernal I R dongle ( see t he sect ion " Device Exam ple: I R Dongle" ) sim ilar t o t he one connect ed t o UART0. Figure 16.5 shows I R connect ion on an em bedded SoC having a built - in I R dongle connect ed t o a syst em UART.

Figu r e 1 6 .4 . I r D A on a la pt op.

Figu r e 1 6 .5 . I r D A on a n e m be dde d de vice ( for e x a m ple , EP7 2 1 1 ) .

Linux support s I R com m unicat ion on t wo planes:

1 . I nt elligent dat a- t ransfer via prot ocols specified by t he I nfrared Dat a Associat ion ( I rDA) . This is im plem ent ed by t he Linux- I rDA proj ect .

2 . Cont rolling applicat ions via a rem ot e cont rol. This is im plem ent ed by t he Linux I nfrared Rem ot e Cont rol ( LI RC) proj ect .

This sect ion prim arily explores Linux- I rDA but t akes a quick look at LI RC before wrapping up.

Lin u x - I r D A The Linux- I rDA proj ect ( ht t p: / / irda.sourceforge.net / ) brings I rDA capabilit ies t o t he kernel. To get an idea of how Linux- I rDA com ponent s relat e vis- à- vis t he I rDA st ack and possible hardware configurat ions, let 's crisscross t hrough Figure 16.6:

1 . Device drivers const it ut e t he bot t om layer. SI R chipset s t hat are 16550- com pat ible can reuse t he nat ive Linux serial driver aft er shaping it s behavior using t he I rDA line discipline, I rTTY. An alt ernat ive t o t his com bo is t he I rPort driver. FI R chipset s have t heir own special drivers.

2 . Next com es t he core prot ocol st ack. This consist s of t he I R Link Access Prot ocol ( I rLAP) , I R Link Managem ent Prot ocol ( I rLMP) , Tiny Transport Prot ocol ( TinyTP) , and t he I rDA socket ( I rSock) int erface. I rLAP provides a reliable t ransport as well as t he st at e m achine t o discover neighboring devices. I rLMP is a m ult iplexer over I rLAP. TinyTP provides segm ent at ion, reassem bly, and flow cont rol. I rSock offers a socket int erface over I rLMP and TinyTP.

3.

3 . Higher regions of t he st ack m arry I rDA t o dat a- t ransfer applicat ions. I rLAN and I rNET enable net working, while I rCom m allows serial com m unicat ion.

4 . You also need t he applicat ions t hat ult im at ely m ake or break t he t echnology. An exam ple is openobex (ht t p: / / openobex.sourceforge.net / ) , which im plem ent s t he OBj ect EXchange ( OBEX) prot ocol used t o exchange obj ect s such as docum ent s and visit ing cards. To configure Linux- I rDA, you need t he irda- ut ils package t hat com es bundled wit h m any dist ribut ions. This provides t ools such as irat t ach, irdadum p, and irdaping.

Figu r e 1 6 .6 . Com m u n ica t in g ove r Lin u x - I r D A.

[ View full size im age]

D e vice Ex a m ple : Su pe r I / O Ch ip To get a first t ast e of Linux- I rDA, let 's get t wo lapt ops t alking t o each ot her over I R. Each lapt op is I R- enabled via Nat ional Sem iconduct or's NSC PC87382 Super I / O chip. [ 2] UART1 in Figure 16.4 shows t he connect ion scenario. The PC87382 chip can work in bot h SI R and FI R m odes. We will look at each in t urn.

[ 2]

Super I / O chipset s t ypically support several peripherals besides I rDA, such as serial port s, parallel port s, Musical I nst rum ent Digit al I nt erface ( MI DI ) , and floppy cont rollers.

SI R chips offer a UART int erface t o t he host com put er. For com m unicat ing in SI R m ode, at t ach t he associat ed UART port ( / dev/ t t yS1 in t his exam ple) of each lapt op t o t he I rDA st ack:

bash> irattach /dev/ttyS1 -s

Verify t hat I rDA kernel m odules ( irda.ko, sir_dev.ko, and irt t y_sir.ko) are loaded and t hat t he ir da_sir _w q kernel t hread is running. The irda0 int erface should also have m ade an appearance in t he ifconfig out put . The -s opt ion t o irattach t riggers a search for I R act ivit y in t he neighborhood. I f you slide t he lapt ops such t hat t heir I R t ransceivers lie wit hin t he range cone, t hey will be able t o spot each ot her: bash> cat /proc/net/irda/discovery nickname: localhost, hint: 0x4400, saddr: 0x55529048, daddr: 0x8fefb350

The ot her lapt op m akes a sim ilar announcem ent , but wit h t he source and dest inat ion addresses ( saddr and daddr) reversed. You m ay set t he desired com m unicat ion speed using st t y on t t yS1. To set t he baud rat e t o 19200, do t his: bash> stty speed 19200 < /dev/ttyS1

The easiest way t o cull I R act ivit y from t he air is by using t he debug t ool, irdadum p. Here's a sam ple dum p obt ained during t he preceding connect ion est ablishm ent , which shows t he negot iat ed param et ers: Code View: bash> irdadump -i irda0 ... 22:05:07.831424 snrm:cmd ca=fe pf=1 6fb7ff33 > 2c0ce8b6 new-ca=40 LAP QoS: Baud Rate=19200bps Max Turn Time=500ms Data Size=2048B Window Size=7 Add BOFS=0 Min Turn Time=5000us Link Disc=12s (32) 22:05:07.987043 ua:rsp ca=40 pf=1 6fb7ff33 < 2c0ce8b6 LAP QoS: Baud Rate=19200bps Max Turn Time=500ms Data Size=2048B Window Size=7 Add BOFS=0 Min Turn Time=5000us Link Disc=12s (31) ...

You can also obt ain debug inform at ion out of t he I rDA st ack by cont rolling t he verbosit y level in / proc/ sys/ net / irda/ debug. To set t he lapt ops in FI R m ode, dissociat e t t yS1 from t he nat ive serial driver and inst ead at t ach it t o t he NSC FI R driver, nsc- ircc.ko: bash> setserial /dev/ttyS1 uart none bash> modprobe nsc-ircc dongle_id=0x09 bash> irattach irda0 -s

dongle_id depends on your I R hardware and can be found from your hardware docum ent at ion. As you did for SI R, t ake a look at / proc/ net / irda/ discovery t o see whet her t hings are okay t hus far. Som et im es, FI R com m unicat ion hangs at higher speeds. I f irdadum p shows a com m unicat ion freeze, eit her put on your kernel hacking hat and fix t he code, or t ry lowering t he negot iat ed speed by t weaking / proc/ sys/ net / irda/ m ax_baud_rat e. Not e t hat unlike t he Bluet oot h physical layer t hat can est ablish one- t o- m any connect ions, I R can support only a single connect ion per physical device at a t im e.

D e vice Ex a m ple : I R D on gle Dongles are I R devices t hat plug int o serial or USB port s. Som e m icrocont rollers ( such as Cirrus Logic's EP7211 shown in Figure 16.5) t hat have on- chip I R cont rollers wired t o t heir UARTs are also considered dongles. Dongle drivers are a set of cont rol m et hods responsible for operat ions such as changing t he com m unicat ion speed. They have four ent ry point s: open(), reset(), change_speed(), and close(). These ent ry point s are defined as part of a dongle_driver st ruct ure and are invoked from t he cont ext of t he I rDA kernel t hread, ir da_sir _w q. Dongle driver m et hods are allowed t o block because t hey are invoked from process cont ext wit h no locks held. The I rDA core offers t hree helper funct ions t o dongle drivers: sirdev_raw_write() and sirdev_raw_read() t o exchange cont rol dat a wit h t he associat ed UART, and sirdev_set_dtr_rts() t o wiggle m odem cont rol lines connect ed t o t he UART. Because you're probably m ore likely t o add kernel support for dongles t han m odify ot her part s of Linux- I rDA, let 's im plem ent an exam ple dongle driver. Assum e t hat you're enabling a yet - unsupport ed sim ple serial I R dongle t hat com m unicat es only at 19200 or 57600 baud. Assum e also t hat when t he user want s t o t oggle t he baud rat e bet ween t hese t wo values, you have t o hold t he UART's Request - t o- Send ( RTS) pin low for 50 m icroseconds and pull it back high for 25 m icroseconds. List ing 16.2 im plem ent s a dongle driver for t his device. List in g 1 6 .2 . An Ex a m ple D on gle D r ive r Code View: #include #include #include "sir-dev.h" /* Assume that this sample driver lives in drivers/net/irda/ */ /* Open Method. This is invoked when an irattach is issued on the associated UART */ static int mydongle_open(struct sir_dev *dev) { struct qos_info *qos = &dev->qos; /* Power the dongle by setting modem control lines, DTR/RTS. */ sirdev_set_dtr_rts(dev, TRUE, TRUE); /* Speeds that mydongle can accept */ qos->baud_rate.bits &= IR_19200|IR_57600; irda_qos_bits_to_value(qos); /* Set QoS */ return 0; } /* Change baud rate */ static int mydongle_change_speed(struct sir_dev *dev, unsigned speed) { if ((speed == 19200) || (speed = 57600)){ /* Toggle the speed by pulsing RTS low for 50 us and back high for 25 us */ sirdev_set_dtr_rts(dev, TRUE, FALSE); udelay(50); sirdev_set_dtr_rts(dev, TRUE, TRUE); udelay(25); return 0; } else { return -EINVAL;

} } /* Reset */ static int mydongle_reset(struct sir_dev *dev) { /* Reset the dongle as per the spec, for example, by pulling DTR low for 50 us */ sirdev_set_dtr_rts(dev, FALSE, TRUE); udelay(50); sirdev_set_dtr_rts(dev, TRUE, TRUE); dev->speed = 19200; /* Reset speed is 19200 baud */ return 0; } /* Close */ static int mydongle_close(struct sir_dev *dev) { /* Power off the dongle as per the spec, for example, by pulling DTR and RTS low.. */ sirdev_set_dtr_rts(dev, FALSE, FALSE); return 0; } /* Dongle Driver Methods */ static struct dongle_driver mydongle = { .owner = THIS_MODULE, .type = MY_DONGLE, /* Add this to the enumeration in include/linux/irda.h */ .open = mydongle_open, /* Open */ .reset = mydongle_reset, /* Reset */ .set_speed = mydongle_change_speed, /* Change Speed */ .close = mydongle_close, /* Close */ }; /* Initialize */ static int __init mydongle_init(void) { /* Register the entry points */ return irda_register_dongle(&mydongle); } /* Release */ static void __exit mydongle_cleanup(void) { /* Unregister entry points */ irda_unregister_dongle(&mydongle); } module_init(mydongle_init); module_exit(mydongle_cleanup);

For real- life exam ples, look at drivers/ net / irda/ t ekram .c and drivers/ net / irda/ ep7211_ir.c. Now t hat you have t he physical layer running, let 's vent ure t o look at I rDA prot ocols.

I r Com m I rCom m em ulat es serial port s. Applicat ions such as t erm inal em ulat ors and prot ocols such as PPP can run unchanged over t he virt ual serial int erfaces creat ed by I rCom m . I rCom m is im plem ent ed by t wo relat ed m odules, ircom m .ko and ircom m _t t y.ko. The form er provides core prot ocol support , while t he lat t er creat es and m anages t he em ulat ed serial port nodes / dev/ ircom m X.

N e t w or k in g There are t hree ways t o get TCP/ I P applicat ions running over I rDA:

1 . Asynchronous PPP over I rCom m

2 . Synchronous PPP over I rNET

3 . Et hernet em ulat ion wit h I rLAN

Net working over I rCom m is equivalent t o running asynchronous PPP over a serial port , so t here is not hing out of t he ordinary in t his scenario. Asynchronous PPP needs t o m ark t he st art and end of fram es using t echniques such as byt e st uffing, but if PPP is running over dat a links such as Et hernet , it need not be burdened wit h t he overhead of a fram ing prot ocol. This is called synchronous PPP and is used t o configure net working over I rNET. [ 3] Passage t hrough t he PPP layer provides feat ures such as on- dem and I P address configurat ion, com pression, and aut hent icat ion.

[ 3]

For a scholarly discussion on net working over I rNET, read www.hpl.hp.com / personal/ Jean_Tourrilhes/ Papers/ I rNET.Dem and.ht m l.

To st art I rNET, insert irnet .ko. This also creat es t he charact er device node / dev/ irnet , which is a cont rol channel over which you can at t ach t he PPP daem on: bash> pppd /dev/irnet 9600 noauth a.b.c.d:a.b.c.e

This yields t he pppX net work int erfaces at eit her ends wit h t he respect ive I P addresses set t o a.b.c.d and a.b.c.e. The int erfaces can now beam TCP/ I P packet s. I rLAN provides raw Et hernet em ulat ion over I rDA. To net work your lapt ops using I rLAN, do t he following at bot h ends:

I nsert irlan.ko. This creat es t he net work int erface, irlanX, where X is t he assigned int erface num ber.

Configure t he irlanX int erfaces. To set t he I P address, do t his:

bash> ifconfig irlanX a.b.c.d Or aut om at e it by adding t he following line t o / et c/ sysconfig/ net work- script s/ - ifcfg- irlan0 : [ 4]

[ 4]

The locat ion of t his file is dist ribut ion- dependent .

DEVICE=irlanX IPADDR=a.b.c.d You can now t elnet bet ween t he lapt ops over t he irlanX int erfaces.

I r D A Sock e t s To develop cust om applicat ions over I rDA, use t he I rSock int erface. To creat e a socket over TinyTP, do t his: int fd = socket(AF_IRDA, SOCK_STREAM, 0);

For a dat agram socket over I rLMP, do t his: int fd = socket(AF_IRDA, SOCK_DGRAM, 0);

Look at t he irsocket s/ direct ory in t he irda- ut ils package for code exam ples.

Lin u x I n fr a r e d Re m ot e Con t r ol The goal of t he LI RC proj ect is t o let you cont rol your Linux com put er via a rem ot e. For exam ple, you can use LI RC t o cont rol applicat ions t hat play MP3 m usic or DVD m ovies via but t ons on your rem ot e. LI RC is archit ect ed int o

1 . A base LI RC m odule called lir c_dev.

2 . A hardware- specific physical layer driver. I R hardware t hat int erface via serial port s use lir c_ser ial. To allow lirc_serial t o do it s j ob wit hout int erference from t he kernel serial driver, dissociat e t he lat t er as you did earlier for FI R: bash> setserial /dev/ttySX uart none You m ay have t o replace lirc_serial wit h a m ore suit able low- level LI RC driver depending on your I R device.

3 . A user m ode daem on called lir cd t hat runs over t he low- level LI RC driver. Lircd decodes signals arriving from t he rem ot e and is t he cent erpiece of LI RC. Support for m any rem ot es are im plem ent ed in t he form of user- space drivers t hat are part of lircd. Lircd export s a UNI X- dom ain socket int erface / dev/ lircd t o higher applicat ions. Connect ing t o lircd via / dev/ lircd is t he key t o writ ing LI RC- aware applicat ions.

4 . An LI RC m ouse daem on called lircm d t hat runs on t op of lircd. Lircm d convert s m essages from lircd t o m ouse event s. These event s can be read from a nam ed pipe / dev/ lircm and input t o program s such as gpm or X Windows.

5 . Tools such as irrecord and irsend. The form er records signals received from your rem ot e and helps you generat e I R configurat ion files for a new rem ot e. The lat t er st ream s I R com m ands from your Linux m achine.

Visit t he LI RC hom e page host ed at www.lirc.org t o download all t hese and t o obt ain insight s on it s design and usage.

I R Char Drivers I f your em bedded device requires only sim ple I nfrared receive capabilit ies, it m ight be using a m iniat urized I R receiver ( such as t he TSOP1730 chip from Vishay Sem iconduct ors) . An exam ple applicat ion device is an I R locat or inst alled in hospit al room s t o read dat a em it t ed by I R badges worn by nurses. I n t his scenario, t he I rDA st ack is not relevant because of t he absence of I rDA prot ocol int eract ions. I t m ay also be an overkill t o port LI RC t o t he locat or if it 's using a lean propriet ary prot ocol t o parse received dat a. An easy solut ion m ight be t o im plem ent a t iny readonly char or m isc driver t hat export s raw I R dat a t o a suit able applicat ion via / dev or / sys int erfaces.

Look in g a t t h e Sou r ce s Look inside drivers/ net / irda/ for I rDA low- level drivers, net / ir da/ for t he prot ocol im plem ent at ion, and include/ net / irda/ for t he header files. Experim ent wit h proc/ sys/ net / irda/ * t o t une t he I rDA st ack and explore / proc/ net / irda/ * for st at e inform at ion pert aining t o different I rDA layers. Table 16.3 cont ains t he m ain dat a st ruct ures used in t his sect ion and t heir locat ion in t he source t ree. Table 16.4 list s t he m ain kernel program m ing int erfaces t hat you used in t his sect ion along wit h t he locat ion of t heir definit ions.

Ta ble 1 6 .3 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion

D e scr ipt ion

dongle_driver

drivers/ net / irda/ sir- dev.h

Dongle driver ent ry point s

sir_dev

drivers/ net / irda/ sir- dev.h

Represent at ion of an SI R device

qos_info

include/ net / irda/ qos.h

Qualit y- of- Service inform at ion

Ta ble 1 6 .4 . Su m m a r y of Ke r n e l Pr ogr a m m in g I n t e r fa ce s Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

irda_register_dongle()

drivers/ net / irda/ sir_dongle.c Regist ers a dongle driver

irda_unregister_dongle()

drivers/ net / irda/ sir_dongle.c Unregist ers a dongle driver

sirdev_set_dtr_rts()

drivers/ net / irda/ sir_dev.c

Wiggles m odem cont rol lines on t he serial port at t ached t o t he I R dev ice

sirdev_raw_write()

drivers/ net / irda/ sir_dev.c

Writ es t o t he serial port at t ached t o t he I R device

Ke r n e l I n t e r fa ce

Loca t ion

D e scr ipt ion

sirdev_raw_read()

drivers/ net / irda/ sir_dev.c

Reads from t he serial port at t ached t o t he I R device

W iFi WiFi, or wireless local- area net work ( WLAN) , is an alt ernat ive t o wired LAN and is generally used wit hin a cam pus. The I EEE 802.11a WLAN st andard uses t he 5GHz I SM (I ndust rial, Scient ific, Medical) band and support s speeds of up t o 54Mbps. The 802.11b and t he 802.11g st andards use t he 2.4GHz band and support speeds of 11Mbps and 54Mbps, respect ively. WLAN resem bles wired Et hernet in t hat bot h are assigned MAC addresses from t he sam e address pool and bot h appear t o t he operat ing syst em as regular net work int erfaces. For exam ple, Address Resolut ion Prot ocol ( ARP) t ables cont ain WLAN MAC addresses alongside Et hernet MAC addresses. WLAN and wired Et hernet differ significant ly at t he link layer, however:

The 802.11 WLAN st andard uses collision avoidance ( CSMA/ CA) rat her t han collision det ect ion ( CSMA/ CD) used by wired Et hernet .

WLAN fram es, unlike Et hernet fram es, are acknowledged.

Due t o securit y issues inherent in wireless net working, WLAN uses an encrypt ion m echanism called Wired Equivalent Privacy ( WEP) t o provide a level of securit y equivalent t o wired Et hernet . WEP com bines a 40bit or a 104- bit key wit h a random 24- bit init ializat ion vect or t o encrypt and decrypt dat a.

WLAN support s t wo com m unicat ion m odes:

1 . Ad- hoc m ode, where a sm all group of nearby st at ions direct ly com m unicat e wit hout using an access point .

2 . I nfrast ruct ure m ode, where dat a exchanges pass via an access point . Access point s periodically broadcast a service set ident ifier ( SSI D or ESSI D) t hat ident ifies one WLAN net work from anot her.

Let 's find out how Linux support s WLAN.

Con figu r a t ion The Wireless Ext ensions proj ect defines a generic Linux API t o configure WLAN device drivers in a deviceindependent m anner. I t also provides a set of com m on t ools t o set and access inform at ion from WLAN drivers. I ndividual drivers im plem ent support for Wireless Ext ensions t o connect t hem selves wit h t he com m on int erface and, hence, wit h t he t ools. Wit h Wireless Ext ensions, t here are prim arily t hree ways t o t alk t o WLAN drivers:

1 . St andard operat ions using t he iwconfig ut ilit y. To glue your driver t o iwconfig, you need t o im plem ent prescribed funct ions corresponding t o com m ands t hat set param et ers such as ESSI D and WEP keys.

2.

2 . Special- purpose operat ions using iwpriv. To use iwpriv over your driver, define privat e ioct ls relevant t o your hardware and im plem ent t he corresponding handler funct ions.

3 . WiFi- specific st at ist ics t hrough / proc/ net / w ireless. For t his, im plem ent t he get_wireless_stats() m et hod in your driver. This is in addit ion t o t he get_stats() m et hod im plem ent ed by NI C drivers for generic st at ist ics collect ion as described in t he sect ion "St at ist ics" in Chapt er 15, " Net work I nt erface Cards."

WLAN drivers t ie t hese t hree pieces of inform at ion inside a st ruct ure called iw_handler_def, defined in include/ net / iw_handler.h. The address of t his st ruct ure is supplied t o t he kernel via t he device's net_device st ruct ure ( discussed in Chapt er 15) during init ializat ion. List ing 16.3 shows a skelet al WLAN driver im plem ent ing support for Wireless Ext ensions. The com m ent s in t he list ing explain t he associat ed code. List in g 1 6 .3 . Su ppor t in g W ir e le ss Ex t e n sion s Code View: #include #include /* Populate the iw_handler_def structure with the location and number of standard and private handlers, argument details of private handlers, and location of get_wireless_stats() */ static struct iw_handler_def mywifi_handler_def = { .standard = mywifi_std_handlers, .num_standard = sizeof(mywifi_std_handlers) / sizeof(iw_handler), .private = (iw_handler *) mywifi_pvt_handlers, .num_private = sizeof(mywifi_pvt_handlers) / sizeof(iw_handler), .private_args = (struct iw_priv_args *)mywifi_pvt_args, .num_private_args = sizeof(mywifi_pvt_args) / sizeof(struct iw_priv_args), .get_wireless_stats = mywifi_stats, };

/* Handlers corresponding to iwconfig */ static iw_handler mywifi_std_handlers[] = { NULL, /* SIOCSIWCOMMIT */ mywifi_get_name, /* SIOCGIWNAME */ NULL, /* SIOCSIWNWID */ NULL, /* SIOCGIWNWID */ mywifi_set_freq, /* SIOCSIWFREQ */ mywifi_get_freq, /* SIOCGIWFREQ */ mywifi_set_mode, /* SIOCSIWMODE */ mywifi_get_mode, /* SIOCGIWMODE */ /* ... */ }; #define MYWIFI_MYPARAMETER SIOCIWFIRSTPRIV /* Handlers corresponding to iwpriv */ static iw_handler mywifi_pvt_handlers[] = { mywifi_set_myparameter, /* ... */ }; /* Argument description of private handlers */

static const struct iw_priv_args mywifi_pvt_args[] = { { MYWIFI_MYPARAMATER, IW_PRIV_TYPE_INT | IW_PRIV_SIZE_FIXED | 1, 0, "myparam"}, } struct iw_statistics mywifi_stats; /* WLAN Statistics */ /* Method to set operational frequency supplied via mywifi_std_handlers. Similarly implement the rest of the methods */ mywifi_set_freq() { /* Set frequency as specified in the data sheet */ /* ... */ } /* Called when you read /proc/net/wireless */ static struct iw_statistics * mywifi_stats(struct net_device *dev) { /* Fill the fields in mywifi_stats */ /* ... */ return(&mywifi_stats); }

/*Device initialization. For PCI-based cards, this is called from the probe() method. Revisit init_mycard() in Listing 15.1 in Chapter 15 for a full discussion */ static int init_mywifi_card() { struct net_device *netdev; /* Allocate WiFi network device. Internally calls alloc_etherdev() */ netdev = alloc_ieee80211(sizeof(struct mywifi_priv)); /* ... */ /* Register Wireless Extensions support */ netdev->wireless_handlers = &mywifi_handler_def; /* ... */ register_netdev(netdev); }

Wit h Wireless Ext ensions support com piled in, you can use iwconfig t o configure t he ESSI D and t he WEP key, peek at support ed privat e com m ands, and dum p net work st at ist ics: bash> iwconfig eth1 essid blue key 1234-5678-9012-3456-7890-1234-56 bash> iwconfig eth1 eth1 IEEE 802.11b ESSID:"blue" Nickname:"ipw2100" Mode:Managed Frequency:2.437 GHz Access Point: 00:40:96:5E:07:2E ...

Encryption key:1234-5678-9012-3456-7890-1234-56 Security mode:open ... bash> dhcpcd eth1 bash> ifconfig eth1 Link encap:Ethernet Hwaddr 00:13:E8:02:EE:18 inet addr:192.168.0.41 Bcasr:192.168.0.255 Mask:255.255.255.0 ... bash> iwpriv eth1 eth1 Available private ioctls: myparam (8BE2): set 2 int & get 0 bash> cat /proc/net/wireless Inter-| sta-| Quality | Discarded packets face | tus |link level noise|nwid crypt frag eth1: 0004 100. 207. 0. 0 0 0

retry 2

| Missed | WE misc| beacon | 19 1 0

Local iwconfig param et ers such as t he ESSI D and WEP key should m at ch t he configurat ion at t he access point . There is anot her proj ect called cfg80211 having sim ilar goals as Wireless Ext ensions. This has been m erged int o t he m ainline kernel st art ing wit h t he 2.6.22 kernel release.

D e vice D r ive r s There are hundreds of WLAN original equipm ent m anufact urers ( OEMs) in t he m arket , and cards com e in several form fact ors such as PCI , Mini PCI , CardBus, PCMCI A, Com pact Flash, USB, and SDI O ( see t he sidebar "WiFi over SDI O" ) . However, t he num ber of cont roller chips t hat lie at t he heart of t hese devices, and hence t he num ber of Linux device drivers, are relat ively less in num ber. The I nt ersil Prism chipset , Lucent Herm es chipset , At heros chipset , and I nt el Pro/ Wireless are am ong t he popular WLAN cont rollers. The following are exam ple devices built using t hese cont rollers:

I n t e r sil Pr ism 2 W LAN Com pa ct Fla sh Ca r d— The Orinoco WLAN driver, part of t he kernel source t ree, support s bot h Prism - based and Herm es- based cards. Look at orinoco.c and herm es.c in drivers/ net / wireless/ for t he sources. orinoco_cs provides PCMCI A/ CF Card Services support .

Th e Cisco Air on e t Ca r dBu s a da pt e r — This card uses an At heros chipset . The Madwifi proj ect (ht t p: / / m adwifi.org/ ) offers a Linux driver t hat works on hardware built using At heros cont rollers. The Madwifi source base is not part of t he kernel source t ree prim arily due t o licensing issues. One of t he m odules of t he Madwifi driver called Hardware Access Layer ( HAL) is closed source. This is because t he At heros chip is capable of operat ing at frequencies t hat are out side perm issible I SM bands and can work at various power levels. The U.S. Federal Com m unicat ions Com m ission ( FCC) m andat es t hat such set t ings should not be easily changeable by users. Part of HAL is dist ribut ed as binary- only t o com ply wit h FCC regulat ions. This binary- only port ion is independent of t he kernel version.

I n t e l Pr o/ W ir e le ss M in i PCI ( a n d PCI e M in i) ca r ds e m be dde d on m a n y la pt ops— The kernel source t ree cont ains drivers for t hese cards. The drivers for t he 2100 and 2200 BG series cards are drivers/ net / wireless/ ipw2100.c and drivers/ net / wireless/ ipw2200.c, respect ively. These devices need oncard firm ware t o work. You can download t he firm ware from ht t p: / / ipw2100.sourceforge.net / or ht t p: / / ipw2200.sourceforge.net / depending on whet her you have a 2100 or a 2200. The sect ion "Microcode Download" in Chapt er 4 , " Laying t he Groundwork," described t he st eps needed t o download

firm ware on t o t hese cards. I nt el's dist ribut ion t erm s for t he firm ware are rest rict ive.

W LAN USB de vice s— The At m el USB WLAN driver ( ht t p: / / at m elwlandriver.sourceforge.net / ) support s USB WLAN devices built using At m el chipset s.

The WLAN driver's t ask is t o let your card appear as a norm al net work int erface. Driver im plem ent at ions are generally split int o t he following part s:

1 . Th e in t e r fa ce t h a t com m u n ica t e s w it h t h e Lin u x n e t w or k in g st a ck — We discussed t his in det ail in t he sect ion " The Net Device I nt erface" in Chapt er 15. You can use List ing 15.1 in t hat chapt er as a t em plat e t o im plem ent t his port ion of your WLAN driver.

2 . For m fa ct or – spe cific code — I f your card is a PCI card, it has t o be archit ect ed t o conform t o t he kernel PCI subsyst em as described in Chapt er 10, " Peripheral Com ponent I nt erconnect ." Sim ilarly, PCMCI A and USB cards have t o t ie in wit h t heir respect ive core layers.

3 . Ch ipse t spe cific pa r t — This is t he cornerst one of t he WLAN driver and is based on regist er specificat ions in t he chip's dat a sheet . Many com panies do not release adequat e docum ent at ion for writ ing open source device drivers, however, so t his port ion of som e Linux WLAN drivers is at least part ly based on reverseengineering.

4 . Su ppor t for W ir e le ss Ex t e n sion s— List ing 16.3, shown earlier, im plem ent s an exam ple.

Hardware- independent port ions of t he 802.11 st ack are reusable across drivers, so t hey are im plem ent ed as a collect ion of com m on library funct ions in t he net / ieee80211/ direct ory. ieee80211 is t he core prot ocol m odule, but if you want t o configure WEP keys via t he iwconfig com m and, you have t o load ieee80211_crypt and ieee80211_crypt _w ep, t oo. To generat e debugging out put from t he 802.11 st ack, enable CONFIG_IEEE80211_DEBUG while configuring your kernel. You can use / proc/ net / ieee80211/ debug_level as a knob t o fine- t une t he t ype of debug m essages t hat you want t o see. St art ing wit h t he 2.6.22 release, t he kernel has an alt ernat e 802.11 st ack ( net / m ac80211/ ) donat ed by a com pany called Devicescape. WiFi device drivers m ay m igrat e t o t his new st ack in t he fut ure.

WiFi over SDI O Like PCMCI A cards whose funct ionalit y has ext ended from st orage t o various ot her t echnologies, SD cards are no longer confined t o t he consum er elect ronics m em ory space. The Secure Digit al I nput / Out put ( SDI O) st andard brings t echnologies such as WiFi, Bluet oot h, and GPS t o t he SD realm . The Linux- SDI O proj ect host ed at ht t p: / / sourceforge.net / proj ect s/ sdio- linux/ offers drivers for several SDI O cards. Go t o www.sdcard.org t o browse t he SD Card Associat ion's websit e. The lat est st andards adopt ed by t he associat ion are m icroSD and m iniSD, which are m iniat ure form fact or versions of t he SD card.

Look in g a t t h e Sou r ce s

WiFi device drivers live in drivers/ net / wireless/ . Look inside net / wireless/ for t he im plem ent at ions of Wireless Ext ensions and t he new cfg80211 configurat ion int erface. The t wo Linux 802.11 st acks live under net / ieee80211/ and net / m ac80211/ , respect ively.

Ce llu la r N e t w or k in g Global Syst em for Mobile Com m unicat ions ( GSM) is a prom inent digit al cellular st andard. GSM net works are called 2G or second- generat ion net works. GPRS represent s t he evolut ion from 2G t o 2.5G. Unlike 2G net works, 2.5G net works are " always on." Com pared t o GSM's 9.6Kbps t hroughput , GPRS support s t heoret ical speeds of up t o 170Kbps. 2.5G GPRS has given way t o 3G net works based on t echnologies such as CDMA t hat offer higher speeds. I n t his sect ion, let 's look at GPRS and CDMA.

GPRS Because GPRS chips are cellular m odem s, t hey present a UART int erface t o t he syst em and usually don't require specialized Linux drivers. Here's how Linux support s com m on GPRS hardware:

1 . For a syst em wit h built - in GPRS support , say, a board having a Siem ens MC- 45 m odule wired t o t he m icrocont roller's UART channel, t he convent ional Linux serial driver can drive t he link.

2 . For a PCMCI A/ CF GPRS device such as an Opt ions GPRS card, ser ial_cs, t he generic serial Card Services driver allows t he rest of t he operat ing syst em t o see t he card as a serial device. The first unused serial device ( / dev/ t t ySX) get s allot t ed t o t he card. Look at Figure 9.5 in Chapt er 9 , for an illust rat ion.

3 . For USB GPRS m odem s, a USB- t o- serial convert er t ypically convert s t he USB port t o a virt ual serial port . The usbserial driver let s t he rest of t he syst em see t he USB m odem as a serial device ( / dev/ t t yUSBX) . The sect ion "USB- Serial" in Chapt er 11 discussed USB- t o- serial convert ers.

The above driver descript ions also hold for driving Global Posit ioning Syst em ( GPS) receivers and net working over GSM. Aft er t he serial link is up, you m ay est ablish a net work connect ion via AT com m ands, a st andard language t o t alk t o m odem s. Cellular devices support an ext ended AT com m and set . The exact com m and sequence depends on t he part icular cellular t echnology in use. Consider for exam ple, t he AT st ring t o connect over GPRS. Before ent ering dat a m ode and connect ing t o an ext ernal net work via a gat eway GPRS support node ( GGSN) , a GPRS device m ust define a cont ext using an AT com m and. Here's an exam ple cont ext st ring: 'AT+CGDCONT=1,"IP","internet1.voicestream.com","0.0.0.0",0,0'

where 1 st ands for a cont ext num ber, IP is t he packet t ype, internet1.-voicestream.com is an access point nam e ( APN) specific t o t he service provider, and 0.0.0.0 asks t he service provider t o choose t he I P address. The last t wo param et ers pert ain t o dat a and header com pression. A usernam e and password are usually not needed. As you saw in Chapt er 9 , PPP is used as t he vehicle t o carry TCP/ I P payload over GPRS. A com m on synt ax for invoking t he PPP daem on, pppd, is t his: bash> pppd ttySX call connection-script

where ttySX is t he serial port over which PPP runs, and connection-script is a file in / et c/ ppp/ peers/ [ 5] t hat cont ains t he AT com m and sequence t o est ablish t he link. Aft er est ablishing connect ion and com plet ing aut hent icat ion, PPP st art s a Net work Cont rol Prot ocol ( NCP) such as I nt ernet Prot ocol Cont rol Prot ocol ( I PCP) . When I PCP successfully negot iat es I P addresses, PPP st art s t alking wit h t he TCP/ I P st ack.

[ 5]

The pat h nam e m ight vary depending on t he dist ribut ion you use.

Here is an exam ple PPP connect ion script ( / et c/ ppp/ peer/ gprs- seq) for connect ing t o a GPRS service provider at 57600 baud. For t he sem ant ics of all const it uent lines in t he script , refer t o t he m an pages of pppd: 57600 connect "/usr/sbin/chat -s -v "" AT+CGDCONT=1,"IP", "internet2.voicestream.com","0.0.0.0",0,0 OK AT+CGDATA="PPP",1" crtscts noipdefault modem usepeerdns defaultroute connect-delay 3000

CD M A For perform ance reasons, m any CDMA PC Cards have an int ernal USB cont roller t hrough which a CDMA m odem is connect ed. When such cards are insert ed, t he syst em sees one or m ore new PCI - t o- USB bridges on t he PCI bus. Let 's t ake t he exam ple of a Huawei CDMA CardBus card. Look at t he addit ional ent ries in t he lspci out put aft er insert ing t his card int o t he CardBus slot of a lapt op: Code View: bash> lspci ... 07:00:0 USB 07:00:1 USB 07:00:2 USB

-v Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI]) Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI]) Controller: NEC Corporation USB 2.0 (rev 04) (prog-if 20 [EHCI])

These are st andard OHCI and EHCI cont rollers, so t he host cont roller drivers on Linux seam lessly t alk t o t hem . I f a CDMA card, however, uses a host cont roller unsupport ed by t he kernel, you will have t he unenviable t ask of writ ing a new USB host cont roller driver. Let 's t ake a closer look at t he new USB buses in t he above lspci out put and see whet her we can find any devices connect ed t o t hem : Code View: bash> cat /proc/bus/usb/devices T: Bus=07 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh= 2 B: Alloc= 0/800 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1 ... T: Bus=06 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 1 B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 ... T: Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 1 B: Alloc= 0/900 us ( 0%), #Int= 1, #Iso= 0

D: ... T: D: P: S: S: C:* I: E: E: E: I: E: E: ...

Ver= 1.10 Cls=09(hub

) Sub=00 Prot=00 MxPS=64 #Cfgs=

1

Bus=05 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 3 Spd=12 MxCh= 0 Ver= 1.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=16 #Cfgs= 1 Vendor=12d1 ProdID=1001 Rev= 0.00 Manufacturer=Huawei Technologies Product=Huawei Mobile #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=pl2303 Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=128ms Ad=8a(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms Ad=0b(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=pl2303 Ad=83(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms Ad=06(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms

The t op t hree ent ries ( bus7, bus6, and bus5) correspond t o t he t hree host cont rollers present in t he CDMA card. The last ent ry shows t hat a full- speed ( 12Mbps) USB device is connect ed t o bus 5. This device has a vendorID of 0x12d1 and a productID of 0x1001. As is evident from t he preceding out put , t he USB core has bound t his device t o t he pl2303 driver. I f you look at t he source file of t he PL2303 Prolific USB- t o- serial adapt er driver (drivers/ usb/ serial/ pl2303.c) , you will find t he following m em ber in t he usb_device_id t able: static struct usb_device_id id_table [] = { /* ... */ {USB_DEVICE(HUAWEI_VENDOR_ID, HUAWEI_PRODUCT_ID)}, /* ... */ };

A quick peek at pl2303.h living in t he sam e direct ory confirm s t hat HUAWEI_VENDOR_ID and HUAWEI_PRODUCT_ID m at ch t he values t hat you j ust gleaned from / proc/ bus/ usb/ devices. The pl2303 driver present s a serial int erface, / dev/ t t yUSB0, over t he det ect ed USB- t o- serial convert er. You can send AT com m ands t o t he CDMA m odem over t his int erface. At t ach pppd over t his device and connect t o t he net . You are now a 3G surfer!

Cu r r e n t Tr e n ds At one end of t oday's on- t he- m ove connect ivit y spect rum , t here are st andards t hat allow coupling bet ween cellular net works and WiFi t o provide cheaper net working solut ions. At t he ot her end, t echnologies such as Bluet oot h and I nfrared are being int egrat ed int o GPRS cell phones t o bridge consum er elect ronics devices wit h t he I nt ernet . Figure 16.7 shows a sam ple scenario.

Figu r e 1 6 .7 . Cou plin g be t w e e n w ir e le ss t e ch n ologie s.

[ View full size im age]

I n t andem wit h t he coupling of exist ing st andards and t echnologies, t here is a st eady st ream of new com m unicat ion st andards arriving in t he wireless space. Zigbee (www.zigbee.org) adopt s t he new 802.15.4 st andard for wireless net working in t he em bedded space t hat is charact erized by low range, speed, energy consum pt ion, and code foot print . I t prim arily t arget s hom e and indust rial aut om at ion. Of t he wireless prot ocols discussed in t his chapt er, Zigbee is closest t o Bluet oot h but is considered com plem ent ary rat her t han com pet it ive wit h it . WiMax (Worldwide int eroperabilit y for Microwave access) , based on t he I EEE 802.16 st andard, is a m et ropolit an- area net work ( MAN) flavor of WiFi t hat has a range of several kilom et ers. I t support s fixed connect ivit y for hom es and offices, and a m obile version for net working on t he go. WiMax is a cost - effect ive way t o solve t he last - m ile connect ivit y problem ( which is analogous t o t he t ask of reaching your hom e from t he nearest m et ro rail st at ion) and creat e broadband clouds t hat span large areas. The WiMax forum is host ed at www.wim axforum .org.

MI MO (Mult iple I n Mult iple Out ) is a new m ult iple- ant enna t echnology ut ilized by WiFi and WiMax product s t o enhance t heir speed, range, and connect ivit y. Working groups are developing new st andards t hat fall under t he am bit of fourt h- generat ion or 4G net working. 4G will signal t he convergence of several com m unicat ion t echnologies. Som e of t he new com m unicat ion t echnologies are t ransparent t o t he operat ing syst em and work unchanged wit h exist ing drivers and prot ocol st acks. Ot hers such as Zigbee need new drivers and prot ocol st acks but do not have accept ed open source im plem ent at ions yet . Linux m irrors t he st at e of t he art , so look out for support for t hese new st andards in fut ure kernel releases.

Ch a pt e r 1 7 . M e m or y Te ch n ology D e vice s I n Th is Ch a pt e r 504 What 's Flash Mem ory? 505 Linux- MTD Subsyst em 506 Map Drivers 511 NOR Chip Drivers 513 NAND Chip Drivers 516 User Modules 518 MTD- Ut ils 519 Configuring MTD 520 eXecut e I n Place 520 The Firm ware Hub 524 Debugging 524 Looking at t he Sources

When you push t he power swit ch on your handheld, it 's m ore t han likely t hat it boot s from flash m em ory. When you click som e but t ons t o save dat a on your cell phone, in all probabilit y, your dat a st art s life in flash m em ory. Today, Linux has penet rat ed t he em bedded space and is no longer confined t o deskt ops and servers. Linux avat ars m anifest in PDAs, m usic players, set - t op boxes, and even m edical- grade devices. The Mem ory Technology Devices ( MTD) subsyst em of t he kernel is responsible for int erfacing your syst em wit h various flavors of flash m em ory found in t hese devices. I n t his chapt er, let 's use t he exam ple of a Linux handheld t o learn about MTD.

W h a t 's Fla sh M e m or y? Flash m em ory is rewrit able st orage t hat does not need power supply t o hold inform at ion. Flash m em ory banks are usually organized int o sect or s. Unlike convent ional st orage, writ es t o flash addresses have t o be preceded by an erase of t he corresponding locat ions. Moreover, erases of port ions of flash can be perform ed only at t he granularit y of individual sect ors. Because of t hese const raint s, flash m em ory is best used wit h device drivers and filesyst em s t hat are t ailored t o suit t hem . On Linux, such specially designed drivers and filesyst em s are provided by t he MTD subsyst em . Flash m em ory chips generally com e in t wo flavors: NOR and NAND. NOR is t he variet y used t o st ore firm ware im ages on em bedded devices, whereas NAND is used for large, dense, cheap, but im perfect [ 1] st orage as required by solid- st at e m ass st orage m edia such as USB pen drives and Disk- On- Modules ( DOMs) . NOR flash chips are connect ed t o t he processor via address and dat a lines like norm al RAM, but NAND flash chips are int erfaced using I / O and cont rol lines. So, code resident on NOR flash can be execut ed in place, but t hat st ored on NAND flash has t o be copied t o RAM before execut ion.

[ 1]

I t 's norm al t o have bad blocks scat t ered across NAND flash regions as you will learn in t he sect ion, "NAND Chip Drivers."

Ch a pt e r 1 7 . M e m or y Te ch n ology D e vice s I n Th is Ch a pt e r 504 What 's Flash Mem ory? 505 Linux- MTD Subsyst em 506 Map Drivers 511 NOR Chip Drivers 513 NAND Chip Drivers 516 User Modules 518 MTD- Ut ils 519 Configuring MTD 520 eXecut e I n Place 520 The Firm ware Hub 524 Debugging 524 Looking at t he Sources

When you push t he power swit ch on your handheld, it 's m ore t han likely t hat it boot s from flash m em ory. When you click som e but t ons t o save dat a on your cell phone, in all probabilit y, your dat a st art s life in flash m em ory. Today, Linux has penet rat ed t he em bedded space and is no longer confined t o deskt ops and servers. Linux avat ars m anifest in PDAs, m usic players, set - t op boxes, and even m edical- grade devices. The Mem ory Technology Devices ( MTD) subsyst em of t he kernel is responsible for int erfacing your syst em wit h various flavors of flash m em ory found in t hese devices. I n t his chapt er, let 's use t he exam ple of a Linux handheld t o learn about MTD.

W h a t 's Fla sh M e m or y? Flash m em ory is rewrit able st orage t hat does not need power supply t o hold inform at ion. Flash m em ory banks are usually organized int o sect or s. Unlike convent ional st orage, writ es t o flash addresses have t o be preceded by an erase of t he corresponding locat ions. Moreover, erases of port ions of flash can be perform ed only at t he granularit y of individual sect ors. Because of t hese const raint s, flash m em ory is best used wit h device drivers and filesyst em s t hat are t ailored t o suit t hem . On Linux, such specially designed drivers and filesyst em s are provided by t he MTD subsyst em . Flash m em ory chips generally com e in t wo flavors: NOR and NAND. NOR is t he variet y used t o st ore firm ware im ages on em bedded devices, whereas NAND is used for large, dense, cheap, but im perfect [ 1] st orage as required by solid- st at e m ass st orage m edia such as USB pen drives and Disk- On- Modules ( DOMs) . NOR flash chips are connect ed t o t he processor via address and dat a lines like norm al RAM, but NAND flash chips are int erfaced using I / O and cont rol lines. So, code resident on NOR flash can be execut ed in place, but t hat st ored on NAND flash has t o be copied t o RAM before execut ion.

[ 1]

I t 's norm al t o have bad blocks scat t ered across NAND flash regions as you will learn in t he sect ion, "NAND Chip Drivers."

Lin u x - M TD Su bsyst e m The kernel's MTD subsyst em shown in Figure 17.1 provides support for flash and sim ilar nonvolat ile solid- st at e st orage. I t consist s of t he following:

The MTD core, which is an infrast ruct ure consist ing of library rout ines and dat a st ruct ures used by t he rest of t he MTD subsyst em

Map drivers t hat decide what t he processor ought t o do when it receives request s for accessing t he flash

NOR Chip drivers t hat know about com m ands required t o t alk t o NOR flash chips

NAND Chip drivers t hat im plem ent low- level support for NAND flash cont rollers

User Modules, t he layer t hat int eract s wit h user- space program s

I ndividual device drivers for som e special flash chips

Figu r e 1 7 .1 . Th e Lin u x - M TD su bsyst e m .

[ View full size im age]

M a p D r ive r s To MTD- enable your device, your first t ask is t o t ell MTD how t o access t he flash device. For t his, you have t o m ap your flash m em ory range for CPU access and provide m et hods t o operat e on t he flash. The next t ask is t o inform MTD about t he different st orage part it ions residing on your flash. Unlike hard disks on PC- com pat ible syst em s, flash- based st orage does not cont ain a st andard part it ion t able on t he m edia. Because of t his, diskpart it ioning t ools such as fdisk and cfdisk [ 2] cannot be used t o part it ion flash devices. I nst ead, part it ioning inform at ion has t o be im plem ent ed as part of kernel code. [ 3] These t asks are accom plished wit h t he help of an MTD m ap driver .

[ 2]

Fdisk and cfdisk are used t o m anipulat e t he part it ion t able residing in t he first hard disk sect or on PC syst em s.

[ 3]

You m ay also pass part it ioning inform at ion t o MTD via t he kernel com m and line argum ent mtdpart=, if you enable CONFIG_MTD_CMDLINE_PARTS during kernel configurat ion. Look at drivers/ m t d/ cm dlinepart .c for t he usage synt ax.

To bet t er underst and t he funct ion of m ap drivers, let 's look at an exam ple.

D e vice Ex a m ple : H a n dh e ld Consider t he Linux handheld shown in Figure 17.2. The flash has a size of 32MB and is m apped t o 0xC0000000 in t he processor's address space. I t cont ains t hree part it ions, one each for t he boot loader, t he kernel, and t he root filesyst em . The boot loader part it ion st art s from t he t op of t he flash, t he kernel part it ion begins at offset MY_KERNEL_START, and t he root filesyst em st art s at offset MY_FS_START. [ 4] The boot loader and t he kernel reside on read- only part it ions t o avoid unexpect ed dam age, while t he filesyst em part it ion is flagged read- writ e.

[ 4]

Som e devices have addit ional part it ions for boot loader param et ers, ext ra filesyst em s, and recovery kernels.

Figu r e 1 7 .2 . Fla sh M e m or y on a sa m ple Lin u x h a n dh e ld.

[ View full size im age]

Let 's first creat e t he flash m ap and t hen proceed wit h t he driver init ializat ion. The m ap driver has t o t ranslat e

t he flash layout shown in t he figure t o an mtd_partition st ruct ure. List ing 17.1 cont ains t he mtd_partition definit ion corresponding t o Figure 17.2. Not e t hat t he mask_flags field holds t he perm issions t o be m asked, so MTD_WRITEABLE im plies a read- only part it ion. List in g 1 7 .1 . Cr e a t in g a n M TD Pa r t it ion M a p Code View: #define FLASH_START 0x00000000 #define MY_KERNEL_START 0x00080000 /* 512K for bootloader */ #define MY_FS_START 0x00280000 /* 2MB for kernel */ #define FLASH_END 0x02000000 /* 32MB */ static struct mtd_partition pda_partitions[] = { { .name = "pda_btldr", /* This string is used by /proc/mtd to identify the bootloader partition */ .size: = (MY_KERNEL_START-FLASH_START), .offset = FLASH_START, /* Start from top of flash */ .mask_flags = MTD_WRITEABLE /* Read-only partition */ }, { .name = "pda_krnl", /* Kernel partition */ .size: = (MY_FS_START-MY_KERNEL_START), .offset = MTDPART_OFS_APPEND, /* Start immediately after the bootloader partition */ .mask_flags = MTD_WRITEABLE /* Read-only partition */ }, { .name: = "pda_fs", /* Filesystem partition */ .size: = MTDPART_SIZ_FULL, /* Use up the rest of the flash */ .offset = MTDPART_OFS_NEXTBLK,/* Align this partition with the erase size */ } };

List ing 17.1 uses MTDPART_OFS_APPEND t o st art a part it ion adj acent t o t he previous one. The st art addresses of writ eable part it ions, however, need t o be aligned wit h t he erase/ sect or size of t he flash chip. To achieve t his, t he filesyst em part it ion uses MTD_OFS_NEXTBLK rat her t han MTD_OFS_APPEND. Now t hat you have populat ed t he mtd_partition st ruct ure, let 's proceed and com plet e a basic m ap driver for t he exam ple handheld. List ing 17.2 regist ers t he m ap driver wit h t he MTD core. I t 's im plem ent ed as a plat form driver, assum ing t hat your archit ect ure- specific code regist ers an associat ed plat form device having t he sam e nam e. Rewind t o t he sect ion "Device Exam ple: Cell Phone" in Chapt er 6 , " Serial Drivers," for a discussion on plat form devices and plat form drivers. The platform_device is defined by t he associat ed archit ect ure- specific code as follows: struct resource pda_flash_resource = { /* Used by Listing 17.3 */ .start = 0xC0000000, /* Physical start of the flash in Figure 17.2 */ .end = 0xC0000000+0x02000000-1, /* Physical end of flash */ .flags = IORESOURCE_MEM, /* Memory resource */ }; struct platform_device pda_platform_device = {

.name = "pda", /* Platform device name */ .id = 0, /* Instance number */ /* ... */ .resource = &pda_flash_resource, /* See above */ }; platform_device_register(&pda_platform_device); List in g 1 7 .2 . Re gist e r in g t h e M a p D r ive r static struct .driver = { .name }, .probe .remove .suspend .resume };

platform_driver pda_map_driver = { =

"pda",

/* ID */

= = = =

pda_mtd_probe, NULL, NULL, NULL,

/* /* /* /*

Probe */ Release */ Power management */ Power management */

/* Driver/module Initialization */ static int __init pda_mtd_init(void) { return platform_driver_register(&pda_map_driver); } /* Module Exit */ static int __init pda_mtd_exit(void) { return platform_driver_uregister(&pda_map_driver); }

Because t he kernel finds t hat t he nam e of t he plat form driver regist ered in List ing 17.2 m at ches wit h t hat of an already- regist ered plat form device, it invokes t he probe m et hod, pda_mtd_probe(), shown in List ing 17.3. This rout ine

Reserves t he flash m em ory address range using request_mem_region(), and obt ains CPU access t o t hat m em ory using ioremap_nocache(). You learned how t o do t his in Chapt er 10, " Peripheral Com ponent I nt erconnect ."

Populat es a map_info st ruct ure ( discussed next ) wit h inform at ion such as t he st art address and size of flash m em ory. The inform at ion in t his st ruct ure is used while perform ing t he probing in t he next st ep.

Probes t he flash via a suit able MTD chip driver ( discussed in t he next sect ion) . Only t he chip driver knows how t o query t he chip and elicit t he com m and- set required t o access it . The chip layer t ries different perm ut at ions of bus widt hs and int erleaves while querying. I n Figure 17.2, t wo 16- bit flash banks are connect ed in parallel t o fill t he 32- bit processor bus widt h, so you have a t wo- way int erleave.

Regist ers t he mtd_partition st ruct ure t hat you populat ed earlier, wit h t he MTD core.

Before looking at List ing 17.3, let 's m eet t he map_info st ruct ure. I t cont ains t he address, size, and widt h of t he flash m em ory and rout ines t o access it :

struct map_info { char * name; /* Name */ unsigned long size; /* Flash size */ int bankwidth; /* In bytes */ /* ... */ /* You need to implement custom routines for the following methods only if you have special needs. Else populate them with builtin methods using simple_map_init() as done in Listing 17.3 */ map_word (*read)(struct map_info *, unsigned long); void (*write)(struct map_info *, const map_word, unsigned long); /* ... */ };

While we are in t he t opic of accessing flash chips, let 's briefly revisit m em ory barriers t hat we discussed in Chapt er 4 , " Laying t he Groundwork." An inst ruct ion reordering t hat appears sem ant ically unchanged t o t he com piler ( or t he processor) m ay not be so in realit y, so t he ordering of dat a operat ions on flash m em ory is best left alone. You don't want t o, for exam ple, end up erasing a flash sect or aft er writ ing t o it , inst ead of doing t he reverse. Also, t he sam e flash chips, and hence t heir device drivers, are used on diverse em bedded processors having different inst ruct ion reordering algorit hm s. For t hese reasons, MTD drivers are not able users of hardware m em ory barriers. simple_map_write(), a generic rout ine available t o m ap drivers for use as t he write() m et hod in t he map_info st ruct ure previously list ed, insert s a call t o mb() before ret urning. This ensures t hat t he processor does not reorder flash reads or writ es across t he barrier. List in g 1 7 .3 . M a p D r ive r Pr obe M e t h od Code View: #include #include #include static int pda_mtd_probe(struct platform_device *pdev) { struct map_info *pda_map; struct mtd_info *pda_mtd; struct resource *res = pdev->resource; /* Populate pda_map with information obtained from the associated platform device */ pda_map->virt = ioremap_nocache(res->start, (res->end – res->start + 1)); pda_map->name = pdev->dev.bus_id; pda_map->phys = res->start; pda_map->size = res->end – res->start + 1; pda_map->bankwidth = 2; /* Two 16-bit banks sitting on a 32-bit bus */ simple_map_init(&pda_map); /* Fill in default access methods */ /* Probe via the CFI chip driver */ pda_mtd = do_map_probe("cfi_probe", &pda_map); /* Register the mtd_partition structure */ add_mtd_partitions(pda_mtd, pda_partitions, 3); /* Three Partitions */ /* ... */ }

Don't worry if t he CFI probing done in List ing 17.3 seem s esot eric. I t 's discussed in t he next sect ion when we look at NOR chip drivers. MTD now knows how your flash device is organized and how t o access it . When you boot t he kernel wit h your m ap driver com piled in, user- space applicat ions can respect ively see your boot loader, kernel, and filesyst em part it ions as / dev/ m t d/ 0 , / dev/ m t d/ 1 , and / dev/ m t d/ 2 . So, t o t est drive a new kernel im age on t he handheld, you can do t his: bash> dd if=zImage.new of=/dev/mtd/1

Flash Part it ioning from Boot loaders The Redboot boot loader m aint ains a part it ion t able t hat holds flash layout , so if you are using Redboot on your em bedded device, you can configure your flash part it ions in t he boot loader inst ead of writ ing an MTD m ap driver. To ask MTD t o parse flash m apping inform at ion from Redboot 's part it ion t able, t urn on CONFIG_MTD_REDBOOT_PARTS during kernel configurat ion.

N OR Ch ip D r ive r s As you m ight have not iced, t he NOR flash chip used by t he handheld in Figure 17.2 is labeled CFI - com pliant . CFI st ands for Com m on Flash I nt erface, a specificat ion designed t o do away wit h t he need for developing separat e drivers t o support chips from different vendors. Soft ware can query CFI - com pliant flash chips and aut om at ically det ect block sizes, t im ing param et ers, and t he com m and- set t o be used for com m unicat ion. Drivers t hat im plem ent specificat ions such as CFI and JEDEC are called chip drivers. According t o t he CFI specificat ion, soft ware m ust writ e 0x98 t o locat ion 0x55 wit hin flash m em ory t o init iat e a query. Look at List ing 17.4 t o see how MTD im plem ent s CFI query.

List in g 1 7 .4 . Qu e r yin g CFI - com plia n t Fla sh /* Snippet from cfi_probe_chip() (2.6.23.1 kernel) defined in drivers/mtd/chips/cfi_probe.c, with comments added */ /* cfi is a pointer to struct cfi_private defined in include/linux/mtd/cfi.h */ /* ... */ /* Ask the device to enter query mode by sending 0x98 to offset 0x55 */ cfi_send_gen_cmd(0x98, 0x55, base, map, cfi, cfi->device_type, NULL); /* If the device did not return the ASCII characters 'Q', 'R' and 'Y', the chip is not CFI-compliant */ if (!qry_present(map, base, cfi)) { xip_enable(base, map, cfi); return 0; } /* Elicit chip parameters and the command-set, and populate the cfi structure */ if (!cfi->numchips) { return cfi_chip_setup(map, cfi); } /* ... */

The CFI specificat ion defines various com m and- set s t hat com pliant chips can im plem ent . Som e of t he com m on ones are as follows:

Com m and- set 0001, support ed by I nt el and Sharp flash chips

Com m and- set 0002, im plem ent ed on AMD and Fuj it su flash chips

Com m and- set 0020, used on ST flash chips

MTD support s t hese com m and- set s as kernel m odules. You can enable t he one support ed by your flash chip via t he kernel configurat ion m enu.

N AN D Ch ip D r ive r s NAND t echnology users such as USB pen drives, DOMs, Com pact Flash m em ory, and SD/ MMC cards em ulat e st andard st orage int erfaces such as SCSI or I DE over NAND flash, so you don't need t o develop NAND drivers t o com m unicat e wit h t hem . [ 5] On- board NAND flash chips need special drivers, however, and are t he t opic of t his sect ion.

[ 5] Unless you are writ ing drivers for t he st orage m edia it self. I f you are em bedding Linux on a device t hat will export part of it s NAND part it ion t o t he out side world as a USB m ass st orage device, you do have t o cont end wit h NAND drivers.

As you learned previously in t his chapt er, NAND flash chips, unlike t heir NOR count erpart s, are not connect ed t o t he CPU via dat a and address lines. They int erface t o t he CPU t hrough special elect ronics called a NAND flash cont roller t hat is part of m any em bedded processors. To read dat a from NAND flash, t he CPU issues an appropriat e read com m and t o t he NAND cont roller. The cont roller t ransfers dat a from t he request ed flash locat ion t o an int ernal RAM m em ory, also part of t he cont roller. The dat a t ransfer is done in unit s of t he flash chip's page size ( for exam ple, 2KB) . I n general, t he denser t he flash chip, t he larger is it s page size. Not e t hat t he page size is different from t he flash chip's block size, which is t he m inim um erasable flash m em ory unit ( for exam ple, 16KB) . Aft er t he t ransfer operat ion com plet es, t he CPU reads t he request ed NAND cont ent s from t he int ernal RAM. Writ es t o NAND flash are done sim ilarly, except t hat t he cont roller t ransfers dat a from t he int ernal RAM t o flash. The connect ion diagram of NAND flash m em ory on an em bedded device is shown in Figure 17.3.

Figu r e 1 7 .3 . N AN D fla sh con n e ct ion .

Because of t his unconvent ional m ode of addressing, you need special drivers t o work wit h NAND st orage. MTD provides such drivers t o m anage NAND- resident dat a. I f you are using a support ed chip, you have t o enable only t he appropriat e low- level MTD NAND driver. I f you are writ ing a NAND flash driver, however, you need t o explore t wo dat asheet s: t he NAND flash cont roller and t he NAND flash chip. NAND flash chips do not support aut om at ic configurat ion using prot ocols such as CFI . You have t o m anually inform MTD about t he propert ies of your NAND chip by adding an ent ry t o t he nand_flash_ids[] t able defined

in drivers/ m t d/ nand/ nand_ids.c. Each ent ry in t he t able consist s of an ident ifier nam e, t he device I D, page size, erase block size, chip size, and opt ions such as t he bus widt h. There is anot her charact erist ic t hat goes hand in hand wit h NAND m em ory. NAND flash chips, unlike NOR chips, are not fault less. I t 's norm al t o have som e problem bit s and bad blocks scat t ered across NAND flash regions. To handle t his, NAND devices associat e a spare area wit h each flash page ( for exam ple, 64 byt es of spare area for each 2KB dat a page) . The spare area cont ains out - of- band ( OOB) inform at ion t o help perform bad block m anagem ent and error correct ion. The OOB area includes error correct ing codes ( ECCs) t o im plem ent error correct ion and det ect ion. ECC algorit hm s correct single- bit errors and det ect m ult ibit errors. The nand_ecclayout st ruct ure defined in include/ m t d/ m t d- abi.h specifies t he layout of t he OOB spare area: struct nand_ecclayout { uint 32_t eccbytes; uint32_t eccpos[64]; uint32_t oobavail; struct nand_oobfree oobfree[MTD_MAX_OOBFREE_ENTRIES]; };

I n t his st ruct ure, eccbytes holds t he num ber of OOB byt es t hat st ore ECC dat a, and eccpos is an array of offset s int o t he OOB area t hat cont ains t he ECC dat a. oobfree records t he unused byt es in t he OOB area available t o flash filesyst em s for st oring flags such as clean m arkers t hat signal successful com plet ion of erase operat ions. I ndividual NAND drivers init ialize t heir nand_ecclayout according t o t he chip's propert ies. Figure 17.4 illust rat es t he layout of a NAND flash chip having a page size of 2KB. The OOB sem ant ics used by t he figure is t he default for 2KB page- sized chips as defined in t he generic NAND driver, drivers/ m t d/ nand/ nand_base.c.

Figu r e 1 7 .4 . La you t of a N AN D fla sh ch ip.

[ View full size im age]

Oft en, t he NAND cont roller perform s error correct ion and det ect ion in hardware by operat ing on t he ECC fields in t he OOB area. I f your NAND cont roller does not support error m anagem ent , however, you will need t o get MTD t o do t hat for you in soft ware. The MTD nand_ecc driver ( drivers/ m t d/ nand/ nand_ecc.c) im plem ent s soft ware ECC. Figure 17.4 also shows OOB m em ory byt es t hat cont ain bad block m arkers. These m arkers are used t o flag fault y flash blocks and are usually present in t he OOB region belonging t o t he first page of each block. The posit ion of t he m arker inside t he OOB area depends on t he propert ies of t he chip. Bad block m arkers are eit her set at t he fact ory during m anufact ure, or by soft ware when it det ect s wear in a block. MTD im plem ent s bad block m anagem ent in drivers/ m t d/ nand/ nand_bbt .c. The mtd_partition st ruct ure used in List ing 17.1 for t he NOR flash in Figure 17.2 works for NAND m em ory, t oo. Aft er you MTD- enable your NAND flash, you can access t he const it uent part it ions using st andard device nodes such as / dev/ m t d/ X and / dev/ m t dblock/ X. I f you have a m ix of NOR and NAND m em ories on your hardware, X can be eit her a NOR or a NAND part it ion. I f you have a t ot al of m ore t han 32 flash part it ions, accordingly change t he value of MAX_MTD_DEVICES in include/ linux/ m t d/ m t d.h . To effect ively m ake use of NAND st orage, you need t o use a filesyst em t uned for NAND access, such as JFFS2 or YAFFS2, in t andem wit h t he low- level NAND driver. We discuss t hese filesyst em s in t he next sect ion.

Use r M odu le s Aft er you have added a m ap driver and chosen t he right chip driver, you're all set t o let higher layers use t he flash. User- space applicat ions t hat perform file I / O need t o view t he flash device as if it were a disk, whereas program s t hat desire t o accom plish raw I / O access t he flash as if it were a charact er device. The MTD layer t hat achieves t hese and m ore is called User Modules, as shown in Figure 17.1. Let 's look at t he com ponent s const it ut ing t his layer.

Block D e vice Em u la t ion The MTD subsyst em provides a block driver called m t dblock t hat em ulat es a hard disk over flash m em ory. You can put any filesyst em , say EXT2, over t he em ulat ed flash disk. Mt dblock hides com plicat ed flash access procedures ( such as preceding a writ e wit h an erase of t he corresponding sect or) from t he filesyst em . Device nodes creat ed by m t dblock are nam ed / dev/ m t dblock/ X, where X is t he part it ion num ber. To creat e an EXT2 filesyst em on t he pda_fs part it ion of t he handheld, as shown in Figure 17.2, do t he following: bash> mkfs.ext2 /dev/mtdblock/2 bash> mount /dev/mtdblock/2 /mnt

Create an EXT2 filesystem on the second partition Mount the partition

As you will soon see, it 's a m uch bet t er idea t o use JFFS2 rat her t han EXT2 t o hold files on flash filesyst em par t it ions. The File Translat ion Layer ( FTL) and t he NAND File Translat ion Layer ( NFTL) perform a t ransform at ion called wear leveling. Flash m em ory sect ors can wit hst and only a finit e num ber of erase operat ions ( in t he order of 100,000) . Wear leveling prolongs flash life by dist ribut ing m em ory usage across t he chip. Bot h FTL and NFTL provide device int erfaces sim ilar t o m t dblock over which you can put norm al filesyst em s. The corresponding device nodes are nam ed / dev/ nft l/ X, where X is t he part it ion num ber. Cert ain algorit hm s used in t hese m odules are pat ent ed, so t here could be rest rict ions on usage.

Ch a r D e vice Em u la t ion The m t dchar driver present s a linear view of t he underlying flash device, rat her t han t he block- orient ed view required by filesyst em s. Device nodes creat ed by m t dchar are nam ed / dev/ m t d/ X, where X is t he part it ion num ber. You m ay updat e t he boot loader part it ion of t he handheld as shown in Figure 17.2, by using dd over t he corresponding m t dchar int erface: bash> dd if=bootloader.bin of=/dev/mtd/0

An exam ple use of a raw m t dchar part it ion is t o hold POST error logs generat ed by t he boot loader on an em bedded device. Anot her use of a char flash part it ion on an em bedded syst em is t o st ore inform at ion sim ilar t o t hat present in t he CMOS or t he EEPROM on PC- com pat ible syst em s. This includes t he boot order, power- on password, and Vit al Product Dat a ( VPD) such as t he device serial num ber and m odel num ber.

JFFS2 Journaling Flash File Syst em ( JFFS) is considered t he best - suit ed filesyst em for flash m em ory. Current ly, version 2 ( JFFS2) is in use, and JFFS3 is under developm ent . JFFS was originally writ t en for NOR flash chips, but support for NAND devices is m erged wit h t he 2.6 kernel. Norm al Linux filesyst em s are designed for deskt op com put ers t hat are shut down gracefully. JFFS2 is designed

for em bedded syst em s where power failure can occur abrupt ly, and where t he st orage device can t olerat e only a finit e num ber of erases. During flash erase operat ions, current sect or cont ent s are saved in RAM. I f t here is a power loss during t he slow erase process, ent ire cont ent s of t hat sect or can get lost . JFFS2 circum vent s t his problem using a log- st ruct ured design. New dat a is appended t o a log t hat lives in an erased region. Each JFFS2 node cont ains m et adat a t o t rack disj oint file locat ions. Mem ory is periodically reclaim ed using garbage collect ion. Because of t his design, flash writ es do not have t o go t hrough a save- erase- writ e cycle, and t his im proves power- down reliabilit y. The log- st ruct ure also increases flash life span by spreading out writ es. To creat e a JFFS2 im age of a t ree living under / pat h/ t o/ filesyst em / on a flash chip having an erase size of 256KB, use m kfs.j ffs2 as follows: bash> mkfs.jffs2 -e 256KiB –r /path/to/filesystem/ -o jffs2.img

JFFS2 includes a garbage collect or ( GC) t hat reclaim s flash regions t hat are no longer in use. The garbage collect ion algorit hm depends on t he erase size, so supplying an accurat e value m akes it m ore efficient . To obt ain t he erase size of your flash part it ions, you m ay seek t he help of / proc/ m t d. The out put for t he Linux handheld shown in Figure 17.2 is as follows: bash> dev: mtd0: mtd1: mtd2:

cat /proc/mtd size erasesize name 00100000 00040000 "pda_btldr" 00200000 00040000 "pda_krnl" 01400000 00040000 "pda_fs"

JFFS2 support s com pression. Enable appropriat e opt ions under CONFIG_JFFS2_COMPRESSION_OPTIONS t o choose available com pressors, and look at fs/ j ffs2/ com pr* .c for t heir im plem ent at ions. Not e t hat JFFS2 filesyst em im ages are usually creat ed on t he host m achine where you do cross- developm ent and t hen t ransferred t o t he desired flash part it ion on t he t arget device via a suit able download m echanism such as serial port , USB, or NFS. More on t his in Chapt er 18, " Em bedding Linux."

YAFFS2 The im plem ent at ion of JFFS2 in t he 2.6 kernel includes feat ures t o work wit h t he lim it at ions of NAND flash, but Yet Anot her Flash File Syst em ( YAFFS) is a filesyst em t hat is designed t o funct ion under const raint s specific t o NAND m em ory. YAFFS is not part of t he m ainline kernel, but som e em bedded dist ribut ions prepat ch t heir kernels wit h support for YAFFS2, t he current version of YAFFS. You can download YAFFS2 source code and docum ent at ion from www.yaffs.net .

M TD - Ut ils The MTD- ut ils package, downloadable from ft p: / / ft p.infradead.org/ pub/ m t d- ut ils/ , cont ains several useful t ools t hat work on t op of MTD- enabled flash m em ory. Exam ples of included ut ilit ies are flash_eraseall, nanddum p, nandwrit e, and sum t ool. To erase t he second flash part it ion ( on NOR or NAND devices) , use flash_eraseall as follows: bash> flash_eraseall –j /dev/mtd/2

Because NAND chips m ay cont ain bad blocks, use ECC- aware program s such as nandwrit e and nanddum p t o copy raw dat a, inst ead of general- purpose ut ilit ies, such as dd. To st ore t he JFFS2 im age t hat you creat ed previously, on t o t he second NAND part it ion, do t his: bash> nandwrite /dev/mtd/2 jffs2.img

You can reduce JFFS2 m ount t im es by insert ing sum m ary inform at ion int o a JFFS2 im age using sum t ool and t urning on CONFIG_JFFS2_SUMMARY while configuring your kernel. To writ e a sum m arized JFFS2 im age t o t he previous NAND flash, do t his: bash> sumtool –e 256KiB –i jffs2.img –o jffs2.summary.img bash> nandwrite /dev/mtd/2 jffs2.summary.img bash> mount –t jffs2 /dev/mtdblock/2 /mnt

Con figu r in g M TD To MTD- enable your kernel, you have t o choose t he appropriat e configurat ion opt ions. For t he flash chip shown in Figure 17.2, t he required opt ions are as follows: CONFIG_MTD=y CONFIG_MTD_PARTITIONS=y CONFIG_MTD_GEN_PROBE=y CONFIG_MTD_CFI=y CONFIG_MTD_PDA_MAP=y CONFIG_JFFS2_FS=y

Enable the MTD subsystem Support for multiple partitions Common routines for chip probing Enable CFI chip driver Option to enable the map driver Enable JFFS2

CONFIG_MTD_PDA_MAP is assum ed t o be a new opt ion added t o enable t he m ap driver we previously wrot e. Each of t hese feat ures can also be built as a kernel m odule unless you have an MTD- resident root filesyst em . To m ount t he filesyst em part it ion in Figure 17.2 as t he root device during boot , ask your boot loader t o append root=/dev/mtdblock/2 t o t he com m and- line st ring t hat it passes t o t he kernel. You m ay reduce kernel foot print by elim inat ing redundant probing. Because our exam ple handheld has t wo parallel 16- bit banks sit t ing on a 32- bit physical bus ( t hus result ing in a t wo- way int erleave and a 2- byt e bank widt h) , you can opt im ize using t hese addit ional opt ions: CONFIG_MTD_CFI_ADV_OPTIONS=y CONFIG_MTD_CFI_GEOMETRY=y CONFIG_MTD_MAP_BANK_WIDTH_2=y CONFIG_MTD_CFI_I2=y

CONFIG_MTD_MAP_BANK_WIDTH_2 enables a CFI bus widt h of 2, and CONFIG_MTD_CFI_I2 set s an int erleave of 2.

e Xe cu t e I n Pla ce Wit h eXecut e I n Place ( XI P) , you can run t he kernel direct ly from flash. Because you do away wit h t he ext ra st ep of copying t he kernel t o RAM, your kernel boot s fast er. The downside is t hat your flash m em ory requirem ent increases because t he kernel has t o be st ored uncom pressed. Before deciding t o go t he XI P rout e, also be aware t hat t he slower inst ruct ion fet ch t im es from flash can im pact runt im e perform ance.

Th e Fir m w a r e H u b PC- com pat ible syst em s use a NOR flash chip called t he Firm ware Hub ( FWH) t o hold t he BI OS. The FWH is not direct ly connect ed t o t he processor's address and dat a bus. I nst ead, it 's int erfaced via t he Low Pin Count ( LPC) bus, which is part of Sout h Bridge chipset s. The connect ion diagram is shown in Figure 17.5.

Figu r e 1 7 .5 . Th e Fir m w a r e H u b on a PC- com pa t ible syst e m .

The MTD subsyst em includes drivers t o int erface t he processor wit h t he FWH. FWHs are usually not com pliant wit h t he CFI specificat ion. I nst ead, t hey conform t o t he JEDEC ( Joint Elect ron Device Engineering Council) st andard. To inform MTD about a yet unsupport ed JEDEC chip, add an ent ry t o t he jedec_table array in drivers/ m t d/ chips/ j edec_probe.c wit h inform at ion such as t he chip m anufact urer I D and t he com m and- set I D. Here is an exam ple: static const struct amd_flash_info jedec_table[] = { /* ... */ { .mfr_id = MANUFACTURER_ID, /* E.g.: MANUFACTURER_ST */ .dev_id = DEVICE_ID, /* E.g.: M50FW080 */ .name = "MYNAME", /* E.g.: "M50FW080" */ .uaddr = { [0] = MTD_UADDR_UNNECESSARY, }, .DevSize = SIZE_1MiB, /* E.g.: 1MB */ .CmdSet = CMDSET, /* Command-set to communicate with the flash chip e.g., P_ID_INTEL_EXT */ .NumEraseRegions = 1, /* One region */ .regions = { ERASEINFO (0x10000, 16),/* Sixteen 64K sectors */ } }, /* ... */ };

When you have your chip det ails im print ed in t he jedec_table as shown here, MTD should recognize your flash, provided you have enabled t he right kernel configurat ion opt ions. The following configurat ion m akes t he kernel aware of an FWH t hat int erfaces t o t he processor via an I nt el I CH2 or I CH4 Sout h Bridge chipset : CONFIG_MTD=y CONFIG_MTD_GEN_PROBE=y CONFIG_MTD_JEDECPROBE=y CONFIG_MTD_CFI_INTELEXT=y CONFIG_MTD_ICHXROM=y

Enable the MTD subsystem Common routines for chip probing JEDEC chip driver The command-set for communicating with the chip The map driver

CONFIG_MTD_JEDECPROBE enables t he JEDEC MTD chip driver, and CONFIG_MTD_ICH2ROM adds t he MTD m ap driver t hat m aps t he FWH t o t he processor's address space. I n addit ion, you need t o include t he appropriat e com m and- set im plem ent at ion ( for exam ple, CONFIG_MTD_CFI_INTELEXT for I nt el Ext ension com m ands) . Aft er t hese m odules have been loaded, you can t alk t o t he FWH from user- space applicat ions via device nodes export ed by MTD. You can, for exam ple, reprogram t he BI OS from user space using a sim ple applicat ion, as shown in List ing 17.5. Be warned t hat incorrect ly operat ing t his program can corrupt t he BI OS and render your syst em unboot able! List ing 17.5 operat es on t he MTD char device associat ed wit h t he FWH, which it assum es t o be / dev/ m t d/ 0 . The program issues t hree MTD- specific ioct l com m ands:

MEMUNLOCK t o unlock t he flash sect ors prior t o program m ing

MEMERASE t o erase flash sect ors prior t o rewrit ing

MEMLOCK t o relock t he sect ors aft er program m ing

List in g 1 7 .5 . Upda t in g t h e BI OS Code View: #include #include #include #include #include #include #define BLOCK_SIZE #define NUM_SECTORS #define SECTOR_SIZE

4096 16 64*1024

int main(int argc, char *argv[]) { int fwh_fd, image_fd; int usect=0, lsect=0, ret; struct erase_info_user fwh_erase_info; char buffer[BLOCK_SIZE];

struct stat statb; /* Ignore SIGINTR(^C) and SIGSTOP (^Z), lest you end up with a corrupted flash and an unbootable system */ sigignore(SIGINT); sigignore(SIGTSTP); /* Open MTD char device */ fwh_fd = open("/dev/mtd/0", O_RDWR); if (fwh_fd < 0) exit(1); /* Open BIOS image */ image_fd = open("bios.img", O_RDONLY); if (image_fd < 0) exit(2); /* Sanity check */ fstat(image_fd, &statb); if (statb.st_size != SECTOR_SIZE*NUM_SECTORS) { printf("BIOS image looks bad, exiting.\n"); exit(3); } /* Unlock and erase all sectors */ while (usect < NUM_SECTORS) { printf("Unlocking & Erasing Sector[%d]\r", usect+1); fwh_erase_info.start = usect*SECTOR_SIZE; fwh_erase_info.length = SECTOR_SIZE; ret = ioctl(fwh_fd, MEMUNLOCK, &fwh_erase_info); if (ret != 0) goto bios_done; ret = ioctl(fwh_fd, MEMERASE, &fwh_erase_info); if (ret != 0) goto bios_done; usect++; } /* Read blocks from the BIOS image and dump it to the Firmware Hub */ while ((ret = read(image_fd, buffer, BLOCK_SIZE)) != 0) { if (ret < 0) goto bios_done; ret = write(fwh_fd, buffer, ret); if (ret worst) worst = now; ts0 = ts1; /* Do work that is to be done periodically */ do_work(); /* NOP for the purpose of this measurement */ } printf("Worst latency was %8ld\n", worst); /* Disable periodic interrupts */ ioctl(fd, RTC_PIE_OFF, 0); }

The code in List ing 19.1 loops for 10,000 it erat ions and print s out t he worst - case lat ency t hat occurred during execut ion. The out put was 240899 on a Pent ium 1.8GHz box, which roughly corresponds t o 133 m icroseconds. According t o t he dat a sheet of t he RTC chipset , a t im er frequency of 8192Hz should result in a periodic int errupt rat e of 122 m icroseconds. That 's close. Rerun t he code under varying loads using SCHED_OTHER inst ead of SCHED_FIFO and observe t he result ant drift . You m ay also run kernel t hreads in t he RT m ode. For t hat , do t he following when you st art t he t hread: static int my_kernel_thread(void *i) { daemonize(); current->policy = SCHED_FIFO; current->rt_priority = 1; /* ... */ }

Ch a pt e r 1 9 . D r ive r s in Use r Spa ce I n Th is Ch a pt e r 553 Process Scheduling and Response Tim es 558 Accessing I / O Regions 562 Accessing Mem ory Regions 565 User Mode SCSI 567 User Mode USB 571 User Mode I 2 C 573 UI O 574 Looking at t he Sources

Most device drivers prefer t o lead a privileged life inside t he kernel, but som e are at hom e in t he indet erm inist ic world out side. Several kernel subsyst em s, such as SCSI , USB, and I 2 C, offer som e level of support for user m ode drivers, so you m ight be able t o cont rol t hose devices wit hout writ ing a single line of kernel code. I n spit e of t he inclem ent weat her in user land, user m ode drivers enj oy cert ain advant ages. They are easy t o develop and debug. You won't have t o reboot t he syst em every t im e you dereference a dangling point er. Som e user m ode drivers will even work across operat ing syst em s if t he device subsyst em enj oys t he services of a st andard user- space program m ing library. Here are som e rules of t hum b t o help decide whet her your driver should reside in user space:

Apply t he possibilit y t est . What can be done in user space should probably st ay in user space.

I f you have t o t alk t o a large num ber of slow devices and if perform ance requirem ent s are m odest , explore t he possibilit y of im plem ent ing t he drivers in user space. I f you have t im ecrit ical perform ance requirem ent s, st ay inside t he kernel.

I f your code needs t he services of kernel API s, access t o kernel variables, or is int ert wined wit h int errupt handling, it has a st rong case for being in kernel space.

I f m uch of what your code does can be const rued as policy, user land m ight be it s logical residence.

I f t he rest of t he kernel needs t o invoke your code's services, it 's a candidat e for st aying inside t he kernel.

You can't easily do float ing- point arit hm et ic inside t he kernel. Float ing- point unit ( FPU) inst ruct ions can, however, be used from user space.

That said, you can't accom plish t oo m uch from user space. Many im port ant device classes, such as st orage m edia and net work adapt ers, cannot be driven from user land. But even when a kernel driver is t he appropriat e solut ion, it 's a good idea t o m odel and t est as m uch code as you can in user space before m oving it t o kernel space. The t est ing cycle is fast er, and it 's easier t o t raverse all possible code pat hs and ensure t hat t hey are clean. I n t his chapt er, t he t erm user space driver ( or user m ode driver) is used in a generic sense t hat does not st rict ly conform t o t he sem ant ics of a driver im plied t hus far in t he book. An applicat ion is considered t o be a user m ode driver if it 's a candidat e for being im plem ent ed inside t he kernel, t oo. The 2.6 kernel overhauled a subsyst em t hat is of special int erest t o user space drivers. The new process scheduler offers huge response- t im e benefit s t o user m ode code, so let 's st art wit h t hat .

Pr oce ss Sch e du lin g a n d Re spon se Tim e s Many user m ode drivers need t o perform som e work in a t im e- bound m anner. I n user space, indet erm inism due

t o scheduling and paging oft en com e in t he way of fast response t im es, however. To see how you can m inim ize t he im pact of t he form er hurdle, let 's dip int o recent Linux schedulers and underst and t heir underlying philosophy.

Th e Or igin a l Sch e du le r I n t he 2.4 and earlier days, t he scheduler used t o recalculat e scheduling param et ers of each t ask before t aking it s pick. The t im e consum ed by t he algorit hm t hus increased linearly wit h t he num ber of cont ending t asks in t he syst em . I n ot her words, it used O(n) t im e, where n is t he num ber of act ive t asks. On a syst em running at high loads, t his t ranslat ed t o significant overhead. The 2.4 algorit hm also didn't work very well on SMP syst em s.

Th e O( 1 ) Sch e du le r Tim e consum ed by an O(n) algorit hm depends linearly on t he size of it s input , and an O(n2) solut ion depends quadrat ically on t he lengt h of it s input , but an O(1) t echnique is independent of t he input and t hus scales well. The 2.6 scheduler replaced t he O(n) algorit hm wit h an O(1) m et hod. I n addit ion t o being super- scalable, t he scheduler has built - in heurist ics t o im prove user responsiveness by providing preferent ial t reat m ent t o t asks involved in I / O act ivit y. Processes are of t wo kinds: I / O bound and CPU bound. I / O- bound t asks are oft en sleepwait ing for device I / O, while CPU- bound ones are workaholics addict ed t o t he processor. Paradoxically, t o achieve fast response t im es, lazy t asks get incent ives from t he O(1) scheduler, while st udious ones draw flak. Look at t he sidebar "Highlight s of t he O(1) Scheduler" t o find out som e of it s im port ant feat ures.

Highlight s of t he O(1) Scheduler The following are som e of t he im port ant feat ures of t he O(1) scheduler:

The algorit hm uses t wo run queues m ade up of 140 priorit y list s: an act ive queue t hat holds t asks t hat have t im e slices left and an expired queue t hat cont ains processes whose t im e slices have expired. When a t ask finishes it s t im e slice, it 's insert ed int o t he expired queue in sort ed order of priorit y. The act ive and expired queues are swapped when t he form er becom es em pt y. To decide which process t o run next , t he scheduler does not navigat e t hrough t he ent ire queue. I nst ead, it picks t hat t ask from t he act ive queue having t he highest priorit y. The overhead of picking t he t ask t hus depends not on t he num ber of act ive t asks, but on t he num ber of priorit ies. This m akes it a const ant - t im e or an O(1) algorit hm .

The scheduler support s t wo priorit y ranges: st andard nicevalues support ed on UNI X syst em s and int ernal priorit ies. The form er range from –20 t o + 19, while t he lat t er ext end from 0 t o 139. I n bot h cases, lower values correspond t o higher priorit ies. The t op 100 ( 0 t o 99) int ernal priorit ies are reserved for real t im e ( RT) t asks, and t he bot t om 40 ( 100 t o 139) are assigned t o norm al t asks. The 40 nice values m ap t o t he bot t om 40 int ernal priorit ies. I nt ernal priorit ies of norm al t asks can be dynam ically varied by t he scheduler, whereas nice values are st at ist ically set by t he user. Each int ernal priorit y get s an associat ed run list .

The scheduler uses a heurist ic t o figure out whet her t he nat ure of a process is I / O- int ensive or CPU- int ensive. I n sim ple t erm s, if a t ask sleeps oft en, it 's likely t o be I / O- int ensive, but if it uses it s t im e slice fast , it 's CPU- int ensive. Whenever t he scheduler finds t hat a t ask has dem onst rat ed I / O- bound charact erist ics, it rewards it by dynam ically increasing it s int ernal priorit y. CPU- bound charact erist ics, on t he ot her hand, are punished wit h negat ive m arks.

The allot t ed t im e slice is direct ly proport ional t o t he priorit y. A higher priorit y t ask get s a bigger t im e slice.

A t ask will not be preem pt ed by t he scheduler as long as it has t im e slice credit . I f it yields t he processor before using up it s t im e slice quot a, it can roll over t he rem inder of it s slice when it 's run next . Because I / O- bound processes are t he ones t hat oft en yield t he CPU, t his im proves int eract ive perform ance.

The scheduler support s RT scheduling policies. RT t asks preem pt norm al (SCHED_OTHER) t asks. Users of RT policies can override t he scheduler's dynam ic priorit y assignm ent s. Unlike SCHED_OTHER t asks, t heir priorit ies are not recalculat ed by t he kernel on- t he- fly. RT scheduling com es in t wo flavors: SCHED_FIFO and SCHED_RR. They are used for producing " soft " real- t im e behavior, rat her t han st ringent " hard" RT guarant ees. SCHED_FIFO has no concept of t im e slices; SCHED_FIFO t asks run unt il t hey sleep- wait for I / O or yield t he processor. SCHED_RR is a round- robin variant of SCHED_FIFO t hat also assigns t im e slices t o RT t asks. SCHED_RR t asks wit h expired slices are appended t o t he end of t he corresponding priorit y list .

The scheduler im proves SMP perform ance by using per- CPU run queues and per- CPU synchronizat ion.

Th e CFS Sch e du le r The Linux scheduler has undergone anot her rewrit e wit h t he 2.6.23 kernel. The Com plet ely Fair Scheduler ( CFS) for t he SCHED_OTHER class rem oves m uch of t he com plexit ies associat ed wit h t he O(1) scheduler by abandoning priorit y arrays, t im e slices, int eract ivit y heurist ics, and t he dependency on HZ. CFS's goal is t o im plem ent fairness for all scheduling ent it ies by providing each t ask t he t ot al CPU power divided by t he num ber of running t asks. Dissect ing CFS is beyond t he scope of t his chapt er. Have a look at Docum ent at ion/ sched- design- CFS.t xt for a brief t ut orial.

Re spon se Tim e s As a user m ode driver developer, you have several opt ions t o im prove your applicat ion's response t im e:

Use RT scheduling policies t hat give you a finer grain of cont rol t han usual. Look at t he m an pages of sched_setscheduler() and it s relat ives for insight s int o achieving soft RT response t im es.

I f you are using non- RT scheduling, t une t he nice values of different processes t o achieve t he required perform ance balance.

I f you are using a 2.6.23 or lat er kernel enabled wit h t he CFS scheduler, you m ay fine- t une / proc/ sys/ kernel/ sched_granularit y_ns. I f you are using a pre- 2.6.23 kernel, m odify #defines in kernel/ sched.c and include/ linux/ sched.h t o suit your applicat ion. Change t hese values caut iously t o sat isfy t he needs of your applicat ion suit e. Usage scenarios of t he scheduler are com plex. Set t ings t hat delight cert ain load condit ions can depress ot hers, so you m ay have t o experim ent by t rial and error.

Response t im es are not solely t he dom ain of t he scheduler; t hey also depend on t he solut ion archit ect ure. For exam ple, if you m ark a busy int errupt handler as fast, it disables ot her local int errupt s frequent ly and t hat can pot ent ially slow down dat a acquisit ion and t ransm ission on ot her I RQs.

Let 's im plem ent an exam ple and see how a user m ode driver can achieve fast response t im es by guarding against indet erm inism int roduced by scheduling and paging. As you learned in Chapt er 2 , " A Peek I nside t he Kernel," t he RTC is a t im er source t hat can generat e periodic int errupt s wit h high precision. List ing 19.1 im plem ent s an exam ple t hat uses int errupt report s from / dev/ rt c t o perform periodic work wit h m icrosecond precision. The Pent ium Tim e St am p Count er ( TSC) is used t o m easure response t im es. The program in List ing 19.1 first changes it s scheduling policy t o SCHED_FIFO using sched_setscheduler(). Next , it invokes mlockall() t o lock all m apped pages in m em ory t o ensure t hat swapping won't com e in t he way of det erm inist ic t im ing. Only t he super- user is allowed t o call sched_setscheduler()and mlockall() and request RTC int errupt s at frequencies great er t han 64Hz. List in g 1 9 .1 . Pe r iodic W or k w it h M icr ose con d Pr e cision Code View: #include #include #include #include #include #include /* Read the lower half of the Pentium Time Stamp Counter using the rdtsc instruction */ #define rdtscl(val) __asm__ __volatile__ ("rdtsc" : "=A" (val))

main() { unsigned long ts0, ts1, now, worst; /* Store TSC ticks */ struct sched_param sched_p; /* Information related to scheduling priority */ int fd, i=0; unsigned long data; /* Change the scheduling policy to SCHED_FIFO */ sched_getparam(getpid(), &sched_p); sched_p.sched_priority = 50; /* RT Priority */ sched_setscheduler(getpid(), SCHED_FIFO, &sched_p); /* Avoid paging and related indeterminism */ mlockall(MCL_CURRENT); /* Open the RTC */ fd = open("/dev/rtc", O_RDONLY); /* Set the periodic interrupt frequency to 8192Hz This should give an interrupt rate of 122uS */ ioctl(fd, RTC_IRQP_SET, 8192); /* Enable periodic interrupts */ ioctl(fd, RTC_PIE_ON, 0); rdtscl(ts0); worst = 0; while (i++ < 10000) { /* Block until the next periodic interrupt */ read(fd, &data, sizeof(unsigned long));

/* Use the TSC to precisely measure the time consumed. Reading the lower half of the TSC is sufficient */ rdtscl(ts1); now = (ts1-ts0); /* Update the worst case latency */ if (now > worst) worst = now; ts0 = ts1; /* Do work that is to be done periodically */ do_work(); /* NOP for the purpose of this measurement */ } printf("Worst latency was %8ld\n", worst); /* Disable periodic interrupts */ ioctl(fd, RTC_PIE_OFF, 0); }

The code in List ing 19.1 loops for 10,000 it erat ions and print s out t he worst - case lat ency t hat occurred during execut ion. The out put was 240899 on a Pent ium 1.8GHz box, which roughly corresponds t o 133 m icroseconds. According t o t he dat a sheet of t he RTC chipset , a t im er frequency of 8192Hz should result in a periodic int errupt rat e of 122 m icroseconds. That 's close. Rerun t he code under varying loads using SCHED_OTHER inst ead of SCHED_FIFO and observe t he result ant drift . You m ay also run kernel t hreads in t he RT m ode. For t hat , do t he following when you st art t he t hread: static int my_kernel_thread(void *i) { daemonize(); current->policy = SCHED_FIFO; current->rt_priority = 1; /* ... */ }

Acce ssin g I / O Re gion s PC- com pat ible syst em s have 64K I / O port s, all of which m ay be driven from user space. User access t o I / O port s on Linux is cont rolled by t wo funct ions: ioperm() and iopl(). ioperm() cont rols access perm issions t o t he first 0x3ff port s. iopl() changes t he I / O privilege level of t he calling process, t hus allowing am ong ot her t hings, unrest rict ed access t o all port s. Only t he super- user can invoke bot h t hese funct ions. To writ e dat a t o an I / O port , use outb(), outw(), outl(), or t heir cousins. To read dat a from a port , use inb(), inw(), inl(), or t heir relat ives. Let 's im plem ent a sim ple program t hat reads t he seconds t icking inside t he RTC chip. I / O regions in t he PC CMOS, of which t he RTC is a part , are accessed via an index port ( 0x70) and a dat a port (0x71) , as shown in Table 5.1 of Chapt er 5 , " Charact er Drivers." To read a byt e of dat a from offset off wit hin an I / O address range, writ e off t o t he index port and read t he associat ed dat a from t he dat a port . List ing 19.2 reads t he seconds field of t he RTC; but t o use it t o obt ain dat a from ot her I / O regions, change t he argum ent s passed t o dump_port() suit ably.

List in g 1 9 .2 . Ut ilit y t o D u m p Byt e s fr om a n I / O Re gion Code View: #include void dump_port(unsigned char addr_port, unsigned char data_port, unsigned short offset, unsigned short length) { unsigned char i, *data; if (!(data = (unsigned char *)malloc(length))) { perror("Bad Malloc\n"); exit(1); } /* Write the offset to the index port and read data from the data port */ for(i=offset; i ls –lR /proc/bus/usb/devices

... T: D: P: S: S: S: C:* I: E: E: ... T: D: P: S: S: S: C:* I: E: E:

Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 5 Spd=12 MxCh= 0 Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 Vendor=04b0 ProdID=0205 Rev= 1.00 Manufacturer=NIKON Product=NIKON DSC E5200 SerialNumber=2507597 #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 2mA If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms Bus=01 Lev=01 Prnt=01 Port=02 Cnt=01 Dev#= 12 Spd=480 MxCh= 0 Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 Vendor=0bc2 ProdID=0501 Rev= 0.01 Manufacturer=Seagate Product=USB Mass Storage SerialNumber=000000062459 #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Look at t he T: lines in t he preceding out put , which display t he t opology. As expect ed, t he hard disk has arrived on t he EHCI bus, bus1, and t he cam era has appeared on t he UHCI bus, bus3. This is how t he usbfs t ree looks now: Code View: bash> ls –lR /proc/bus/usb /proc/bus/usb: total 0 dr-xr-xr-x 2 root root 0 Dec dr-xr-xr-x 2 root root 0 Dec dr-xr-xr-x 2 root root 0 Dec dr-xr-xr-x 2 root root 0 Dec -r--r--r-- 1 root root 0 Dec /proc/bus/usb/001: total 0 -rw-r--r-- 1 root root 43 Dec -rw-r--r-- 1 root root 50 Dec /proc/bus/usb/002: total 0 -rw-r--r-- 1 root root 43 Dec /proc/bus/usb/003: total 0 -rw-r--r-- 1 root root 43 Dec -rw-r--r-- 1 root root 50 Dec

2 2 2 2 2

12:44 12:44 12:44 12:44 19:51

001 002 003 004 devices EHCI: bus1

2 12:44 001 2 19:51 007

High-speed disk UHCI: bus2

2 12:44 001 UHCI: bus3 2 12:44 001 2 19:16 003

Full-speed camera

/proc/bus/usb/004: total 0 -rw-r--r-- 1 root root 43 Dec

UHCI: bus4 2 12:44 001

The usbfs files corresponding t o plugged- in devices cont ain t he associat ed USB device and configurat ion descript ors. I n t he preceding exam ple, read / proc/ bus/ usb/ 003/ 003 t o get descript or inform at ion for t he cam era and / proc/ bus/ usb/ 001/ 007 for t he descript or associat ed wit h t he hard disk. Managing usbfs files is not st raight forward however, because t he device filenam es get reused aft er a device is plugged out . The solut ion is t o use t he libusb library, which uses usbfs under t he hood. Using libusb inst ead of direct ly operat ing on usbfs has anot her benefit : Your driver is likely t o work unchanged on ot her operat ing syst em s t hat support t his library. I f you don't find libusb bundled along wit h your dist ribut ion, download it s sources from ht t p: / / libusb.sourceforge.net / . The full list of USB access funct ions offered by t his library is available under t he doc/ direct ory of t he libusb sources. List ing 19.6 im plem ent s a skelet al user space driver for t he digit al cam era using an oft - used libusb program m ing t em plat e. The cam era's vendor I D (0x04b0) and device I D ( 0x0205) are obt ained from t he / proc/ bus/ usb/ devices out put shown previously. List in g 1 9 .6 . A Sk e le t a l Use r Spa ce USB D r ive r Usin g libu sb Code View: #include

/* From the libusb package */

#define DIGICAM_VENDOR_ID 0x04b0 /* From /proc/bus/usb/devices */ #define DIGICAM_PRODUCT_ID 0x0205 /* From /proc/bus/usb/devices */ int main(int argc, char *argv[]) { struct usb_dev_handle *mydevice_handle; struct usb_bus *usb_bus; struct usb_device *mydevice; /* Initialize libusb */ usb_init(); usb_find_buses(); usb_find_devices(); /* Walk the bus */ for (usb_bus = usb_buses; usb_bus; usb_bus = usb_bus->next) { for (mydevice = usb_bus->devices; mydevice; mydevice = mydevice->next) { if ((mydevice->descriptor.idVendor == DIGICAM_VENDOR_ID) && (mydevice->descriptor.idProduct == DIGICAM_PRODUCT_ID)) { /* Open the device */ mydevice_handle = usb_open(mydevice); /* Send commands to the camera. This is the heart of the driver. Getting information about the USB control messages to which your device responds is often a challenge since many vendors do not readily divulge hardware details */ usb_control_msg(mydevice_handle, ...); /* ... */

/* Close the device */ usb_close(mydevice_handle); } } } }

Use r M ode I 2 C I n Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol," you learned t o develop kernel m ode drivers for I 2 C devices; but som et im es, when you need t o enable support for a large num ber of slow I 2 C devices, it m akes sense t o drive t hem from user space. The i2c- dev m odule enables t he developm ent of user m ode I 2 C/ SMBus device drivers. User- space code can access I 2 C host adapt ers via device nodes. To operat e on t he n t h adapt er, open / dev/ i2c- n. Aft er you have a file descript or t ied t o a host adapt er device node, you can com m and it t hrough ioct ls t o connect t o specific slave devices at t ached t o it . You can t hen use t he services of a fam ily of dat a access rout ines t o exchange dat a wit h t he slaves. List ing 19.7 is a sim ple user m ode driver t hat perform s com m on operat ions on an I 2 C EEPROM from user space. The EEPROM is t he sam e as t he one discussed in Chapt er 8 and has t wo m em ory banks and a slave address corresponding t o each bank. The list ing uses inline funct ions from i2c- dev.h t o operat e on device nodes associat ed wit h t he banks. Get t his header file from t he lm - sensors package ( also discussed in Chapt er 8 ) downloadable from www.lm - sensors.org. This file cont ains user space equivalent s for all kernel space I 2 C access funct ions list ed in Table 8.1 of Chapt er 8 . List in g 1 9 .7 . A Use r Spa ce I 2 C/ SM Bu s D r ive r Code View: #include #include /* Bus addresses of the memory banks */ #define SLAVE_ADDR1 0x60 #define SLAVE_ADDR2 0x61 int main(int argc, char *argv[]) { /* Open the host adapter */ if ((smbus_fp = open("/dev/i2c-0", O_RDWR)) < 0) { exit(1); } /* Connect to the first bank */ if (ioctl(smbus_fp, I2C_SLAVE, EEPROM_SLAVE_ADDR1) < 0) { exit(1); } /* ... */ /* Dump data from the device */ for (reg=0; reg < length; reg++) { /* See i2c-dev.h from the lm-sensors package for the implementation of the following inline function */ res = i2c_smbus_read_byte_data(smbus_fp, (unsigned char) reg); if (res < 0) { exit(1); } /* Dump data */ /* ... */ } /* ... */

/* Switch to bank 2 */ if (ioctl(smbus_fp, I2C_SLAVE, SLAVE_ADDR2) < 0) { exit(1); } /* Clear bank 2 */ for (reg=0; reg < length; reg+=2){ i2c_smbus_write_word_data(smbus_fp, (unsigned char) reg, 0x0); } /* ... */ close(smbus_fp); }

UI O St art ing wit h t he 2.6.23 release, t he kernel includes a subsyst em called UI O ( Userspace I O) t hat eases t he im plem ent at ion of som e user- space drivers. UI O's int ent is t o allow t he developm ent of a bare- bones kernel driver for t asks such as int errupt handling, and push m ost of t he device I / O logic t o user space. UI O is especially relevant for som e classes of indust rial I / O cards. Look inside drivers/ uio/ for t he UI O sources. A user guide is available under Docum ent at ion/ DocBook/ uiohowt o.t m pl. Exploring UI O is beyond t he scope of t his chapt er.

Look in g a t t h e Sou r ce s The Linux scheduler lives in kernel/ sched.c. The SCSI generic im plem ent at ion is in drivers/ scsi/ sg.c, and drivers/ usb/ core/ devio.c is responsible for support ing user space USB drivers. The i2c- dev driver t hat enables support for user m ode I 2 C program m ing resides in drivers/ i2c/ i2c- dev.c. Table 19.1 cont ains t he m ain dat a st ruct ures used in t his chapt er, and Table 19.2 list s t he funct ions t hat we used t o aid user m ode driver developm ent .

Ta ble 1 9 .1 . Su m m a r y of D a t a St r u ct u r e s D a t a St r u ct u r e

Loca t ion ( Use r Spa ce )

D e scr ipt ion

sched_param

/ usr/ include/ bit s/ sched.h I nform at ion relat ed t o scheduling priorit ies.

fb_var_screeninfo

/ usr/ include/ linux/ fb.h

Used t o operat e on fram e buffers. Cont ains variable screen inform at ion such as resolut ion and pixclock. See Chapt er 11 for m ore det ails.

sg_io_hdr_t

/ usr/ include/ scsi/ sg.h

I nform at ion t o m anage SCSI generic devices.

usb_dev_handle usb_bus usb_device

Header files in t he libusb St ruct ures t o operat e on USB devices from user package. space.

Ta ble 1 9 .2 . Su m m a r y of Use r - Spa ce Fu n ct ion s Use r - Spa ce Fu n ct ion

D e scr ipt ion

sched_getparam()

Obt ains scheduling param et ers associat ed wit h a given process

sched_setscheduler()

Set s scheduling param et ers associat ed wit h a given process

mlockall()

Locks pages of t he calling process in m em ory and t hus avoids page fault s

ioperm()

Cont rols access perm issions t o t he first 0x3FF I / O port s

iopl()

Cont rols access privileges t o all I / O port s

outb()/outw()/outl()

Out put s a byt e/ word/ long t o a specified port

inb()/inw()/inl()

I nput s a byt e/ word/ long from a specified port

mmap()

Associat es a file or a device address region wit h a chunk of user virt ual m em ory

msync()

Flushes changes m ade t o an m m ap- ed m em ory area

munmap()

Reverse of mmap()

usb_init() usb_find_buses()

Funct ions provided by t he libusb library t o help you operat e over usbfs

Use r - Spa ce Fu n ct ion usb_find_buses() usb_find_devices() usb_open() usb_control_msg() usb_close()

D e scr ipt ion

i2c_smbus_read_byte_data() i2c_smbus_write_word_data()

User- space I 2 C/ SMBus dat a access rout ines available as part of t he lm - sensors package

Ch a pt e r 2 0 . M or e D e vice s a n d D r ive r s I n Th is Ch a pt e r 578 ECC Report ing 583 Frequency Scaling 584 Em bedded Cont rollers 585 ACPI 587 I SA and MCA 588 FireWire 589 I nt elligent I nput / Out put 590 Am at eur Radio 590 Voice over I P 591 High- Speed I nt erconnect s

So far, we have devot ed a full chapt er t o each m aj or device driver class, but t here are several subdirect ories under drivers/ t hat we haven't yet descended int o. I n t his chapt er let 's vent ure inside som e of t hem at a brisk pace.

ECC Re por t in g Several m em ory cont rollers cont ain special silicon t o m easure t he fidelit y of st ored dat a using error correct ing codes ( ECCs) . The Error Det ect ion And Correct ion ( EDAC) driver subsyst em announces occurrences of m em ory error event s generat ed by ECC- aware m em ory cont rollers. Typical ECC DRAM chips have t he capabilit y t o correct single- bit errors ( SBEs) and det ect m ult ibit errors ( MBEs) . I n EDAC parlance, t he form er errors are correct able errors ( CEs) , whereas t he lat t er are uncorrect able errors ( UEs) . ECC operat ions are t ransparent t o t he operat ing syst em . This m eans t hat if your DRAM cont roller support s ECC, error correct ion and det ect ion occurs silent ly wit hout operat ing syst em part icipat ion. EDAC's t ask is t o report such event s and allow users t o fashion error handling policies ( such as replace a suspect DRAM chip) . The EDAC driver subsyst em consist s of t he following:

A core m odule called edac_m c t hat provides a set of library rout ines.

Separat e drivers for int eract ing wit h support ed m em ory cont rollers. For exam ple, t he driver m odule t hat works wit h t he m em ory cont roller t hat is part of t he I nt el 82860 Nort h Bridge is called i82860_edac.

EDAC report s errors via files in t he sysfs direct ory, / sys/ devices/ syst em / edac/ . I t also generat es m essages t hat can be gleaned from t he kernel error log. The layout of DRAM chips is specified in t erm s of t he num ber of chip- select s em anat ing from t he m em ory cont roller and t he dat a- t ransfer widt h ( channels) bet ween t he m em ory cont roller and t he CPU. The num ber of rows in t he DRAM chip array depends on t he form er, whereas t he num ber of colum ns hinge on t he lat t er. One of t he m ain aim s of EDAC is t o point t he needle of suspicion at problem DRAM chips, so t he EDAC sysfs node st ruct ure is designed according t o t he physical chip layout : / sys/ devices/ syst em / edac/ m c/ m cX/ csrowY/ corresponds t o chip- select row Y in m em ory cont roller X. Each such direct ory cont ains det ails such as t he num ber of det ect ed CEs ( ce_count) , UEs (ue_count) , channel locat ion, and ot her at t ribut es.

D e vice Ex a m ple : ECC- Aw a r e M e m or y Con t r olle r Let 's add EDAC support for a yet - unsupport ed m em ory cont roller. Assum e t hat you're put t ing Linux ont o a m edical grade device t hat is an em bedded x86 derivat ive. The Nort h Bridge chipset ( which includes t he m em ory cont roller as discussed in t he sidebar " The Nort h Bridge" in Chapt er 12, " Video Drivers" ) on your board is t he I nt el 855GME t hat is capable of ECC report ing. All DRAM banks connect ed t o t he 855GME on t his syst em are ECC- enabled chips because t his is a life- crit ical device. EDAC does not yet support t he 855GME, so let 's t ake a st ab at im plem ent ing it . ECC DRAM cont rollers have t wo m aj or ECC- relat ed regist ers: an error st at us regist er and an error address point er regist er, as shown in Table 20.1. When an ECC error occurs, t he form er cont ains t he st at us ( whet her t he error is an SBE or an MBE) , whereas t he lat t er cont ains t he physical address where t he error occurred. The EDAC core periodically checks t hese regist ers and report s result s t o user space via sysfs. From a configurat ion st andpoint , all devices inside t he 855GME appear t o be on PCI bus 0. The DRAM cont roller resides on device 0 of t his bus. DRAM int erface cont rol regist ers ( including t he ECC- specific regist ers) m ap int o t he corresponding PCI configurat ion space. To add EDAC support for t he 855GME, add hooks t o read t hese regist ers, as shown in List ing 20.1. Refer back t o Chapt er 10, " Peripheral Com ponent I nt erconnect ," for explanat ions on PCI device driver m et hods and dat a st ruct ures.

Ta ble 2 0 .1 . ECC- Re la t e d Re gist e r s on t h e D RAM Con t r olle r

ECC- Specific Regist ers Residing in t he DRAM Cont roller's PCI Configurat ion Space

Descript ion

I855_ERRSTS_REGISTER

The error st at us regist er, which signals occurrence of an ECC error. Shows whet her t he error is an SBE or an MBE.

I855_EAP_REGISTER

The error address point er regist er, which cont ains t he physical address where t he m ost recent ECC error occurred.

List in g 2 0 .1 . An ED AC D r ive r for t h e 8 5 5 GM E Code View: /* Based on drivers/edac/i82860_edac.c */ #define I855_PCI_DEVICE_ID

0x3584 /* PCI Device ID of the memory controller in the 855 GME */ #define I855_ERRSTS_REGISTER 0x62 /* Error Status Register's offset in the PCI configuration space */ #define I855_EAP_REGISTER 0x98 /* Error Address Pointer Register's offset in the PCI configuration space */ struct i855_error_info { u16 errsts; /* Error Type */ u32 eap; /* Error Location */ }; /* Get error information */ static void i855_get_error_info(struct mem_ctl_info *mci, struct i855_error_info *info) { struct pci_dev *pdev; pdev = to_pci_dev(mci->dev); /* Read error type */ pci_read_config_word(pdev, I855_ERRSTS_REGISTER, &info->errsts); /* Read error location */ pci_read_config_dword(pdev, I855_EAP_REGISTER, &info->eap); } /* Process errors */ static int i855_process_error_info(struct mem_ctl_info *mci, struct i855_error_info *info, int handle_errors) { int row; info->eap >>= PAGE_SHIFT; row = edac_mc_find_csrow_by_page(mci, info->eap); /* Find culprit row */ /* Handle using services provided by the EDAC core. Populate sysfs, generate error messages, and so on */ if (is_MBE()) { /* is_MBE() looks at I855_ERRSTS_REGISTER and checks for an MBE. Implementation not shown */

edac_mc_handle_ue(mci, info->eap, 0, row, "i855 UE"); } else if (is_SBE()) { /* is_SBE() looks at I855_ERRSTS_REGISTER and checks for an SBE. Implementation not shown */ edac_mc_handle_ce(mci, info->eap, 0, info->derrsyn, row, 0, "i855 CE"); } return 1; } /* This method is registered with the EDAC core from i855_probe() */ static void i855_check(struct mem_ctl_info *mci) { struct i855_error_info info; i855_get_error_info(mci, &info); i855_process_error_info(mci, &info, 1); } /* The PCI driver probe method, part of the pci_driver structure */ static int i855_probe(struct pci_dev *pdev, int dev_idx) { struct mem_ctl_info *mci; /* ... */ pci_enable_device(pdev); /* Allocate control memory for this memory controller. The 3 arguments to edac_mc_alloc() correspond to the amount of requested private storage, number of chip-select rows, and number of channels in your memory layout */ mci = edac_mc_alloc(0, CSROWS, CHANNELS); /* ... */ mci->edac_check = i855_check; /* Supply the check method to the EDAC core */ /* Do other memory controller initializations */ /* ... */ /* Register this memory controller with the EDAC core */ edac_mc_add_mc(mci, 0); /* ... */ } /* Remove method */ static void __devexit i855_remove(struct pci_dev *pdev) { struct mem_ctl_info *mci = edac_mc_find_mci_by_pdev(pdev); if (mci && !edac_mc_del_mc(mci)) { edac_mc_free(mci); /* Free memory for this controller. Reverse of edac_mc_alloc() */ } } /* PCI Device ID Table */ static const struct pci_device_id i855_pci_tbl[] __devinitdata = { {PCI_VEND_DEV(INTEL, I855_PCI_DEVICE_ID), PCI_ANY_ID, PCI_ANY_ID, 0, 0,},

{0,}, }; MODULE_DEVICE_TABLE(pci, i855_pci_tbl); /* pci_driver structure for this device. Re-visit Chapter 10 for a detailed explanation */ static struct pci_driver i855_driver = { .name = "855", .probe = i855_probe, .remove = __devexit_p(i855_remove), .id_table = i855_pci_tbl, }; /* Driver Initialization */ static int __init i855_init(void) { /* ... */ pci_rc = pci_register_driver(&i855_driver); /* ... */ }

Look at drivers/ edac/ * for EDAC source files and at Docum ent at ion/ drivers/ edac/ edac.t xt for det ailed sem ant ics of EDAC sysfs nodes.

Ch a pt e r 2 0 . M or e D e vice s a n d D r ive r s I n Th is Ch a pt e r 578 ECC Report ing 583 Frequency Scaling 584 Em bedded Cont rollers 585 ACPI 587 I SA and MCA 588 FireWire 589 I nt elligent I nput / Out put 590 Am at eur Radio 590 Voice over I P 591 High- Speed I nt erconnect s

So far, we have devot ed a full chapt er t o each m aj or device driver class, but t here are several subdirect ories under drivers/ t hat we haven't yet descended int o. I n t his chapt er let 's vent ure inside som e of t hem at a brisk pace.

ECC Re por t in g Several m em ory cont rollers cont ain special silicon t o m easure t he fidelit y of st ored dat a using error correct ing codes ( ECCs) . The Error Det ect ion And Correct ion ( EDAC) driver subsyst em announces occurrences of m em ory error event s generat ed by ECC- aware m em ory cont rollers. Typical ECC DRAM chips have t he capabilit y t o correct single- bit errors ( SBEs) and det ect m ult ibit errors ( MBEs) . I n EDAC parlance, t he form er errors are correct able errors ( CEs) , whereas t he lat t er are uncorrect able errors ( UEs) . ECC operat ions are t ransparent t o t he operat ing syst em . This m eans t hat if your DRAM cont roller support s ECC, error correct ion and det ect ion occurs silent ly wit hout operat ing syst em part icipat ion. EDAC's t ask is t o report such event s and allow users t o fashion error handling policies ( such as replace a suspect DRAM chip) . The EDAC driver subsyst em consist s of t he following:

A core m odule called edac_m c t hat provides a set of library rout ines.

Separat e drivers for int eract ing wit h support ed m em ory cont rollers. For exam ple, t he driver m odule t hat works wit h t he m em ory cont roller t hat is part of t he I nt el 82860 Nort h Bridge is called i82860_edac.

EDAC report s errors via files in t he sysfs direct ory, / sys/ devices/ syst em / edac/ . I t also generat es m essages t hat can be gleaned from t he kernel error log. The layout of DRAM chips is specified in t erm s of t he num ber of chip- select s em anat ing from t he m em ory cont roller and t he dat a- t ransfer widt h ( channels) bet ween t he m em ory cont roller and t he CPU. The num ber of rows in t he DRAM chip array depends on t he form er, whereas t he num ber of colum ns hinge on t he lat t er. One of t he m ain aim s of EDAC is t o point t he needle of suspicion at problem DRAM chips, so t he EDAC sysfs node st ruct ure is designed according t o t he physical chip layout : / sys/ devices/ syst em / edac/ m c/ m cX/ csrowY/ corresponds t o chip- select row Y in m em ory cont roller X. Each such direct ory cont ains det ails such as t he num ber of det ect ed CEs ( ce_count) , UEs (ue_count) , channel locat ion, and ot her at t ribut es.

D e vice Ex a m ple : ECC- Aw a r e M e m or y Con t r olle r Let 's add EDAC support for a yet - unsupport ed m em ory cont roller. Assum e t hat you're put t ing Linux ont o a m edical grade device t hat is an em bedded x86 derivat ive. The Nort h Bridge chipset ( which includes t he m em ory cont roller as discussed in t he sidebar " The Nort h Bridge" in Chapt er 12, " Video Drivers" ) on your board is t he I nt el 855GME t hat is capable of ECC report ing. All DRAM banks connect ed t o t he 855GME on t his syst em are ECC- enabled chips because t his is a life- crit ical device. EDAC does not yet support t he 855GME, so let 's t ake a st ab at im plem ent ing it . ECC DRAM cont rollers have t wo m aj or ECC- relat ed regist ers: an error st at us regist er and an error address point er regist er, as shown in Table 20.1. When an ECC error occurs, t he form er cont ains t he st at us ( whet her t he error is an SBE or an MBE) , whereas t he lat t er cont ains t he physical address where t he error occurred. The EDAC core periodically checks t hese regist ers and report s result s t o user space via sysfs. From a configurat ion st andpoint , all devices inside t he 855GME appear t o be on PCI bus 0. The DRAM cont roller resides on device 0 of t his bus. DRAM int erface cont rol regist ers ( including t he ECC- specific regist ers) m ap int o t he corresponding PCI configurat ion space. To add EDAC support for t he 855GME, add hooks t o read t hese regist ers, as shown in List ing 20.1. Refer back t o Chapt er 10, " Peripheral Com ponent I nt erconnect ," for explanat ions on PCI device driver m et hods and dat a st ruct ures.

Ta ble 2 0 .1 . ECC- Re la t e d Re gist e r s on t h e D RAM Con t r olle r

ECC- Specific Regist ers Residing in t he DRAM Cont roller's PCI Configurat ion Space

Descript ion

I855_ERRSTS_REGISTER

The error st at us regist er, which signals occurrence of an ECC error. Shows whet her t he error is an SBE or an MBE.

I855_EAP_REGISTER

The error address point er regist er, which cont ains t he physical address where t he m ost recent ECC error occurred.

List in g 2 0 .1 . An ED AC D r ive r for t h e 8 5 5 GM E Code View: /* Based on drivers/edac/i82860_edac.c */ #define I855_PCI_DEVICE_ID

0x3584 /* PCI Device ID of the memory controller in the 855 GME */ #define I855_ERRSTS_REGISTER 0x62 /* Error Status Register's offset in the PCI configuration space */ #define I855_EAP_REGISTER 0x98 /* Error Address Pointer Register's offset in the PCI configuration space */ struct i855_error_info { u16 errsts; /* Error Type */ u32 eap; /* Error Location */ }; /* Get error information */ static void i855_get_error_info(struct mem_ctl_info *mci, struct i855_error_info *info) { struct pci_dev *pdev; pdev = to_pci_dev(mci->dev); /* Read error type */ pci_read_config_word(pdev, I855_ERRSTS_REGISTER, &info->errsts); /* Read error location */ pci_read_config_dword(pdev, I855_EAP_REGISTER, &info->eap); } /* Process errors */ static int i855_process_error_info(struct mem_ctl_info *mci, struct i855_error_info *info, int handle_errors) { int row; info->eap >>= PAGE_SHIFT; row = edac_mc_find_csrow_by_page(mci, info->eap); /* Find culprit row */ /* Handle using services provided by the EDAC core. Populate sysfs, generate error messages, and so on */ if (is_MBE()) { /* is_MBE() looks at I855_ERRSTS_REGISTER and checks for an MBE. Implementation not shown */

edac_mc_handle_ue(mci, info->eap, 0, row, "i855 UE"); } else if (is_SBE()) { /* is_SBE() looks at I855_ERRSTS_REGISTER and checks for an SBE. Implementation not shown */ edac_mc_handle_ce(mci, info->eap, 0, info->derrsyn, row, 0, "i855 CE"); } return 1; } /* This method is registered with the EDAC core from i855_probe() */ static void i855_check(struct mem_ctl_info *mci) { struct i855_error_info info; i855_get_error_info(mci, &info); i855_process_error_info(mci, &info, 1); } /* The PCI driver probe method, part of the pci_driver structure */ static int i855_probe(struct pci_dev *pdev, int dev_idx) { struct mem_ctl_info *mci; /* ... */ pci_enable_device(pdev); /* Allocate control memory for this memory controller. The 3 arguments to edac_mc_alloc() correspond to the amount of requested private storage, number of chip-select rows, and number of channels in your memory layout */ mci = edac_mc_alloc(0, CSROWS, CHANNELS); /* ... */ mci->edac_check = i855_check; /* Supply the check method to the EDAC core */ /* Do other memory controller initializations */ /* ... */ /* Register this memory controller with the EDAC core */ edac_mc_add_mc(mci, 0); /* ... */ } /* Remove method */ static void __devexit i855_remove(struct pci_dev *pdev) { struct mem_ctl_info *mci = edac_mc_find_mci_by_pdev(pdev); if (mci && !edac_mc_del_mc(mci)) { edac_mc_free(mci); /* Free memory for this controller. Reverse of edac_mc_alloc() */ } } /* PCI Device ID Table */ static const struct pci_device_id i855_pci_tbl[] __devinitdata = { {PCI_VEND_DEV(INTEL, I855_PCI_DEVICE_ID), PCI_ANY_ID, PCI_ANY_ID, 0, 0,},

{0,}, }; MODULE_DEVICE_TABLE(pci, i855_pci_tbl); /* pci_driver structure for this device. Re-visit Chapter 10 for a detailed explanation */ static struct pci_driver i855_driver = { .name = "855", .probe = i855_probe, .remove = __devexit_p(i855_remove), .id_table = i855_pci_tbl, }; /* Driver Initialization */ static int __init i855_init(void) { /* ... */ pci_rc = pci_register_driver(&i855_driver); /* ... */ }

Look at drivers/ edac/ * for EDAC source files and at Docum ent at ion/ drivers/ edac/ edac.t xt for det ailed sem ant ics of EDAC sysfs nodes.

Fr e qu e n cy Sca lin g The CPU frequency (cpufreq) driver subsyst em aids power m anagem ent by scaling CPU frequencies on- t he- fly. I f you use a suit able scaling algorit hm ( called a governor) , your device's bat t ery can pot ent ially last longer. Cpufreq support s several archit ect ures such as x86, ARM, and PowerPC. To obt ain cpufreq capabilit ies, you also need t o enable a suit able processor driver ( say, t he I nt el Enhanced SpeedSt ep driver if you are using a SpeedSt ep- enabled CPU such as Pent ium M) . You can cont rol cpufreq's behavior via files in t he / sys/ devices/ syst em / cpu/ cpuX/ cpufreq/ direct ory, where X is t he CPU num ber. To set m axim um and m inim um frequency scaling lim it s, writ e desired values t o scaling_max_freq and scaling_min_freq, respect ively. To see a list of support ed cpufreq governors, look at t he cont ent s of scaling_available_governors. The kernel support s several governors:

The perform ance governor st at ically set s t he CPU frequency t o scaling_max_freq.

Powersave set s t he CPU frequency t o scaling_min_freq.

Ondem and adj ust s t he frequency depending on CPU load.

Conservat ive is a variant of ondem and where t he speed change occurs sm oot hly in gradual st eps.

Userspace let s applicat ions dict at e t he scaling t echnique. Som e dist ribut ions set t he governor t o userspace and im plem ent t he scaling algorit hm via a daem on called cpuspeed, which is spawned during boot .

You m ay also im plem ent your own kernel governor using t he cpufreq_register_governor() int erface.

Each support ed governor is im plem ent ed as a kernel m odule. To see cpufreq in act ion, assign a governor and vary t he syst em load: bash> cd /sys/devices/system/cpu/cpu0/cpufreq bash>cat scaling_max_freq Maximum frequency 1700000 bash> cat scaling_min_freq Minimum frequency 600000 bash> cat cpuinfo_cur_freq Current frequency 600000 bash> cat scaling_governor Scaling algorithm in use powersave bash> cat scaling_available_frequencies 1700000 1400000 1200000 1000000 800000 600000 bash> cat scaling_available_governors conservative ondemand powersave userspace performance bash> echo conservative > scaling_governor Assign 'conservative' governor bash> ls -lR / Switch to another terminal and load your system by recursively

traversing all directories.

I f you now m onit or t he running frequency by looking at / sys/ devices/ syst em / cpu/ cpu0/ cpufreq/ cpuinfo_cur_freq, you will see it dancing t o t he t une of t he CPU load. The CPU scaling code lives in t he drivers/ cpufreq/ direct ory. Look at Docum ent at ion/ cpu- freq/ * for t he det ailed sem ant ics of cpufreq sysfs nodes.

Em be dde d Con t r olle r s Not ebook com put ers and t heir derivat ives usually cont ain a built - in em bedded cont roller ( EC) t o t ake care of various side responsibilit ies, including t he following:

I nt erfacing wit h t he keyboard cont roller

Managing t herm al event s

Handling special but t ons and LEDs

Cont rolling syst em and CPU fan speeds

Monit oring bat t ery volt age

Most of t hese funct ions are specific t o t he OEM's hardware im plem ent at ion. Different OEMs use different ECs; I BM/ Lenovo lapt ops, for exam ple, em bed a Renesas H8 m icrocont roller t o assist t he m ain processor. The int erface t o access t he EC, however, is st andard irrespect ive of t he m ake of t he cont roller. The BI OS and t he operat ing syst em operat e on I / O port 0x80 t o read inform at ion from t he EC and I / O port 0x81 t o writ e dat a t o t he EC. On deskt ops, t hese port s provide access t o t he keyboard cont roller rat her t han t o a general- purpose EC. The next sect ion refers t o an exam ple driver t hat det ect s t elem et ry st rengt h by accessing EC m em ory space.

ACPI Advanced Configurat ion and Power I nt erface ( ACPI ) is a power- m anagem ent specificat ion t hat replaces earlier st andards such as Advanced Power Managem ent ( APM) . ACPI is responsible for t ransit ioning t he syst em bet ween power st at es. I t also has t he t ask of int erfacing wit h devices and sensors connect ed t o t he EC. Such devices are called ACPI devices, and m em ory devot ed t o handle t hem is called ACPI space. As you saw elsewhere in t his book, low- level code is not t he place t o im plem ent policy. This was t he m ain problem wit h APM, where m ost of t he power- m anagem ent policies were part of BI OS firm ware. ACPI shift s policy one level up, t o t he operat ing syst em . Using a daem on called acpid, ACPI even allows policy t o be pushed one m ore level up, t o user- space configurat ion files. By adding rules t o an acpid configurat ion file, you can decide what t o do when a hot key is pressed or when a t herm al t rip occurs. Even wit h ACPI , low- level BI OS firm ware ret ains t he responsibilit y of int erfacing wit h hardware and det ect ing ACPI event s such as a power but t on press or a t herm al sensor report . To perform t his, t he BI OS ut ilizes a special x86 execut ion m ode t riggered via syst em m anagem ent int errupt s ( SMI s) . The SMI execut ion m ode is t ransparent t o t he operat ing syst em . To not ify t he operat ing syst em about ACPI event s det ect ed in SMI m ode, t he BI OS assert s a syst em cont rol int errupt ( SCI ) . Look at drivers/ acpi/ osl.c for t he Linux ACPI code t hat request s t he SCI I RQ. Linux ACPI com ponent s include t he following:

1 . A core layer t hat provides ACPI essent ials such as t he ACPI Machine Language ( AML) int erpret er. ACPI specific BI OS code is writ t en in AML, a language t hat runs on a virt ual m achine im plem ent ed by t he operat ing syst em 's AML int erpret er.

2 . ACPI drivers for int erfacing wit h st andard com ponent s such as t he EC (drivers/ acpi/ ec.c) , but t ons (drivers/ acpi/ but t on.c) , and fan (drivers/ acpi/ fan.c) . OEM- specific drivers offer support for feat ures not support ed by t he st andard ACPI drivers. For exam ple, drivers/ m isc/ t hinkpad_acpi.c [ 1 ] is t he OEM- specific driver t hat im plem ent s ext ras for I BM/ Lenovo Thinkpads. On an I BM/ Lenovo Thinkpad, t he files under / proc/ acpi/ are generat ed by t he st andard ACPI drivers, whereas t hose in / proc/ acpi/ ibm / are produced by t he OEM- specific driver. So, t o get t he current t em perat ure, do t his:

[ 1]

Prior t o 2.6.22, t his driver used t o be drivers/ acpi/ ibm _acpi.c.

bash> cat /proc/acpi/thermal_zone/THM0/temperature temperature: 39 C But t o t urn on t he night light on t op of t he LCD display, get help from t he OEM- specific driver: bash> echo on > /proc/acpi/ibm/light 3 . A kernel t hread kacpid t hat ACPI uses t o queue work for execut ion.

4 . I ndividual device drivers t hat use ACPI 's services t o respond t o t ransit ions in t he syst em 's power st at e. To achieve t his, drivers regist er suspend() and resume() m et hods wit h t he kernel's device m odel. We alluded t o t hese m et hods while discussing t he platform_driver st ruct ure in Chapt er 6 , " Serial Drivers," t he spi_driver st ruct ure in Chapt er 8 , " The I nt er- I nt egrat ed Circuit Prot ocol," t he pcmcia_driver st ruct ure in Chapt er 9 , " PCMCI A and Com pact Flash," and t he pci_driver st ruct ure in Chapt er 10.

5 . User- space t ools such as acpit ool, which report t he st at e of various ACPI devices, show t herm al zone inform at ion and suspend t he syst em t o different sleep st at es: bash> acpitool Battery #1 : charging, 69.08%, 01:14:02 AC adapter : on-line Thermal zone 1 : ok, 38 C 6 . The acpid daem on, which is t he policy enabler for ACPI event s. I t list ens on / proc/ acpi/ event s for powerm anagem ent event s report ed by t he kernel. When you press t he power but t on or when a t herm al t rip occurs, t he kernel ACPI driver dispat ches an event t o user space via / proc/ acpi/ event s. Acpid reads t his, passes it t hrough configurat ion script s present in / et c/ acpi/ event s/ and t akes specified act ions. Assum e t hat you want t o execut e a specific program ( / bin/ lidhandler ) when your lapt op's lid but t on is pressed. For t his, add t he following t o / et c/ acpi/ event s/ acpi_handler.sh : event=button/lid.* action=/bin/lidhandler You m ay use cpufreq in t andem wit h ACPI . You can, for exam ple, add t his line inside / bin/ lidhandler t o drop down t he processor frequency when you shut your lapt op's lid: echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor You can download t he ACPI specificat ions from www.acpi.info. As an exercise, consider t hat you have a t elem et ry card [ 2] built in t o an em bedded lapt op derivat ive, and t hat t he EC is connect ed t o a sensor t hat m easures t elem et ry st rengt h. To access t elem et ry st rengt h via / proc/ acpi/ ( or / sys/ bus/ acpi/ ) , updat e t he corresponding lapt op m odel's " ext ras" driver present in drivers/ m isc/ . I f your board is a derivat ive of an I BM/ Lenovo Thinkpad, for exam ple, m odify drivers/ m isc/ t hinkpad_acpi.c accordingly. You m ay use t he ec_read() and ec_write() kernel funct ions t o access t he locat ion t hat st ores t elem et ry st rengt h in t he EC's ACPI space.

[ 2]

We developed a driver for an exam ple t elem et ry card in Chapt er 11, " Universal Serial Bus."

I SA a n d M CA The I ndust ries St andard Archit ect ure ( I SA) st art ed as a bus for int erfacing I / O devices wit h t he PC but evolved int o a de fact o st andard. I SA drivers would have m erit ed a separat e chapt er several years earlier; but t oday, wit h t he advent of t he PCI bus, I SA has all but disappeared. There are t wo m ain bus- specific fact ors t hat I SA device drivers have t o cont end wit h:

I SA does not offer st andard int erfaces t hat drivers can use t o det ect resource inform at ion t hat is elect rically wired or assigned by boot firm ware. I m plem ent ing com plex probing logic, oft en leveraging device- specific quirks, is an im port ant part of I SA driver init ializat ion. This is unlike t he PCI bus, where t he device driver can cleanly decipher t he ident it y of resources such as int errupt request lines and I / O base addresses assigned by boot firm ware. You learned how t o do t his when we discussed t he PCI configurat ion space in Chapt er 10. We also briefly looked at I SA probing in t he sect ion " I SA Net work Drivers" in Chapt er 15, " Net work I nt erface Cards." The I SA Plug- and- Play ( PnP) specificat ion at t em pt s t o bring a degree of aut oconfigurabilit y t o I SA, however.

The I SA bus has a widt h of 24 bit s, so devices can access only t he low 16MB of syst em m em ory. To DMA net work dat a from an I SA Et hernet card, for exam ple, DMA buffers have t o reside in t he low 16MB range called ZONE_DMA. The Ext ended I ndust ry St andard Archit ect ure ( EI SA) , however, widens t he I SA bus t o 32 bit s. You can plug I SA devices int o EI SA slot s.

Today, t he LPC bus is used rat her t han t he I SA bus t o connect legacy peripherals t o t he CPU on PC- com pat ible syst em s. We discussed LPC devices such as Super I / O chipset s, firm ware hubs, and t herm al sensors in earlier chapt ers. The Micro- Channel Archit ect ure ( MCA) bus overcom es m any of t he lim it at ions of t he I SA fam ily. MCA support s bus m ast ering, aut oconfigurat ion, and 32- bit bus widt hs. Though t echnologically superior t o I SA, MCA didn't becom e as popular because of it s propriet ary nat ure. Look at drivers/ net / t okenring/ skisa.c for a sam ple I SA driver for a Token Ring card. The I BM Token Ring driver, drivers/ net / t okenring/ ibm t r.c, support s I SA, PnP, and MCA form fact ors of I BM Token Ring hardware. The 3COM Et hernet driver, drivers/ net / 3c509 .c, drives MCA, PnP, and EI SA form fact ors of a 3COM Et hernet card. The kernel provides core rout ines for t he use of PnP, EI SA, and MCA drivers. These im plem ent at ions live in drivers/ pnp/ , drivers/ eisa/ , and drivers/ m ca/ , respect ively.

Fir e W ir e FireWire, or I EEE 1394, is a high- speed serial bus prot ocol invent ed by Apple for connect ing peripheral devices t o a syst em . I t provides dat a rat es of up t o 800Mbps ( I EEE 1394b) . Figure 10.1 in Chapt er 10 shows t he connect ion of t he FireWire cont roller on an x86- based lapt op. FireWire is sim ilar t o USB 2.0 in t hat bot h are ext ernal I / O buses t hat support high speeds and device hot plugging. FireWire, however, is a peer- t o- peer prot ocol, unlike t he m ast er- slave USB 2.0, so t wo FireWireenabled devices can exchange inform at ion wit hout t he int ervent ion of a PC. Because of t his charact erist ic, FireWire is popular on m ult im edia devices such as cam corders. As you learned in Chapt er 11, t he On- The- Go supplem ent brings peer- t o- peer capabilit y t o USB 2.0, t oo. FireWire on Linux is archit ect ed as follows:

Device drivers such as ohci1394 t hat int erface wit h FireWire cont rollers.

Prot ocol drivers for applicat ions such as st orage, video, and net working. The FireWire Serial Bus Prot ocol 2 ( SBP2) driver is a low- level FireWire prot ocol driver t hat let s you use your FireWire st orage m edia as you would use a SCSI disk or a USB m ass st orage device. SBP2 has t o be used in t andem wit h a high- level SCSI driver such as sd_m od ( for disks) or sr _m od ( for CD- ROMs) . Applicat ions such as cdrecord work over FireWire CD drives j ust as t hey work wit h USB CD drives. The dv1394 and video1394 prot ocol drivers enable you t o capt ure video via FireWire, and t he et h1394 prot ocol driver let s you run TCP/ I P over FireWire.

A FireWire core t hat provides services t o bot h previously m ent ioned.

User- space libraries such as libraw1394 t hat assist in developing FireWire- aware applicat ions.

Look at drivers/ ieee1394/ * for FireWire kernel sources and go t o www.linux1394.org for det ailed docum ent at ion. St art ing wit h t he 2.6.22 release, t he kernel has an alt ernat e, slim m er FireWire st ack living in t he drivers/ firewire/ direct ory.

I n t e llige n t I n pu t / Ou t pu t I nt elligent I nput / Out put ( or I 2O) is a st andard t hat calls for offloading I / O act ivit ies from t he m ain processor t o an I / O coprocessor residing on an I 2O adapt er. I 2O is largely defunct t oday, and t he I 2O Special I nt erest Group ( I 2O SI G) has ceased t o exist . However, m any operat ing syst em s, including Linux, cont inue support for I 2O. I 2O hardware is available for t echnologies such as SCSI , RAI D, and net working. I 2O part it ions t he soft ware archit ect ure int o an OS- specific m odule ( OSM) running on t he m ain processor and a hardware- specific m odule ( HDM) execut ing on t he I 2O adapt er. HDMs are OS- agnost ic and can be reused across operat ing syst em s, so t he OSMs are rendered sim pler. Linux support s I 2O in t he form of an I 2O core, drivers for I 2O adapt ers, and various OSMs. Look at t he Linux I 2O hom e page at ht t p: / / i2o.shadowconnect .com and t he sources in drivers/ m essage/ i2o/ for m ore det ails.

Am a t e u r Ra dio Am at eur ( ham ) radio is a packet radio t echnology used for round- t he- world com m unicat ion by hobbyist s. I t 's also oft en used t o respond t o calam it ies such as floods and cyclones. To use am at eur radio on Linux, you need t he following:

A low- level m odem driver t o access your radio. Modem drivers for several am at eur radio devices are present in drivers/ net / ham radio/ .

One or m ore packet prot ocols such as AX.25, Rose, and Net rom . The AX.25 prot ocol is an adapt at ion of t he X.25 prot ocol for am at eur radio. Look at t he Linux Am at eur Radio AX.25 HOWTO for an explanat ion of t he prot ocol, t he net / ax25/ direct ory for t he sources, and ht t p: / / ham s.sourceforge.net for user- space ut ilit ies and libraries t hat operat e over AX.25. Rose ( net / rose/ ) and Net rom ( net / net rom / ) are net work prot ocols t hat use AX.25 as t he dat a link layer. You can writ e Linux socket applicat ions t hat run over AX.25, Rose, and Net rom using t he AF_AX25, AF_ROSE, and AF_NETROM prot ocol fam ilies, respect ively.

Voice ove r I P Voice over I nt ernet Prot ocol ( VoI P) is a t echnology t hat uses t he I nt ernet t o carry voice t raffic. VoI P let s you m ake voice- qualit y t elephone calls at cheap rat es. There are several PCI - , PC Card- , and USB- based VoI P solut ions available for t he PC environm ent . Device drivers for several of t hese cards are available on Linux. Not m any are int egrat ed int o t he m ainline kernel, however. The drivers/ t elephony/ direct ory cont ains drivers for a few VoI P devices and a regist rat ion API t hat fut ure drivers can use. Wit h t he increasing popularit y of Linux in t he em bedded t elecom space, t here are several Linux I P t elephones in t he m arket t oday. Figure 20.1 shows a VoI P- enabled device having a hardware voice codec t hat im plem ent s st andards such as G.711 and G.729 for encoding and decoding voice st ream s. The device draws power using a t echnology called Power over Et hernet ( PoE) t hat t ransm it s power along wit h t he Et hernet cable. A device driver com m unicat es wit h t he VoI P hardware.

Figu r e 2 0 .1 . A VoI P ph on e .

[ View full size im age]

VoI P drivers work in t andem wit h t ransport prot ocols such as Real Tim e Transport Prot ocol ( RTP) and call cont rol signaling st acks such as Session I nit iat ion Prot ocol ( SI P) and H.323. On t op of t hese prot ocols sit various I P t elephony applicat ions. Solut ions t hat im plem ent VoI P codecs in soft ware are also popular in t he em bedded space. They usually reside in user space and int eract wit h t he following:

Kernel audio drivers using OSS or ALSA API s

Kernel net work drivers using t he socket API

SoCs orient ed t oward t he Video- and- Voice over I P ( V2I P) m arket usually cont ain hardware support for video codecs such as H.264. I f you are put t ing Linux ont o a V2I P phone, you need t o im plem ent drivers t o int erface wit h such codecs, t oo.

H igh - Spe e d I n t e r con n e ct s High- speed int erconnect ing t echnologies such as I nfiniBand, RapidI O, Hyper- Transport , and 10 Gigabit Et hernet are not com m on in t he PC or low- end em bedded environm ent s. You are m ore likely t o find t hem on clust ers, blade servers, gam ing syst em s, swit ches, or high- speed rout ers. Net working t echnologies such as Fibre Channel and I nt ernet SCSI ( iSCSI ) can be found in ent erprise environm ent s served by st orage- area net works ( SANs) . Let 's peek at t he driver subsyst em s for som e of t hese t echnologies.

I n fin iBa n d I nfiniBand is a high- speed serial bus st andard originally int ended t o replace PCI . PCI Express, however, has becom e t he accept ed fut ure of syst em buses. Today, I nfiniBand t echnology is com m only used in blade server designs t o provide a high- perform ance st orage and net working fabric. I nfiniBand support s Rem ot e DMA ( RDMA) , which allows dat a t o be DMA- ed from t he m em ory of one com put er syst em t o anot her. The Linux I nfiniBand subsyst em includes core support for I nfiniBand, device drivers for host channel adapt ers, and an im plem ent at ion of I P over I nfiniBand. Look inside drivers/ infiniband/ for t he Linux I nfiniBand subsyst em and at Docum ent at ion/ infiniband/ * for relat ed docum ent at ion.

Ra pidI O RapidI O is anot her high- speed serial bus t echnology, which is used for connect ing boards via a back plane. I t support s speeds of t he order of 10Gbps. An exam ple processor t hat support s RapidI O is t he power- based MPC8540 from Freescale, t arget ed at em bedded devices such as net work rout ers and swit ches. The Linux RapidI O subsyst em provides a core set of rout ines t hat can be used t o drive devices on t he RapidI O bus. There are t wo ways t o com m unicat e over a - RapidI O int erconnect :

1 . Short , out - of- band m essages using doorbells. Doorbell services provided by t he RapidI O core are rio_request_inb_dbell(), rio_release_inb_dbell(), rio_request_outb_dbell(), and rio_release_outb_dbell().

2 . High- bandwidt h dat a delivery using m ailboxes. Mailbox services provided by t he RapidI O core are rio_request_inb_mbox(), rio_release_inb_mbox(), rio_request_outb_mbox(), and rio_release_outb_mbox().

Take a look inside drivers/ rapidio/ for t he sources.

Fibr e Ch a n n e l Fibre Channel is a m odern high- speed serial bus prot ocol used t o t alk wit h st orage syst em s. Fibre Channel int erface cards have fiber- opt ic port s t o t alk t o st orage devices on SANs. Fibre Channel is com pat ible wit h SCSI , so a Fibre Channel device driver is essent ially a SCSI driver wit h ext ras t o handle fiber channels. Linux support s a Fibre Channel core and device drivers t o handle Fibre Channel hardware. Look inside drivers/ fc4/ for t he sources.

iSCSI iSCSI is anot her SAN t echnology. I t allows t he t ransport of SCSI packet s over TCP/ I P net works. Wit h iSCSI , a rem ot e block device appears t o your syst em as local st orage. The rem ot e syst em owning t he st orage is called an iSCSI t arget , and local syst em s using t he st orage are called iSCSI init iat ors. Linux support s iSCSI via a kernel driver, drivers/ scsi/ iscsi_t cp.c, and a user- space daem on called iscsid. The hom e page of t he Linux- iSCSI proj ect is at ht t p: / / linux- iscsi.sourceforge.net .

Ch a pt e r 2 1 . D e bu ggin g D e vice D r ive r s I n Th is Ch a pt e r 596 Kernel Debuggers 609 Kernel Probes 620 Kexec and Kdum p 629 Profiling 634 Tracing 638 Linux Test Proj ect 638 User Mode Linux 638 Diagnost ic Tools 639 Kernel Hacking Config Opt ions 640 Test Equipm ent

Now t hat we have learned how t o im plem ent diverse classes of device drivers, let 's t ake a st ep back and explore som e debugging t echniques. I nvest ing t im e in logic design and soft ware engineering before code developm ent and st aring hard at t he code aft er developm ent can m inim ize or even elim inat e bugs. But because t hat is easier said t han done, and because we are all hum ans, developers need debugging t ools. I n t his chapt er, let 's look at a variet y of m et hods t o debug kernel code.

Reliabilit y, Availabilit y, Serviceabilit y Many syst em s, especially m ission crit ical ones, have a need for reliabilit y, availabilit y, and serviceabilit y ( RAS) . The Linux RAS effort has result ed in t he developm ent of several powerful t ools. Exercisers such as t he Linux Test Proj ect ( LTP) m easure t he reliabilit y and robust ness of your kernel port . CPU hot plugging and t he Linux High Availabilit y ( HA) proj ect can be seen in t he cont ext of availabilit y. Kernel debuggers, Kprobes, Kdum p, EDAC, and t he Linux Trace Toolkit ( LTT) com e under t he am bit of serviceabilit y. The line dividing t hese classificat ions is som et im es t hin; you can use any or a com binat ion of t hese m et hods t o suit your debugging needs. For exam ple, out put from a kernel profiler such as OProfile can be used eit her t o search for pot ent ial code bot t lenecks ( reliabilit y) or t o debug a field problem ( serviceabilit y) .

Ke r n e l D e bu gge r s The Linux kernel has no built - in debugger support . Whet her t o include a debugger as part of t he st ock kernel is an oft - debat ed point in kernel m ailing list s. The inst ruct ion level Kernel Debugger ( kdb) and t he source- level Kernel GNU Debugger ( kgdb) are t he t wo m ain Linux kernel debuggers. As of t oday, whet her you use kdb or kgdb, you need t o download relevant pat ches and apply t hem t o your kernel sources. Even if you want t o st ay away from t he hassle of pat ching your kernel sources wit h debugger support , you can glean inform at ion about kernel panics and peek at kernel variables via t he plain GNU Debugger ( gdb) . JTAG debuggers use hardwareassist ed debugging and are powerful, but expensive. Kernel debuggers m ake kernel int ernals m ore t ransparent . You can single- st ep t hrough inst ruct ions, disassem ble inst ruct ions, display and m odify kernel variables, and look at st ack t races. I n t his chapt er, let 's learn t he basics of kernel debuggers wit h t he help of som e exam ples.

En t e r in g a D e bu gge r You can ent er a kernel debugger in m ult iple ways. One way is t o pass com m and- line argum ent s t hat ask t he kernel t o ent er t he debugger during boot . Anot her way is via soft ware or hardware br eakpoints. A breakpoint is an address where you want execut ion st opped and cont rol t ransferred t o t he debugger. A soft ware breakpoint replaces t he inst ruct ion at t hat address wit h som et hing else t hat causes an except ion. You m ay set soft ware breakpoint s eit her using debugger com m ands or by insert ing t hem int o your code. For x86- based syst em s, you can set a soft ware breakpoint in your kernel source code as follows: asm(" int $3");

Alt ernat ively, you can invoke t he BREAKPOINT m acro, which t ranslat es t o t he appropriat e archit ect ure- dependent inst ruct ion. You m ay use hardware breakpoint s in place of soft ware breakpoint s if t he inst ruct ion where you need t o st op is in flash m em ory, where it cannot be replaced by t he debugger. A hardware breakpoint needs processor support . The corresponding address has t o be added t o a debug regist er. You can only have as m any hardware

breakpoint s as t he num ber of debug regist ers support ed by t he processor. You m ay also ask a debugger t o set a wat chpoint on a variable. The debugger st ops execut ion whenever an inst ruct ion m odifies dat a at t he wat chpoint address. Yet anot her com m on m et hod t o ent er a debugger is by hit t ing an at t ent ion key, but t here are inst ances when t his won't work. I f your code is sit t ing in a t ight loop aft er disabling int errupt s, t he kernel will not get a chance t o process t he at t ent ion key and ent er t he debugger. For exam ple, you can't ent er t he debugger via an at t ent ion key if your code does som et hing like t his: unsigned long flags; local_irq_save(flags); while (1) continue; local_irq_restore(flags);

When cont rol is t ransferred t o t he debugger, you can st art your analysis using various debugger com m ands.

Ke r n e l D e bu gge r ( k db) Kdb is an inst ruct ion- level debugger used for debugging kernel code and device drivers. Before you can use it , you need t o pat ch your kernel sources wit h kdb support and recom pile t he kernel. ( Refer t o t he sect ion "Dow nloads" for inform at ion on downloading kdb pat ches.) The m ain advant age of kdb is t hat it 's easy t o set up, because you don't need an addit ional m achine t o do t he debugging ( unlike kgdb) . The m ain disadvant age is t hat you need t o correlat e your sources wit h disassem bled code ( again, unlike kgdb) . Let 's wet our t oes in kdb wit h t he help of an exam ple. Here's t he crim e scene: You have m odified a kernel serial driver t o work wit h your x86- based hardware. But t he driver isn't working, and you would like kdb t o help nab t he culprit . Let 's st art our search for fingerprint s by set t ing a breakpoint at t he serial driver open() ent ry point . Rem em ber, because kdb is not a source- level debugger, you have t o open your sources and t ry t o m at ch t he inst ruct ions wit h your C code. Let 's list t he source snippet in quest ion: drivers/serial/myserial.c: static int rs_open(struct tty_struct *tty, struct file *filp) { struct async_struct *info; /* ... */ retval = get_async_struct(line, &info); if (retval) return(retval); tty->driver_data = info; /* Point A */ /* ... */ }

Press t he Pause key and ent er kdb. Let 's find out how t he disassem bled rs_open() looks. As usual, all debug sessions shown in t his chapt er at t ach explanat ions using t he sym bol. Entering kdb (current=0xc03f6000, pid 0) on processor 0 due to Keyboard Entry

kdb> id rs_open 0xc01cce00 rs_open: 0xc01cce03 rs_open+0x03: ... 0xc01cce4b rs_open+0x4b: ... 0xc01cce56 rs_open+0x56: 0xc01cce5a rs_open+0x5a: ...

Disassemble rs_open sub $0x1c, %esp mov $ffffffed, %ecx call 0xc01ccca0, get_async_struct mov 0xc(%esp,1), %eax mov %eax, 0x9a4(%ebx)

Point A in t he source code is a good place t o at t ach a breakpoint because you can peek at bot h t he tty st ruct ure and t he info st ruct ure t o see what 's going on. Looking side by side at t he source and t he disassem bly, rs_open+0x5a corresponds t o Point A. Not e t hat correlat ion is easier if t he kernel is com piled wit hout opt im izat ion flags. Set a breakpoint at rs_open+0x5a ( which is address 0xc01cce5a) and cont inue execut ion by exit ing t he debugger : kbd> bp rs_open+0x5a kbd> go

Set breakpoint Continue execution

Now you need t o get t he kernel t o call rs_open() t o hit t he breakpoint . To t rigger t his, execut e an appropriat e user- space program . I n t his case, echo som e charact ers t o t he corresponding serial port ( / dev/ t t ySX) : bash> echo "kerala monsoons" > /dev/ttySX

This result s in t he invocat ion of rs_open(). The breakpoint get s hit , and kdb assum es cont rol: Entering kdb on processor 0 due to Breakpoint @ 0xc01cce5a kdb>

Let 's now find out t he cont ent s of t he info st ruct ure. I f you look at t he disassem bly, one inst ruct ion before t he breakpoint ( rs_open+0x56) , you will see t hat t he EAX regist er cont ains t he address of t he info st ruct ure. Let 's look at t he regist er cont ent s: kbd> r Dump register contents eax = 0xcf1ae680 ebx = 0xce03b000 ecx = 0x00000000 ...

So, 0xcf1ae680 is t he address of t he info st ruct ure. Dum p it s cont ent s using t he md com m and: kbd> md 0xcf1ae680 Memory dump 0xcf1ae680 00005301 0000ABC 00000000 10000400 ...

To m ake sense of t his dum p, let 's look at t he corresponding st ruct ure definit ion. info is defined as struct async_struct in include/ linux/ serialP.h as follows:

struct async_struct { int magic; unsigned long port; int hub6; /* ... */ };

/* Magic Number */ /* I/O Port */

I f you m at ch t he dum p wit h t he definit ion, 0x5301 is t he m agic num ber and 0xABC is t he I / O port . Well, isn't t his int er est ing! 0xABC doesn't look like a valid port . I f you have done enough serial port debugging, you will know t hat t he I / O port base addresses and I RQs are configured in include/ asm - x86/ serial.h for x86- based hardware. Change t he port definit ion t o t he correct value, recom pile t he kernel, and cont inue your t est ing!

Ke r n e l GN U D e bu gge r ( k gdb) Kgdb is a source- level debugger. I t is easier t o use t han kdb because you don't have t o spend t im e correlat ing assem bly code wit h your sources. However it 's m ore difficult t o set up because an addit ional m achine is needed t o front - end t he debugging. You have t o use gdb in t andem wit h kgdb t o st ep t hrough kernel code. gdb runs on t he host m achine, while t he kgdb- pat ched kernel ( refer t o t he " Dow nloads" sect ion for inform at ion on downloading kgdb pat ches) runs on t he t arget hardware. The host and t he t arget are connect ed via a serial null- m odem cable, as shown in Figure 21.1. [ 1]

[ 1]

You can also launch kgdb debug sessions over Et hernet .

Figu r e 2 1 .1 . Kgdb se t u p.

You have t o inform t he kernel about t he ident it y and baud rat e of t he serial port via com m and- line argum ent s. Depending on t he boot loader used, add t he following kernel argum ent s t o eit her syslinux.cfg, lilo.conf, or grub.conf: kgdbwait kgdb8250=X,115200

kgdbwait asks t he kernel t o wait unt il a connect ion is est ablished wit h t he host - side gdb, X is t he serial port connect ed t o t he host , and 115200 is t he baud rat e used for com m unicat ion. Now configure t he sam e baud rat e on t he host side:

bash> stty speed 115200 < /dev/ttySX

I f your host com put er is a lapt op t hat does not have a serial port , you m ay use a USB- t o- serial convert er for t he debug session. I n t hat case, inst ead of / dev/ t t ySX, use t he / dev/ t t yUSBX node creat ed by t he usbserial driver. Figure 6.4 of Chapt er 6 , " Serial Drivers," illust rat es t his scenario. Let 's learn kgdb basics using t he exam ple of a buggy kernel m odule. Modules are easier t o debug because t he ent ire kernel need not be recom piled aft er m aking code changes, but rem em ber t o com pile your m odule wit h t he -g opt ion t o generat e sym bolic inform at ion. Because m odules are dynam ically loaded, t he debugger needs t o be inform ed about t he sym bolic inform at ion t hat t he m odule cont ains. List ing 21.1 cont ains a buggy trojan_function(). Assum e t hat it 's defined in drivers/ char/ m y_m odule.c. List in g 2 1 .1 . Bu ggy Fu n ct ion char buffer; int trojan_function() { int *my_variable = 0xAB, i; /* ... */ Point A: i = *my_variable; /* Kernel Panic: my_variable points to bad memory */ return(i); }

I nsert m y_m odule.ko on t he t arget and look inside / sys/ m odule/ m y_m odule/ sect ions/ t o decipher ELF (Execut able and Linking Form at) sect ion addresses. [ 2] The .text sect ion in ELF files cont ains code, .data cont ains init ialized variables, .rodata cont ains init ialized read- only variables such as st rings, and .bss cont ains variables t hat are not init ialized during st art up. The addresses of t hese sect ions are available in t he form of t he files .t ext , .dat a, .r odat a, and .bss in / sys/ m odule/ m y_m odule/ sect ions/ if you enable CONFIG_KALLSYMS during kernel configurat ion. To obt ain t he code sect ion address, for inst ance, do t his:

[ 2]

I f you are st ill using a 2.4 kernel, get t he sect ion addresses using t he –m opt ion t o insm od inst ead:

bash> insmod my_module.o –m Using /lib/modules/2.x.y/kernel/drivers/char/my_module.o Sections: Size Address Align .this 00000060 e091a000 2**2 .text 00001ec0 e091a060 2**4 ... .rodata 0000004c e091d1fc 2**2 .data 00000048 e091d260 2**5 .bss 000000e4 e091d2c0 2**5 ...

bash> cat /sys/module/my_module/sections/.text 0xe091a060

More m odule load inform at ion is available from / proc/ m odules and / proc/ kallsym s. Aft er you have t he sect ion addresses, invoke gdb on t he host - side m achine: bash> gdb vmlinux

vmlinux is the uncompressed kernel

(gdb) target remote /dev/ttySX

Connect to the target

Because you passed kgdbwait as a kernel com m and- line argum ent , gdb get s cont rol when t he kernel boot s on t he t arget . Now inform gdb about t he preceding sect ion addresses using t he add-symbol-file com m and: (gdb) add-symbol-file drivers/char/mymodule.ko 0xe091a060 -s .rodata 0xe091d1fc -s .data 0xe091d260 -s .bss 0xe091d2c0 add symbol table from file "drivers/char/my_module.ko" at .text_addr = 0xe091a060 .rodata_addr = 0xe091d1fc .data_addr = 0xe091d260 .bss_addr = 0xe091d2c0 (y or n) y Reading symbols from drivers/char/mymodule.ko ...done.

To debug t he kernel panic, let 's set a breakpoint at trojan_function(): (gdb) b trojan_function (gdb) c

Set breakpoint Continue execution

When kgdb hit s t he breakpoint , let 's look at t he st ack t race, single- st ep unt il Point A, and display t he value of my_variable: (gdb) bt Back (stack) trace #0 trojan_function () at my_module.c :124 #1 0xe091a108 in my_parent_function (my_var1=438, my_var2=0xe091d288) .. (gdb) step (gdb) step

Continue to single-step up to Point A

(gdb) p my_variable $0 = 0

There is an obvious bug here. my_variable point s t o NULL because trojan_function() forgot t o allocat e m em ory for it . Let 's j ust allocat e t he m em ory using kgdb, circum vent t he kernel crash, and cont inue t est ing: (gdb) p &buffer $1 = 0xe091a100 "" (gdb) set my_variable=0xe091a100

Print address of buffer my_variable = &buffer

(gdb) c

Continue your testing

Kgdb port s are available for several archit ect ures such as x86, ARM, and PowerPC. When you use kgdb t o debug a t arget em bedded device ( inst ead of t he PC shown in Figure 21.1) , t he gdb front - end t hat you run on your host syst em needs t o be com piled t o work wit h your t arget plat form . For exam ple, t o debug a device driver developed for an ARM- based em bedded device from your x86- based host developm ent syst em , you have t o use t he appropriat ely generat ed gdb, oft en nam ed arm - linux- gdb. The exact nam e depends on t he dist ribut ion you use.

GN U D e bu gge r ( gdb) As m ent ioned earlier, you can use plain gdb t o gat her som e kernel debug inform at ion. However, you can't st ep t hrough kernel code, set breakpoint s, or m odify kernel variables. Let 's use gdb t o debug t he kernel panic caused by t he buggy funct ion in List ing 21.1, but assum e t his t im e t hat trojan_function() is com piled as part of t he kernel and not as a m odule, because you can't easily peek inside m odules using gdb. This is part of t he " oops" m essage generat ed when trojan_function() is execut ed: Unable to handle kernel NULL pointer dereference at virtual address 000000ab ... eax: f7571de0 ebx: ffffe000 ecx: f6c78000 edx: f98df870 ... Stack: c019d731 00000000 ... bffffbe8 c0108fab Call Trace: [] [] [] ...

Copy t his crypt ic " oops" m essage t o oops.t xt and use t he ksym oops ut ilit y t o obt ain m ore verbose out put . You m ight need t o hand- copy t he m essage if t he syst em is hung: bash> ksymoops oops.txt Code; c019d710 00000000 : Code; c019d710 0: a1 ab 00 00 00 mov Code; c019d715 5: c3 ret

driver_data = info; /* Point A */ /* ... */ }

Press t he Pause key and ent er kdb. Let 's find out how t he disassem bled rs_open() looks. As usual, all debug sessions shown in t his chapt er at t ach explanat ions using t he sym bol. Entering kdb (current=0xc03f6000, pid 0) on processor 0 due to Keyboard Entry

kdb> id rs_open 0xc01cce00 rs_open: 0xc01cce03 rs_open+0x03: ... 0xc01cce4b rs_open+0x4b: ... 0xc01cce56 rs_open+0x56: 0xc01cce5a rs_open+0x5a: ...

Disassemble rs_open sub $0x1c, %esp mov $ffffffed, %ecx call 0xc01ccca0, get_async_struct mov 0xc(%esp,1), %eax mov %eax, 0x9a4(%ebx)

Point A in t he source code is a good place t o at t ach a breakpoint because you can peek at bot h t he tty st ruct ure and t he info st ruct ure t o see what 's going on. Looking side by side at t he source and t he disassem bly, rs_open+0x5a corresponds t o Point A. Not e t hat correlat ion is easier if t he kernel is com piled wit hout opt im izat ion flags. Set a breakpoint at rs_open+0x5a ( which is address 0xc01cce5a) and cont inue execut ion by exit ing t he debugger : kbd> bp rs_open+0x5a kbd> go

Set breakpoint Continue execution

Now you need t o get t he kernel t o call rs_open() t o hit t he breakpoint . To t rigger t his, execut e an appropriat e user- space program . I n t his case, echo som e charact ers t o t he corresponding serial port ( / dev/ t t ySX) : bash> echo "kerala monsoons" > /dev/ttySX

This result s in t he invocat ion of rs_open(). The breakpoint get s hit , and kdb assum es cont rol: Entering kdb on processor 0 due to Breakpoint @ 0xc01cce5a kdb>

Let 's now find out t he cont ent s of t he info st ruct ure. I f you look at t he disassem bly, one inst ruct ion before t he breakpoint ( rs_open+0x56) , you will see t hat t he EAX regist er cont ains t he address of t he info st ruct ure. Let 's look at t he regist er cont ent s: kbd> r Dump register contents eax = 0xcf1ae680 ebx = 0xce03b000 ecx = 0x00000000 ...

So, 0xcf1ae680 is t he address of t he info st ruct ure. Dum p it s cont ent s using t he md com m and: kbd> md 0xcf1ae680 Memory dump 0xcf1ae680 00005301 0000ABC 00000000 10000400 ...

To m ake sense of t his dum p, let 's look at t he corresponding st ruct ure definit ion. info is defined as struct async_struct in include/ linux/ serialP.h as follows:

struct async_struct { int magic; unsigned long port; int hub6; /* ... */ };

/* Magic Number */ /* I/O Port */

I f you m at ch t he dum p wit h t he definit ion, 0x5301 is t he m agic num ber and 0xABC is t he I / O port . Well, isn't t his int er est ing! 0xABC doesn't look like a valid port . I f you have done enough serial port debugging, you will know t hat t he I / O port base addresses and I RQs are configured in include/ asm - x86/ serial.h for x86- based hardware. Change t he port definit ion t o t he correct value, recom pile t he kernel, and cont inue your t est ing!

Ke r n e l GN U D e bu gge r ( k gdb) Kgdb is a source- level debugger. I t is easier t o use t han kdb because you don't have t o spend t im e correlat ing assem bly code wit h your sources. However it 's m ore difficult t o set up because an addit ional m achine is needed t o front - end t he debugging. You have t o use gdb in t andem wit h kgdb t o st ep t hrough kernel code. gdb runs on t he host m achine, while t he kgdb- pat ched kernel ( refer t o t he " Dow nloads" sect ion for inform at ion on downloading kgdb pat ches) runs on t he t arget hardware. The host and t he t arget are connect ed via a serial null- m odem cable, as shown in Figure 21.1. [ 1]

[ 1]

You can also launch kgdb debug sessions over Et hernet .

Figu r e 2 1 .1 . Kgdb se t u p.

You have t o inform t he kernel about t he ident it y and baud rat e of t he serial port via com m and- line argum ent s. Depending on t he boot loader used, add t he following kernel argum ent s t o eit her syslinux.cfg, lilo.conf, or grub.conf: kgdbwait kgdb8250=X,115200

kgdbwait asks t he kernel t o wait unt il a connect ion is est ablished wit h t he host - side gdb, X is t he serial port connect ed t o t he host , and 115200 is t he baud rat e used for com m unicat ion. Now configure t he sam e baud rat e on t he host side:

bash> stty speed 115200 < /dev/ttySX

I f your host com put er is a lapt op t hat does not have a serial port , you m ay use a USB- t o- serial convert er for t he debug session. I n t hat case, inst ead of / dev/ t t ySX, use t he / dev/ t t yUSBX node creat ed by t he usbserial driver. Figure 6.4 of Chapt er 6 , " Serial Drivers," illust rat es t his scenario. Let 's learn kgdb basics using t he exam ple of a buggy kernel m odule. Modules are easier t o debug because t he ent ire kernel need not be recom piled aft er m aking code changes, but rem em ber t o com pile your m odule wit h t he -g opt ion t o generat e sym bolic inform at ion. Because m odules are dynam ically loaded, t he debugger needs t o be inform ed about t he sym bolic inform at ion t hat t he m odule cont ains. List ing 21.1 cont ains a buggy trojan_function(). Assum e t hat it 's defined in drivers/ char/ m y_m odule.c. List in g 2 1 .1 . Bu ggy Fu n ct ion char buffer; int trojan_function() { int *my_variable = 0xAB, i; /* ... */ Point A: i = *my_variable; /* Kernel Panic: my_variable points to bad memory */ return(i); }

I nsert m y_m odule.ko on t he t arget and look inside / sys/ m odule/ m y_m odule/ sect ions/ t o decipher ELF (Execut able and Linking Form at) sect ion addresses. [ 2] The .text sect ion in ELF files cont ains code, .data cont ains init ialized variables, .rodata cont ains init ialized read- only variables such as st rings, and .bss cont ains variables t hat are not init ialized during st art up. The addresses of t hese sect ions are available in t he form of t he files .t ext , .dat a, .r odat a, and .bss in / sys/ m odule/ m y_m odule/ sect ions/ if you enable CONFIG_KALLSYMS during kernel configurat ion. To obt ain t he code sect ion address, for inst ance, do t his:

[ 2]

I f you are st ill using a 2.4 kernel, get t he sect ion addresses using t he –m opt ion t o insm od inst ead:

bash> insmod my_module.o –m Using /lib/modules/2.x.y/kernel/drivers/char/my_module.o Sections: Size Address Align .this 00000060 e091a000 2**2 .text 00001ec0 e091a060 2**4 ... .rodata 0000004c e091d1fc 2**2 .data 00000048 e091d260 2**5 .bss 000000e4 e091d2c0 2**5 ...

bash> cat /sys/module/my_module/sections/.text 0xe091a060

More m odule load inform at ion is available from / proc/ m odules and / proc/ kallsym s. Aft er you have t he sect ion addresses, invoke gdb on t he host - side m achine: bash> gdb vmlinux

vmlinux is the uncompressed kernel

(gdb) target remote /dev/ttySX

Connect to the target

Because you passed kgdbwait as a kernel com m and- line argum ent , gdb get s cont rol when t he kernel boot s on t he t arget . Now inform gdb about t he preceding sect ion addresses using t he add-symbol-file com m and: (gdb) add-symbol-file drivers/char/mymodule.ko 0xe091a060 -s .rodata 0xe091d1fc -s .data 0xe091d260 -s .bss 0xe091d2c0 add symbol table from file "drivers/char/my_module.ko" at .text_addr = 0xe091a060 .rodata_addr = 0xe091d1fc .data_addr = 0xe091d260 .bss_addr = 0xe091d2c0 (y or n) y Reading symbols from drivers/char/mymodule.ko ...done.

To debug t he kernel panic, let 's set a breakpoint at trojan_function(): (gdb) b trojan_function (gdb) c

Set breakpoint Continue execution

When kgdb hit s t he breakpoint , let 's look at t he st ack t race, single- st ep unt il Point A, and display t he value of my_variable: (gdb) bt Back (stack) trace #0 trojan_function () at my_module.c :124 #1 0xe091a108 in my_parent_function (my_var1=438, my_var2=0xe091d288) .. (gdb) step (gdb) step

Continue to single-step up to Point A

(gdb) p my_variable $0 = 0

There is an obvious bug here. my_variable point s t o NULL because trojan_function() forgot t o allocat e m em ory for it . Let 's j ust allocat e t he m em ory using kgdb, circum vent t he kernel crash, and cont inue t est ing: (gdb) p &buffer $1 = 0xe091a100 "" (gdb) set my_variable=0xe091a100

Print address of buffer my_variable = &buffer

(gdb) c

Continue your testing

Kgdb port s are available for several archit ect ures such as x86, ARM, and PowerPC. When you use kgdb t o debug a t arget em bedded device ( inst ead of t he PC shown in Figure 21.1) , t he gdb front - end t hat you run on your host syst em needs t o be com piled t o work wit h your t arget plat form . For exam ple, t o debug a device driver developed for an ARM- based em bedded device from your x86- based host developm ent syst em , you have t o use t he appropriat ely generat ed gdb, oft en nam ed arm - linux- gdb. The exact nam e depends on t he dist ribut ion you use.

GN U D e bu gge r ( gdb) As m ent ioned earlier, you can use plain gdb t o gat her som e kernel debug inform at ion. However, you can't st ep t hrough kernel code, set breakpoint s, or m odify kernel variables. Let 's use gdb t o debug t he kernel panic caused by t he buggy funct ion in List ing 21.1, but assum e t his t im e t hat trojan_function() is com piled as part of t he kernel and not as a m odule, because you can't easily peek inside m odules using gdb. This is part of t he " oops" m essage generat ed when trojan_function() is execut ed: Unable to handle kernel NULL pointer dereference at virtual address 000000ab ... eax: f7571de0 ebx: ffffe000 ecx: f6c78000 edx: f98df870 ... Stack: c019d731 00000000 ... bffffbe8 c0108fab Call Trace: [] [] [] ...

Copy t his crypt ic " oops" m essage t o oops.t xt and use t he ksym oops ut ilit y t o obt ain m ore verbose out put . You m ight need t o hand- copy t he m essage if t he syst em is hung: bash> ksymoops oops.txt Code; c019d710 00000000 : Code; c019d710 0: a1 ab 00 00 00 mov Code; c019d715 5: c3 ret

blocked, &info); /* Point A */ /* Free npages pages when SIGUSR1 is received */ if (sig == SIGUSR1) { /* Point B */ /* Problem manifests when npages crosses 10 in the following loop. Let's apply run time medication here via Kprobes */ for (i=0; i < npages; i++, curr_pfn++) { /* ... */ } } /* ... */ } /* ... */ }

List in g 2 1 .3 . Re gist e r in g Kpr obe H a n dle r s Code View: #include #include #include #include #include extern int npages; /* Defined in Listing 21.2 */ /* Per-probe structure */ static struct kprobe bandaid; /* Pre Handler: Invoked before running probed instruction */ int bandaid_pre(struct kprobe *p, struct pt_regs *regs) { if (npages > 10) npages = 10; return 0; }

/* Post Handler: Invoked after running probed instruction */ void bandaid_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { /* Nothing to do */ } /* Fault Handler: Invoked if the pre/post-handlers encounter a fault */ int bandaid_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) { return 0; } int init_module(void) { int retval; /* Fill the kprobe structure */ bandaid.pre_handler = bandaid_pre; bandaid.post_handler = bandaid_post; bandaid.fault_handler = bandaid_fault; /* Arrive at the target address as explained */ bandaid.addr = (kprobe_opcode_t*) kallsyms_lookup_name("memwalkd") + 0xaa; if (!bandaid.addr) { printk("Bad Probe Point\n"); return -1; } /* Register the kprobe */ if ((retval = register_kprobe(&bandaid)) < 0) { printk("register_kprobe error, return value=%d\n", retval); return -1; } return 0; } void module_cleanup(void) { unregister_kprobe(&bandaid); } MODULE_LICENSE("GPL"); /* You can't link the Kprobes API unless your user module is GPL'ed */

List ing 21.3 uses Kprobes t o insert a pat ch at kallsyms_lookup_name("memwalkd") + 0xaa, which lim it s npages t o 10. To figure out how t o arrive at t his probe address, t ake anot her look at List ing 21.2. You want t he pat ch t o be insert ed at Point B. To calculat e t he kernel address at Point B, disassem ble t he cont ent s of m ydrv.ko using obj dum p:

Code View: bash> objdump -D mydrv.ko mydrv.ko:

file format elf32-i386

Disassembly of section .text: 00000000 0: 1: 6: 7: 8: 9: e: ... ... 7a: 7d: 7f: 82: ... a9: aa: af: b1: b7: ... fa:

: 55 bd 00 40 00 00 57 56 53 bb 00 f0 ff ff 81 ec 90 00 00 00

push mov push push push mov sub

%ebp $0x4000,%ebp %edi %esi %ebx $0xfffff000,%ebx $0x90,%esp

83 74 83 75

f8 0a 2b f8 09 cc

cmp je cmp jne

$0xa,%eax Point A aa $0x9,%eax 50

c3 a1 85 0f 81

ret 00 00 00 00 c0 8e b5 00 00 00 fd 7b f6 00 00

mov test jle cmp

0x0,%eax Point B %eax,%eax 16c $0xf67b,%ebp

mov

0x0,%eax

a1 00 00 00 00

You have t o use an archit ect ure- specific obj dum p if you're cross- com piling for a different processor plat form . You will need som et hing like arm - linux- obj dum p if you're disassem bling a binary crosscom piled for an ARM- based t arget device. Pass t he - S opt ion t o obj dum p t o m ix source code wit h t he disassem bled out put : bash> arm-linux-objdump –d –S mydrv.ko

I f you t ry and m at ch t he C code in List ing 21.2 wit h it s disassem bled dum p above, you can associat e Point A and Point B wit h t he shown kernel addresses. kallsyms_lookup_name()[ 6] locat es t he address of memwalkd(), and 0xaa is t he offset where Point B resides, so apply t he kprobe at kallsyms_lookup_name("memwalkd") + 0xaa.

[ 6]

You have t o enable CONFIG_KALLSYMS during kernel configurat ion t o obt ain t he services of t his funct ion.

Aft er you regist er t he kprobe, memwalkd() equivalent ly looks like t his:

static int memwalkd(void *unused) { /* ...*/ for (;;) { /* ... */ /* Point A */ /* Free npages pages when SIGUSR1 is received */ if (sig == SIGUSR1) { /* Point B */ if (npages > 10) npages = 10; /* The medicated patch! */ for (i=0; i < npages; i++, curr_pfn++) { /* ... */ } } /* ... */ } /* ... */ }

Whenever npages is assigned a value great er t han 10, t he kprobed pat ch pulls it back t o 10, t hus st epping around t he problem . I n t he next t wo sect ions, let 's look at a couple of helper facilit ies t hat m ake it easier t o use Kprobes during funct ion ent ry and exit .

Jpr obe s A j probe is a specialized kprobe. I t eases t he work of adding a probe when t he point of invest igat ion is at t he ent ry t o a kernel funct ion. The j probe handler has t he sam e prot ot ype as t he probed funct ion. I t 's invoked wit h t he sam e argum ent list as t he probed funct ion, so you can easily access t he funct ion argum ent s from t he j probe handler. I f you use Kprobes rat her t han Jprobes, im agine t he hassles your probe handler needs t o undergo, wading t hrough t he dark alleys of t he funct ion st ack t o ext ract funct ion argum ent s! And t his code t hat delves int o t he st ack t o elicit argum ent values has t o be heavily funct ion- specific, not t o m ent ion being archit ect uredependent and unport able. To learn how t o use Jprobes, let 's revert t o an exam ple. Assum e t hat you're debugging a net work device driver ( t hat is built as part of t he kernel) by looking at t he printk() m essages it 's generat ing. The driver is em it t ing crucial values in oct al ( base 8) , but t o your horror, t he driver writ er has int roduced a t ypo in t he print form at st ring by coding %O rat her t han %o. So, all you can see are m essages such as t his: Number of Free Receive buffers = %O.

Jprobes t o t he rescue. You can fix t his in a few seconds, wit hout recom piling or reboot ing t he kernel. First , t ake a look at printk() defined in kernel/ print k.c: asmlinkage int printk(const char *fmt, ...) { va_list args; int r; va_start(args, fmt); r = vprintk(fmt, args); va_end(args);

return r; }

Let 's add a sim ple j probe at t he ent ry t o printk() and t ransform every % O int o %o. List ing 21.4 does t his j ob. Not e t hat t he j probe handler needs t o have t he sam e prot ot ype as printk(). Bot h funct ions are m arked wit h t he asmlinkage t ag t hat asks t hem t o expect argum ent s from t he st ack, rat her t han from CPU regist ers. List in g 2 1 .4 . Re gist e r in g Jpr obe H a n dle r s Code View: #include #include #include #include /* Jprobe the entrance to printk */ asmlinkage int jprintk(const char *fmt, ...) { for (; *fmt; ++fmt) { if ((*fmt=='%')&&(*(fmt+1) == 'O')) *(char *)(fmt+1) = 'o'; } jprobe_return(); return 0; } /* Per-probe structure */ static struct jprobe jprobe_eg = { .entry = (kprobe_opcode_t *) jprintk }; int init_module(void) { int retval; jprobe_eg.kp.addr = (kprobe_opcode_t*) kallsyms_lookup_name("printk"); if (!jprobe_eg.kp.addr) { printk("Bad probe point\n"); return -1; } /* Register the Jprobe */ if ((retval = register_jprobe(&jprobe_eg) < 0)) { printk("register_jprobe error, return value=%d\n", retval); return -1; } printk("Jprobe registered.\n"); return 0; } void module_cleanup(void) {

unregister_jprobe(&jprobe_eg); } MODULE_LICENSE("GPL");

When List ing 21.4 invokes register_jprobes() t o regist er t he j probe, a kprobe is insert ed at t he beginning of printk(). When t his probe is hit , Kprobes replace t he saved ret urn address wit h t hat of t he regist ered j probe handler, jprintk(). I t t hen copies a port ion of t he st ack and ret urns, t hus passing cont rol t o jprintk() wit h printk()'s argum ent list . When jprintk() calls jprobe_return(), t he original call st at e is rest ored, and printk() cont inues t o execut e norm ally. When you insert t his j probe user m odule, t he net work driver no longer em it s useless m essages announcing %O buffers, rat her it print s saner inform at ion such as t his: Number of Free Receive buffers = 12.

Re t u r n Pr obe s A ret urn probe ( or a kret probe in Kprobes t erm inology) is anot her specialized Kprobes helper. I t eases t he work of insert ing a kprobe when you need t o probe a funct ion's ret urn point . I f you use vanilla Kprobes t o invest igat e ret urn point s, you m ight need t o regist er t hem at m ult iple places because a funct ion can ret urn via m ult iple code pat hs. However, if you use ret urn probes, you need t o insert only one kret probe, rat her t han regist er, say, 20 Kprobes t o cover a funct ion's 20 ret urn pat hs. The funct ion tty_open() defined in drivers/ char/ t t y_io.c has seven ret urn pat hs. The successful pat h ret urns 0, and ot hers ret urn error values such as –ENXIO and -ENODEV. A single kret probe is sufficient t o alert you about failures, irrespect ive of t he associat ed code pat h. List ing 21.5 im plem ent s t his kret probe. List in g 2 1 .5 . Re gist e r in g Re t u r n Pr obe H a n dle r s Code View: #include #include #include #include /* kretprobe at exit from tty_open() */ static int kret_tty_open(struct kretprobe_instance *kreti, struct pt_regs *regs) { /* The EAX register contains the function return value on x86 systems */ if ((int) regs->eax) { /* tty_open() failed. Announce the return code */ printk("tty_open returned %d\n", (int)regs->eax); } return 0; } /* Per-probe structure */ static struct kretprobe kretprobe_eg = { .handler = (kretprobe_handler_t)kret_tty_open };

int init_module(void) { int retval; kretprobe_eg.kp.addr = (kprobe_opcode_t*) kallsyms_lookup_name("tty_open"); if (!kretprobe_eg.kp.addr) { printk("Bad Probe Point\n"); return -1; } /* Register the kretprobe */ if ((retval = register_kretprobe(&kretprobe_eg) < 0)) { printk("register_kretprobe error, return value=%d\n", retval); return -1; } printk("kretprobe registered.\n"); return 0; } void module_cleanup(void) { unregister_kretprobe(&kretprobe_eg); } MODULE_LICENSE("GPL");

When List ing 21.5 invokes register_kretprobes(), a kprobe is int ernally insert ed at t he beginning of tty_open(). When t his probe get s hit , t his int ernal kprobe handler replaces t he funct ion ret urn address wit h t hat of a special rout ine ( called a t ram poline in Kprobes t erm inology) . Look at arch/ your- arch/ kernel/ kprobes.c for t he im plem ent at ion of t he t ram poline. When tty_open() ret urns via any of it s seven ret urn pat hs, cont rol ret urns t o t he t ram poline inst ead of t he caller funct ion. The t ram poline invokes t he kret probe handler, kret_tty_open() regist ered by List ing 21.5, which print s t he ret urn value if tty_open() has not ret urned successfully.

Lim it a t ion s Kprobes has it s lim it at ions. Som e of t hem are obvious. You won't , for exam ple, see desired result s if you insert a kprobe inside an inline funct ion. And, of course, you can't probe Kprobes code. Kprobes are m ore useful for applying probes inside t he base kernel. I f t he subj ect code is part of a dynam ically loadable m odule, you m ight as well rewrit e and recom pile your m odule rat her t han writ e and com pile a new m odule t o " kprobe" it . However, you m ight st ill want t o use Kprobes if bringing down t he m odule is not accept able. There are less- obvious lim it at ions, t oo. Opt im izat ions are done at com pile t im e, whereas Kprobes are insert ed during runt im e. So, t he effect of insert ing inst ruct ions via Kprobes is not equivalent t o adding code in t he original source files. For exam ple, t he buggy code snippet volatile int *integerp = 0xFF;

int integerd = *integerp;

is reduced by t he com piler t o mov 0xff, %eax

So, you can't easily use Kprobes if you want t o sneak in bet ween t hose t wo lines of C code, allocat e a word of m em ory, point integerp t o t he allocat ed word, and circum vent a kernel crash.

Syst em Tap ( ht t p: / / sourceware.org/ syst em t ap/ ) is a diagnost ic t ool t hat eases t he use of Kprobes.

Look in g a t t h e Sou r ce s The Kprobes im plem ent at ion consist s of a generic port ion defined in kernel/ kprobes.c ( and include/ linux/ kprobes.h) and an archit ect ure- dependent part t hat lives in arch/ your- arch/ kernel/ kprobes.c ( and include/ asm - your- arch/ kprobes.h ) . Peek inside Docum ent at ion/ kprobes.t xt for furt her inform at ion about Kprobes, Jprobes, and Kret probes.

Ke x e c a n d Kdu m p Now t hat you have learned how t o use Kprobes, let 's cont inue and look at m ore facet s of Linux RAS. Kexec and kdum p are serviceabilit y feat ures int roduced in t he 2.6 kernel. Kexec uses t he im age overlay philosophy of t he UNI X exec() syst em call t o spawn a new kernel over a running kernel wit hout t he overhead of boot firm ware. This can save several seconds of reboot t im e because boot firm ware spends cycles walking buses and recognizing devices. The less t he reboot lat ency, t he less t he syst em downt im e; so, t his was one of t he m ain m ot ivat ions for developing kexec. However, kexec's m ost popular user is kdum p. Capt uring a dum p aft er a kernel crash is inherent ly unreliable because kernel code t hat accesses t he dum p device m ight be in an unst able st at e. Kdum p circum vent s t his problem by collect ing t he dum p aft er boot ing int o a healt hy kernel via kexec.

Ke x e c Before you can kexec a kernel, you need t o do som e preparat ions: 1.

Com pile and boot int o a kernel t hat has kexec support . For t his, t urn on CONFIG_KEXEC ( Processor Type Kexec Syst em Call) in t he kernel configurat ion m enu. This kernel is called t he first kernel and Feat ures or t he running kernel.

2.

Prepare t he kernel t hat is t o be kexec- ed. This second kernel can be t he sam e as t he first kernel.

3.

Download t he kexec- t ools package source t ar ball from www.kernel.org/ pub/ linux/ kernel/ people/ horm s/ kexec- t ools/ kexec- t ools- t est ing.t ar.gz. Build and produce t he user- space t ool called kexec.

The kexec t ool built in St ep 3 is invoked in t wo st ages. The first st age loads t he second kernel im age int o t he buffers of t he running kernel, while t he second st age act ually overlays t he running kernel: 1.

Load t he second ( overlay) kernel using t he kexec com m and:

bash> kexec -l /path/to/kernelsources/arch/x86/boot/bzImage -append="root=/dev/hdaX" --initrd=/boot/myinitrd.img bzI m age is t he second kernel, hdaX is t he root device, and m yinit rd.im g is t he init ial root filesyst em . The kernel im plem ent at ion of t his st age is m ost ly archit ect ure- independent . At t he heart of t his st age is t he sys_kexec() syst em call. The kexec com m and loads t he new kernel im age int o t he running kernel's buffers using t he services of t his syst em call. 2.

Boot int o t he second kernel:

bash> kexec -e ...

Kernel boot up messages

Kexec abrupt ly st art s t he new kernel wit hout gracefully halt ing t he operat ing syst em . To shut down prior t o reboot , invoke kexec from t he bot t om of t he halt script ( usually / et c/ rc.d/ rc0.d/ S01halt ) and invoke halt inst ead. The im plem ent at ion of t he second st age is archit ect ure- dependent . The crux of t his st age is a reboot_code_buffer t hat cont ains assem bly code t o put t he new kernel in place t o boot .

Kexec bypasses t he init ial kernel code t hat invokes t he services of boot firm ware and direct ly j um ps t o t he prot ect ed m ode ent ry point ( for x86 processors) . An im port ant challenge t o im plem ent kexec is t he int eract ion t hat happens bet ween t he kernel and t he boot firm ware ( BI OS on x86- based syst em s, Openfirm ware on POWER- based m achines, and so on) . On x86 syst em s, inform at ion such as t he e820 m em ory m ap passed t o t he kernel by t he BI OS ( see Appendix B, " Linux and t he BI OS" ) needs t o be supplied t o t he kexec- ed kernel, t oo.

Ke x e c w it h Kdu m p The kexec invocat ion sem ant ics is som ewhat special when it 's used in t andem wit h kdum p. I n t his case, kexec is required t o aut om at ically boot a new kernel when it encount ers a kernel panic. I f t he running kernel crashes, t he new kernel ( called t he capt ure kernel) is boot ed t o reliably collect t he dum p. A t ypical call synt ax is t his: bash> kexec -p /path/to/capture-kernel-sources/vmlinux --args-linux --append="root=/dev/hdaX irqpoll" --initrd=/boot/myinitrd.img

The -p opt ion asks kexec t o t rigger a reboot when a kernel panic occurs. A vm linux ELF kernel im age is used as t he capt ure kernel. Because vm linux is a general ELF boot im age and because kexec is t heoret ically OS agnost ic, you need t o specify via t he --args-linux opt ion t hat t he following argum ent s have t o be int erpret ed in a Linux- specific m anner. The capt ure kernel boot s asynchronously during a kernel crash, so device drivers using shared int errupt s m ay fat ally express t heir unhappiness during boot . To be nice t o such drivers, specify irqpoll in t he com m and line passed t o t he capt ure kernel using --append. To use kexec wit h kdum p, you need som e addit ional kernel configurat ion set t ings. The capt ure kernel requires access t o kernel m em ory of t he crashed kernel t o generat e a full dum p, so t he lat t er cannot j ust overwrit e t he form er as was done by kexec in t he non- kdum p case. The running kernel needs t o reserve a m em ory region t o run t he capt ure kernel. To m ark t his region

Boot t he first kernel wit h t he com m and- line argum ent crashkernel=64M@16M ( or ot her suit able size@start values) . Also include debug sym bols in t he kernel im age by enabling CONFIG_DEBUG_INFO (Kernel Hacking Com pile t he Kernel wit h Debug I nfo) in t he configurat ion m enu.

While configuring t he capt ure kernel, set CONFIG_PHYSICAL_START t o t he sam e start value assigned above ( 16M in t his case) . I f you kexec int o t he capt ure kernel and peek inside / proc/ m em info, you will find t hat size ( 64M in t his case) is t he t ot al am ount of physical m em ory t hat t his kernel can see.

Now t hat you're com fort able wit h kexec and have m ast ered it from t he perspect ive of a kdum p user, let 's delve int o kdum p and use it t o analyze som e real- world kernel crashes.

Kdu m p An im age of syst em m em ory capt ured aft er a kernel crash or hang is called a crash dum p. Analyzing a crash dum p can give valuable clues for post m ort em analysis of kernel problem s. However, obt aining a dum p aft er a kernel crash is inherent ly unreliable because t he st orage driver responsible for logging dat a ont o t he dum p device m ight be in an undefined st at e. Unt il t he advent of kdum p, Linux Kernel Crash Dum p ( LKCD) was t he popular m echanism t o obt ain and analyze dum ps. LKCD uses a t em porary dum p device ( such as t he swap part it ion) t o capt ure t he dum p. I t t hen warm reboot s back t o a healt hy st at e and copies t he dum p from t he t em porary device t o a perm anent locat ion. A t ool called lcrash is used t o analyze t he dum p. The disadvant ages wit h LKCD include t he following:

Even copying t he dum p t o a t em porary device m ight be unreliable on a crashed kernel.

Dum p device configurat ion is nont rivial.

The reboot m ight be slow because swap space can be act ivat ed only aft er t he dum p has been safely saved away t o a perm anent locat ion.

LKCD is not part of t he m ainline kernel, so inst alling t he proper pat ches for your kernel version is a hurdle.

Kdum p is not burdened wit h t hese short falls. I t elim inat es indet erm inism by collect ing t he dum p aft er boot ing int o a healt hy kernel via kexec. Also, because m em ory st at e is preserved aft er a kexec reboot , t he m em ory im age can be accurat ely accessed from t he capt ure kernel. Let 's first get t he prelim inary kdum p set up out of t he way: 1.

Ask t he running kernel t o kexec int o a capt ure kernel if it encount ers a panic. The capt ure kernel should addit ionally have CONFIG_HIMEM and CONFIG_CRASH_DUMP t urned on. ( Bot h t hese opt ions sit inside Processor t ype and Feat ures in t he kernel configurat ion m enu.)

2.

Aft er t he capt ure kernel boot s, copy t he collect ed dum p inform at ion from /proc/vmcore ( obt ained by enabling CONFIG_PROC_VMCORE in t he kernel configurat ion m enu) t o a file on your hard disk:

bash> cp /proc/vmcore /dump/vmcore.dump You can also save ot her inform at ion such as t he raw m em ory snapshot of t he crashed kernel, via / dev/ oldm em . 3.

Boot back int o t he first kernel. You are now ready t o st art dum p analysis.

Let 's use t he collect ed dum p file and t he crash t ool t o analyze som e exam ple kernel crashes. I nt roduce t his bug inside t he int errupt handler of t he RTC driver (drivers/ char/ rt c.c) : irqreturn_t rtc_interrupt(int irq, void *dev_id) { + volatile int *integerp = 0xFF; + int integerd = *integerp; /* Bad memory reference! */ spin_lock(&rtc_lock); /* ... */

Trigger execut ion of t he handler by enabling int errupt s via t he hwclock com m and: bash> hwclock ... Kernel panic occurs when execution hits rtc_interrupt() causing kexec to boot into the capture kernel.

Save / proc/ vm core t o / dum p/ vm core.dum p, reboot back int o t he first ( crashed) kernel, and st art analysis using t he crash t ool. I n a real- world sit uat ion, of course, t he dum p m ight be capt ured at a cust om er sit e, whereas t he

analysis is done at a support cent er: bash> crash /usr/src/linux/vmlinux /dump/vmcore.dump crash 4.0-2.24 ... KERNEL: DUMPFILE: CPUS: DATE: UPTIME: LOAD AVERAGE: TASKS: ... PANIC: crash>

/usr/src/linux/vmlinux /root/vmcore.dumpfile 1 Mon Nov 26 04:15:49 2007 00:17:22 0.82, 1.02, 0.87 63 "Oops: 0000 [#1]" (check log for details)

Exam ine t he st ack t race t o underst and t he cause of t he crash: crash> bt PID: 0 TASK: c03a32e0 CPU: 0 COMMAND: "swapper" #0 [c0431eb8] crash_kexec at c013a8e7 #1 [c0431f04] die at c0103a73 #2 [c0431f44] do_page_fault at c0343381 #3 [c0431f84] error_code (via page_fault) at c010312d EAX: 00000008 EBX: c59a5360 ECX: c03fbf90 EDX: 00000000 EBP: 00000000 DS: 007b ESI: 00000000 ES: 007b EDI: c03fbf90 CS: 0060 EIP: f8a8c004 ERR: ffffffff EFLAGS: 00010092 #4 [c0431fb8] rtc_interrupt at f8a8c004 #5 [c0431fc4] handle_IRQ_event at c013de51 #6 [c0431fdc] __do_IRQ at c013df0f

The st ack t race point s t he needle of suspicion at rtc_interrupt(). Let 's disassem ble t he inst ruct ions near rtc_interrupt(): crash> dis 0xf8a8c000 0xf8a8c001 0xf8a8c004 0xf8a8c009 0xf8a8c00e

0xf8a8c000 5 : : : : :

push sub mov mov call

%ebx $0x4,%esp 0xff,%eax $0xc03a6640,%eax 0xc0342300

The inst ruct ion at address 0xf8a8c004 is at t em pt ing t o m ove t he cont ent s of t he EAX regist er t o address 0xff, which is clearly t he invalid deference t hat caused t he crash. Fix t his and build a new kernel. I f you use t he irq com m and, you can figure out t he ident it y of t he int errupt t hat was in progress during t he t im e of t he crash. I n t his case, t he out put confirm s t hat t he RTC int errupt was indeed act ive: crash> irq IRQ: 8 STATUS: 1 (IRQ_INPROGRESS) ...

... handler: f8a8c000 flags: mask: name:

20000000 (SA_INTERRUPT) 0 f8a8c29d "rtc"

crash> quit bash>

Let 's now shift gears and look at a case where t he kernel freezes, rat her t han generat e an " oops." Consider t he following buggy driver init() rout ine: static int __init mydrv_init(void) { spin_lock(&mydrv_wq.lock); /* Usage before initialization! */ spin_lock_init(&mydrv_wq.lock); /* ... */ }

The code is erroneously using a spinlock before init ializing it . Effect ively, t he CPU spins forever t rying t o acquire t he lock, and t he kernel appears t o hang. Let 's debug t his problem using kdum p. I n t his case, t here will be no aut o- t rigger because t here is no panic, so force a crash dum p by pressing t he m agic Sysrq key com binat ion, Alt Sysrq- c. You m ay need t o enable Sysrq by writ ing a 1 t o / proc/ sys/ kernel/ sysrq: bash> echo 1 > /proc/sys/kernel/sysrq bash> insmod mydrv.ko

This induces t he kernel t o hang inside mydrv_init(). Press t he Alt - Sysrq- c key com binat ion t o t rigger a crash dum p: Alt-Sysrq-c ...

Messages announcing that a crash dump has been triggered

Save t he dum p t o disk aft er kexec boot s t he capt ure kernel, boot back t o t he original kernel, and run crash on t he saved dum p: bash> crash vmlinux vmcore.dump ... PANIC: "SysRq : Trigger a crashdump" PID: 2115 COMMAND: "insmod" TASK: f7c57000 [THREAD_INFO: f6170000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash>

Test t he wat ers by checking t he ident it y of t he process t hat was running at t he t im e of t he crash. I n t his case, it

was apparent ly insm od ( of m ydrv.ko) : crash> ps ... 2171 2214 2230 > 2261

2137 1 2214 2230

0 0 0 0

f6bb7000 f6b5b000 f6bb0550 c596f550

IN IN IN RU

0.5 0.1 0.1 0.0

11728 2732 4580 1572

5352 1192 1528 376

emacs-x login bash insmod

The st ack t race doesn't yield m uch inform at ion ot her t han t elling you t hat a Sysrq key press was responsible for t he dum p: crash> bt PID: 2115 TASK: f7c57000 CPU: 0 COMMAND: "insmod" #0 [c0431e68] crash_kexec at c013a8e7 #1 [c0431eb4] __handle_sysrq at c0254664 #2 [c0431edc] handle_sysrq at c0254713

Let 's next t ry peeking at t he log m essages generat ed by t he crashed kernel. The log com m and reads t he m essages from t he kernel printk ring buffer present on t he dum p file: crash> log ... BUG: soft lockup detected on CPU#0! Pid: 2261, comm: insmod EIP: 0060:[] CPU: 0 EIP is at delay_pmtmr+0xb/0x20 EFLAGS: 00000246 Tainted: P (2.6.16.16 #11) EAX: 5caaa48c EBX: 00000001 ECX: 5caaa459 EDX: 00000012 ESI: 02e169c9 EDI: 00000000 EBP: 00000001 DS: 007b ES: 007b CR0: 8005003b CR2: 08062017 CR3: 35e89000 CR4: 000006d0 [] __delay+0x9/0x10 [] _raw_spin_lock+0xa9/0x150 [] mydrv_init+0xd/0xb2 [wqdrv] [] sys_init_module+0x175/0x17a2 [] do_sync_read+0xc4/0x100 [] audit_syscall_entry+0x13d/0x170 [] do_syscall_trace+0x208/0x21a [] syscall_call+0x7/0xb SysRq : Trigger a crashdump crash>

The log offers t wo useful pieces of debug inform at ion. First , it let s you know t hat a soft lockup was det ect ed on t he crashed kernel. As discussed in t he sect ion " Device Exam ple: Wat chdog Tim er" in Chapt er 5 , " Charact er Drivers," t he kernel det ect s soft lockups as follows: A kernel wat chdog t hread runs once a second and t ouches a per- CPU t im est am p variable. I f t he CPU sit s in a t ight loop, t he wat chdog t hread cannot updat e t his t im est am p. An updat e check is carried out during t im er int errupt s using softlockup_tick() ( defined in kernel/ soft lockup.c) . I f t he wat chdog t im est am p is m ore t han 10 seconds old, it concludes t hat a soft lockup has occurred and em it s a kernel m essage t o t hat effect . Second, t he log frowns upon mydrv_init()+0xd ( or 0xf893d00), so let 's look at t he disassem bly of t he surrounding code region:

crash> dis f893d000 5 dis: WARNING: f893d000: no associated kernel symbol found 0xf893d000: mov $0xf89f1208,%eax 0xf893d005: sub $0x8,%esp 0xf893d008: call 0xc0342300 0xf893d00d: movl $0xffffffff,0xf89f1214 0xf893d017: movl $0xffffffff,0xf89f1210

The ret urn address in t he st ack is 0xf893d00d, so t he kernel is hanging inside t he previous inst ruct ion, which is a call t o spin_lock(). I f you co- relat e t his wit h t he earlier source snippet and look at it in t he eye, you can see t he error sequence, spin_lock()/spin_lock_init(), st aring sorrowfully back at you. Fix t he problem by swapping t he sequence. You m ay also use crash t o peek at dat a st ruct ures of int erest , but be aware t hat m em ory regions t hat were swapped out during t he crash are not part of t he dum p. I n t he preceding exam ple, you can exam ine mydrv_wq as follows: crash> rd mydrv_wq 100 f892c200: 00000000 00000000 00000000 00000000 ... f892c230: 2e636373 00000068 00000000 00000011

................ scc.h...........

Gdb is int egrat ed wit h crash, so you can pass com m ands from crash t o gdb for evaluat ion. For exam ple, you can use gdb's p com m and t o print dat a st ruct ures.

Look in g a t t h e Sou r ce s Archit ect ure- dependent port ions of kexec reside in arch/ your- arch/ kernel/ m achine_kexec.c and arch/ yourarch/ kernel/ relocat e_kernel.S. The generic part s live in kernel/ kexec.c ( and include/ linux/ kexec.h) . Peek inside arch/ your- arch/ kernel/ crash.c and arch/ your- arch/ kernel/ crash_dum p.c for t he kdum p im plem ent at ion. Docum ent at ion/ kdum p/ kdum p.t xt cont ains inst allat ion inform at ion.

Pr ofilin g Profiling point s you t o t hose regions of code t hat burn m ore CPU cycles. Profilers help sense t he presence of code bot t lenecks and com e in different flavors. The OProfile kernel profiler, included wit h t he 2.6 kernel, uses hardware assist t o gat her profile dat a. The gprof applicat ion profiler, on t he ot her hand, relies on com piler assist t o collect profiling inform at ion.

Ke r n e l Pr ofilin g w it h OPr ofile OProfile sam ples dat a at regular int ervals using hardware perform ance count ers support ed by m any processors. The perform ance count ers can be program m ed t o count event s such as t he num ber of cache m isses. On syst em s where t he processor does not support perform ance count ers, OProfile obt ains lim it ed inform at ion by collect ing dat a during t im er event s. OProfile consist s of t he following:

A kernel layer t hat collect s profiling inform at ion. [ 7] To enable OProfile in your kernel, enable CONFIG_PROFILING, CONFIG_OPROFILE, and CONFIG_APIC and recom pile.

[ 7]

I f you are st ill using a 2.4 kernel, you have t o pat ch your kernel sources wit h OProfile support .

The oprofiled daem on.

A suit e of post - profiling t ools such as opcont r ol, opreport , and op_help t hat help in det ailed analysis of t he collect ed dat a. These t ools are included wit h several dist ribut ions; if your dist ribut ion doesn't have t hem , however, you can download precom piled binaries.

To illust rat e t he basics of kernel profiling, let 's sim ulat e a bot t leneck in t he filesyst em layer and use OProfile t o det ect it . Our code area of int erest is t he port ion of t he filesyst em layer t hat reads direct ories ( funct ion vfs_readdir() in fs/ readdir.c) First , use opcont rol t o configure OProfile: bash> opcontrol --setup --vmlinux=/path/to/kernelsources/vmlinux --event=GLOBAL_POWER_EVENTS:100000:1:1:1

The event specifier asks OProfile t o collect sam ples during GLOBAL_POWER_EVENTS ( t im e during which t he processor is not st opped) . The num erals adj acent t o t he event specifier denot e t he sam pling count in clock cycles, unit m ask filt er, kernel- space count ing, and user- space count ing, respect ively. I f you would like t o sam ple x t im es every second and your processor is running at a frequency of cpu_speed HZ, your sam ple count should approxim at ely be (cpu_speed/x). A larger count generat es a finer profile but also result s in m ore CPU overhead. The event s support ed by OProfile depend on your processor: bash> opcontrol -l List available events on a Pentium 4 CPU GLOBAL_POWER_EVENTS: (counter: 0, 4)

time during which processor is not stopped (min count: 3000) BRANCH_RETIRED: (counter: 3, 7) retired branches (min count: 3000) MISPRED_BRANCH_RETIRED: (counter: 3, 7) retired mispredicted branches (min count: 3000) BSQ_CACHE_REFERENCE: (counter: 0, 4) ...

Next , st art OProfile and run a benchm arking t ool t hat st resses t hose part s of t he kernel you would like t o profile. Look at ht t p: / / lbs.sourceforge.net / for a list of benchm arking proj ect s on Linux. For t his exam ple, let 's exercise t he Virt ual File Syst em ( VFS) layer by recursively list ing all files in t he syst em : bash> opcontrol --start bash> ls -lR / bash> opcontrol --dump

Start data collection Stress test Save profiled data

Use opreport t o look at t he profiling result s. The % colum n provides a m easure of t he funct ion's load on t he syst em : Code View: bash> opreport -l /path/to/kernelsources/vmlinux CPU: P4 / Xeon, speed 2992.9 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (count cycles when processor is active) count 100000 samples % symbol name 914506 24.2423 vgacon_scroll ls output printed to console 406619 10.7789 do_con_write 273023 7.2375 vgacon_cursor 206611 5.4770 __d_lookup ... 1380 0.0366 vfs_readdir Our routine of interest ... 1 2.7e-05 vma_prio_tree_next

Let 's now sim ulat e a bot t leneck in t he VFS code by int roducing a 1- m illisecond delay in vfs_readdir(). This is done in List ing 21.6. List in g 2 1 .6 . vfs_readdir() D e fin e d in fs/ r e a d_ dir .c

int vfs_readdir(struct file *file, filldir_t filler, void *buf) { struct inode *inode = file->f_ dentry->d_inode; int res = -ENOTDIR; + + + + +

/* Introduce a millisecond bottleneck (HZ is set to 1000 on this system) */ unsigned long timeout = jiffies+1; while (time_before(jiffies, timeout)); /* End of bottleneck */ /* ... */

}

Com pile t he kernel wit h t his change and recollect t he profile. The new dat a looks like t his: Code View: bash> opreport -l /path/to/kernelsources/vmlinux CPU: P4 / Xeon, speed 2993.08 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (count cycles when processor is active) count 100000 samples % symbol name 6178015 57.1640 vfs_readdir Our routine of interest 1065197 9.8561 vgacon_scroll ls output printed to console 479801 4.4395 do_con_write ...

As you can see, t he bot t leneck is clearly reflect ed in t he profiled dat a. vfs_readdir() has now j um ped t o t he t op of t he list ! You can use OProfile t o obt ain a lot m ore inform at ion. You can, for exam ple, gat her t he percent age of dat a cache line m isses. Caches are fast m em ory close t o t he processor. Fet ches t o cache are done in unit s of t he processor cache line ( 32 byt es for Pent ium 4) . I f t he dat a you need t o access is not already present in t he cache ( a cache m iss) , t he processor has t o fet ch it from m ain m em ory, and t his burns m ore CPU cycles. Subsequent accesses t o t hat m em ory ( and t he surrounding byt es t ouched int o t he cache) will be fast er unt il t he corresponding cache line get s invalidat ed. You can configure OProfile t o count t he num ber of cache m isses by profiling your kernel code for t he BSQ_CACHE_REFERENCE event ( for Pent ium 4) . You can t hen t une your code, possibly by realigning fields in dat a st ruct ures, t o achieve bet t er cache ut ilizat ion: Code View: bash> opcontrol --setup --event=BSQ_CACHE_REFERENCE:50000:0x100:1:1 --vmlinux=/path/to/kernelsources/vmlinux Unit mask 0x100 denotes an L2 cache miss bash> opcontrol --start Start data collection bash> ls -lR / Stress bash> opcontrol --dump Save profile bash> opreport -l /path/to/kernelsources/vmlinux CPU: P4 / Xeon, speed 2993.68 MHz (estimated)

Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x100 (read 2nd level cache miss) count 50000 samples % symbol name 73 29.6748 find_inode_fast 59 23.9837 __d_lookup 27 10.9756 inode_init_once ...

I f you run OProfile on different kernel versions and look at t he corresponding change logs, you m ight be able t o figure out reasons for code changes in different part s of t he kernel. You have only t ouched t he surface of what can be accom plished using OProfile. For m ore inform at ion, visit ht t p: / / oprofile.sourceforge.net / .

Applica t ion Pr ofilin g w it h Gpr of I f you need t o profile only an applicat ion process in isolat ion wit hout profiling t he kernel code t hat m ight get execut ed on it s behalf, use gprof rat her t han OProfile. Gprof relies on addit ional code generat ed by t he com piler t o profile C, Pascal, or Fort ran program s. Let 's use gprof t o profile t he following code snippet : main(int argc, char *argv[]) { int i; for (i=0; i gcc -pg -g -o myprog myprog.c bash> ./myprog

This produces gm on.out , which is a call graph of m y pr og. Run gprof t o view t he profile: bash> gprof -p -b myprog Flat profile: Each sample counts as 0.01 seconds. % cumulative self time seconds seconds calls 65.17 2.75 2.75 2 34.83 4.22 1.47 10

self s/call 1.38 0.15

total s/call 1.38 0.15

name do_error_handling do_task

This shows t hat t he error pat h was hit t wice during execut ion. You can t une t he code t o produce fewer

t raversals of t he error pat h and rerun gprof t o generat e an updat ed profile.

Tr a cin g Tracing provides insight int o behavioral problem s t hat m anifest during int eract ions bet ween different code m odules. A com m on way t o obt ain execut ion t races is by using printks. While printk is perhaps t he m ost heavily used m et hod for kernel debugging ( t here are m ore t han 62,000 printk() st at em ent s in t he 2.6.23 source t ree) , it is not sophist icat ed enough for high- volum e t racing. Linux Trace Toolkit ( LTT) is a powerful t ool t hat let s you obt ain com plex syst em level t races wit h m inim um overhead.

Lin u x Tr a ce Toolk it LTT ext ract s execut ion t races t hat are useful for post m ort em analysis and is valuable in sit uat ions where it m ay not be possible t o use a debugger. Unlike OProfile, which collect s dat a by sam pling event s at regular int ervals, LTT provides exact t races of event s as and when t hey occur. LTT consist s of four com ponent s:

A core m odule t hat provides t race services t o t he rest of t he kernel. The collect ed t races are copied t o a kernel buffer.

Code t hat m akes use of t he t race services. These are insert ed at point s where you want t o capt ure t race dum ps.

A t race daem on t hat pulls t race inform at ion from t he kernel buffer t o a perm anent locat ion in t he filesy st em .

Ut ilit ies such as t racereader and t racevisualizer t hat int erpret raw t race dat a and convert it int o hum anreadable form . I f you are developing code for an em bedded device having no GUI support , you can t ransparent ly run t hese t ools on anot her m achine.

LTT is not part of t he m ainline kernel. [ 8] You m ay download LTT kernel pat ches, t race daem on, and user- space t race ut ilit ies from www.opersys.com / LTT.

[ 8]

LTT was included as a release candidat e in t he 2.6.11- rc1- m m 1 pat ch, downloadable from www.kernel.org .

Let 's find out what LTT offers wit h t he help of a sim ple exam ple. Assum e t hat you are seeing dat a corrupt ion when your applicat ion is reading from a device. You first want t o figure out whet her t he device is sending bad dat a or whet her a kernel layer is int roducing t he corrupt ion. To do t hat , dum p dat a packet s and dat a st ruct ures at t he device driver level. List ing 21.7 init ializes t he LTT event s t hat you plan t o generat e. List in g 2 1 .7 . Cr e a t in g LTT Eve n t s

#include int data_packet, driver_data; /* Trace events */ /* Driver init */ static int __init mydriver_init(void) { /* ... */ /* Event to dump packets received from the device */ data_packet = trace_create_event("data_pkt", NULL, CUSTOM_EVENT_FORMAT_TYPE_HEX, NULL); /* Event to dump a driver structure */ driver_data = trace_create_event("dvr_data", NULL, CUSTOM_EVENT_FORMAT_TYPE_HEX, NULL); /* ... */ }

Next , let 's add t race hooks t o dum p received packet s and dat a st ruct ures when t he driver reads dat a from t he device. This is done in List ing 21.8 in t he driver read() m et hod. List in g 2 1 .8 . Obt a in in g Tr a ce D u m ps Code View: struct mydriver_data driver_data; /* Private device structure */ /* Driver read() method */ ssize_t mydriver_read(struct file *file, char *buf, size_t count, loff_t *ppos) { char *buffer; /* Read numbytes bytes of data from the device into buffer */ /* ... */ /* Dump data Packet. If you see the problem only under certain conditions, say, when the packet length is greater than a value, use that as a filter */ if (condition) { /* See Listing 21.7 for the definition of data_packet*/ trace_raw_event(data_packet, numbytes, buffer); } /* Dump driver data structures */ if (some_other_condition) { /* See Listing 21.7 for the definition of driver_data */ trace_raw_event(driver_data, sizeof(driver_data), &driver_data); }

/* ... */ }

Com pile and run t his code as part of t he kernel or as a m odule. Rem em ber t o t urn on t race support in t he kernel by set t ing CONFIG_TRACE while configuring t he kernel. The next st ep is t o st art t he t race daem on: bash> tracedaemon -ts60 /dev/tracer mylog.txt mylog.proc

/ dev/ t racer is t he int erface used by t he t race daem on t o access t he t race buffer, -ts60 asks t he daem on t o run for 60 seconds, m ylog.t xt is t he file where you want t o st ore t he generat ed raw t race, and m y log.pr oc is where you want t o save t he syst em st at e obt ained from procfs. Lat er versions of LTT use a m echanism called relayfs rat her t han t he / dev/ t racer device for efficient ly t ransferring dat a from t he kernel t race buffer t o user space. Run your applicat ion t hat reads dat a from t he device: bash> ./application

Trigger invocation of mydriver_read()

m ylog.t xt should now cont ain t he request ed t race dat a. The generat ed raw t race can be analyzed using t he t racevisualizer t ool. Choose t he Cust om Event s opt ion and search for data_pkt and dvr_data event s. The out put looks like t his: #################################################################### Event Time SECS MICROSEC PID Length Description #################################################################### data_pkt 1,110,563,008,742,457 0 27 12 43 AB AC 00 01 0D 56 data_pkt 1,110,563,008,743,151 0 27 01 D4 73 F1 0A CB DD 06 dvr_data 1,110,563,008,743,684 0 25 0D EF 97 1A 3D 4C ...

The last colum n holds t he t race dat a. The t im est am p shows t he inst ant when t he t race was collect ed. I f t he dat a looks corrupt , t he device could be sending bad dat a. Ot herwise, t he problem m ust be in a higher kernel layer and can be furt her isolat ed by obt aining t races from a different point in t he dat a- flow pat h. The next generat ion of LTT called LTTng is available for download from ht t p: / / lt t .polym t l.ca/ . This proj ect also includes a post - t race analyzer called Linux Trace Toolkit Viewer ( LTTV) . I f your need is only t o perform lim it ed t racing of a user applicat ion, you can use t he st race ut ilit y rat her t han LTT. St race uses t he pt race support in t he kernel t o int ercept syst em calls. I t print s out a list of syst em calls m ade by your applicat ion, along wit h t he corresponding argum ent s and ret urn values.

Lin u x Te st Pr oj e ct Linux Test Proj ect ( LTP) , host ed at ht t p: / / lt p.sourceforge.net / , is a suit e consist ing of around 3,000 t est s designed t o exercise different part s of t he kernel. Most t est s run wit hout user int ervent ion. Ot hers such as net working and st orage m edia t est s need som e m anual configurat ion. Download t he source t ar ball from t he LTP hom e page, run m ake, and invoke t he wrapper script runlt p from t he root of t he source t ree t o st art t he t est s. To capt ure t he result s in logfile in t he result s/ direct ory, do t his: bash> runltp –p –l logfile

Som e errors generat ed by LTP are " expect ed." The LTP websit e docum ent s t he list of expect ed errors for various kernel versions. Also in t he websit e is an int erest ing analysis of LTP's code coverage ( overall coverage, lines in pat h, and dist inct lines hit ) for a few kernel versions, split across direct ories in t he kernel t ree.

Use r M ode Lin u x User Mode Linux ( UML) , host ed at ht t p: / / user- m ode- linux.sourceforge.net / , let s you debug t he kernel wit hout " oops" ing t he m achine. To accom plish t his, an inst ance of t he kernel ( called t he guest kernel) runs as a user m ode process over t he running kernel ( called t he host kernel) . UML has diverse users. I t can funct ion as an environm ent for t est ing kernel and applicat ion code, a vehicle t o experim ent wit h unst able kernels, a secure pseudo com put er for host ing services such as web servers, or a t ool t o learn Linux int ernals. UML is m ore useful for debugging hardware- independent port ions of t he kernel t han for device driver debugging.

D ia gn ost ic Tools The sysfsut ils package helps you navigat e t he volum inous am ount of dat a present inside sysfs. This, and ot her Linux diagnost ic t ools such as sysdiag and lsvpd , can be downloaded from ht t p: / / linux- diag.sourceforge.net / .

Ke r n e l H a ck in g Con fig Opt ion s Several opt ions exist under Kernel hacking in t he kernel configurat ion m enu t hat can em it valuable debug inform at ion. I f you enable an opt ion, corresponding debug code get s com piled when you build t he kernel. [ 9] Here are a few exam ples:

[ 9]

Som e kernel hacking opt ions are archit ect ure- dependent .

1.

Show Tim ing inform at ion on print ks ( CONFIG_PRINTK_TIME) adds t im ing inst rum ent at ion t o printk() out put , so you can use printks as checkpoint s for m easuring execut ion t im es and ident ifying slow code regions.

2.

Using freed m em ory result s in m em ory poisoning. Debug slab m em ory allocat ions ( CONFIG_DEBUG_SLAB) helps you det ect such problem s.

3.

Spinlock and rw- lock debugging: basic checks ( CONFIG_DEBUG_SPINLOCK) finds lock- relat ed problem s such as uninit ialized spinlock usage and helps cat ch code t hat is not SMP- safe.

4.

You have already worked wit h Magic SysRq key(CONFIG_MAGIC_SYSRQ) when you learned t o use kdum p. I f you t urn t his on, you will have som e avenues left even if t he kernel crashes during debugging. For exam ple, pressing Alt - Sysrq- t produces a dum p of current t asks, whereas Alt - Sysrq- p print s t he cont ent s of processor regist ers.

5.

Det ect Soft Lockups ( CONFIG_DETECT_SOFTLOCKUP) ut ilizes t he services of a wat chdog t o det ect t ight loops in kernel code t hat last for m ore t han 10 seconds. We looked at t his when we analyzed a kernel hang using kdum p. Not e t hat hardware lockups cannot be found t his way. For t hat , use t he services of a Non- Maskable I nt errupt ( NMI ) - wat chdog if your plat form support s it .

6.

I f you enable CONFIG_DEBUG_SLAB, CONFIG_DEBUG_HIMEM, or CONFIG_DEBUG_PAGE_ALLOC while configuring your kernel, addit ional error- checking code get s com piled t hat help debug problem s relat ed t o m em ory m anagem ent .

7.

Check for st ack overflows ( CONFIG_DEBUG_STACKOVERFLOW) adds code t o em it warnings if t he available st ack space falls below a t hreshold. St ack ut ilizat ion inst rum ent at ion ( CONFIG_DEBUG_STACK_USAGE) adds st ack space inst rum ent at ion t o t he m agic Sysrq key out put . Anot her relat ed opt ion, CONFIG_4KSTACKS, let s you set t he kernel st ack size t o 4KB rat her t han 8KB.

8.

Verbose BUG( ) report ing (CONFIG_DEBUG_BUGVERBOSE) produces ext ra debug inform at ion when any kernel code invokes BUG(), assum ing t hat you have CONFIG_BUG t urned on during kernel configurat ion.

Som e debug opt ions live out side t he Kernel hacking subm enu, t oo. For exam ple, we enabled CONFIG_KALLSYMS in t his chapt er t o debug an " oops" m essage using gdb and t o kprobe a kernel m odule. This opt ion lives under Configure St andard Kernel Feat ures ( for sm all syst em s) in t he configurat ion m enu. General set up Kernel hacking opt ions result in overhead and increased foot print , so don't leave t hem on in product ion- level kernels.

Te st Equ ipm e n t I t goes wit hout saying t hat you need t he full com plem ent of relevant t est equipm ent for device driver debugging. I f you are t est ing a m odem int erface in a digit al- only laborat ory environm ent for exam ple, you will be well served by a phone sim ulat or. I f a high- speed serial driver is m anifest ing parit y errors, an oscilloscope will aid your problem analysis. I f you are writ ing an I / O device driver, it will help if you have t he associat ed bus analyzer. I f you are writ ing a net work driver, t he corresponding prot ocol line sniffer will ease your debugging effort .

Ch a pt e r 2 2 . M a in t e n a n ce a n d D e live r y I n Th is Ch a pt e r 642 Coding St yle 642 Change Markers 643 Version Cont rol 643 Consist ent Checksum s 645 Build Script s 647 Port able Code

You have reached t he end of t he device driver t our, but im plem ent ing a driver is only a part of t he soft ware developm ent life cycle. Before wrapping up, let 's discuss a few ideas t hat cont ribut e t o operat ional efficiency during soft ware m aint enance and delivery.

Codin g St yle The life span of m any Linux devices range from 5 t o 10 years, so adherence t o a st andard coding st yle helps support t he product long aft er you have m oved out of t he proj ect . A powerful edit or coupled wit h an organized writ ing st yle m akes it easier t o correlat e code wit h t hought . There can be no infallible guidelines for good st yle because it 's a m at t er of personal preference, but a uniform m anner of coding is invaluable if t here are m ult iple developers working on a proj ect . Agree on com m on coding st andards wit h t eam m em bers and t he cust om er before st art ing a proj ect . The coding st yle preferred by kernel developers is described in Docum ent at ion/ CodingSt yle in t he source t ree.

Ch a pt e r 2 2 . M a in t e n a n ce a n d D e live r y I n Th is Ch a pt e r 642 Coding St yle 642 Change Markers 643 Version Cont rol 643 Consist ent Checksum s 645 Build Script s 647 Port able Code

You have reached t he end of t he device driver t our, but im plem ent ing a driver is only a part of t he soft ware developm ent life cycle. Before wrapping up, let 's discuss a few ideas t hat cont ribut e t o operat ional efficiency during soft ware m aint enance and delivery.

Codin g St yle The life span of m any Linux devices range from 5 t o 10 years, so adherence t o a st andard coding st yle helps support t he product long aft er you have m oved out of t he proj ect . A powerful edit or coupled wit h an organized writ ing st yle m akes it easier t o correlat e code wit h t hought . There can be no infallible guidelines for good st yle because it 's a m at t er of personal preference, but a uniform m anner of coding is invaluable if t here are m ult iple developers working on a proj ect . Agree on com m on coding st andards wit h t eam m em bers and t he cust om er before st art ing a proj ect . The coding st yle preferred by kernel developers is described in Docum ent at ion/ CodingSt yle in t he source t ree.

Ch a n ge M a r k e r s Using a m arker such as CONFIG_MYPROJECT t o t ag addit ions and delet ions t o exist ing kernel source files helps highlight proj ect - specific changes t o t he source t ree. One can recursively grep for t he m arker st ring from t he root of t he code t ree t o learn t he locat ion of all kernel changes im plem ent ed for t he proj ect . The following exam ple m arks code changes t o drivers/ i2c/ busses/ i2c- i801.c. The m odificat ion int roduces a check for a new PCI device I D during set up and elim inat es a configurat ion byt e access: /* ... */ switch(dev->device) { case PCI_DEVICE_ID_INTEL_82801DB_3: #if defined (CONFIG_MYPROJECT) case PCI_DEVICE_ID_MYID : #endif /* ... */ } /* ... */ #if !defined (CONFIG_MYPROJECT) pci_write_config_byte(I801_dev, SMBHSTCFG, temp); #endif return 0; /* ... */

CONFIG_MYPROJECT also funct ions as a configurat ion- t im e swit ch t o enable or disable proj ect - specific changes. I t 's a good idea t o have subm arkers for various subt asks in your proj ect . So, if you are m odifying t he kernel for fast boot as part of your proj ect , wrap t hose changes wit hin a subm arker such as CONFIG_MYPROJECT_FASTBOOT.

Ve r sion Con t r ol You can't execut e a serious proj ect wit hout t he services of a robust version cont rol reposit ory. A version cont rol syst em helps m anage m ult iple versions of source code and regulat es file accesses by t eam m em bers. Concurrent Versions Syst em or CVS ( www.nongnu.org/ cvs) is an open source revision cont rol soft ware t hat has been around for a long t im e and com es bundled wit h m any Linux dist ribut ions. Anot her versioning syst em called subversion ( ht t p: / / subversion.t igris.org) was developed as an int ended replacem ent for CVS. Git (ht t p: / / git .or.cz) is t he version cont rol syst em of choice for kernel developers and is used t o m aint ain several open source proj ect s, including t he Linux kernel. Am ple docum ent at ion on t hese syst em s is available on t he I nt ernet .

Con sist e n t Ch e ck su m s Because of legal issues lat ent in dist ribut ing t he kernel, com panies oft en ship kernel m odificat ions t o cust om ers in t he form of a source pat ch generat ed against an agreed- upon base. Cust om ers, in t urn, int egrat e t he pat ch int o an in- house code reposit ory and build t he soft ware locally. Com paring t he MD5 checksum of your binary im ages wit h t hat of your cust om er's is a guard against pat ching errors, but t he values m ay not m at ch as- is because t he kernel and m odule im ages oft en cont ain inform at ion specific t o t he build environm ent . To force ident ical MD5 sum s, exclude such dat a while generat ing kernel and m odule im ages at eit her end. Here are som e t ypical scenarios t hat inj ect environm ent al dat a int o t he obj ect im age:

Som e driver m et hods t oss a call t o BUG() t o announce condit ions t hat are never supposed t o occur. BUG() spit s out , am ong ot her t hings, t he offending filenam e and line num ber. The pat hnam e of t he file depends on t he locat ion of your build sandbox. I t get s im print ed in t he produced im age and cont ribut es t o MD5 variance. For exam ple, look at nfs_unlock_request() in fs/ nfs/ pagelist .c: void nfs_unlock_request(struct nfs_page *req) { if (!NFS_WBACK_BUSY(req)) { printk(KERN_ERR "NFS: Invalid unlock attempted\n"); BUG(); } /* ... */ } BUG() is defined in include/ asm - your- arch/ bug.h: #define BUG() do {\ __asm__ __volatile__ ("ud2\n"\ ... : : "I" (__LINE__), "I"(__FILE__)) You can com pile BUG() away by disabling CONFIG_BUG during kernel configurat ion. Or you m ay get rid of t he line num ber and filenam e inform at ion em it t ed by BUG() by swit ching off CONFIG_DEBUG_BUGVERBOSE.

The wd33c93 driver ( drivers/ scsi/ wd33c93.c) announces t he t im e of com pilat ion during init ializat ion. You will find t his snippet if you go t o t he end of t he init ializat ion rout ine, wd33c93_init(): void wd33c93_init(struct Scsi_Host *instance, const wd33c93_regs regs, dma_setup_t setup, dma_stop_t stop, int clock_freq) { /* ... */ printk(" Version %s - %s, Compiled %s at %s\n", WD33C93_VERSION, WD33C93_DATE, __DATE__, __TIME__); } The build t im est am p t hus get s em bedded inside t he im age, causing t he MD5 checksum t o depend on it .

The CONFIG_IKCONFIG_PROC configurat ion opt ion, if enabled, int roduces t he configurat ion t im est am p in t he kernel im age. This inform at ion is available as part of / proc/ config.gz at runt im e.

Ut ilit ies living inside t he script s/ direct ory in t he kernel t ree also cont ribut e t o MD5 variance by inj ect ing t he out put of program s such as host nam e, dat e, whoam i and dom ainnam e, int o kernel header files such as include/ linux/ - com pile.h.

Hunt down and m ask out such direct and indirect references t o environm ent al inform at ion t o generat e ident ical checksum s at bot h ends. Of course, you need not bot her about kernel m odules t hat aren't relevant . Envelope your code m odificat ions wit hin a change m arker such as CONFIG_MYPROJECT_SAME_MD5 and creat e a kernel configurat ion swit ch t o t urn consist ent MD5 generat ion on or off. When you finish, run m d5sum on t he st ripped vm linux im age.

Bu ild Scr ipt s Cust om ers generally ask for periodic soft ware builds during t he developm ent cycle. Each build includes new feat ures or bug fixes. The deliverables for an em bedded PC derivat ive, for exam ple, m ay include firm ware com ponent s such as t he base kernel, loadable device driver m odules, filesyst em ut ilit ies, boot loader, BI OS, and on- card m icrocode. To aut om at e build generat ion, it 's a good idea t o im plem ent a set of versat ile build script s t hat obt ain a source code snapshot from t he version cont rol reposit ory and generat e a packaged deliverable. List ing 22.1 shows a skelet al build script t hat assum es use of CVS for version cont rol. This is a sim ple script t hat shows only t he kernel build. I n t he real world, you m ight need a sophist icat ed suit e of script s t hat package several soft ware com ponent s and m anage different inst allat ion scenarios.

List in g 2 2 .1 . A Sim ple Bu ild Scr ipt Code View: # Check that compilation tools are installed #... # Assume that $user contains the user name, $cvsserver contains # the CVS server name and /path/to/repository is the location # of your project's repository on the CVS server CVS="cvs –d :pserver:$user@$cvsserver:/path/to/repository" $CVS login # Check-out the kernel $CVS checkout kernel # Build the kernel cd kernel make mrproper #Get the .config file for your platform cp arch/your-arch/configs/your_platform_defconfig .config make oldconfig make –j5 bzImage # Accelerate by spawning 5 instances of 'make' if [ $? != 0 ] then echo "Error building Kernel. Bailing out.." exit 1 fi # Copy the kernel image to a target directory cp arch/x86/boot/bzImage /path/to/target_directory/productname.kernel # Build modules and install them in an appropriate directory make modules if [ $? != 0 ] then echo "Error building modules. Bailing.." exit 2 fi export INSTALL_MOD_PATH=»$TARGET_DIRECTORY/modules» make modules_install # Rebuild after forcing generation of identical MD5 sums and

# package the resulting checksum information. #... # Generate a source patch from the base starting point, assuming # that KERNELBASE is the CVS tag for the vanilla kernel cvs rdiff –u –r KERNELBASE kernel > patch.kernel # Generate a changelog using "cvs log" #... # Package everything nicely into a tar ball #...

Aft er you sat isfact orily com plet e build verificat ion t est s on t he generat ed deliverable, init iat e a post - build process t o t ag t he current st at e of t he version cont rol syst em wit h a build ident ifier. This essent ially at t aches a nam e t o t he source snapshot corresponding t o t he build and helps t race problem s t o code versions. You can check out source versions based on t he relevant build ident ifier when you lat er at t em pt t o re- creat e report ed field problem s in your lab.

Por t a ble Code Port abilit y direct ly t ranslat es t o code reusabilit y and easier m aint enance. This is significant in t oday's m arket place, where t here are an assort m ent of processors and innum erable peripheral chipset s. Things will fast spin out of cont rol if you have t o code separat e bus drivers for each processor and different client device drivers for each host cont roller. Here are som e hint s for writ ing port able drivers:

Make port abilit y a design goal while archit ect ing your driver.

Using appropriat e kernel API s aut om at ically inj ect s a degree of port abilit y. A USB driver using t he services of t he USB core is rendered independent of t he USB host cont roller. I t will work unchanged on different syst em s, irrespect ive on whet her t hey use UHCI , OHCI , or som et hing else.

Writ e SMP- safe code.

Writ e code t hat is 64- bit clean. Do not , for exam ple, assign a point er t o an int eger, even wit h valid t ypecast s.

Writ e drivers such t hat t hey can be easily adapt ed for ot her sim ilar devices.

Use archit ect ure- independent API s wherever available. For exam ple, calls t o outb() or inb() will work irrespect ive of whet her t he processor uses I / O- m apped or m em ory- m apped addressing. I f you do need t o use archit ect ure- specific code such as inline assem bly, st ow it away inside t he appropriat e arch/ your- arch/ direct ory.

Push policy t o header files and user space. Use m acros and definit ions wherever suit able.

Ch a pt e r 2 3 . Sh u t t in g D ow n I n Th is Ch a pt e r 650 Checklist 651 What Next ?

Before t ransit ioning t o init runlevel 0, let 's sum m arize how t o set fort h on your way t o Linuxenablem ent when you get hold of a new device. Here's a quick checklist .

Che ck list 1 . I dent ify t he device's funct ionalit y and int erface t echnology. Depending on what you find, review t he chapt er describing t he associat ed device driver subsyst em . As you learned, alm ost every driver subsyst em on Linux cont ains a core layer t hat offers driver services, and an abst ract ion layer t hat renders applicat ions independent of t he underlying hardware ( revisit Figure 18.3 in Chapt er 18, " Em bedding Linux" ) . Your driver needs t o fit int o t his fram ework and int eract wit h ot her com ponent s in t he subsyst em . I f your device is a m odem , learn how t he UART, t t y, and line discipline layers operat e. I f your chip is an RTC or a wat chdog, learn how t o conform t o t he respect ive kernel API s. I f what you have is a m ouse, find out how t o t ie it wit h t he input event layer. I f your hardware is a video cont roller, glean expert ise on t he fram e buffer subsyst em . Before em barking on driving an audio codec, invest igat e t he ALSA fram ework.

2 . Obt ain t he device's dat a sheet and underst and it s regist er program m ing m odel. For an I 2 C DVI t ransm it t er, for exam ple, get t he device's slave address and t he program m ing sequence for init ializat ion. For an SPI t ouch cont roller, underst and how t o im plem ent it s finit e st at e m achine. For a PCI Et hernet card, find out t he configurat ion space sem ant ics. For a USB device, figure out t he support ed endpoint s and learn how t o com m unicat e wit h t hem .

3 . Search for a st art ing point driver inside t he m ight y kernel source t ree. Research candidat e drivers and hone in on a suit able one. Cert ain subsyst em s offer skelet al drivers t hat you can m odel aft er, if you don't find a close m at ch. Exam ples are sound/ drivers/ dum m y.c, drivers/ usb/ usb- skelet on.c, drivers/ net / pciskelet on.c, and drivers/ video/ skelet onfb.c.

4 . I f you obt ain a st art ing point driver, invest igat e t he exact differences bet ween t he associat ed device and your hardware by com paring t he respect ive dat a sheet s and schem at ics. For illust rat ion, assum e t hat you are put t ing Linux on a cust om board t hat is based on a dist ribut ion- support ed reference hardware. Your dist ribut ion includes t he USB cont roller driver t hat is t est ed on t he reference hardware, but does your

cust om board use different USB t ransceivers? You have a fram e buffer driver for t he LCD cont roller, but does your board use a different display panel int erface such as LVDS? Perhaps an EEPROM t hat sat on t he I 2 C bus on t he reference board now sit s on a 1- wire bus. I s t he Et hernet cont roller now connect ed t o a different PHY chip or even t o a Layer 2 swit ch chip? Or perhaps t he RS- 232 int erface t o t he UART has given way t o RS- 485 for bet t er range and fidelit y.

5 . I f you don't have a close st art ing point or if you decide t o writ e your own driver from scrat ch, invest t im e in designing and archit ect ing t he driver and it s dat a st ruct ures.

6 . Now t hat you have all t he inform at ion you need, arm yourself wit h soft ware t ools ( such as ct ags, cscope, and debuggers) and lab equipm ent ( such as oscilloscopes, m ult im et ers, and analyzers) and st art writ ing code.

Ch a pt e r 2 3 . Sh u t t in g D ow n I n Th is Ch a pt e r 650 Checklist 651 What Next ?

Before t ransit ioning t o init runlevel 0, let 's sum m arize how t o set fort h on your way t o Linuxenablem ent when you get hold of a new device. Here's a quick checklist .

Che ck list 1 . I dent ify t he device's funct ionalit y and int erface t echnology. Depending on what you find, review t he chapt er describing t he associat ed device driver subsyst em . As you learned, alm ost every driver subsyst em on Linux cont ains a core layer t hat offers driver services, and an abst ract ion layer t hat renders applicat ions independent of t he underlying hardware ( revisit Figure 18.3 in Chapt er 18, " Em bedding Linux" ) . Your driver needs t o fit int o t his fram ework and int eract wit h ot her com ponent s in t he subsyst em . I f your device is a m odem , learn how t he UART, t t y, and line discipline layers operat e. I f your chip is an RTC or a wat chdog, learn how t o conform t o t he respect ive kernel API s. I f what you have is a m ouse, find out how t o t ie it wit h t he input event layer. I f your hardware is a video cont roller, glean expert ise on t he fram e buffer subsyst em . Before em barking on driving an audio codec, invest igat e t he ALSA fram ework.

2 . Obt ain t he device's dat a sheet and underst and it s regist er program m ing m odel. For an I 2 C DVI t ransm it t er, for exam ple, get t he device's slave address and t he program m ing sequence for init ializat ion. For an SPI t ouch cont roller, underst and how t o im plem ent it s finit e st at e m achine. For a PCI Et hernet card, find out t he configurat ion space sem ant ics. For a USB device, figure out t he support ed endpoint s and learn how t o com m unicat e wit h t hem .

3 . Search for a st art ing point driver inside t he m ight y kernel source t ree. Research candidat e drivers and hone in on a suit able one. Cert ain subsyst em s offer skelet al drivers t hat you can m odel aft er, if you don't find a close m at ch. Exam ples are sound/ drivers/ dum m y.c, drivers/ usb/ usb- skelet on.c, drivers/ net / pciskelet on.c, and drivers/ video/ skelet onfb.c.

4 . I f you obt ain a st art ing point driver, invest igat e t he exact differences bet ween t he associat ed device and your hardware by com paring t he respect ive dat a sheet s and schem at ics. For illust rat ion, assum e t hat you are put t ing Linux on a cust om board t hat is based on a dist ribut ion- support ed reference hardware. Your dist ribut ion includes t he USB cont roller driver t hat is t est ed on t he reference hardware, but does your

cust om board use different USB t ransceivers? You have a fram e buffer driver for t he LCD cont roller, but does your board use a different display panel int erface such as LVDS? Perhaps an EEPROM t hat sat on t he I 2 C bus on t he reference board now sit s on a 1- wire bus. I s t he Et hernet cont roller now connect ed t o a different PHY chip or even t o a Layer 2 swit ch chip? Or perhaps t he RS- 232 int erface t o t he UART has given way t o RS- 485 for bet t er range and fidelit y.

5 . I f you don't have a close st art ing point or if you decide t o writ e your own driver from scrat ch, invest t im e in designing and archit ect ing t he driver and it s dat a st ruct ures.

6 . Now t hat you have all t he inform at ion you need, arm yourself wit h soft ware t ools ( such as ct ags, cscope, and debuggers) and lab equipm ent ( such as oscilloscopes, m ult im et ers, and analyzers) and st art writ ing code.

W hat Next? Linux is here t o st ay, but int ernal kernel int erfaces t end t o get fossilized as soon as som eone figures out a cleverer way of doing t hings. No kernel code is et ched in st one. As you learned, even t he scheduler, considered sacred, has undergone t wo rewrit es since t he 2.4 days. The num ber of new lines of code appearing in t he kernel t ree runs int o t he m illions each year. As t he kernel evolves, new feat ures and abst ract ions keep get t ing added, program m ing int erfaces redesigned, subsyst em s rest ruct ured for ext ract ing bet t er perform ance, and reusable regions filt ered int o com m on cores. You now have a solid foundat ion, so you can adapt t o t hese changes. To m aint ain your cut t ing- edge, refresh your kernel t ree regularly, browse t he kernel m ailing list frequent ly, and writ e code whenever you can. Linux is t he fut ure, and being a kernel guru pays. St ay at t he front lines!

Appe n dix A. Lin u x Asse m bly

Device drivers som et im es need t o im plem ent som e code snippet s in assem bly, so let 's t ake a look at t he different facet s of assem bly program m ing on Linux.

Figure A.1 shows t he Linux boot sequence on a PC- com pat ible syst em and is a sim pler version of Figure 2.1 in Chapt er 2 , " A Peek I nside t he Kernel." The firm ware com ponent s in t he figure are im plem ent ed using different assem bly synt axes:

The BI OS is t ypically writ t en wholly in assem bly. Som e of t he popular PC BI OSes are coded using assem blers such as t he Microsoft Macro Assem bler ( MASM) .

Linux boot loaders such as LI LO and GRUB are im plem ent ed using a m ix of C and assem bly. The SYSLI NUX boot loader is ent irely writ t en in assem bly using t he Net wide Assem bler ( NASM) .

Real m ode Linux st art up code uses t he GNU Assem bler ( GAS) .

Prot ect ed m ode BI OS invocat ions are done in inline assem bly , which is a const ruct support ed by GCC t o insert assem bly in bet ween C st at em ent s.

Figu r e A.1 . Fir m w a r e com pon e n t s a n d a sse m bly syn t a x e s.

I n Figure A.1, t he t op t wo com ponent s generally follow I nt el- based assem bly synt ax, whereas t he bot t om t wo are coded in AT&T ( or GAS) synt ax. There are except ions; t he assem bly part s of GRUB use GAS. To illust rat e t he differences bet ween t hese t wo synt axes, consider code t hat out put s a byt e t o t he parallel port . I n I nt el form at used by t he BI OS or t he boot loader, you would writ e t he following: mov dx, 03BCh mov al, 0ABh out dx, al

;0x3BC is the I/O address of the parallel port ;0xAB is the data to be output ;Send data to the parallel port

However, if you want t o perform t he sam e t ask from Linux real m ode st art up code, you need t o do t his: movw $0x3BC, %dx movb $0xAB, %al outb %al, %dx

You can see t hat unlike in I nt el form at , in AT&T synt ax, t he source operand com es first , and t he dest inat ion operand com es second. Regist er nam es in AT&T form at are preceded by %, and im m ediat e operands are preceded by $. AT&T opcodes have suffixes such as b ( for byt e) and w ( for word) t o specify t he size of m em ory operands, whereas I nt el synt ax accom plishes t his by looking at t he operands rat her t han t he opcodes. To m ove point er references in I nt el synt ax, you have t o specify operand prefixes such as byte ptr.

The advant age of learning AT&T synt ax is t hat it 's underst ood by GAS and inline GCC, which work not only on I nt el- based syst em s, but also on a variet y of processor archit ect ures.

Next , let 's rewrit e t he preceding snippet using GCC inline assem bly, which is what you would use from t he prot ect ed m ode kernel: unsigned short port = 0x3BC; unsigned char data = 0xAB; asm("outb %%al, %%dx\n\t" : : "a" (data), "d" (port) );

The general form at of t he asm const ruct support ed by GCC is as follows: asm(assembly : output operand constraints : input operand constraints : clobbered operand specifier );

I n t he operand sect ions, a, b, c, d, S, and D st and for EAX, EBX, ECX, EDX, ESI, and EDI regist ers, respect ively. I nput operand const raint s copy dat a from t he supplied variables t o t he specified regist ers before execut ing t he assem bly inst ruct ions, whereas out put operand const raint s ( writ t en as =a, =b, and so on) copy dat a from t he specified regist ers t o t he supplied variables aft er execut ing t he assem bly inst ruct ions. The clobbered operand const raint s ask GCC t o assum e t hat t he list ed regist ers are not available for use. Look at t he GCC I nline Assem bly HOWTO (www.ibiblio.org/ gferg/ ldp/ GCC- I nline- Assem bly- HOWTO.ht m l) for m ore det ails on t he GCC inline assem bly synt ax. The only const raint used in our exam ple is specific t o input operands. This effect ively copies t he value of data t o t he AL regist er and t he value of port t o t he DX regist er. Regist er nam es are preceded by %% in inline assem bly, because % is used t o refer t o t he supplied operands. %i st ands for t he i t h operand; so, if you want t o refer t o data and port inside t he exam ple inline assem bly snippet , you m ay respect ively use %0 and %1. To obt ain a clearer pict ure of inline assem bly t ranslat ion, let 's look at t he assem bly code generat ed by t he com piler corresponding t o t he preceding inline assem bly snippet by supplying t he -s com m and- line argum ent t o GCC. Look at t he com m ent against each generat ed code line for explanat ions: movw movb movb movw

$956, -2(%ebp) $-85, -3(%ebp) -3(%ebp), %al -2(%ebp), %dx

# # # # # # #

#APP outb %al, %dx #NO_APP

Value of data in stack set to 0x3BC Value of port in stack set to 0xAB movb 0xAB, %al movw 0x3BC, %dx Marker to note start of inline assembly Write to parallel port Marker to note end of inline assembly

You m ay use inline assem bly from user m ode program s, t oo. Here is an applicat ion writ t en using inline assem bly t hat invokes t he syslog() syst em call t o read t he last 128 byt es from t he kernel printk() ring buffer: Code View: #define READ_COMMAND

3

#define MSG_LENGTH

128

/* First argument to syslog() system call */ /* Third argument to syslog() */

int main(int argc, char *argv[]) { int syslog_command = READ_COMMAND; int bytes_to_read = MSG_LENGTH; int retval; char buffer[MSG_LENGTH]; /* Second argument to syslog() */ asm volatile( "movl %1, %%ebx\n" /* READ_COMMAND */ "movl %2, %%ecx\n" /* buffer */ "movl %3, %%edx\n" /* bytes_to_read */ "movl $103, %%eax\n" /* __NR_syslog */ "int $128\n" /* Generate System Call */ "movl %%eax, %0" /* retval */ :"=r" (retval) :"m"(syslog_command),"r"(buffer),"m"(bytes_to_read) :"%eax","%ebx","%ecx","%edx"); if (retval > 0) printf("%s\n", buffer); }

As you learned in Chapt er 4 , " Laying t he Groundwork," t he int $128 ( or int 0x80) inst ruct ion generat es a soft ware int errupt t hat t raps int o syst em calls. Because syst em calls result in t ransit ion from user m ode t o kernel m ode, t he funct ion argum ent s are not passed in user or kernel st acks, but in CPU regist ers. The syst em call num ber ( include/ asm - your- arch/ unist d.h has t he full list ) is st ored in t he EAX regist er. For t he syslog() syst em call, t his num ber is 103. I f you look at t he m an page for syslog(), you will see t hat it t akes t hree argum ent s: a com m and, t he address of a buffer t o hold ret urned dat a, and t he lengt h of t he buffer. These are passed in regist ers EBX, ECX and EDX, respect ively. The ret urn value is t ransferred from EAX t o retval. The inline assem bly invocat ion effect ively t ranslat es t o t his: retval = syslog(syslog_command, buffer, bytes_to_read);

I f you com pile and run t he code, you will see out put like t his, fet ched from t he kernel ring buffer: 0:0:0:0: Attached scsi removable disk sda sd 0:0:0:0: Attached scsi generic sg0 type 0 usb-storage: device scan complete ...

The kernel syst em call t rap in arch/ x86/ kernel/ ent ry_32.S saves all regist er cont ent s t o st ack, so t he real syst em calls see t heir argum ent s on st ack, even t hough user- space code passes t hem in CPU regist ers. To ensure t hat syst em call rout ines expect argum ent s on st ack, t hey are all t agged wit h t he GCC at t ribut e, asmlinkage. Not e t hat asmlinkage has not hing t o do wit h t he asm ( or __asm__) t hat is used t o declare inline assem bly. Let 's end t his sect ion by illust rat ing an exam ple of inline assem bly m odificat ion t o a Linux boot loader for a PowerPC- based board. Assum e t hat t he flash m em ory on t he board does not support BackGround Operat ion ( BGO) . This m eans t hat t he boot loader code cannot writ e t o flash while execut ing from flash, which is needed, for exam ple, if t he boot loader needs t o updat e a kernel im age t hat is residing in anot her part of t he flash. One solut ion is t o m odify t he boot loader so t hat t he boot code used t o writ e and erase t he flash get s execut ed ent irely from I nst ruct ion Cache ( I - cache) wit h t he dat a segm ent residing in Dat a Cache ( D- cache) . The sam ple

m acro writ t en here in GCC inline assem bly does t he j ob of pret ouching t he necessary boot loader inst ruct ions ont o I - cache. You need a working knowledge of PowerPC assem bly t o underst and t his code snippet : Code View: /* instr_length is the number of instructions to touch into I-cache. _load_i$_copy and _end_i$_copy are program labels */ #define load_into_icache_copy(instr_length) \ asm volatile("lis %%r3, 0x1@h\n \ ori %%r3, %%r3, 0x1@l\n \ mticcr %%r3\n \ isync\n \ \n \ lis %%r6, _end_i$_copy@h\n \ ori %%r6, %%r6, _end_i$_copy@l\n \ icbt %%r0, %%r6\n \ lis %%r4, %0@h\n \ ori %%r4, %%r4, %0@l\n \ mtctr %%r4\n \ _load_i$_copy: \ addis %%r6, %%r6, 32@ha\n \ addi %%r6, %%r6, 32@l\n \ icbt %%r0, %%r6\n \ bdnz _load_i$_copy\n \ _end_i$_copy: \ nop\n" \ : \ : "i"(instr_length) \ :"%r6","%r4","%r0","r8","r9");

D e bu ggin g To debug t he real m ode kernel, you cannot use debuggers such as t he Kernel Debugger ( kdb) or t he Kernel GNU Debugger ( kgdb) , which we discussed in Chapt er 21, " Debugging Device Drivers." A quick way t o debug kernel assem bly snippet s is by using t he DOS debug t ool aft er convert ing your code t o I nt el- st yle synt ax. But debug was creat ed in t he 16- bit era, so you can't , for inst ance, st ep t hrough code t hat init ializes t he EAX regist er. You can find 32- bit debug- t ype freeware t ools available for download on t he I nt ernet . JTAG debuggers, also discussed in Chapt er 21, are a kind of panacea because a single t ool can be used t o debug t he BI OS, boot loader, Linux real m ode code, and kernel- BI OS int eract ions.

Appe n dix A. Lin u x Asse m bly

Device drivers som et im es need t o im plem ent som e code snippet s in assem bly, so let 's t ake a look at t he different facet s of assem bly program m ing on Linux.

Figure A.1 shows t he Linux boot sequence on a PC- com pat ible syst em and is a sim pler version of Figure 2.1 in Chapt er 2 , " A Peek I nside t he Kernel." The firm ware com ponent s in t he figure are im plem ent ed using different assem bly synt axes:

The BI OS is t ypically writ t en wholly in assem bly. Som e of t he popular PC BI OSes are coded using assem blers such as t he Microsoft Macro Assem bler ( MASM) .

Linux boot loaders such as LI LO and GRUB are im plem ent ed using a m ix of C and assem bly. The SYSLI NUX boot loader is ent irely writ t en in assem bly using t he Net wide Assem bler ( NASM) .

Real m ode Linux st art up code uses t he GNU Assem bler ( GAS) .

Prot ect ed m ode BI OS invocat ions are done in inline assem bly , which is a const ruct support ed by GCC t o insert assem bly in bet ween C st at em ent s.

Figu r e A.1 . Fir m w a r e com pon e n t s a n d a sse m bly syn t a x e s.

I n Figure A.1, t he t op t wo com ponent s generally follow I nt el- based assem bly synt ax, whereas t he bot t om t wo are coded in AT&T ( or GAS) synt ax. There are except ions; t he assem bly part s of GRUB use GAS. To illust rat e t he differences bet ween t hese t wo synt axes, consider code t hat out put s a byt e t o t he parallel port . I n I nt el form at used by t he BI OS or t he boot loader, you would writ e t he following: mov dx, 03BCh mov al, 0ABh out dx, al

;0x3BC is the I/O address of the parallel port ;0xAB is the data to be output ;Send data to the parallel port

However, if you want t o perform t he sam e t ask from Linux real m ode st art up code, you need t o do t his: movw $0x3BC, %dx movb $0xAB, %al outb %al, %dx

You can see t hat unlike in I nt el form at , in AT&T synt ax, t he source operand com es first , and t he dest inat ion operand com es second. Regist er nam es in AT&T form at are preceded by %, and im m ediat e operands are preceded by $. AT&T opcodes have suffixes such as b ( for byt e) and w ( for word) t o specify t he size of m em ory operands, whereas I nt el synt ax accom plishes t his by looking at t he operands rat her t han t he opcodes. To m ove point er references in I nt el synt ax, you have t o specify operand prefixes such as byte ptr.

The advant age of learning AT&T synt ax is t hat it 's underst ood by GAS and inline GCC, which work not only on I nt el- based syst em s, but also on a variet y of processor archit ect ures.

Next , let 's rewrit e t he preceding snippet using GCC inline assem bly, which is what you would use from t he prot ect ed m ode kernel: unsigned short port = 0x3BC; unsigned char data = 0xAB; asm("outb %%al, %%dx\n\t" : : "a" (data), "d" (port) );

The general form at of t he asm const ruct support ed by GCC is as follows: asm(assembly : output operand constraints : input operand constraints : clobbered operand specifier );

I n t he operand sect ions, a, b, c, d, S, and D st and for EAX, EBX, ECX, EDX, ESI, and EDI regist ers, respect ively. I nput operand const raint s copy dat a from t he supplied variables t o t he specified regist ers before execut ing t he assem bly inst ruct ions, whereas out put operand const raint s ( writ t en as =a, =b, and so on) copy dat a from t he specified regist ers t o t he supplied variables aft er execut ing t he assem bly inst ruct ions. The clobbered operand const raint s ask GCC t o assum e t hat t he list ed regist ers are not available for use. Look at t he GCC I nline Assem bly HOWTO (www.ibiblio.org/ gferg/ ldp/ GCC- I nline- Assem bly- HOWTO.ht m l) for m ore det ails on t he GCC inline assem bly synt ax. The only const raint used in our exam ple is specific t o input operands. This effect ively copies t he value of data t o t he AL regist er and t he value of port t o t he DX regist er. Regist er nam es are preceded by %% in inline assem bly, because % is used t o refer t o t he supplied operands. %i st ands for t he i t h operand; so, if you want t o refer t o data and port inside t he exam ple inline assem bly snippet , you m ay respect ively use %0 and %1. To obt ain a clearer pict ure of inline assem bly t ranslat ion, let 's look at t he assem bly code generat ed by t he com piler corresponding t o t he preceding inline assem bly snippet by supplying t he -s com m and- line argum ent t o GCC. Look at t he com m ent against each generat ed code line for explanat ions: movw movb movb movw

$956, -2(%ebp) $-85, -3(%ebp) -3(%ebp), %al -2(%ebp), %dx

# # # # # # #

#APP outb %al, %dx #NO_APP

Value of data in stack set to 0x3BC Value of port in stack set to 0xAB movb 0xAB, %al movw 0x3BC, %dx Marker to note start of inline assembly Write to parallel port Marker to note end of inline assembly

You m ay use inline assem bly from user m ode program s, t oo. Here is an applicat ion writ t en using inline assem bly t hat invokes t he syslog() syst em call t o read t he last 128 byt es from t he kernel printk() ring buffer: Code View: #define READ_COMMAND

3

#define MSG_LENGTH

128

/* First argument to syslog() system call */ /* Third argument to syslog() */

int main(int argc, char *argv[]) { int syslog_command = READ_COMMAND; int bytes_to_read = MSG_LENGTH; int retval; char buffer[MSG_LENGTH]; /* Second argument to syslog() */ asm volatile( "movl %1, %%ebx\n" /* READ_COMMAND */ "movl %2, %%ecx\n" /* buffer */ "movl %3, %%edx\n" /* bytes_to_read */ "movl $103, %%eax\n" /* __NR_syslog */ "int $128\n" /* Generate System Call */ "movl %%eax, %0" /* retval */ :"=r" (retval) :"m"(syslog_command),"r"(buffer),"m"(bytes_to_read) :"%eax","%ebx","%ecx","%edx"); if (retval > 0) printf("%s\n", buffer); }

As you learned in Chapt er 4 , " Laying t he Groundwork," t he int $128 ( or int 0x80) inst ruct ion generat es a soft ware int errupt t hat t raps int o syst em calls. Because syst em calls result in t ransit ion from user m ode t o kernel m ode, t he funct ion argum ent s are not passed in user or kernel st acks, but in CPU regist ers. The syst em call num ber ( include/ asm - your- arch/ unist d.h has t he full list ) is st ored in t he EAX regist er. For t he syslog() syst em call, t his num ber is 103. I f you look at t he m an page for syslog(), you will see t hat it t akes t hree argum ent s: a com m and, t he address of a buffer t o hold ret urned dat a, and t he lengt h of t he buffer. These are passed in regist ers EBX, ECX and EDX, respect ively. The ret urn value is t ransferred from EAX t o retval. The inline assem bly invocat ion effect ively t ranslat es t o t his: retval = syslog(syslog_command, buffer, bytes_to_read);

I f you com pile and run t he code, you will see out put like t his, fet ched from t he kernel ring buffer: 0:0:0:0: Attached scsi removable disk sda sd 0:0:0:0: Attached scsi generic sg0 type 0 usb-storage: device scan complete ...

The kernel syst em call t rap in arch/ x86/ kernel/ ent ry_32.S saves all regist er cont ent s t o st ack, so t he real syst em calls see t heir argum ent s on st ack, even t hough user- space code passes t hem in CPU regist ers. To ensure t hat syst em call rout ines expect argum ent s on st ack, t hey are all t agged wit h t he GCC at t ribut e, asmlinkage. Not e t hat asmlinkage has not hing t o do wit h t he asm ( or __asm__) t hat is used t o declare inline assem bly. Let 's end t his sect ion by illust rat ing an exam ple of inline assem bly m odificat ion t o a Linux boot loader for a PowerPC- based board. Assum e t hat t he flash m em ory on t he board does not support BackGround Operat ion ( BGO) . This m eans t hat t he boot loader code cannot writ e t o flash while execut ing from flash, which is needed, for exam ple, if t he boot loader needs t o updat e a kernel im age t hat is residing in anot her part of t he flash. One solut ion is t o m odify t he boot loader so t hat t he boot code used t o writ e and erase t he flash get s execut ed ent irely from I nst ruct ion Cache ( I - cache) wit h t he dat a segm ent residing in Dat a Cache ( D- cache) . The sam ple

m acro writ t en here in GCC inline assem bly does t he j ob of pret ouching t he necessary boot loader inst ruct ions ont o I - cache. You need a working knowledge of PowerPC assem bly t o underst and t his code snippet : Code View: /* instr_length is the number of instructions to touch into I-cache. _load_i$_copy and _end_i$_copy are program labels */ #define load_into_icache_copy(instr_length) \ asm volatile("lis %%r3, 0x1@h\n \ ori %%r3, %%r3, 0x1@l\n \ mticcr %%r3\n \ isync\n \ \n \ lis %%r6, _end_i$_copy@h\n \ ori %%r6, %%r6, _end_i$_copy@l\n \ icbt %%r0, %%r6\n \ lis %%r4, %0@h\n \ ori %%r4, %%r4, %0@l\n \ mtctr %%r4\n \ _load_i$_copy: \ addis %%r6, %%r6, 32@ha\n \ addi %%r6, %%r6, 32@l\n \ icbt %%r0, %%r6\n \ bdnz _load_i$_copy\n \ _end_i$_copy: \ nop\n" \ : \ : "i"(instr_length) \ :"%r6","%r4","%r0","r8","r9");

D e bu ggin g To debug t he real m ode kernel, you cannot use debuggers such as t he Kernel Debugger ( kdb) or t he Kernel GNU Debugger ( kgdb) , which we discussed in Chapt er 21, " Debugging Device Drivers." A quick way t o debug kernel assem bly snippet s is by using t he DOS debug t ool aft er convert ing your code t o I nt el- st yle synt ax. But debug was creat ed in t he 16- bit era, so you can't , for inst ance, st ep t hrough code t hat init ializes t he EAX regist er. You can find 32- bit debug- t ype freeware t ools available for download on t he I nt ernet . JTAG debuggers, also discussed in Chapt er 21, are a kind of panacea because a single t ool can be used t o debug t he BI OS, boot loader, Linux real m ode code, and kernel- BI OS int eract ions.

Appe n dix B. Lin u x a n d t h e BI OS

Part s of t he x86 kernel, such as t he video fram e buffer driver ( vesafb) and Advanced Power Managem ent ( APM) , explicit ly use BI OS services t o accom plish cert ain funct ions. Ot her sect ions of t he kernel, such as t he serial driver, im plicit ly depend on t he BI OS t o init ialize I / O base addresses and int errupt levels. Real m ode kernel code m akes ext ensive use of BI OS calls during boot t o perform t asks such as assem bling t he syst em m em ory m ap. [ 1] Because som e device drivers depend direct ly or indirect ly on t he BI OS, let 's learn how t o int eract wit h it .

[ 1] On BI OS- less em bedded archit ect ures, sim ilar responsibilit ies ( for exam ple, waking t he kernel from suspend on ARM Linux) rest wit h t he boot loader.

Re a l M ode Ca lls Many part s of t he kernel glean inform at ion from t he BI OS in real m ode and use t he collect ed inform at ion during norm al operat ion in prot ect ed m ode. The st eps needed t o accom plish t his are as follows: 1.

Real m ode kernel code invokes BI OS services and populat es ret urned inform at ion in t he first physical m em ory page, called t he zero page. This is done by t he source files in t he arch/ x86/ boot / direct ory. The full layout of t he zero page can be found in Docum ent at ion/ i386/ zero- page.t xt.

2.

Aft er t he kernel swit ches t o prot ect ed m ode, but before it clears t he zero page, t he obt ained dat a is saved in kernel dat a st ruct ures. This is done in arch/ x86/ kernel/ set up_32.c.

3.

The prot ect ed m ode kernel m akes suit able use of t he saved inform at ion during norm al operat ion.

As an exam ple, let 's find out how t he kernel assem bles t he syst em m em ory m ap from t he BI OS. List ing B.1 is a snippet from arch/ x86/ boot / m em ory.c in t he 2.6.23.1 source t ree t hat invokes t he BI OS int 0x15 service t o obt ain t he syst em m em ory m ap.

List in g B.1 . Obt a in in g t h e Syst e m M e m or y M a p ( a r ch/ x 8 6 / boot / m e m or y.c)

static int detect_memory_e820(void) { int count = 0; u32 next = 0; u32 size, id; u8 err; /* The boot_params structure contains the zero page */ struct e820entry *desc = boot_params.e820_map; do { size = sizeof(struct e820entry); asm("int $0x15; setc %0" : "=d" (err), "+b" (next), "=a" (id), "+c" (size), "=m" (*desc) : "D" (desc), "d" (SMAP), "a" (0xe820)); /* ... */ count++; desc++; } while (next && count < E820MAX); return boot_params.e820_entries = count; }

I n t he list ing, 0xe820 is t he funct ion num ber specified in t he AX regist er before invoking int 0x15 t o procure t he m em ory m ap. I f you look at t he BI OS call definit ion for int 0x15, funct ion 0xe820 ( t he full list is available at ht t p: / / lrs.fm i.uni- passau.de/ support / doc/ int errupt - 57/ I NT.HTM) , you will see t hat t he BI OS writ es t he current elem ent of t he m em ory m ap in a buffer point ed t o by t he DI regist er. I n List ing B.1, DI point s t o t he offset in t he zero page where t he m em ory m ap is t o be st ored (boot_params.e820_map) . The code t hen loops unt il all elem ent s in t he m em ory m ap are collect ed. The num ber of elem ent s is com put ed and st ored at offset boot_params.e820_entries in t he zero page. When execut ion successfully exit s t he loop, t he m em ory m ap is available in t he zero page in t he form of struct e820map, defined in include/ asm - x86/ e820.h : struct e820entry { _u64 addr; /* start of memory segment */ _u64 size; /* size of memory segment */ _u32 type; /* type of memory segment */ } _attribute_((packed)); struct e820map { _u32 nr_map; struct e820entry map[E820MAX]; };

The kernel swit ches t o prot ect ed m ode lat er in arch/ x86/ boot / pm .c. When in prot ect ed m ode, t he kernel saves t he collect ed m em ory m ap via copy_e820_map(), defined in arch/ x86/ kernel/ e820_32.c. This is shown in List ing B.2. For sim plicit y, t he list ing scissors out error checks and folds t he add_memory_region() rout ine. List in g B.2 . Copyin g t h e M e m or y M a p ( a r ch/ x 8 6 / k e r ne l/ e 8 2 0 _ 3 2 .c)

Code View: struct e820map e820; static int __init copy_e820_map(struct e820entry *biosmap, int nr_map) { int x; /* ... */ do { /* Copy memory map information collected from the BIOS into local variables */ unsigned long long start = biosmap->addr; unsigned long long size = biosmap->size; unsigned long long end = start + size; unsigned long type = biosmap->type; /* Sanitize start and size */ /* ... */ /* Populate the kernel data structure, e820 */ x = e820.nr_map; e820.map[x].addr = start; e820.map[x].size = size; e820.map[x].type = type; e820.nr_map++; } while (biosmap++,--nr_map); /*Do for all elements in map*/ /* ... */ }

Look at arch/ x86/ m m / init _32.c t o see how t he e820 st ruct ure populat ed in List ing B.2 is used lat er on in t he boot process.

The Old i386 Boot Code St art ing wit h t he 2.6.23 kernel, t he i386 boot assem bly code has been largely rewrit t en in C. Prior t o 2.6.23, t he code in List ing B.1 lived in arch/ i386/ boot / set up.S rat her t han in arch/ x86/ boot / m em ory.c. Also, t he swit ch t o prot ect ed m ode now occurs in arch/ x86/ boot / pm .c rat her t han set up.S.

To t ake anot her exam ple, t he kernel m akes use of t he BI OS int 0x10 service t o obt ain video m ode param et ers while it 's in real m ode (arch/ x86/ boot / video* .c) . The VESA fram e buffer driver (drivers/ video/ vesafb.c) relies on t hese param et ers t o t urn on graphics m ode at boot t im e. As an exercise, use a sim ilar approach t o obt ain BI OS Power- On Self Test ( POST) error codes from t he real m ode kernel ( via int 0x15, funct ion 0x2100) and display t hem during norm al operat ion via t he / proc filesy st em . Boot loaders also m ake use of BI OS services in real m ode. I f you browse t hrough t he sources of LI LO, GRUB, or SYSLI NUX, you will see a liberal sprinkling of int 0x13 calls t o read t he kernel im age from t he boot device.

Appe n dix B. Lin u x a n d t h e BI OS

Part s of t he x86 kernel, such as t he video fram e buffer driver ( vesafb) and Advanced Power Managem ent ( APM) , explicit ly use BI OS services t o accom plish cert ain funct ions. Ot her sect ions of t he kernel, such as t he serial driver, im plicit ly depend on t he BI OS t o init ialize I / O base addresses and int errupt levels. Real m ode kernel code m akes ext ensive use of BI OS calls during boot t o perform t asks such as assem bling t he syst em m em ory m ap. [ 1] Because som e device drivers depend direct ly or indirect ly on t he BI OS, let 's learn how t o int eract wit h it .

[ 1] On BI OS- less em bedded archit ect ures, sim ilar responsibilit ies ( for exam ple, waking t he kernel from suspend on ARM Linux) rest wit h t he boot loader.

Re a l M ode Ca lls Many part s of t he kernel glean inform at ion from t he BI OS in real m ode and use t he collect ed inform at ion during norm al operat ion in prot ect ed m ode. The st eps needed t o accom plish t his are as follows: 1.

Real m ode kernel code invokes BI OS services and populat es ret urned inform at ion in t he first physical m em ory page, called t he zero page. This is done by t he source files in t he arch/ x86/ boot / direct ory. The full layout of t he zero page can be found in Docum ent at ion/ i386/ zero- page.t xt.

2.

Aft er t he kernel swit ches t o prot ect ed m ode, but before it clears t he zero page, t he obt ained dat a is saved in kernel dat a st ruct ures. This is done in arch/ x86/ kernel/ set up_32.c.

3.

The prot ect ed m ode kernel m akes suit able use of t he saved inform at ion during norm al operat ion.

As an exam ple, let 's find out how t he kernel assem bles t he syst em m em ory m ap from t he BI OS. List ing B.1 is a snippet from arch/ x86/ boot / m em ory.c in t he 2.6.23.1 source t ree t hat invokes t he BI OS int 0x15 service t o obt ain t he syst em m em ory m ap.

List in g B.1 . Obt a in in g t h e Syst e m M e m or y M a p ( a r ch/ x 8 6 / boot / m e m or y.c)

static int detect_memory_e820(void) { int count = 0; u32 next = 0; u32 size, id; u8 err; /* The boot_params structure contains the zero page */ struct e820entry *desc = boot_params.e820_map; do { size = sizeof(struct e820entry); asm("int $0x15; setc %0" : "=d" (err), "+b" (next), "=a" (id), "+c" (size), "=m" (*desc) : "D" (desc), "d" (SMAP), "a" (0xe820)); /* ... */ count++; desc++; } while (next && count < E820MAX); return boot_params.e820_entries = count; }

I n t he list ing, 0xe820 is t he funct ion num ber specified in t he AX regist er before invoking int 0x15 t o procure t he m em ory m ap. I f you look at t he BI OS call definit ion for int 0x15, funct ion 0xe820 ( t he full list is available at ht t p: / / lrs.fm i.uni- passau.de/ support / doc/ int errupt - 57/ I NT.HTM) , you will see t hat t he BI OS writ es t he current elem ent of t he m em ory m ap in a buffer point ed t o by t he DI regist er. I n List ing B.1, DI point s t o t he offset in t he zero page where t he m em ory m ap is t o be st ored (boot_params.e820_map) . The code t hen loops unt il all elem ent s in t he m em ory m ap are collect ed. The num ber of elem ent s is com put ed and st ored at offset boot_params.e820_entries in t he zero page. When execut ion successfully exit s t he loop, t he m em ory m ap is available in t he zero page in t he form of struct e820map, defined in include/ asm - x86/ e820.h : struct e820entry { _u64 addr; /* start of memory segment */ _u64 size; /* size of memory segment */ _u32 type; /* type of memory segment */ } _attribute_((packed)); struct e820map { _u32 nr_map; struct e820entry map[E820MAX]; };

The kernel swit ches t o prot ect ed m ode lat er in arch/ x86/ boot / pm .c. When in prot ect ed m ode, t he kernel saves t he collect ed m em ory m ap via copy_e820_map(), defined in arch/ x86/ kernel/ e820_32.c. This is shown in List ing B.2. For sim plicit y, t he list ing scissors out error checks and folds t he add_memory_region() rout ine. List in g B.2 . Copyin g t h e M e m or y M a p ( a r ch/ x 8 6 / k e r ne l/ e 8 2 0 _ 3 2 .c)

Code View: struct e820map e820; static int __init copy_e820_map(struct e820entry *biosmap, int nr_map) { int x; /* ... */ do { /* Copy memory map information collected from the BIOS into local variables */ unsigned long long start = biosmap->addr; unsigned long long size = biosmap->size; unsigned long long end = start + size; unsigned long type = biosmap->type; /* Sanitize start and size */ /* ... */ /* Populate the kernel data structure, e820 */ x = e820.nr_map; e820.map[x].addr = start; e820.map[x].size = size; e820.map[x].type = type; e820.nr_map++; } while (biosmap++,--nr_map); /*Do for all elements in map*/ /* ... */ }

Look at arch/ x86/ m m / init _32.c t o see how t he e820 st ruct ure populat ed in List ing B.2 is used lat er on in t he boot process.

The Old i386 Boot Code St art ing wit h t he 2.6.23 kernel, t he i386 boot assem bly code has been largely rewrit t en in C. Prior t o 2.6.23, t he code in List ing B.1 lived in arch/ i386/ boot / set up.S rat her t han in arch/ x86/ boot / m em ory.c. Also, t he swit ch t o prot ect ed m ode now occurs in arch/ x86/ boot / pm .c rat her t han set up.S.

To t ake anot her exam ple, t he kernel m akes use of t he BI OS int 0x10 service t o obt ain video m ode param et ers while it 's in real m ode (arch/ x86/ boot / video* .c) . The VESA fram e buffer driver (drivers/ video/ vesafb.c) relies on t hese param et ers t o t urn on graphics m ode at boot t im e. As an exercise, use a sim ilar approach t o obt ain BI OS Power- On Self Test ( POST) error codes from t he real m ode kernel ( via int 0x15, funct ion 0x2100) and display t hem during norm al operat ion via t he / proc filesy st em . Boot loaders also m ake use of BI OS services in real m ode. I f you browse t hrough t he sources of LI LO, GRUB, or SYSLI NUX, you will see a liberal sprinkling of int 0x13 calls t o read t he kernel im age from t he boot device.

Pr ot e ct e d M ode Ca lls To see how t he kernel m akes prot ect ed m ode BI OS calls, let 's look at t he APM im plem ent at ion. APM is a BI OS int erface specificat ion, which is now alm ost obsolet e ( see t he sect ion "Power Managem ent " in Chapt er 4 , " Laying t he Groundwork" ) . Power m anagem ent policies are defined in t he BI OS, and a kernel t hread called kapm d polls it every second t o figure out t he course of act ion. The polling is done using prot ect ed m ode BI OS calls. To do t his, kapm d needs t o know t he prot ect ed m ode ent ry segm ent address and offset . These are obt ained from t he real m ode kernel during boot using t he int 0x15, funct ion 0x5303 BI OS service. The act ual prot ect ed m ode BI OS call is invoked using inline assem bly from apm_bios_call_simple_asm(), defined in include/ asm - x86/ m ach- default / apm .h: __asm__ __volatile__(APM_DO_ZERO_SEGS "pushl %%edi\n\t" "pushl %%ebp\n\t" "lcall *%%cs:apm_bios_entry\n\t" "setc %%bl\n\t" "popl %%ebp\n\t" "popl %%edi\n\t" APM_DO_POP_SEGS : "=a" (*eax), "=b" (error), "=c" (cx), "=d" (dx), "=S" (si) : "a" (func), "b" (ebx_in), "c" (ecx_in) : "memory", "cc");

APM_DO_ZERO_SEGS zeros out segm ent regist ers. apm_bios_entry cont ains t he prot ect ed m ode ent ry address. The input const raint "a"(func) copies t he desired BI OS funct ion num ber t o t he EAX regist er before invocat ion. For exam ple, funct ion num ber APM_FUNC_GET_EVENT ( 0x530b) elicit s an APM event from t he BI OS, and funct ion num ber APM_FUNC_IDLE ( 0x5305) not ifies t he BI OS t hat t he processor is idle. Result s are ret urned by t he BI OS in regist ers EAX, EBX, ECX, and EDX. As per t he previous out put operand const raint s, t hese are propagat ed t o t he caller in variables *eax, error, cx, and dx, respect ively. I n t he assem bly body, regist ers are saved ont o t he kernel st ack before t he BI OS call and rest ored aft erward t o prevent t he BI OS from t ram pling on t hem .

BI OS a n d Le ga cy D r ive r s The BI OS provides a degree of hardware abst ract ion t o som e Linux drivers. Let 's t ake t he PC serial port driver ( discussed in Chapt er 6 , " Serial Drivers" ) as an exam ple. The BI OS probes t he Super I / O chipset and assigns I / O base addresses and I RQs t o t he respect ive serial ( and I nfrared) port s. The serial driver needs t o be t old about t he resources assigned by t he BI OS eit her via hard- coded values in a header file ( include/ asm x86/ serial.h) or via user- space com m ands. As an exercise, dig int o t he dat a sheet of your Super I / O chipset and add support in t he serial driver t o probe for t he resource values set by t he BI OS. To t ake anot her exam ple, even if you disable USB support in t he kernel, you can use USB keyboards and m ice on PC syst em s wit h help from t he BI OS. The BI OS t urns on an em ulat ion m ode in t he Sout h Bridge t hat rout es USB keyboard and m ouse input from t he USB cont roller t o t he keyboard cont roller. This t ricks t he operat ing syst em int o t hinking t hat you are using a legacy keyboard or m ouse. The kernel used t o rely on t he BI OS t o walk t he PCI bus and configure det ect ed devices. This is now obsolet e, but t ake a look at arch/ x86/ pci/ pcbios.c t o see how PCI BI OS can be accessed from t he kernel. Chapt er 10, " Peripheral Com ponent I nt erconnect ," discussed PCI drivers.

Appe n dix C. Se q File s

Monit oring and t rending dat a point s offered by procfs m ight help diagnose device driver problem s when t he cause of a sym pt om looks fuzzy. But som et im es, especially when t he am ount of dat a is large, t he corresponding procfs read() im plem ent at ions becom e com plex. The seq file int erface is a kernel helper m echanism designed t o sim plify such im plem ent at ions. Seq files render procfs operat ions cleaner and easier. Let 's gradually int roduce com plexit ies t o a procfs read() rout ine and see how t he seq file int erface t ransform s t he labored rout ine int o a graceful one. We'll also updat e one of t he few rem aining 2.6 drivers t hat does not yet leverage seq files.

Th e Se q File Adva n t a ge Let 's discover t he advant ages offered by seq files wit h t he help of an exam ple. As is com m on wit h m any device drivers, assum e t hat you have a linked list of dat a st ruct ures and t hat each node in t he list cont ains a st ring field ( called info) . The exam ple code in List ing C.1 uses a procfs file nam ed / proc/ readm e t o export t hese st rings t o user space. When a user reads t his file, t he procfs read() m et hod, readme_proc(), get s invoked. This rout ine t raverses t he linked list and appends t he info field of each node t o t he filesyst em buffer passed down t o it . List in g C.1 . Re a din g via Pr ocfs Code View: /* Private Data structure */ struct _mydrv_struct { /* ... */ struct list_head list; /* Link to the next node */ char info[10]; /* Info to pass via the procfs file */ /* ... */ }; static LIST_HEAD(mydrv_list);

/* List Head */

/* Initialization */ static int __init mydrv_init(void) { int i; static struct proc_dir_entry *entry = NULL ; struct _mydrv_struct *mydrv_new; /* ... */ /* Create /proc/readme */ entry = create_proc_entry("readme", S_IWUSR, NULL); /* Attach it to readme_proc() */ if (entry) { entry->read_proc = readme_proc; }

/* Handcraft mydrv_list for testing purpose. In the real world, device driver logic maintains the list and populates the 'info' field */ for (i=0;iinfo, "Node No: %d\n", i); list_add_tail(&mydrv_new->list, &mydrv_list); } return 0; } /* The procfs read entry point */ static int readme_proc(char *page, char **start, off_t offset, int count, int *eof, void *data) { int i = 0; off_t thischunk_len = 0; struct _mydrv_struct *p; /* Traverse the list and copy info into the supplied buffer */ list_for_each_entry(p, &mydrv_list, list) { thischunk_len += sprintf(page+thischunk_len, p->info); } *eof = 1; /* Indicate completion */ return thischunk_len; }

Boot t he kernel wit h t hese changes and peek inside / proc/ readm e: bash> cat /proc/readme Node No: 0 Node No: 1 ... Node No: 99

When procfs read() m et hods are invoked, t hey are supplied one page of m em ory t hat t hey can use t o pass inform at ion t o user space. As you can see in List ing C.1, t he first argum ent passed t o readme_proc() is a point er t o t his page- sized buffer. The second argum ent , start, is used t o aid t he im plem ent at ion of procfs files larger t han a page. The use of t his param et er will get clear when we look at t he exam ple in List ing C.2. The next t wo argum ent s respect ively specify t he offset from where t he read operat ion is request ed and t he num ber of byt es t o be read. The eof argum ent is used t o t ell t he caller whet her t here is m ore dat a t o be read. I f *eof is not set before ret urning, t he procfs read ent ry point is called again for m ore dat a. I n List ing C.1, if you com m ent out t he line t hat set s *eof, readme_proc() get s called again wit h t he offset argum ent set t o 1190 ( which is t he num ber of ASCI I byt es cont ained in t he st rings, Node No: 0 t o Node No: 99) . readme_proc() ret urns t he num ber of byt es copied t o t he supplied buffer. The size of dat a generat ed by t he procfs read rout ine in List ing C.1 falls wit hin t he one- page lim it . However, if you increase t he num ber of nodes in t he linked list from 100 t o 500 in mydrv_init(), t he am ount of dat a generat ed while reading / proc/ readm e crosses a page and t riggers t he following out put : bash> cat /proc/readme Node No: 0

Node No: 1 ... Node No: 322 proc_file_read: Apparent buffer overflow!

As you can see, an overflow occurs aft er one page ( 4,096 in t his case) wort h of ASCI I charact ers have been produced. To handle such large procfs files, you need t o refashion t he code in List ing C.1 using t he start param et er alluded t o earlier. This m akes t he funct ion som ewhat com plicat ed and is shown in List ing C.2. The sem ant ics of t his m odified im plem ent at ion is as follows:

readme_proc() is called m ult iple t im es, each invocat ion yielding a m axim um of count byt es st art ing at offset. The count request ed during each call is less t han t he size of a page.

During each invocat ion, t he kernel increm ent s offset by t he num ber of byt es ret urned by t he previous invocat ion.

readme_proc() signals eof only if t he am ount of dat a produced is less t han or equal t o t he request ed count plus t he current offset. I f eof is not set , t he funct ion is called again wit h offset advanced by t he num ber of byt es ret urned previously.

Aft er each invocat ion, only t hose byt es st art ing from *start are collect ed and ret urned t o t he caller.

Print t he values of *start, offset, count, and page, and look at t he out put generat ed during each invocat ion t o bet t er underst and t he operat ion sequence. Wit h t his hack, your procfs file can supply large am ount s of dat a t o user space wit hout size lim it at ions: bash> cat /proc/readme Node No: 0 Node No: 1 ... Node No: 499 List in g C.2 . La r ge Pr ocfs Re a ds

Code View: static int readme_proc(char *page, char **start, off_t offset, int count, int *eof, void *data) { int i = 0; off_t thischunk_start = 0; off_t thischunk_len = 0; struct _mydrv_struct *p; /* Loop thru the list collecting device info */ list_for_each_entry(p, &mydrv_list, list) { thischunk_len += sprintf(page+thischunk_len, p->info); /* Advance thischunk_start only to the extent that the next * read will not result in total bytes more than (offset+count) */ if (thischunk_start + thischunk_len < offset) { thischunk_start += thischunk_len; thischunk_len = 0; } else if (thischunk_start + thischunk_len > offset+count) { break; } else { continue; } } /* Actual start */ *start = page + (offset - thischunk_start); /* Calculate number of written bytes */ thischunk_len -= (offset - thischunk_start); if (thischunk_len > count) { thischunk_len = count; } else { *eof = 1; } return thischunk_len; }

The seq file int erface com es t o t he rescue when you are faced wit h t he prospect of awkwardly im plem ent ing large procfs files as in List ing C.2. As t he nam e im plies, t he seq file int erface views t he cont ent s of procfs files as a sequence of obj ect s. Program m ing int erfaces are provided t o it erat e t hrough t his obj ect sequence. Your code has t o supply t he following it erat or m et hods expect ed by t he seq int erface:

1 . start(), which is called first by t he seq int erface. This init ializes t he posit ion wit hin t he it erat or sequence and ret urns t he first it erat or obj ect of int erest .

2 . next(), which increm ent s t he it erat or posit ion and ret urns a point er t o t he next it erat or. This funct ion is agnost ic t o t he int ernal st ruct ure of t he it erat or and considers it an opaque obj ect .

3 . show(), which int erpret s t he it erat or passed t o it and generat es out put st rings t o be displayed when a user reads t he corresponding procfs file. This m et hod m akes use of helpers such as seq_printf(), seq_putc(), and seq_puts() t o form at t he out put .

4 . stop(), which is called at t he end for cleanup.

The seq file int erface aut om at ically invokes t hese it erat or m et hods t o produce out put in response t o user operat ions on relat ed procfs files. You no longer need t o worry about page- sized buffers and signaling t he end of dat a. Let 's rewrit e List ing C.2 m aking use of seq files. This is done in List ing C.3 by viewing t he linked list as a sequence of nodes. The basic it erat or obj ect is t he node, and each invocat ion of t he next() m et hod ret urns t he next node in t he list . List in g C.3 . Usin g Se q File s t o Sim plify List in g C.2 Code View: #include /* start() method */ static void * mydrv_seq_start(struct seq_file *seq, loff_t *pos) { struct _mydrv_struct *p; loff_t off = 0; /* The iterator at the requested offset */ list_for_each_entry(p, &mydrv_list, list) { if (*pos == off++) return p; } return NULL; } /* next() method */ static void * mydrv_seq_next(struct seq_file *seq, void *v, loff_t *pos) { /* 'v' is a pointer to the iterator returned by start() or by the previous invocation of next() */ struct list_head *n = ((struct _mydrv_struct *)v)->list.next; ++*pos; /* Advance position */ /* Return the next iterator, which is the next node in the list */ return(n != &mydrv_list) ? list_entry(n, struct _mydrv_struct, list) : NULL; } /* show() method */ static int mydrv_seq_show(struct seq_file *seq, void *v) { const struct _mydrv_struct *p = v; /* Interpret the iterator, 'v' */ seq_printf(seq, p->info);

return 0; } /* stop() method */ static void mydrv_seq_stop(struct seq_file *seq, void *v) { /* No cleanup needed in this example */ } /* Define iterator operations */ static struct seq_operations mydrv_seq_ops = { .start = mydrv_seq_start, .next = mydrv_seq_next, .stop = mydrv_seq_stop, .show = mydrv_seq_show, }; static int mydrv_seq_open(struct inode *inode, struct file *file) { /* Register the operators */ return seq_open(file, &mydrv_seq_ops); } static struct file_operations mydrv_proc_fops = { .owner = THIS_MODULE, .open = mydrv_seq_open, /* User supplied */ .read = seq_read, /* Built-in helper function */ .llseek = seq_lseek, /* Built-in helper function */ .release = seq_release, /* Built-in helper funciton */ }; static int __init mydrv_init(void) { /* ... */ /* Replace the assignment to entry->read_proc in Listing C.1, with a more fundamental assignment to entry->proc_fops. So instead of doing "entry->read_proc = readme_proc;", do the following: */ entry->proc_fops = &mydrv_proc_fops; /* ... */ }

Appe n dix C. Se q File s

Monit oring and t rending dat a point s offered by procfs m ight help diagnose device driver problem s when t he cause of a sym pt om looks fuzzy. But som et im es, especially when t he am ount of dat a is large, t he corresponding procfs read() im plem ent at ions becom e com plex. The seq file int erface is a kernel helper m echanism designed t o sim plify such im plem ent at ions. Seq files render procfs operat ions cleaner and easier. Let 's gradually int roduce com plexit ies t o a procfs read() rout ine and see how t he seq file int erface t ransform s t he labored rout ine int o a graceful one. We'll also updat e one of t he few rem aining 2.6 drivers t hat does not yet leverage seq files.

Th e Se q File Adva n t a ge Let 's discover t he advant ages offered by seq files wit h t he help of an exam ple. As is com m on wit h m any device drivers, assum e t hat you have a linked list of dat a st ruct ures and t hat each node in t he list cont ains a st ring field ( called info) . The exam ple code in List ing C.1 uses a procfs file nam ed / proc/ readm e t o export t hese st rings t o user space. When a user reads t his file, t he procfs read() m et hod, readme_proc(), get s invoked. This rout ine t raverses t he linked list and appends t he info field of each node t o t he filesyst em buffer passed down t o it . List in g C.1 . Re a din g via Pr ocfs Code View: /* Private Data structure */ struct _mydrv_struct { /* ... */ struct list_head list; /* Link to the next node */ char info[10]; /* Info to pass via the procfs file */ /* ... */ }; static LIST_HEAD(mydrv_list);

/* List Head */

/* Initialization */ static int __init mydrv_init(void) { int i; static struct proc_dir_entry *entry = NULL ; struct _mydrv_struct *mydrv_new; /* ... */ /* Create /proc/readme */ entry = create_proc_entry("readme", S_IWUSR, NULL); /* Attach it to readme_proc() */ if (entry) { entry->read_proc = readme_proc; }

/* Handcraft mydrv_list for testing purpose. In the real world, device driver logic maintains the list and populates the 'info' field */ for (i=0;iinfo, "Node No: %d\n", i); list_add_tail(&mydrv_new->list, &mydrv_list); } return 0; } /* The procfs read entry point */ static int readme_proc(char *page, char **start, off_t offset, int count, int *eof, void *data) { int i = 0; off_t thischunk_len = 0; struct _mydrv_struct *p; /* Traverse the list and copy info into the supplied buffer */ list_for_each_entry(p, &mydrv_list, list) { thischunk_len += sprintf(page+thischunk_len, p->info); } *eof = 1; /* Indicate completion */ return thischunk_len; }

Boot t he kernel wit h t hese changes and peek inside / proc/ readm e: bash> cat /proc/readme Node No: 0 Node No: 1 ... Node No: 99

When procfs read() m et hods are invoked, t hey are supplied one page of m em ory t hat t hey can use t o pass inform at ion t o user space. As you can see in List ing C.1, t he first argum ent passed t o readme_proc() is a point er t o t his page- sized buffer. The second argum ent , start, is used t o aid t he im plem ent at ion of procfs files larger t han a page. The use of t his param et er will get clear when we look at t he exam ple in List ing C.2. The next t wo argum ent s respect ively specify t he offset from where t he read operat ion is request ed and t he num ber of byt es t o be read. The eof argum ent is used t o t ell t he caller whet her t here is m ore dat a t o be read. I f *eof is not set before ret urning, t he procfs read ent ry point is called again for m ore dat a. I n List ing C.1, if you com m ent out t he line t hat set s *eof, readme_proc() get s called again wit h t he offset argum ent set t o 1190 ( which is t he num ber of ASCI I byt es cont ained in t he st rings, Node No: 0 t o Node No: 99) . readme_proc() ret urns t he num ber of byt es copied t o t he supplied buffer. The size of dat a generat ed by t he procfs read rout ine in List ing C.1 falls wit hin t he one- page lim it . However, if you increase t he num ber of nodes in t he linked list from 100 t o 500 in mydrv_init(), t he am ount of dat a generat ed while reading / proc/ readm e crosses a page and t riggers t he following out put : bash> cat /proc/readme Node No: 0

Node No: 1 ... Node No: 322 proc_file_read: Apparent buffer overflow!

As you can see, an overflow occurs aft er one page ( 4,096 in t his case) wort h of ASCI I charact ers have been produced. To handle such large procfs files, you need t o refashion t he code in List ing C.1 using t he start param et er alluded t o earlier. This m akes t he funct ion som ewhat com plicat ed and is shown in List ing C.2. The sem ant ics of t his m odified im plem ent at ion is as follows:

readme_proc() is called m ult iple t im es, each invocat ion yielding a m axim um of count byt es st art ing at offset. The count request ed during each call is less t han t he size of a page.

During each invocat ion, t he kernel increm ent s offset by t he num ber of byt es ret urned by t he previous invocat ion.

readme_proc() signals eof only if t he am ount of dat a produced is less t han or equal t o t he request ed count plus t he current offset. I f eof is not set , t he funct ion is called again wit h offset advanced by t he num ber of byt es ret urned previously.

Aft er each invocat ion, only t hose byt es st art ing from *start are collect ed and ret urned t o t he caller.

Print t he values of *start, offset, count, and page, and look at t he out put generat ed during each invocat ion t o bet t er underst and t he operat ion sequence. Wit h t his hack, your procfs file can supply large am ount s of dat a t o user space wit hout size lim it at ions: bash> cat /proc/readme Node No: 0 Node No: 1 ... Node No: 499 List in g C.2 . La r ge Pr ocfs Re a ds

Code View: static int readme_proc(char *page, char **start, off_t offset, int count, int *eof, void *data) { int i = 0; off_t thischunk_start = 0; off_t thischunk_len = 0; struct _mydrv_struct *p; /* Loop thru the list collecting device info */ list_for_each_entry(p, &mydrv_list, list) { thischunk_len += sprintf(page+thischunk_len, p->info); /* Advance thischunk_start only to the extent that the next * read will not result in total bytes more than (offset+count) */ if (thischunk_start + thischunk_len < offset) { thischunk_start += thischunk_len; thischunk_len = 0; } else if (thischunk_start + thischunk_len > offset+count) { break; } else { continue; } } /* Actual start */ *start = page + (offset - thischunk_start); /* Calculate number of written bytes */ thischunk_len -= (offset - thischunk_start); if (thischunk_len > count) { thischunk_len = count; } else { *eof = 1; } return thischunk_len; }

The seq file int erface com es t o t he rescue when you are faced wit h t he prospect of awkwardly im plem ent ing large procfs files as in List ing C.2. As t he nam e im plies, t he seq file int erface views t he cont ent s of procfs files as a sequence of obj ect s. Program m ing int erfaces are provided t o it erat e t hrough t his obj ect sequence. Your code has t o supply t he following it erat or m et hods expect ed by t he seq int erface:

1 . start(), which is called first by t he seq int erface. This init ializes t he posit ion wit hin t he it erat or sequence and ret urns t he first it erat or obj ect of int erest .

2 . next(), which increm ent s t he it erat or posit ion and ret urns a point er t o t he next it erat or. This funct ion is agnost ic t o t he int ernal st ruct ure of t he it erat or and considers it an opaque obj ect .

3 . show(), which int erpret s t he it erat or passed t o it and generat es out put st rings t o be displayed when a user reads t he corresponding procfs file. This m et hod m akes use of helpers such as seq_printf(), seq_putc(), and seq_puts() t o form at t he out put .

4 . stop(), which is called at t he end for cleanup.

The seq file int erface aut om at ically invokes t hese it erat or m et hods t o produce out put in response t o user operat ions on relat ed procfs files. You no longer need t o worry about page- sized buffers and signaling t he end of dat a. Let 's rewrit e List ing C.2 m aking use of seq files. This is done in List ing C.3 by viewing t he linked list as a sequence of nodes. The basic it erat or obj ect is t he node, and each invocat ion of t he next() m et hod ret urns t he next node in t he list . List in g C.3 . Usin g Se q File s t o Sim plify List in g C.2 Code View: #include /* start() method */ static void * mydrv_seq_start(struct seq_file *seq, loff_t *pos) { struct _mydrv_struct *p; loff_t off = 0; /* The iterator at the requested offset */ list_for_each_entry(p, &mydrv_list, list) { if (*pos == off++) return p; } return NULL; } /* next() method */ static void * mydrv_seq_next(struct seq_file *seq, void *v, loff_t *pos) { /* 'v' is a pointer to the iterator returned by start() or by the previous invocation of next() */ struct list_head *n = ((struct _mydrv_struct *)v)->list.next; ++*pos; /* Advance position */ /* Return the next iterator, which is the next node in the list */ return(n != &mydrv_list) ? list_entry(n, struct _mydrv_struct, list) : NULL; } /* show() method */ static int mydrv_seq_show(struct seq_file *seq, void *v) { const struct _mydrv_struct *p = v; /* Interpret the iterator, 'v' */ seq_printf(seq, p->info);

return 0; } /* stop() method */ static void mydrv_seq_stop(struct seq_file *seq, void *v) { /* No cleanup needed in this example */ } /* Define iterator operations */ static struct seq_operations mydrv_seq_ops = { .start = mydrv_seq_start, .next = mydrv_seq_next, .stop = mydrv_seq_stop, .show = mydrv_seq_show, }; static int mydrv_seq_open(struct inode *inode, struct file *file) { /* Register the operators */ return seq_open(file, &mydrv_seq_ops); } static struct file_operations mydrv_proc_fops = { .owner = THIS_MODULE, .open = mydrv_seq_open, /* User supplied */ .read = seq_read, /* Built-in helper function */ .llseek = seq_lseek, /* Built-in helper function */ .release = seq_release, /* Built-in helper funciton */ }; static int __init mydrv_init(void) { /* ... */ /* Replace the assignment to entry->read_proc in Listing C.1, with a more fundamental assignment to entry->proc_fops. So instead of doing "entry->read_proc = readme_proc;", do the following: */ entry->proc_fops = &mydrv_proc_fops; /* ... */ }

Upda t in g t h e N VRAM D r ive r The seq file int erface has been around since t he lat t er versions of t he 2.4 kernel, but it s use has becom e widespread only wit h 2.6. Let 's updat e t he NVRAM driver ( drivers/ char/ nvram .c) , one of t he few rem aining drivers t hat hasn't swit ched over t o use seq files. ( As usual, + and - show t he differences from t he original source file.) To do t his, you m ay use an ext ra- sim ple flavor of seq files t hat uses only t he show() it erat or m et hod. Use single_open() t o regist er t his m et hod. List ing C.4 cont ains t he updat ed NVRAM driver. Because t he seq int erface won't sleep bet ween calls t o it erat or m et hods, you m ay hold locks inside t he m et hods. List in g C.4 . Upda t e t h e N VRAM D r ive r Usin g Se q File s Code View: +static struct file_operations nvram_proc_fops = { + .owner = THIS_MODULE, + .open = nvram_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; -static struct file_operations nvram_fops = { - .owner = THIS_MODULE, - .llseek = nvram_llseek, - .read = nvram_read, - .write = nvram_write, - .ioctl = nvram_ioctl, - .open = nvram_open, - .release = nvram_release, -}; +static int nvram_seq_open(struct inode *inode, struct file *file) +{ + return single_open(file, nvram_show, NULL); +} +static int nvram_show(struct seq_file *seq, void *v) +{ + unsigned char contents[NVRAM_BYTES]; + int i; + + spin_lock_irq(&rtc_lock); + for (i = 0; i < NVRAM_BYTES; ++i) + contents[i] = __nvram_read_byte(i); + spin_unlock_irq(&rtc_lock); + + mach_proc_infos(seq, contents); + return 0; +} static int __init nvram_init(void) { +

ent = create_proc_entry("driver/nvram", 0, NULL);

+ + + + + + -

if (!ent) { printk(KERN_ERR "nvram: can't create /proc/driver/nvram\n"); ret = -ENOMEM; goto outmisc; } ent->proc_fops = &nvram_proc_fops; if (!create_proc_read_entry("driver/nvram", 0, NULL, nvram_read_proc, NULL)) { printk(KERN_ERR "nvram: can't create /proc/driver/nvram\n"); ret = -ENOMEM; goto outmisc; } /* ... */

} -#define PRINT_PROC(fmt,args...) -/* ... */

\

-static int -nvram_read_proc(char *buffer, char **start, off_t offset, int size, int *eof, void *data) -{ - /* ... */ -}

I n addit ion t o t he m odificat ions in List ing C.4, change all references t o PRINT_PROC() in t he original driver t o seq_printf(). The original driver and t he one in List ing C.4 produce t he sam e out put if you read from / proc/ driver/ nvram .

Look in g a t t h e Sou r ce s Look at Docum ent at ion/ filesyst em s/ proc.t xt for m ore inform at ion about procfs. The fs/ proc/ direct ory cont ains code t hat im plem ent s t he procfs core. The seq file int erface lives in fs/ seq_file.c. Users of procfs and seq files are sprinkled all over t he kernel sources.

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ]

I n de x [ SYM BOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] $ ( dolla r sign ) % ( pe r ce n t sign ) 1 - w ir e pr ot ocol 4 G n e t w or k in g 7 - bit a ddr e ssin g 8 0 2 .1 1 st a ck 8 5 5 GM E ED AC dr ive r 8 2 5 0 .c dr ive r 1 6 5 5 0 - t ype UART

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] AAL ( ATM Ada pt a t ion La ye r ) AC'9 7 a c9 7 _ bu s m odu le a cce le r a t e d m e t h ods a cce le r om e t e r s a cce ssin g char drivers EEPROM device I / O regions m em ory regions from user space PCI regions configurat ion space I / O and m em ory regions regist ers a cce ss poin t n a m e s ( APN s) Accle r a t e d Gr a ph ics Por t ( AGP) ACPI ( Adva n ce d Con figu r a t ion a n d Pow e r I n t e r fa ce ) 2nd acpid daem on AML ( ACPI Machine Language I nt erpret er) devices drivers kacpid spaces user- space t ools a cpid da e m on a cpit ool com m a n d a ct iva t ion net _device st ruct ure NI Cs ( net work int erface cards) a ct ive qu e u e s a d- h oc m ode ( W LAN ) AD C ( An a log- t o- D igit a l Con ve r t e r ) 2nd a dd_ disk ( ) fu n ct ion 2nd a dd_ m e m or y_ r e gion ( ) fu n ct ion a dd_ m t d_ pa r t it ion s( ) fu n ct ion a dd- sym bol- file com m a n d a dd_ t im e r ( ) fu n ct ion 2nd a dd_ w a it _ qu e u e ( ) fu n ct ion 2nd a ddr e sse s ARP ( Address Resolut ion Prot ocol) bus addresses endpoint addresses LBA ( logical block addressing) logical addresses MAC ( Media Access Cont rol) addresses PCI slave addresses USB ( universal serial bus) virt ual addresses Addr e ss Re solu t ion Pr ot ocol ( ARP) a dj u st ch e ck su m com m a n d ( ioct l) a dj u st _ cm os_ cr c( ) fu n ct ion Adva n ce d Con figu r a t ion a n d Pow e r I n t e r fa ce [ See ACPI ( Adva n ce d Con figu r a t ion a n d Pow e r I n t e r fa ce ) .] Adva n ce d H ost Con t r olle r I n t e r fa ce ( AH CI ) Adva n ce d Lin u x Sou n d Ar ch it e ct u r e [ See ALSA ( Adva n ce d Lin u x Sou n d Ar ch it e ct u r e ) .]

Adva n ce d Pow e r M a n a ge m e n t ( APM ) 2nd [ See also BI OS ( ba sic in pu t / ou t pu t syst e m ) .] Adva n ce d Te ch n ology At t a ch m e n t ( ATA) AF_ I N ET pr ot ocol fa m ily AF_ N ETLI N K pr ot ocol fa m ily AF_ UN I X pr ot ocol fa m ily Affix AGP ( Accle r a t e d Gr a ph ics Por t ) AH CI ( Adva n ce d H ost Con t r olle r I n t e r fa ce ) AI O ( Asyn ch r on ou s I / O) a io_ r e a d( ) fu n ct ion a io_ w r it e ( ) fu n ct ion a lloc_ ch r de v_ r e gion ( ) fu n ct ion 2nd 3rd a lloc_ disk ( ) fu n ct ion 2nd a lloc_ e t h e r de v( ) fu n ct ion 2nd a lloc_ ie e e 8 0 2 1 1 ( ) fu n ct ion 2nd a lloc_ ir da de v( ) fu n ct ion 2nd a lloc_ n e t de v( ) fu n ct ion 2nd a lloca t in g m e m or y a llow _ sign a l( ) fu n ct ion 2nd ALSA ( Adva n ce d Lin u x Sou n d Ar ch it e ct u r e ) ALSA driver for MP3 player ALSA program m ing a lsa - de ve l m a ilin g list a lsa - lib libr a r y a lsa - u t ils pa ck a ge a lsa ct l com m a n d a lsa m ix e r com m a n d a m a t e u r r a dio a m d_ fla sh _ in fo st r u ct u r e a m ix e r com m a n d AM L ( ACPI M a ch in e La n gu a ge I n t e r pr e t e r ) An a log- t o- D igit a l Con ve r t e r ( AD C) 2nd a n t icipa t or y I / O sch e du le r 2nd a pla y com m a n d APM ( Adva n ce d Pow e r M a n a ge m e n t ) 2nd [ See also BI OS ( ba sic in pu t / ou t pu t syst e m ) .] a pm _ bios_ ca ll_ sim ple _ a sm ( ) fu n ct ion APM _ D O_ ZERO_ SEGS APM _ FUN C_ GET_ EVEN T APM _ FUN C_ I D LE APN s ( a cce ss poin t n a m e s) a pplyin g pa t ch e s a r ch dir e ct or y arch/ x86/ boot / direct ory arch/ x86/ boot / m em ory.c file arch/ x86/ kernel/ e820_32.c file ARM boot loa de r s ARP ( Addr e ss Re solu t ion Pr ot ocol) a sk e d_ t o_ die ( ) fu n ct ion a sm con st r u ct a sm lin k a ge a t t r ibu t e a sse m bly boot sequence debugging GNU Assem bler ( GAS) i386 boot assem bly code inline assem bly Microsoft Macro Assem bler ( MASM) Net wide Assem bler ( NASM) a ssign in g I RQs ( in t e r r u pt r e qu e st s) a syn ch r on ou s D M A Asyn ch r on ou s I / O ( AI O) a syn ch r on ou s in t e r r u pt s a syn ch r on ou s t r a n sfe r m ode ( ATM ) ATA ( Adva n ce d Te ch n ology At t a ch m e n t )

ATAGs ATAPI ( ATA Pa ck e t I n t e r fa ce ) ATM ( a syn ch r on ou s t r a n sfe r m ode ) ATM Ada pt a t ion La ye r ( AAL) a t om ic_ de c( ) fu n ct ion a t om ic_ de c_ a n d_ t e st ( ) fu n ct ion a t om ic_ in c( ) fu n ct ion a t om ic_ in c_ a n d_ t e st ( ) fu n ct ion a t om ic_ n ot ifie r _ ch a in _ r e gist e r ( ) fu n ct ion 2nd ATOM I C_ N OTI FI ER_ H EAD ( ) m a cr o 2nd a t om ic ope r a t or s At t r ibu t e m e m or y ( PCM CI A) a u dio code cs a u dio dr ive r s ALSA ( Advanced Linux Sound Archit ect ure) ALSA driver for MP3 player ALSA program m ing audio archit ect ure audio codecs Bluet oot h dat a st ruct ures debugging em bedded drivers kernel program m ing int erfaces, t able of MP3 player exam ple ALSA driver code list ing ALSA program m ing codec_writ e_reg( ) funct ion MP3 decoding com plexit y m ycard_audio_probe( ) funct ion m ycard_audio_rem ove( ) funct ions m ycard_hw_param s( ) funct ion m ycard_pb_t rigger( ) funct ion m ycard_playback_open( ) funct ion ov er v iew regist er layout of audio hardware snd_card_free( ) funct ion snd_card_new( ) funct ion snd_card_proc_new( ) funct ion snd_card_regist er( ) funct ion snd_ct l_add( ) funct ion snd_ct l_new1( ) funct ion snd_device_new( ) funct ion snd_kcont rol st ruct ure snd_pcm _hardware st ruct ure snd_pcm _lib_m alloc_pages( ) funct ion snd_pcm _lib_preallocat e_pages_for_all( ) funct ion snd_pcm _new( ) funct ion snd_pcm _ops st ruct ure snd_pcm _set _ops( ) funct ion user program s OSS ( Open Sound Syst em ) overview sound direct ory sound m ixing ( fn) sources a u dio pla ye r s [ See M P3 pla ye r e x a m ple .] a u t oloa din g m odu le s AX.2 5 pr ot ocol

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] Ba ck Gr ou n d Ope r a t ion ( BGO) ba ck ligh t _ de vice _ r e gist e r ( ) ba r r ie r s ( m e m or y) BCD ( Bin a r y Code d D e cim a l) BCD 2 BI N ( ) m a cr o BCSP ( Blu e Cor e Se r ia l Pr ot ocol) bdflu sh k e r n e l t h r e a d be n ch m a r k in g BGO ( Ba ck Gr ou n d Ope r a t ion ) BH ( bot t om h a lf) fla vor s Bin a r y Code d D e cim a l ( BCD ) Bin u t ils bio_ for _ e a ch _ se gm e n t ( ) fu n ct ion 2nd bio st r u ct u r e 2nd bio_ ve c st r u ct u r e BI OS ( ba sic in pu t / ou t pu t syst e m ) BI OS- provided physical RAM m ap legacy drivers prot ect ed m ode calls real m ode calls updat ing bit - ba n gin g dr ive r s blk _ cle a n u p_ qu e u e ( ) fu n ct ion blk _ fs_ r e qu e st ( ) fu n ct ion blk _ in it _ qu e u e ( ) fu n ct ion 2nd blk _ qu e u e _ h a r dse ct _ size ( ) fu n ct ion 2nd blk _ qu e u e _ m a k e _ r e qu e st ( ) fu n ct ion 2nd blk _ qu e u e _ m a x _ se ct or s( ) fu n ct ion 2nd blk _ r q_ m a p_ sg( ) fu n ct ion 2nd BLOBs ( Boot Loa de r Obj e ct s) block de vice e m u la t ion block dir e ct or y block dr ive r s block_device_operat ions st ruct ure block I / O layer dat a st ruct ures 2nd debugging DMA dat a t ransfer ent ry point s int errupt handlers I / O schedulers kernel program m ing int erfaces, t able of m yblkdev st orage cont roller block device operat ions disk access init ializat ion ov er v iew regist er layout sources st orage t echnologies ATAPI ( ATA Packet I nt erface) I DE ( I nt egrat ed Drive Elect ronics) libATA MMC ( Mult iMediaCard) RAI D ( redundant array of inexpensive disks)

SATA ( Serial ATA) SCSI ( Sm all Com put er Syst em I nt erface) SD ( Secure Digit al) cards sum m ary of block I / O la ye r block in g_ n ot ifie r _ ca ll_ ch a in ( ) fu n ct ion 2nd block in g_ n ot ifie r _ ch a in _ r e gist e r ( ) fu n ct ion BLOCKI N G_ N OTI FI ER_ H EAD ( ) m a cr o 2nd b lock s Blu e Cor e Se r ia l Pr ot ocol ( BCSP) Blu e t oot h 2nd audio Bluet oot h Host Cont rol I nt erface Bluet oot h Net work Encapsulat ion Prot ocol ( BNEP) Bluet oot h Special I nt erest Group ( SI G) BlueZ CF cards RFCOMM USB adapt ers debugging keyboards m ice net working profiles USB blu e t oot h .k o Blu e t oot h H ost Con t r ol I n t e r fa ce Blu e t oot h N e t w or k En ca psu la t ion Pr ot ocol ( BN EP) Blu e t oot h Spe cia l I n t e r e st Gr ou p ( SI G) Blu e Z CF cards RFCOMM USB adapt ers blu e z- u t ils pa ck a ge BN EP ( Blu e t oot h N e t w or k En ca psu la t ion Pr ot ocol) bn e p.k o boa r d r e w or k Bog oM I PS Boot Loa de r Obj e ct s ( BLOBs) boot loa de r s definit ion em bedded boot loaders BLOB ( Boot Loader Obj ect ) boot st r apping GRUB LI LO ( Linux Loader) ov er v iew RedBoot SYSLI NUX t able of Redboot boot loader boot logo ( con sole dr ive r s) boot pr oce ss 2nd [ See also boot loa de r s.] BI OS- provided physical RAM m ap delay- loop calibrat ion EXT3 filesyst em HLT inst ruct ion I / O scheduler init process init rd m em ory kernel com m and line Linux boot sequence low m em ory/ high m em ory PCI resource configurat ion

regist ered prot ocol fam ilies st art _kernel( ) funct ion boot st r a ppin g bot t om h a lf ( BH ) fla vor s BREAKPOI N T m a cr o br e a k poin t s br ow n ou t s bu ffe r s DMA 2nd NI C buffer m anagem ent socket buffers BUG( ) fu n ct ion bu ild scr ipt s bu ildin g k e r n e ls bu ilt - in k e r n e l t h r e a ds bu lk e n dpoin t s bu lk URBs bu s a ddr e sse s bu s- de vice - dr ive r pr ogr a m m in g in t e r fa ce bu s_ r e gist e r ( ) fu n ct ion bu se s bus addresses I 2 C bus t ransact ions LPC ( Low Pin Count ) bus SMBus 2nd SPI ( Serial Peripheral I nt erface) bus USB [ See USB ( u n ive r sa l se r ia l bu s) .] user space I 2 C/ SMBus driver w1 bus Bu syBox

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] ca ch e cache m isses, count ing coherency [ See D M A ( D ir e ct M e m or y Acce ss) .] ca libr a t e _ de la y( ) fu n ct ion ca libr a t in g t ou ch con t r olle r s ca ll_ u se r m ode h e lpe r ( ) fu n ct ion 2nd Ca m br idge Silicon Ra dio ( CSR) CAN ( con t r olle r a r e a n e t w or k ) ca pa cit y of disk s, obt a in in g via SCSI Ge n e r ic Ca r dBu s 2nd Ca r d I n for m a t ion St r u ct u r e ( CI S) ca r dm gr da e m on Ca r d Se r vice s 2nd Ca r r ie r Gr a de Lin u x ( CGL) ca t h ode r a y t u be ( CRT) cde v_ a dd( ) fu n ct ion 2nd 3rd cde v_ de l( ) fu n ct ion cde v_ in it ( ) fu n ct ion 2nd cde v st r u ct u r e 2nd CD M A ( code division m u lt iple a cce ss) 2nd cdr e cor d cdr t ools CELF ( Con su m e r Ele ct r on ics Lin u x For u m ) ce ll ph on e de vice s claim ing/ freeing m em ory console drivers CPLD ( Com plex Program m able Logic Device) overview plat form drivers SoC ( Syst em - on- Chip) USB_UART driver USB_UART port s USB_UART regist er layout ce llu la r n e t w or k in g CDMA GPRS CEs ( cor r e ct a ble e r r or s) CF ( Com pa ct Fla sh ) [ See also PCM CI A ( Pe r son a l Com pu t e r M e m or y Ca r d I n t e r n a t ion a l Associa t ion ) .] BlueZ debugging definit ion em bedded drivers st orage cfb_ fillr e ct ( ) CFI ( Com m on Fla sh I n t e r fa ce ) cfi_ pr iva t e st r u ct u r e cfi_ pr obe _ ch ip( ) fu n ct ion CFQ ( Com ple t e Fa ir Qu e u in g) 2nd CFS ( Com ple t e ly Fa ir Sch e du le r ) CGL ( Ca r r ie r Gr a de Lin u x ) ch a n ge m a r k e r s ch a n g in g line disciplines MTU size ch a r a ct e r dr ive r s [ See ch a r dr ive r s.]

ch a r de vice e m u la t ion ch a r dr ive r s accessing char device em ulat ion CMOS driver I / O Cont rol init ializat ion int ernal file point er, set t ing wit h cm os_llseek( ) opening ov er v iew reading/ writ ing dat a regist er layout releasing code flow com m on problem s dat a st ruct ures m isc drivers overview parallel port com m unicat ion parallel port LED board cont rolling wit h sysfs led.c driver pseudo char drivers RTC subsyst em sensing dat a availabilit y fasync( ) funct ion ov er v iew select ( ) / poll( ) m echanism sources UART drivers wat chdog t im er ch e ck _ bu gs( ) fu n ct ion ch e ck list for n e w de vice s ch e ck su m s ch ip dr ive r s [ See N OR ch ip dr ive r s.] Ch ip Se le ct ( CS) ch oosin g peripherals processors Cir r u s Logic EP7 2 1 1 con t r olle r CI S ( Ca r d I n for m a t ion St r u ct u r e ) cispa r se _ t st r u ct u r e 2nd cist pl.h file cist pl_ cft a ble _ e n t r y_ t st r u ct u r e 2nd cla ss_ cr e a t e ( ) fu n ct ion 2nd cla ss_ de st r oy( ) fu n ct ion 2nd cla ss_ de vice _ a dd_ a t t r s( ) fu n ct ion cla ss_ de vice _ cr e a t e ( ) fu n ct ion 2nd cla ss_ de vice _ cr e a t e _ file ( ) fu n ct ion cla ss_ de vice _ de st r oy( ) fu n ct ion 2nd cla ss_ de vice _ r e gist e r ( ) fu n ct ion cla ss dr ive r s Bluet oot h HI Ds ( hum an int erface devices) m ass st orage overview USB- Serial cla sse s device classes input class st ruct ure cle a n m a r k e r s cle a r _ bit ( ) fu n ct ion Cle a r To Se n d ( CTS)

clie n t s client cont rollers EEPROM device exam ple PCMCI A client drivers, regist ering clock _ ge t t im e ( ) fu n ct ion CLOCK_ I N PUT_ REGI STER clock _ se t t im e ( ) fu n ct ion close ( ) fu n ct ion CLUT ( Color Look Up Ta ble ) CLu t 2 2 4 CM OS_ BAN K0 _ D ATA_ PORT r e gist e r CM OS_ BAN K0 _ I N D EX_ PORT r e gist e r CM OS_ BAN K1 _ D ATA_ PORT r e gist e r CM OS_ BAN K1 _ I N D EX_ PORT r e gist e r cm os_ de v st r u ct u r e CM OS dr ive r s I / O Cont rol init ializat ion int ernal file point er, set t ing wit h cm os_llseek( ) opening overview reading/ writ ing dat a regist er layout releasing cm os_ fops st r u ct u r e cm os_ in it ( ) fu n ct ion cm os_ ioct l( ) fu n ct ion cm os_ llse e k ( ) fu n ct ion cm os_ ope n ( ) fu n ct ion cm os_ r e a d( ) fu n ct ion cm os_ r e le a se ( ) fu n ct ion cm os_ w r it e ( ) fu n ct ion code division m u lt iple a cce ss ( CD M A) 2nd code por t a bilit y code c_ w r it e _ r e g( ) fu n ct ion codin g st yle s cold p lu g colle ct _ da t a ( ) fu n ct ion color m ode s com m a n d- lin e u t ilit ie s [ See spe cific u t ilit ie s.] com m a n d- se t 0 0 0 1 com m a n d- se t 0 0 0 2 com m a n d- se t 0 0 2 0 COM M AN D _ REGI STER com m a n ds [ See spe cific com m a n ds.] Com m on Fla sh I n t e r fa ce ( CFI ) Com m on m e m or y ( PCM CI A) Com m on UN I X Pr in t in g Syst e m ( CUPS) Com pa ct Fla sh [ See CF ( Com pa ct Fla sh ) .] com pa ct m iddle w a r e com pila t ion GCC com piler line disciplines com ple t e ( ) fu n ct ion 2nd com ple t e _ a ll( ) fu n ct ion com ple t e _ a n d_ e x it ( ) fu n ct ion 2nd Com ple t e Fa ir Qu e u in g ( CFQ) 2nd Com ple t e ly Fa ir Sch e du le r ( CFS) com ple t ion in t e r fa ce com ple t ion st r u ct u r e Com ple x Pr ogr a m m a ble Logic D e vice s ( CPLD s) 2nd con cu r r e n cy at om ic operat ors CVS ( Concurrent Versioning Syst em ) 2nd

debugging NI Cs ( net work int erface cards) overview reader- writ er locks spinlocks and m ut exes Con cu r r e n t Ve r sion in g Syst e m ( CVS) 2nd CON FI G_ 4 KSTACKS con figu r a t ion opt ion CON FI G_ D EBUG_ BUGVERBOSE con figu r a t ion opt ion CON FI G_ D EBUG_ H I M EM con figu r a t ion opt ion CON FI G_ D EBUG_ PAGE_ ALLOC con figu r a t ion opt ion CON FI G_ D EBUG_ SLAB con figu r a t ion opt ion CON FI G_ D EBUG_ SPI N LOCK con figu r a t ion opt ion CON FI G_ D EBUG_ STACK_ USAGE con figu r a t ion opt ion CON FI G_ D EBUG_ STACKOVERFLOW con figu r a t ion opt ion CON FI G_ D ETECT_ SOFTLOCKUP con figu r a t ion opt ion CON FI G_ I KCON FI G_ PROC con figu r a t ion opt ion CON FI G_ M AGI C_ SYSRQ con figu r a t ion opt ion CON FI G_ M YPROJECT_ FASTBOOT m a r k e r CON FI G_ M YPROJECT m a r k e r CON FI G_ PCM CI A_ D EBUG( ) m a cr o con fig_ por t ( ) fu n ct ion CON FI G_ PREEM PT_ RT pa t ch - se t CON FI G_ PREEM PT con figu r a t ion opt ion CON FI G_ PRI N TK_ TI M E con figu r a t ion opt ion CON FI G_ RTC_ CLASS con figu r a t ion opt ion CON FI G_ SYSCTL con figu r a t ion opt ion con figu r a t ion kernel hacking configurat ion opt ions MTD NAND chip drivers net _device st ruct ure NI Cs PCI resources Wireless Ext ensions con figu r a t ion spa ce ( PCI ) , a cce ssin g con n e ct ivit y of e m be dde d dr ive r s con se r va t ive gove r n or con sist e n cy of ch e ck su m s con sist e n t D M A a cce ss m e t h ods con sole dr ive r s boot logo cell phones con sole s Con su m e r Ele ct r on ics Lin u x For u m ( CELF) con t a in e r _ of( ) fu n ct ion 2nd 3rd con t e x t s, in t e r r u pt con t r a st a n d ba ck ligh t CON TROL_ REGI STER 2nd con t r olle r a r e a n e t w or k ( CAN ) con t r olle r s CAN ( cont roller area net work) CS8900 cont roller DRAM cont rollers ECC- aware m em ory cont roller EHCI cont roller host cont rollers NAND flash cont rollers OTG ( On- The- Go) cont rollers USB device cont rollers USB host cont rollers coor d.c a pplica t ion copy_ e 8 2 0 _ m a p( ) fu n ct ion copy_ fr om _ u se r ( ) fu n ct ion 2nd copy_ t o_ u se r ( ) fu n ct ion

copyin g syst e m m e m or y m a ps copyle ft ( GN U) cor r e ct a ble e r r or s ( CEs) cou n t e r s preem pt ion count ers TSC ( Tim e St am p Count er) CPLD s ( Com ple x Pr ogr a m m a ble Logic D e vice s) 2nd cpqa r r a y dr ive r cpu fr e q_ r e gist e r _ gove r n or ( ) fu n ct ion CPU fr e qu e n cy ( cpu fr e q) dr ive r su bsyst e m CPU fr e qu e n cy n ot ifica t ion cpu spe e d da e m on cr a sh com m a n d cr a sh du m ps cr e a t e _ sin gle t h r e a d_ w or k qu e u e ( ) fu n ct ion cr e a t e _ w or k qu e u e ( ) fu n ct ion CRT ( ca t h ode r a y t u be ) cr ypt o dir e ct or y CS ( Ch ip Se le ct ) CS8 9 0 0 con t r olle r cs8 9 x 0 _ pr obe 1 ( ) fu n ct ion cscope com m a n d CSR ( Ca m br idge Silicon Ra dio) ct a gs com m a n d CTS ( Cle a r To Se n d) CUPS ( Com m on UN I X Pr in t in g Syst e m ) CVS ( Con cu r r e n t Ve r sion in g Syst e m ) 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] D - ca ch e ( D a t a Ca ch e ) da e m on ize ( ) fu n ct ion 2nd da e m on s acpid cardm gr cpuspeed iscsid oprofiled pppd t race D ATA_ REGI STER da t a a va ila bilit y, se n sin g fasync( ) funct ion overview select ( ) / poll( ) m echanism D a t a Ca ch e ( D - ca ch e ) da t a fie ld ( sk _ bu ff st r u ct u r e ) da t a flow , Lin u x - PCM CI A su bsyst e m da t a m ix in g ( fn ) da t a st r u ct u r e s [ See spe cific st r u ct u r e s.] da t a t r a n sfe r DMA dat a t ransfer net _device st ruct ure NI Cs ( net work int erface cards) PCI DMA descript ors and buffers receiving and t ransm it t ing dat a regist er layout of net work funct ions t elem et ry card exam ple USB D D W G ( D igit a l D ispla y W or k in g Gr ou p) de a dlin e I / O sch e du le r 2nd de a d st a t e ( t h r e a ds) de bu gfs de bu gge r s [ See k e r n e l de bu gge r s.] de bu ggin g [ See also ECC ( e r r or cor r e ct in g code ) r e por t in g .] audio drivers block drivers Bluet oot h breakpoint s concur r ency crash dum ps diagnost ic t ools em bedded Linux board rework debuggers I 2C input drivers JTAG debuggers kdum p exam ple kexec wit h kdum p set up sources kernel debuggers

dow nloads ent ering gdb ( GNU debugger) JTAG debuggers kdb ( kernel debugger) kgdb ( kernel GNU debugger) ov er v iew kernel hacking configurat ion opt ions kexec inv ok ing preparat ion sources wit h kdum p kprobes exam ple fault - handlers insert ing inside kernel funct ions j probes kret probes lim it at ions post - handlers pr e- handler s sources Linux assem bly LTP ( Linux Test Proj ect ) MTD ( Mem ory Technology Devices) overview 2nd 3rd PCI PCMCI A profiling gprof OProfile ov er v iew RAS ( reliabilit y, availabilit y, serviceabilit y) t est equipm ent t racing UDB ( universal serial bus) UML ( User Mode Linux) wat chpoint s de bu g t ool D ECLARE_ COM PLETI ON ( ) m a cr o D ECLARE_ M UTEX( ) fu n ct ion D ECLARE_ W AI TQUEUE( ) m a cr o D EFI N E_ M UTEX( ) fu n ct ion D EFI N E_ TI M ER( ) fu n ct ion D EFI N E_ TI M ER( ) m a cr o de l_ ge n disk ( ) fu n ct ion de l_ t im e r ( ) fu n ct ion de la y- loop ca libr a t ion de la ys long delays short delays de live r y build script s change m arkers checksum consist ency code port abilit y coding st yles version cont rol de pm od u t ilit y de scr ipt or s ( USB) de t e ct _ m e m or y_ e 8 2 0 ( ) fu n ct ion de v_ a lloc_ sk b( ) fu n ct ion 2nd / de v dir e ct or y

/ dev nam es, adding t o usbfs / dev/ full driver / dev/ km em driver / dev/ m em driver / dev/ null char device / dev/ port driver / dev/ random driver / dev/ urandom driver 2nd / dev/ zero driver de v_ k fr e e _ sk b( ) fu n ct ion de v_ t st r u ct u r e de vfs de vice ch e ck list de vice cla sse s de vice con t r olle r s de vice _ dr ive r st r u ct u r e de vice _ r e gist e r ( ) fu n ct ion d e v ice s [ See also spe cific de vice s.] ACPI ( Advanced Configurat ion and Power I nt erface) devices int errupt handling [ See in t e r r u pt h a n dlin g.] Linux device m odel device classes hot plug/ coldplug kobj ect s m icrocode download m odule aut oload ov er v iew sysfs udev 2nd m em ory barriers power m anagem ent dia gn ost ic t ools dia lu p n e t w or k in g ( D UN ) die _ ch a in st r u ct u r e die n ot ifica t ion s 2nd diff com m a n d D igit a l D ispla y W or k in g Gr ou p ( D D W G) D igit a l Visu a l I n t e r fa ce ( D VI ) dir e ct - t o- h om e ( D TH ) in t e r fa ce D ir e ct M e m or y Acce ss [ See D M A ( D ir e ct M e m or y Acce ss) .] dir e ct or ie s [ See also spe cific dir e ct or ie s.] disa ble _ ir q( ) fu n ct ion 2nd disa ble _ ir q_ n osyn c( ) fu n ct ion 2nd disa blin g I RQs ( in t e r r u pt r e qu e st s) discon n e ct in g t e le m e t r y dr ive r s D isk - On - M odu le s ( D OM s) disk ca pa cit y, obt a in in g via SCSI Ge n e r ic disk m ir r or in g displa y a r ch it e ct u r e displa yin g im a ge s w it h m m a p( ) displa y pa r a m e t e r s dist r ibu t ion s dm a _ a ddr _ t st r u ct u r e D M A_ AD D RESS_ REGI STER D M A ( D ir e ct M e m or y Acce ss) [ See also Et h e r n e t - M ode m ca r d e x a m ple .] buffers consist ent DMA access m et hods definit ion descript ors and buffers I OMMU ( I / O m em ory m anagem ent unit ) m ast ers navigat ion syst em s scat t er- gat her st ream ing DMA access m et hods

synchronous versus asynchronous dm a _ m a p_ sin gle ( ) fu n ct ion D M A_ RX _ REGI STER dm a _ se t _ m a sk ( ) fu n ct ion D M A_ SI ZE_ REGI STER D M A_ TX _ REGI STER D M A da t a t r a n sfe r dm ix ( fn ) do_ ge t t im e ofda y( ) fu n ct ion 2nd do_ ida _ in t r ( ) fu n ct ion do_ I RQ( ) fu n ct ion do_ m a p_ pr obe ( ) fu n ct ion docu m e n t a t ion Docum ent at ion direct ory procfs seq files dolla r sign ( $ ) dom a in - spe cific e le ct r on ics D OM s ( D isk - On - M odu le s) don gle s, I n fr a r e d door be lls D OS de bu g t ool dow n ( ) fu n ct ion dow n _ r e a d( ) fu n ct ion dow n _ w r it e ( ) fu n ct ion D RAM con t r olle r s D RD s ( du a l- r ole de vice s) dr ive r _ r e gist e r ( ) fu n ct ion dr ive r s dir e ct or y D r ive r Se r vice s ds ( dr ive r se r vice s) m odu le D TH ( dir e ct - t o- h om e ) in t e r fa ce du a l- r ole de vice s ( D RD s) du m p_ por t ( ) fu n ct ion D UN ( dia lu p n e t w or k in g) dv1 3 9 4 dr ive r D VI ( D igit a l Visu a l I n t e r fa ce )

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] e 8 2 0 .c file e 8 2 0 .h file e 1 0 0 0 PCI - X Giga bit Et h e r n e t dr ive r ECC ( e r r or cor r e ct in g code ) r e por t in g 2nd correct able errors ( CEs) ECC- aware m em ory cont roller ECC- relat ed regist ers on DRAM cont roller edac_m c m odule em bedded drivers m ult ibit errors ( MBEs) single- bit errors ( SBEs) / sys/ devices/ syst em / edac/ direct ory uncorrect able errors ( UEs) ECs ( e m be dde d con t r olle r s) ED AC ( Er r or D e t e ct ion a n d Cor r e ct ion ) 2nd correct able errors ( CEs) ECC- aware m em ory cont roller ECC- relat ed regist ers on DRAM cont roller edac_m c m odule em bedded drivers error- handling aids m ult ibit errors ( MBEs) single- bit errors ( SBEs) / sys/ devices/ syst em / edac/ direct ory uncorrect able errors ( UEs) e da c_ m c m odu le e dge - se n sit ive de vice s e e p_ a t t a ch ( ) fu n ct ion e e p_ pr obe ( ) fu n ct ion e e p_ r e a d( ) fu n ct ion EEPROM de vice e x a m ple accessing adapt er capabilit ies, checking client s, at t aching i2c_del_driver( ) funct ion init ializing ioct l( ) funct ion llseek( ) m et hod m em ory banks opening overview probing RFI D ( Radio Frequency I dent ificat ion) t ransm it t ers EH CI ( En h a n ce d H ost Con t r olle r I n t e r fa ce ) 2nd EI SA ( Ex t e n de d I n du st r y St a n da r d Ar ch it e ct u r e ) e lv_ n e x t _ r e qu e st ( ) fu n ct ion 2nd e m be dde d boot loa de r s BLOB ( Boot Loader Obj ect ) boot st r apping GRUB LI LO ( Linux Loader) overview RedBoot SYSLI NUX t able of

e m be dde d con t r olle r s ( ECs) e m be dde d dr ive r s audio brownout s but t ons and wheels connect iv it y CPLDs ( Com plex Program m able Logic Devices) dom ain- specific elect ronics ECC capabilit ies flash m em ory FPGAs ( Field Program m able Gat e Arrays) overview PCMCI A/ CF 2nd PWM ( pulse- widt h m odulat or) unit s RTC SD/ MMC t ouch screens UARTs udev USB video e m be dde d Lin u x challenges com ponent select ion debugging board rework debuggers em bedded boot loaders BLOB ( Boot Loader Obj ect ) boot st r apping GRUB LI LO ( Linux Loader) ov er v iew RedBoot SYSLI NUX t able of em bedded drivers audio brownout s but t ons and wheels connect ivit y CPLDs ( Com plex Program m able Logic Devices) dom ain- specific elect ronics ECC capabilit ies flash m em ory FPGAs ( Field Program m able Gat e Arrays) ov er v iew PCMCI A/ CF PWM ( pulse- widt h m odulat or) unit s RTC SD/ MMC t ouch screens UARTs USB video hardware block diagram kernel port ing m em ory layout overview root filesyst em com pact m iddleware NFS- m ount ed root ov er v iew t est infrast ruct ure

t ool chains USB ( universal serial bus) e m u la t ion block device em ulat ion char device em ulat ion e n a ble _ ir q( ) fu n ct ion 2nd e n a blin g I RQs ( in t e r r u pt r e qu e st s) e n d fie ld ( sk _ bu ff st r u ct u r e ) e n d_ r e qu e st ( ) fu n ct ion e n dpoin t a ddr e sse s e n dpoin t s ( USB) En h a n ce d H ost Con t r olle r I n t e r fa ce ( EH CI ) 2nd e n u m e r a t ion EP7 2 1 1 con t r olle r e poll( ) fu n ct ion e r a se _ in fo_ u se r st r u ct u r e e r a se _ in fo st r u ct u r e e r r or cor r e ct in g code s ( ECCs) [ See ECC ( e r r or cor r e ct in g code ) r e por t in g .] Er r or D e t e ct ion An d Cor r e ct ion [ See ED AC ( Er r or D e t e ct ion a n d Cor r e ct ion ) .] / e t c/ in it t a b file / e t c/ r c.sy sin it e t a gs com m a n d e t h 1 3 9 4 dr ive r Et h e r n e t - M ode m ca r d e x a m ple dat a t ransfer DMA descript ors and buffers receiving and t ransm it t ing dat a regist er layout of net work funct ions m odem funct ions probing regist ering MODULE_DEVI CE_TABLE( ) m acro net work funct ions probing regist ering PCI _DEVI CE( ) m acro pci_device_id st ruct ures Et h e r n e t N I C dr ive r e t h t ool e t h t ool_ ops st r u ct u r e 2nd e vbu g m odu le Evde v in t e r fa ce e ve n t s input event drivers Evdev int erface ov er v iew virt ual m ouse device exam ple writ ing LTT event s not ifier event handlers e ve n t s/ n t h r e a ds e volu t ion of Lin u x e Xe cu t e I n Pla ce ( XI P) EXI T_ D EAD st a t e EXI T_ ZOM BI E st a t e e x pir e d qu e u e s Ex pr e ssCa r ds 2nd EXT3 file syst e m EXT4 file syst e m e Xt e n de d Gr a ph ics Ar r a y ( XGA) Ex t e n de d I n du st r y St a n da r d Ar ch it e ct u r e ( EI SA) e x t e r n a l w a t ch dogs

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] fa syn c( ) fu n ct ion fa syn c_ h e lpe r ( ) fu n ct ion 2nd fa u lt - h a n dle r s ( k pr obe s) fb_ bla n k ( ) m e t h od fb_ ch e ck _ va r ( ) m e t h od fb_ fillr e ct ( ) fb_ va r _ scr e e n in fo st r u ct u r e FCC ( Fe de r a l Com m u n ica t ion s Com m ission ) fcn t l( ) fu n ct ion Fe de r a l Com m u n ica t ion s Com m ission ( FCC) Fibr e Ch a n n e l Fie ld Pr ogr a m m a ble Ga t e Ar r a ys ( FPGAs) FI FO ( fir st - in fir st - ou t ) m e m or y file _ ope r a t ion s st r u ct u r e 2nd 3rd file st r u ct u r e file sy st e m s debugfs EXT3 EXT4 JFFS ( Journaling Flash File Syst em ) NFS ( Net work File Syst em ) procfs [ See pr ocfs.] root fs com pact m iddleware NFS- m ount ed root obt aining ov er v iew sysfs usbfs virt ual filesyst em VFS ( Virt ual File Syst em ) 2nd YAFFS ( Yet Anot her Flash File Syst em ) File Tr a n sla t ion La ye r ( FTL) Fin it e St a t e M a ch in e ( FSM ) Fir e W ir e Fir m w a r e H u b ( FW H ) fir st - in fir st - ou t ( FI FO) m e m or y fla sh _ e r a se a ll com m a n d fla sh m e m or y [ See also M TD ( M e m or y Te ch n ology D e vice s) .] CFI - com pliant flash, querying definit ion em bedded drivers NAND NOR sect or s floppy st or a ge flow con t r ol ( N I Cs) flu sh _ bu ffe r ( ) fu n ct ion flu sh in g da t a for u m s FPGAs ( Fie ld Pr ogr a m m a ble Ga t e Ar r a ys) fr a m e bu ffe r API fr a m e bu ffe r dr ive r s accelerat ed m et hods color m odes cont rast and backlight

dat a st ruct ures DMA param et ers screen blanking fr e e _ ir q( ) fu n ct ion 2nd fr e e _ n e t de v( ) fu n ct ion fr e e in g I RQs ( int errupt request s) m em ory Fr e e sca le M C1 3 7 8 3 Pow e r M a n a ge m e n t a n d Au dio Com pon e n t ( PM AC) Fr e e sca le M PC8 5 4 0 Fr e e Soft w a r e Fou n da t ion fr e qu e n cy sca lin g Fr on t Side Bu s ( FSB) fs dir e ct or y FSM ( Fin it e St a t e M a ch in e ) fsyn c( ) fu n ct ion FTD I dr ive r FTL ( File Tr a n sla t ion La ye r ) fu ll ch a r de vice fu ll- spe e d USB fu n ct ion con t r olle r s f u n ct ion s [ See spe cific fu n ct ion s.] FW H ( Fir m w a r e H u b)

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] ga dge t dr ive r s ga r ba ge colle ct or ( GC) GAS ( GN U Asse m ble r ) GC ( ga r ba ge colle ct or ) GCC com pile r GCC I n lin e Asse m bly H OW TO gdb ( GN U de bu gge r ) ge n disk st r u ct u r e 2nd ge n e r a l- pu r pose m ou se ( gpm ) Ge n e r a l Obj e ct Ex ch a n ge Pr ofile ( GOEP) Ge n e r a l Pa ck e t Ra dio Se r vice ( GPRS) 2nd Ge n e r a l Pu r pose I / O ( GPI O) ge n e r a t in g pat ches preprocessed source code GET_ D EVI CE_ I D com m a n d ge t _ r a n dom _ byt e s( ) fu n ct ion ge t _ st a t s( ) m e t h od ge t _ w ir e le ss_ st a t s( ) fu n ct ion ge t it im e r ( ) fu n ct ion ge t t im e ofda y( ) fu n ct ion Glibc libr a r ie s Globa l Syst e m for M obile Com m u n ica t ion ( GSM ) 2nd glow _ sh ow _ le d( ) fu n ct ion 2nd GM CH ( Gr a ph ics a n d M e m or y Con t r olle r H u b) GN U copyleft GAS ( GNU Assem bler) gdb ( GNU debugger) LGPL ( Lesser General Public License) GPL ( GNU Public License) GOEP ( Ge n e r a l Obj e ct Ex ch a n ge Pr ofile ) gove r n or s GPI O ( Ge n e r a l Pu r pose I / O) GPL ( GN U Pu blic Lice n se ) gpm ( ge n e r a l- pu r pose m ou se ) gpr of GPRS ( Ge n e r a l Pa ck e t Ra dio Se r vice ) 2nd Gr a ph ics a n d M e m or y Con t r olle r H u b ( GM CH ) GRUB GSM ( Globa l Syst e m for M obile Com m u n ica t ion ) 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] H A ( H igh Ava ila bilit y) pr oj e ct H AL ( H a r dw a r e Acce ss La ye r ) h a lt ( H LT) in st r u ct ion h a m r a dio h a n dle _ I RQ_ e ve n t ( ) fu n ct ion h a n dlin g in t e r r u pt s [ See in t e r r u pt h a n dlin g.] h a r d- spe cific m odu le s ( H D M s) h a r d_ st a r t _ x m it ( ) fu n ct ion h a r d_ st a r t _ x m it m e t h od H a r d D r ive Act ive Pr ot e ct ion Syst e m ( H D APS) H a r dw a r e Acce ss La ye r ( H AL) h a r dw a r e block dia gr a m s em bedded syst em PC- com pat ible syst em h a r dw a r e RAI D h a sh list s H CI ( H ost Con t r ol I n t e r fa ce ) 2nd h ci_ u a r t .k o H D ( H igh D e fin it ion ) Au dio H D APS ( H a r d D r ive Act ive Pr ot e ct ion Syst e m ) H D LC ( H igh - le ve l D a t a Lin k Con t r ol) H D M I ( H igh - D e fin it ion M u lt im e dia I n t e r fa ce ) H D M s ( h a r d- spe cific m odu le s) h dpa r m u t ilit y H D TV ( H igh - D e fin it ion Te le vision ) h e a d fie ld ( sk _ bu ff st r u ct u r e ) h e lpe r in t e r fa ce s com plet ion int erface error- handling aids hash list s kt hread helpers linked list s creat ing dat a st ruct ures, init ializing funct ions work subm ission worker t hread not ifier chains overview work queues h idp dr ive r H I D s ( h u m a n in t e r fa ce de vice s) 2nd 3rd 4t h H igh - D e fin it ion M u lt im e dia I n t e r fa ce ( H D M I ) H igh - D e fin it ion Te le vision ( H D TV) H igh - le ve l D a t a Lin k Con t r ol ( H D LC) h igh - spe e d in t e r con n e ct s I nfiniBand RapidI O Fibre Channel iSCSI ( I nt ernet SCSI ) USB H igh Ava ila bilit y ( H A) pr oj e ct H igh D e fin it ion ( H D ) Au dio h igh m e m or y h ist or y of Lin u x

h list _ h e a d st r u ct u r e 2nd h list _ n ode s st r u ct u r e h list s ( h a sh list s) H LT in st r u ct ion H N P ( H ost N e got ia t ion Pr ot ocol) h ost a da pt e r s H ost Con t r ol I n t e r fa ce ( H CI ) 2nd H ost N e got ia t ion Pr ot ocol ( H N P) h ot plu g h u bs, r oot h u m a n in t e r fa ce de vice s ( H I D s) 2nd 3rd 4t h h w clock com m a n d H Z 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] I - ca ch e ( I n st r u ct ion Ca ch e ) I / O Con t r ol CMOS driver t ouch cont roller I / O m e m or y m a n a ge m e n t u n it ( I OM M U) I / O r e gion s accessing dum ping byt es from I / O sch e du le r s 2nd I 2 C [ See also SM Bu s.] 1- wire prot ocol bus t ransact ions com pared t o USB core debugging definit ion EEPROM device exam ple accessing adapt er capabilit ies, checking client s, at t aching i2c_del_driver( ) funct ion init ializing ioct l( ) funct ion llseek( ) m et hod m em ory banks opening ov er v iew probing RFI D ( Radio Frequency I dent ificat ion) t ransm it t ers i2c- dev LM- Sensors overview RTC ( Real Tim e Clock) sources SPI ( Serial Peripheral I nt erface) bus sum m ary of dat a st ruct ures sum m ary of kernel program m ing int erfaces user m ode I 2 C i2 c- de v m odu le 2nd i2 c_ a dd_ a da pt e r ( ) fu n ct ion i2 c_ a dd_ dr ive r ( ) fu n ct ion 2nd i2 c_ a t t a ch _ clie n t ( ) fu n ct ion i2 c_ ch e ck _ fu n ct ion a lit y( ) fu n ct ion 2nd i2 c_ clie n t _ a ddr e ss_ da t a st r u ct u r e 2nd i2 c_ clie n t st r u ct u r e i2 c_ de l_ a da pt e r ( ) fu n ct ion i2 c_ de l_ dr ive r ( ) fu n ct ion 2nd i2 c_ de t a ch _ clie n t ( ) fu n ct ion i2 c_ dr ive r st r u ct u r e i2 c_ ge t _ fu n ct ion a lit y( ) fu n ct ion 2nd i2 c_ m sg st r u ct u r e i2 c_ pr obe ( ) fu n ct ion i2 c_ sm bu s_ r e a d_ block _ da t a ( ) fu n ct ion i2 c_ sm bu s_ r e a d_ byt e ( ) fu n ct ion i2 c_ sm bu s_ r e a d_ byt e _ da t a ( ) fu n ct ion 2nd

i2 c_ sm bu s_ r e a d_ w or d_ da t a ( ) fu n ct ion i2 c_ sm bu s_ w r it e _ block _ da t a ( ) fu n ct ion i2 c_ sm bu s_ w r it e _ byt e ( ) fu n ct ion i2 c_ sm bu s_ w r it e _ byt e _ da t a ( ) fu n ct ion i2 c_ sm bu s_ w r it e _ qu ick ( ) fu n ct ion i2 c_ sm bu s_ w r it e _ w or d_ da t a ( ) fu n ct ion 2nd i2 c_ t r a n sfe r ( ) fu n ct ion 2nd I 2 O ( I n t e llige n t I n pu t / Ou t pu t ) I 2 O SI G ( I 2 O Spe cia l I n t e r e st Gr ou p) I 2 S ( I n t e r - I C Sou n d) bu s i3 8 6 boot a sse m bly code I 8 5 5 _ EAP_ REGI STER r e gist e r I 8 5 5 _ ERRSTS_ REGI STER r e gist e r I D E ( I n t e gr a t e d D r ive Ele ct r on ics) I EEE 1 3 9 4 im a ge s displaying wit h m m ap( ) init ram fs im x .c dr ive r in [ b| w | l| sn | sl] ( ) fu n ct ion in _ in t e r r u pt ( ) fu n ct ion in b( ) fu n ct ion 2nd 3rd in clu de / a sm - x 8 6 / e 8 2 0 .h file in clu de / pcm cia / cist pl.h file in clu de dir e ct or y I n du st r ie s St a n da r d Ar ch it e ct u r e ( I SA) I n fin iBa n d I n fr a r e d 2nd 3rd dat a st ruct ures dongles I rCOMM I rDA socket s kernel program m ing int erfaces Linux- I rDA LI RC net working sources Super I / O chip in fr a st r u ct u r e m ode ( W LAN ) in it ( ) fu n ct ion char drivers CMOS driver EEPROM device exam ple in it _ com ple t ion ( ) fu n ct ion 2nd in it dir e ct or y I N I T_ LI ST_ H EAD ( ) fu n ct ion in it _ M UTEX( ) fu n ct ion in it _ t im e r ( ) fu n ct ion 2nd in it ia liza t ion CMOS driver EEPROM device exam ple m yblkdev st orage cont roller t elem et ry configurat ion regist er t elem et ry driver in it ia t or s ( iSCSI ) in it pr oce ss in it r a m fs r oot file syst e m in it r d m e m or y in it t a b file in l( ) fu n ct ion 2nd in lin e a sse m bly in pu t _ a lloca t e _ de vice ( ) fu n ct ion in pu t _ de v st r u ct u r e in pu t _ e ve n t ( ) fu n ct ion

in pu t _ e ve n t st r u ct u r e in pu t _ h a n dle r st r u ct u r e 2nd in pu t _ r e gist e r _ de vice ( ) fu n ct ion 2nd 3rd 4t h in pu t _ r e gist e r _ h a n dle r ( ) fu n ct ion in pu t _ r e por t _ a bs( ) fu n ct ion 2nd in pu t _ r e por t _ k e y( ) fu n ct ion in pu t _ r e por t _ r e l( ) fu n ct ion in pu t _ syn c( ) fu n ct ion 2nd in pu t _ u n r e gist e r _ de vice ( ) fu n ct ion in pu t cla ss in pu t dr ive r s debugging input device drivers accelerom et ers Bluet oot h keyboards Bluet oot h m ice out put event s PC keyboards PS/ 2 m ouse roller m ouse device exam ple serio t ouch cont rollers t ouchpads t rackpoint s USB keyboards USB m ice input event drivers Evdev int erface ov er v iew virt ual m ouse device exam ple writ ing input subsyst em sources sum m ary of dat a st ruct ures in pu t su bsyst e m in sm od com m a n d I n st r u ct ion Ca ch e ( I - ca ch e ) in t 0 x 1 5 se r vice 2nd I n t e gr a t e d D r ive Ele ct r on ics ( I D E) I n t e llige n t I n pu t / Ou t pu t ( I 2 O) I n t e r - I C Sou n d ( I 2 S) bu s I n t e r - I n t e gr a t e d Cir cu it [ See I 2 C.] in t e r n a l file poin t e r , se t t in g w it h cm os_ llse e k ( ) I n t e r n e t a ddr e ss n ot ifica t ion I n t e r n e t Pr ot ocol ( I P) I n t e r n e t SCSI ( iSCSI ) in t e r r u pt con t e x t s 2nd in t e r r u pt h a n dlin g asynchronous int errupt s block drivers int errupt cont ext s I RQs ( int errupt request s) assigning definit ion enabling/ disabling fr eeing request ing overview roller wheel device exam ple edge sensit ivit y free_irq( ) funct ion request _irq( ) funct ion roller int errupt handler soft irqs

t asklet s wave form s generat ed by soft irqs synchronous int errupt s t asklet s in t e r r u pt ible st a t e ( t h r e a ds) in t e r r u pt r e qu e st s [ See I RQs ( in t e r r u pt r e qu e st s) .] in t e r r u pt s in t e r r u pt se r vice r ou t in e ( I SR) in vok in g k e x e c in w ( ) fu n ct ion 2nd ioct l( ) fu n ct ion 2nd 3rd 4t h 5t h I OM M U ( I / O m e m or y m a n a ge m e n t u n it ) iope r m ( ) fu n ct ion 2nd iopl( ) fu n ct ion 2nd ior e m a p( ) fu n ct ion ior e m a p_ n oca ch e ( ) fu n ct ion 2nd iove c st r u ct u r e I P ( I n t e r n e t Pr ot ocol) ipc dir e ct or y ipx _ r ou t e s_ lock I r COM M ir da - u t ils pa ck a ge I r D A sock e t ( I r Sock ) 2nd I r LAP ( I R Lin k Acce ss Pr ot ocol) I r LM P ( I R Lin k M a n a ge m e n t Pr ot ocol) ir q com m a n d I RQ_ H AN D LED fla g I RQF_ D I SABLED fla g I RQF_ SAM PLE_ RAN D OM fla g I RQF_ SH ARED fla g I RQF_ TRI GGER_ H I GH fla g I RQF_ TRI GGER_ RI SI N G fla g I RQs ( in t e r r u pt r e qu e st s) assigning cell phone device exam ple definit ion enabling/ disabling freeing request ing roller wheel device exam ple I r Sock ( I r D A sock e t ) I S_ ERR( ) fu n ct ion 2nd I SA ( I n du st r ie s St a n da r d Ar ch it e ct u r e ) I SA N I Cs iSCSI ( I n t e r n e t SCSI ) iscsi_ t cp.c dr ive r iscsid da e m on I SR ( in t e r r u pt se r vice r ou t in e ) it e r a t or m e t h ods next ( ) show( ) st art ( ) st op( )

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] JFFS ( Jou r n a lin g Fla sh File Syst e m ) j iffie s 2nd Jou r n a lin g Fla sh File Syst e m ( JFFS) j pr in t k ( ) fu n ct ion j pr obe _ r e t u r n ( ) fu n ct ion j pr obe s JTAG ( Join t Te st Act ion Gr ou p) debuggers 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] k a cpid t h r e a d k a llsym s_ look u p_ n a m e ( ) fu n ct ion k a pm d t h r e a d k bn e pd k db ( k e r n e l de bu gge r ) k du m p exam ple kexec wit h kdum p set up sources k e r n e l.or g k e r n e l_ t h r e a d( ) fu n ct ion 2nd k e r n e l de bu gge r s dow nloads ent ering gdb ( GNU debugger) JTAG debuggers kdb ( kernel debugger) kgdb ( kernel GNU debugger) overview k e r n e l dir e ct or y k e r n e l h a ck in g con figu r a t ion opt ion s k e r n e l m ode k e r n e l m odu le s [ See m odu le s.] k e r n e l pr obe s [ See k pr obe s.] k e r n e l pr oce sse s [ See k e r n e l t h r e a ds.] k e r n e l pr ogr a m m in g in t e r fa ce s [ See spe cific fu n ct ion s.] k e r n e ls boot process BI OS- provided physical RAM m ap delay- loop calibrat ion EXT3 filesyst em HLT inst ruct ion I / O scheduler init process init rd m em ory kernel com m and line Linux boot sequence low m em ory/ high m em ory PCI resource configurat ion regist ered prot ocol fam ilies st art _kernel( ) funct ion building concurrency at om ic operat ors debugging ov er v iew reader- writ er locks spinlocks and m ut exes dat a st ruct ures, t able of debuggers dow nloads ent ering gdb ( GNU debugger) JTAG debuggers

kdb ( kernel debugger) kgdb ( kernel GNU debugger) ov er v iew helper int erfaces com plet ion int erface error- handling aids hash list s kt hread helpers linked list s not ifier chains ov er v iew work queues int errupt cont ext s kernel.org reposit ory kernel hacking configurat ion opt ions kernel m ode kernel program m ing int erfaces, t able of m em ory allocat ion m odules edac_m c loading por t ing process cont ext s sources 2nd source t ree layout direct ories 2nd navigat ing t hreads bdflush creat ing definit ion event s/ n t hreads kacpid kapm d kj ournald ksoft irqd/ 0 kt hreadd kt hread helpers kupdat ed list ing act ive t hreads n f sd pccardd pdflush process st at es user m ode helpers wait queues t im ers HZ j iffies long delays ov er v iew RTC ( Real Tim e Clock) short delays TSC ( Tim e St am p Count er) uClinux user m ode k e r n e l t h r e a ds bdflush creat ing definit ion event s/ n t hreads kacpid kapm d kj ournald

ksoft irqd/ 0 kt hreadd kt hread helpers kupdat ed list ing act ive t hreads n f sd pccardd pdflush process st at es user m ode helpers wait queues k e r n e l t im e r s HZ j iffies long delays overview RTC ( Real Tim e Clock) short delays TSC ( Tim e St am p Count er) k e r n e lt r a p.or g kexec invoking preparat ion sources wit h kdum p k e x e c- t ools pa ck a ge k e yboa r ds Bluet oot h keyboards overview PC keyboards USB keyboards k e y cod e s k e ypa ds k fr e e ( ) fu n ct ion k gdb ( k e r n e l GN U de bu gge r ) k gdbw a it com m a n d k h u bd k ill_ fa syn c( ) fu n ct ion 2nd k j ou r n a ld t h r e a d k m a lloc( ) fu n ct ion 2nd 3rd 4t h k m e m ch a r de vice k obj _ t ype st r u ct u r e 2nd k obj e ct _ a dd( ) fu n ct ion k obj e ct _ r e gist e r ( ) fu n ct ion 2nd k obj e ct _ u e ve n t ( ) fu n ct ion k obj e ct _ u n r e gist e r ( ) fu n ct ion 2nd k obj e ct s 2nd k pr obe s exam ple kprobe handlers, regist ering m ydrv.c file pat ches, insert ing fault - handlers insert ing inside kernel funct ions j probes k r et pr obes lim it at ions post - handlers pr e- handler s sources k r e f_ ge t ( ) fu n ct ion k r e f_ in it ( ) fu n ct ion k r e f_ pu t ( ) fu n ct ion k r e f obj e ct

k r e t _ t t y_ ope n ( ) fu n ct ion k r e t pr obe s k se t st r u ct u r e k soft ir qd/ 0 k e r n e l t h r e a d k t h r e a d_ cr e a t e ( ) fu n ct ion 2nd k t h r e a d_ r u n ( ) fu n ct ion k t h r e a d_ sh ou ld_ st op( ) fu n ct ion 2nd k t h r e a d_ st op( ) fu n ct ion k t h r e a dd k e r n e l t h r e a d k t h r e a d h e lpe r s k t ype _ le d st r u ct u r e k u pda t e d k e r n e l t h r e a d k za lloc( ) fu n ct ion 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] L2 CAP ( Logica l Lin k Con t r ol a n d Ada pt a t ion Pr ot ocol) l2 ca p.k o LAD ( Lin u x Au dio D e ve lope r s) list LAN Em u la t ion ( LAN E) LAN s ( loca l a r e a n e t w or k s) 2nd la pt ops la r ge pr ocfs r e a ds la ye r e d a r ch it e ct u r e ( se r ia l dr ive r s) LBA ( logica l block a ddr e ssin g) LCD C ( Liqu id Cr yst a l D ispla y Con t r olle r ) LCD con t r olle r s ldisc.r e a d( ) fu n ct ion ldisc.r e ce ive _ bu f( ) fu n ct ion le d.c dr ive r le d_ in it ( ) fu n ct ion le d_ w r it e ( ) fu n ct ion LED boa r d [ See pa r a lle l por t LED boa r ds.] le ga cy dr ive r s BI OS RTC driver le n fie ld ( sk _ bu ff st r u ct u r e ) le ve l- se n sit ive de vice s LGPL ( GN U Le sse r Ge n e r a l Pu blic Lice n se ) libATA lib dir e ct or y libr a r ie s alsa- lib Glibc libraw1394 libr a w 1 3 9 4 libr a r y libu sb pr ogr a m m in g t e m pla t e lik e ly( ) fu n ct ion 2nd LI LO ( Lin u x Loa de r ) lin e disciplin e s ( t ou ch con t r olle r de vice e x a m ple ) changing com piling connect ion diagram flushing dat a I / O Cont rol open/ close operat ions opening overview read pat hs unregist ering writ e pat hs lin k e d list s creat ing dat a st ruct ures, init ializing funct ions worker t hread work subm ission lin k s ( PCI e ) lin u x .con f.a u Lin u x Am a t e u r Ra dio AX.2 5 H OW TO Lin u x a sse m bly

boot sequence debugging GNU Assem bler ( GAS) i386 boot assem bly code inline assem bly Microsoft Macro Assem bler ( MASM) Net wide Assem bler ( NASM) Lin u x Asyn ch r on ou s I / O ( AI O) lin u x - a u dio- de v m a ilin g list Lin u x Au dio D e ve lope r s ( LAD ) list Lin u x de vice m ode l device classes hot plug/ coldplug kobj ect s m icrocode download m odule aut oload overview sysfs udev 2nd Lin u x dist r ibu t ion s Lin u x h ist or y a n d de ve lopm e n t lin u x - ide m a ilin g list Lin u x - I r D A Lin u x Ke r n e l Cr a sh D u m p ( LKCD ) Lin u x Ke r n e l M a ilin g List ( LKM L) Lin u x Kon gr e ss Lin u x Loa de r ( LI LO) Lin u x - M TD JFFS H OW TO lin u x - m t d m a ilin g list Lin u x - M TD su bsyst e m [ See M TD ( M e m or y Te ch n ology D e vice s) .] Lin u x - PCM CI A su bsyst e m [ See PCM CI A ( Pe r son a l Com pu t e r M e m or y Ca r d I n t e r n a t ion a l Associa t ion ) .] lin u x - scsi m a ilin g list Lin u x Sym posiu m Lin u x Te st Pr oj e ct ( LTP) 2nd Lin u x Tr a ce Toolk it [ See LTT ( Lin u x Tr a ce Toolk it ) .] Lin u x Tr a ce Toolk it Vie w e r ( LTTV) lin u x - u sb- de ve l m a ilin g list 2nd Lin u x - USB su bsyst e m [ See USB ( u n ive r sa l se r ia l bu s) .] Lin u x - vide o su bsyst e m Lin u x W or ld Con fe r e n ce a n d Ex po Liqu id Cr yst a l D ispla y Con t r olle r ( LCD C) LI RC ( Lin u x I n fr a r e d Re m ot e Con t r ol) list _ a dd( ) fu n ct ion list _ a dd_ t a il( ) fu n ct ion list _ de l( ) fu n ct ion 2nd list _ e m pt y( ) fu n ct ion list _ e n t r y( ) fu n ct ion 2nd list _ for _ e a ch _ e n t r y( ) fu n ct ion 2nd list _ for _ e a ch _ e n t r y_ sa fe ( ) fu n ct ion 2nd list _ h e a d st r u ct u r e 2nd list _ r e pla ce ( ) fu n ct ion list _ splice ( ) fu n ct ion list s hash list s linked list s creat ing dat a st ruct ures, init ializing funct ions worker t hread work subm ission LKCD ( Lin u x Ke r n e l Cr a sh D u m p) LKM L ( Lin u x Ke r n e l M a ilin g List ) llse e k ( ) fu n ct ion 2nd LM - Se n sor s

loa din g m odu le s loa dk e ys loca l_ ir q_ disa ble ( ) fu n ct ion loca l_ ir q_ e n a ble ( ) fu n ct ion 2nd loca l_ ir q_ r e st or e ( ) fu n ct ion loca l_ ir q_ sa ve ( ) fu n ct ion loca l a r e a n e t w or k s ( LAN s) 2nd loca lt im e ( ) fu n ct ion lock s lock u ps, soft log com m a n d logica l a ddr e sse s logica l block a ddr e ssin g ( LBA) Logica l Lin k Con t r ol a n d Ada pt a t ion Pr ot ocol ( L2 CAP) lon g de la ys loopba ck de vice s loops_ pe r _ j iffy va r ia ble 2nd 3rd low - spe e d USB low - volt a ge diffe r e n t ia l sign a lin g ( LVD S) low m e m or y Low Pin Cou n t ( LPC) bu s lp.c dr ive r lp_ w r it e ( ) fu n ct ion LPC ( Low Pin Cou n t ) bu s lse e k ( ) fu n ct ion lsm od com m a n d lspci com m a n d lsvpd u t ilit y LTP ( Lin u x Te st Pr oj e ct ) 2nd LTT ( Lin u x Tr a ce Toolk it ) com ponent s event s LTTng LTTV ( Linux Trace Toolkit Viewer) t race dum ps LTTn g LTTV ( Lin u x Tr a ce Toolk it Vie w e r ) LVD S ( low - volt a ge diffe r e n t ia l sign a lin g) lw n .n e t lx r com m a n d

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] M AC ( M e dia Acce ss Con t r ol) a ddr e sse s m a cr os [ See spe cific m a cr os.] M a d p la y 2nd m a ilbox e s ( Ra pidI O) m a ilin g list s 2nd m a in t e n a n ce build script s change m arkers checksum consist ency code port abilit y coding st yles version cont rol m a j or n u m be r s ( ch a r dr ive r s) m a k e com m a n d M AN ( m e t r opolit a n a r e a n e t w or k ) m a p_ in fo st r u ct u r e 2nd m a p dr ive r s definit ion MTD part it ion m aps, creat ing probe m et hod regist ering m a ppin g m e m or y m a ps, syst e m m e m or y m a p copying obt aining m a r k e r s, cle a n M ASM ( M icr osoft M a cr o Asse m ble r ) m a ss st or a ge de vice s ( USB) M a st e r Boot Re cor d ( M BR) M a st e r I n Sla ve Ou t ( M I SO) M a st e r Ou t Sla ve I n ( M OSI ) m a st e r s ( D M A) m a x im u m t r a n sm ission u n it ( M TU) 2nd m b( ) fu n ct ion M BEs ( m u lt ibit e r r or s) M BR ( M a st e r Boot Re cor d) M CA ( M icr o- Ch a n n e l Ar ch it e ct u r e ) M CH ( M e m or y Con t r olle r H u b) m d com m a n d m de la y( ) fu n ct ion m e dia _ ch a n ge d( ) m e t h od M e dia Acce ss Con t r ol ( M AC) a ddr e sse s M e dia I n de pe n de n t I n t e r fa ce ( M I I ) m e m ch a r de vice M EM ERASE com m a n d M EM LOCK com m a n d m e m or y accessing from user space allocat ing cache m isses, count ing claim ing/ fr eeing CMOS ( com plem ent ary m et al oxide sem iconduct or) DMA ( Direct Mem ory Access) [ See also Et h e r n e t - M ode m ca r d e x a m ple .] buffers consist ent DMA access m et hods

definit ion I OMMU ( I / O m em ory m anagem ent unit ) m ast ers scat t er- gat her st ream ing DMA access m et hods synchronous versus asynchronous em bedded Linux m em ory layout FI FO ( first - in first - out ) m em ory flash m em ory CFI - com pliant flash, querying definit ion em bedded drivers NAND NOR sect or s high m em ory init rd m em ory low m em ory m apping m em ory barriers m em ory zones MTD ( Mem ory Technology Devices) flash m em ory illust rat ion of Linux- MTD subsyst em m ap drivers MTD core NAND drivers NOR Chip drivers ov er v iew part it ion m aps, creat ing User Modules pages syst em m em ory m ap copying obt aining zero page ZONE_DMA ZONE_HI GH ZONE_NORMAL m e m or y.c file m e m or y ba n k s ( EEPROM ) M e m or y Con t r olle r H u b ( M CH ) m e m or y_ cs Ca r d Se r vice s dr ive r M e m or y Te ch n ology D e vice s [ See M TD ( M e m or y Te ch n ology D e vice s) .] m e m or y zon e s M EM UN LOCK com m a n d m e m w a lk d( ) fu n ct ion m e t h ods [ See spe cific m e t h ods.] m e t r opolit a n a r e a n e t w or k ( M AN ) m ice Bluet oot h m ice PS/ 2 m ouse roller m ouse device exam ple t ouchpads t r ack point s USB m ice virt ual m ouse device exam ple gpm ( general- purpose m ouse) vm s.c input driver M icr o- Ch a n n e l Ar ch it e ct u r e ( M CA) m icr ocode dow n loa d m icr odr ive s M icr osoft M a cr o Asse m ble r ( M ASM ) m iddle w a r e

M I I ( M e dia I n de pe n de n t I n t e r fa ce ) m illion in st r u ct ion s pe r se con d ( M I PS) M I M O ( M u lt iple I n M u lt iple Ou t ) m in icom M in i PCI m in or n u m be r s ( ch a r dr ive r s) M I PS ( m illion in st r u ct ion s pe r se con d) m ir r or in g disk s m isc_ de r e gist e r ( ) fu n ct ion m isc_ r e gist e r ( ) fu n ct ion 2nd 3rd M iscde vice st r u ct u r e m isc ( m isce lla n e ou s) dr ive r s [ See also w a t ch dog t im e r.] M I SO ( M a st e r I n Sla ve Ou t ) m ix e r s m k in it r a m fs com m a n d m k in it r d com m a n d m k t im e ( ) fu n ct ion m lock a ll( ) fu n ct ion 2nd - m m pa t ch m m a p( ) fu n ct ion 2nd 3rd m m a ppin g M M C ( M u lt iM e dia Ca r d) m m dir e ct or y m od_ t im e r ( ) fu n ct ion 2nd m ode m fu n ct ion s probing regist ering m ode s kernel m ode prot ect ed m ode real m ode user m ode m odin fo com m a n d m odpr obe com m a n d M OD ULE_ D EVI CE_ TABLE( ) m a cr o 2nd 3rd m odu le s aut oloading edac_m c loading M oln a r , I n go M or t on , An dr e w M OSI ( M a st e r I n Sla ve Ou t ) m ost sign ifica n t bit ( M SB) m ou se _ poll( ) fu n ct ion m ou se de v M ovin g Pict u r e Ex pe r t s Gr ou p ( M PEG) 2nd M P3 pla ye r e x a m ple ALSA driver code list ing ALSA program m ing codec_writ e_reg( ) funct ion MP3 decoding com plexit y m ycard_audio_probe( ) funct ion m ycard_audio_rem ove( ) funct ions m ycard_hw_param s( ) funct ion m ycard_pb_prepare( ) funct ion m ycard_pb_t rigger( ) funct ion m ycard_playback_open( ) funct ion overview regist er layout of audio hardware snd_card_free( ) funct ion snd_card_new( ) funct ion snd_card_proc_new( ) funct ion snd_card_regist er( ) funct ion snd_ct l_add( ) funct ion

snd_ct l_new1( ) funct ion snd_device_new( ) funct ion snd_kcont rol st ruct ure snd_pcm _hardware st ruct ure snd_pcm _lib_m alloc_pages( ) funct ion snd_pcm _lib_preallocat e_pages_for_all( ) funct ion snd_pcm _new( ) funct ion snd_pcm _ops st ruct ure snd_pcm _set _ops( ) funct ion user program s M PC8 5 4 0 ( Fr e e sca le ) M PEG ( M ovin g Pict u r e Ex pe r t s Gr ou p) 2nd M PLS ( M u lt ipr ot ocol La be l Sw it ch in g) M PoA ( M u lt i Pr ot ocol ove r ATM ) M SB ( m ost sign ifica n t bit ) m sle e p( ) fu n ct ion m syn c( ) fu n ct ion M TD ( M e m or y Te ch n ology D e vice s) configur at ion dat a st ruct ures debugging flash m em ory FWH ( Firm ware Hub) illust rat ion of Linux- MTD subsyst em kernel program m ing int erfaces m ap drivers definit ion MTD part it ion m aps, creat ing ov er v iew probe m et hod regist ering MTD core NAND chip drivers block size configuring definit ion layout NAND flash cont rollers OOB ( out - of- band) inform at ion page size spare area NOR chip drivers definit ion querying CFI - com pliant flash part it ion m aps, creat ing sources User Modules block device em ulat ion char device em ulat ion definit ion JFFS ( Journaling Flash File Syst em ) MTD- ut ils ov er v iew YAFFS ( Yet Anot her Flash File Syst em ) XI P ( eXecut e I n Place) m t d_ in fo st r u ct u r e m t d_ pa r t it ion st r u ct u r e 2nd M TD - u t ils m t dblock dr ive r m t dch a r dr ive r M TU ( m a x im u m t r a n sm ission u n it ) 2nd m u lt ibit e r r or s ( M BEs) M u lt iM e dia Ca r d ( M M C) m u lt im e t e r s

M u lt iple I n M u lt iple Ou t ( M I M O) M u lt ipr ot ocol La be l Sw it ch in g ( M PLS) M u lt i Pr ot ocol ove r ATM ( M PoA) m u n m a p( ) fu n ct ion m u t e x _ in it ( ) fu n ct ion m u t e x _ lock ( ) fu n ct ion m u t e x _ u n lock ( ) fu n ct ion m u t e x e s 2nd m u t u a l e x clu sion ( m u t e x e s) m y_ de v_ e ve n t _ h a n dle r ( ) fu n ct ion m y_ de vice _ x m it ( ) fu n ct ion m y_ die _ e ve n t _ h a n dle r ( ) fu n ct ion m y_ n ot i_ ch a in st r u ct u r e m y_ r e le a se ( ) fu n ct ion m yblk de v_ in it ( ) fu n ct ion m yblk de v_ ioct l( ) fu n ct ion m yblk de v_ r e qu e st ( ) fu n ct ion m yblk de v st or a ge con t r olle r block device operat ions disk access init ializat ion overview regist er layout m yca r d_ a u dio_ pr obe ( ) fu n ct ion m yca r d_ a u dio_ r e m ove ( ) fu n ct ion m yca r d_ ch a n ge _ m t u ( ) fu n ct ion m yca r d_ ge t _ e e pr om ( ) fu n ct ion m yca r d_ ge t _ st a t s( ) fu n ct ion m yca r d_ h w _ pa r a m s( ) fu n ct ion m yca r d_ pb_ pr e pa r e ( ) fu n ct ion m yca r d_ pb_ t r igge r ( ) fu n ct ion m yca r d_ pb_ vol_ in fo( ) fu n ct ion m yca r d_ pla yba ck _ ope n ( ) fu n ct ion m ydr v.c file m ydr v_ de v st r u ct u r e m ydr v_ in it ( ) fu n ct ion m ydr v_ w or k e r ( ) fu n ct ion m ydr v_ w or k it e m st r u ct u r e m ydr v_ w q st r u ct u r e m ye ve n t _ id st r u ct u r e m ye ve n t _ w a it qu e u e st r u ct u r e m yr t c_ a t t a ch ( ) fu n ct ion m yr t c_ ge t t im e ( ) fu n ct ion

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] N _ TCH lin e disciplin e 2nd n _ t ou ch _ ch a r s_ in _ bu ffe r ( ) fu n ct ion n _ t ou ch _ ope n ( ) fu n ct ion n _ t ou ch _ r e ce ive _ bu f( ) fu n ct ion n _ t ou ch _ r e ce ive _ r oom ( ) fu n ct ion n _ t ou ch _ w r it e ( ) fu n ct ion n _ t ou ch _ w r it e _ w a k e u p( ) fu n ct ion n a n d_ e ccla you t st r u ct u r e 2nd n a n d_ fla sh _ ids[ ] t a ble N AN D ch ip dr ive r s block size configuring definit ion layout NAND flash cont rollers OOB ( out - of- band) inform at ion page size spare area N AN D File Tr a n sla t ion La ye r ( N FTL) N AN D fla sh con t r olle r s N AN D fla sh m e m or y N AN D st or a ge n a n osle e p( ) fu n ct ion N API ( N e w API ) 2nd N ASM ( N e t w ide Asse m ble r ) n a viga t ion fram e buffer drivers accelerat ed m et hods color m odes cont rast and backlight dat a st ruct ures DMA param et ers screen blanking source t ree layout N CP ( N e t w or k Con t r ol Pr ot ocol) n de la y( ) fu n ct ion n e t _ de vice _ st a t s st r u ct u r e 2nd n e t _ de vice m e t h od n e t de vice n ot ifica t ion n e t _ de vice st r u ct u r e act ivat ion bus- specific m et hods configur at ion dat a t ransfer overview st at ist ics wat chdog t im eout n e t dir e ct or y n e t de v_ ch a in st r u ct u r e n e t if_ de vice _ a t t a ch ( ) fu n ct ion n e t if_ de vice _ de t a ch ( ) fu n ct ion n e t if_ qu e u e _ st oppe d( ) fu n ct ion 2nd n e t if_ r e ce ive _ sk b( ) fu n ct ion n e t if_ r x ( ) fu n ct ion 2nd 3rd

n e t if_ r x _ com ple t e ( ) fu n ct ion 2nd n e t if_ r x _ sch e du le ( ) fu n ct ion n e t if_ r x _ sch e du le _ pr e p( ) fu n ct ion n e t if_ st a r t _ qu e u e ( ) fu n ct ion 2nd n e t if_ st op_ qu e u e ( ) fu n ct ion 2nd n e t if_ w a k e _ qu e u e ( ) fu n ct ion 2nd N e t lin k sock e t s n e t pe r f N e t r om N e t w ide Asse m ble r ( N ASM ) N e t w or k Con t r ol Pr ot ocol ( N CP) N e t w or k File Syst e m ( N FS) n e t w or k in t e r fa ce ca r ds [ See N I Cs ( n e t w or k in t e r fa ce ca r ds) .] n e t w or k s Bluet oot h 2nd 3rd I nfrared LANs ( local area net works) net work funct ions probing regist ering NI Cs ( net work int erface cards) [ See N I Cs ( n e t w or k in t e r fa ce ca r ds) .] t hroughput driver perform ance ov er v iew prot ocol perform ance N e w API ( N API ) n e w de vice ch e ck list n e x t ( ) fu n ct ion N FS ( N e t w or k File Syst e m ) 2nd n fs_ u n lock _ r e qu e st ( ) fu n ct ion n fsd k e r n e l t h r e a d N FTL ( N AN D File Tr a n sla t ion La ye r ) n ice va lu e s N I Cs ( n e t w or k in t e r fa ce ca r ds) act ivat ion ATM ( asynchronous t ransfer m ode) buffer m anagem ent concurrency cont rol configur at ion dat a st ruct ures dat a t ransfer Et hernet NI C driver I SA NI Cs MTU size, changing net device int erface [ See n e t _ de vice st r u ct u r e.] net work t hroughput driver perform ance ov er v iew prot ocol perform ance overview prot ocol layers flow cont rol receive pat h t ransm it pat h socket buffers sources st at ist ics sum m ary of kernel program m ing int erfaces wat chdog t im eout N oop 2nd N OR ch ip dr ive r s definit ion querying CFI - com pliant flash N OR fla sh m e m or y

N or t h Br idge n ot e book s n ot ifica t ion s CPU frequency not ificat ion die not ificat ion I nt ernet address not ificat ion net device not ificat ion not ifier chains n ot ifie r _ block st r u ct u r e n ot ifie r ch a in s n u ll sin k N VRAM dr ive r s, u pda t in g w it h se q file s

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] O( 1 ) sch e du le r OBEX ( OBj e ct EXch a n ge ) obj du m p com m a n d OBj e ct EXch a n ge ( OBEX) obj e ct s, k obj e ct s obt a in in g syst e m m e m or y m a p OEM s ( or igin a l e qu ipm e n t m a n u fa ct u r e r s) off- t h e - sh e lf ( OTS) m odu le s OH CI ( Ope n H ost Con t r olle r I n t e r fa ce ) oh ci1 3 9 4 dr ive r On - Th e - Go ( OTG) con t r olle r s on de m a n d gove r n or OOB ( ou t - of- ba n d) in for m a t ion opcon t r ol ope n ( ) m e t h od block drivers CMOS driver EEPROM driver net _device st ruct ure ope n _ soft ir q( ) fu n ct ion Ope n H ost Con t r olle r I n t e r fa ce ( OH CI ) ope n in g CMOS driver EEPROM driver t ouch cont rollers Ope n Sou n d Syst e m ( OSS) Ope n Sou r ce D e ve lopm e n t La b ( OSD L) Ope n Syst e m s I n t e r con n e ct ( OSI ) ope r a t or s, a t om ic opr e por t OPr ofile 2nd cache m isses, count ing opcont rol opreport opr ofile d da e m on or igin a l e qu ipm e n t m a n u fa ct u r e r s ( OEM s) OS- spe cific m odu le s ( OSM s) oscilloscope s OSD L ( Ope n Sou r ce D e ve lopm e n t La b) OSI ( Ope n Syst e m Con n e ct ) OSM s ( OS- spe cific m odu le s) OSS ( Ope n Sou n d Syst e m ) OTG ( On - Th e - Go) con t r olle r s ou t - of- ba n d ( OOB) in for m a t ion ou t b( ) fu n ct ion 2nd 3rd ou t l( ) fu n ct ion 2nd ou t sl( ) fu n ct ion ou t sn ( ) fu n ct ion ou t pu t e ve n t s ( in pu t de vice dr ive r s) ou t w ( ) fu n ct ion 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] pa ck a ge s alsa- ut ils k ex ec- t ools MTD- ut ils pcm cia- cs pcm ciaut ils sysfsut ils pa ge s ( m e m or y) PAN ( pe r son a l a r e a n e t w or k ) Pa r a lle l ATA ( PATA) pa r a lle l por t com m u n ica t ion pa r a lle l por t LED boa r ds cont rolling from user space cont rolling wit h sysfs led.c driver pa r a lle l pr in t e r dr ive r s Pa r de vice st r u ct u r e pa r por t pa r por t _ cla im _ or _ block ( ) fu n ct ion pa r por t _ r e a d_ da t a ( ) fu n ct ion pa r por t _ r e gist e r _ de vice ( ) fu n ct ion 2nd pa r por t _ r e gist e r _ dr ive r ( ) fu n ct ion 2nd pa r por t _ r e le a se ( ) fu n ct ion pa r por t _ u n r e gist e r _ de vice ( ) fu n ct ion pa r por t _ u n r e gist e r _ dr ive r ( ) fu n ct ion pa r por t _ w r it e _ da t a ( ) fu n ct ion pa r t it ion s MTD part it ion m aps, creat ing swap space PATA ( Pa r a lle l ATA) pa t ch e s applying CONFI G_PREEMPT_RT pat ch- set creat ing definit ion kernel.org reposit ory pa t ch u t ilit y PC- com pa t ible syst e m h a r dw a r e block dia gr a m PCBs ( pr in t e d cir cu it boa r ds) pcca r dct l com m a n d pcca r dd t h r e a d PC Ca r ds PC k e yboa r ds PCI ( Pe r iph e r a l Com pon e n t I n t e r con n e ct ) accessing PCI regions configurat ion space I / O and m em ory regions addressing and ident ificat ion CardBus 2nd com pared t o USB dat a st ruct ures debugging definit ion DMA ( Direct Mem ory Access) buffers

consist ent DMA access m et hods definit ion descript ors and buffers I OMMU ( I / O m em ory m anagem ent unit ) m ast ers scat t er- gat her st ream ing DMA access m et hods synchronous versus asynchronous Et hernet - Modem card exam ple dat a t ransfer m odem funct ions, probing m odem funct ions, regist ering MODULE_DEVI CE_TABLE( ) m acro net work funct ions, probing net work funct ions, regist ering PCI _DEVI CE( ) m acro pci_device_id st ruct ures Express Cards kernel program m ing int erfaces Mini PCI PCI - based solut ions PCI Express PCI Express Mini Card PCI Ext ended ( PCI - X) PCI inside Sout h Bridge syst em resources, configuring serial com m unicat ion sources pci_ a lloc_ con sist e n t ( ) fu n ct ion 2nd PCI _ D EVI CE( ) m a cr o 2nd pci_ de vice _ id st r u ct u r e 2nd 3rd pci_ de v st r u ct u r e 2nd pci_ disa ble _ de vice ( ) fu n ct ion pci_ dm a _ syn c_ sg( ) fu n ct ion pci_ dm a _ syn c_ sin gle ( ) fu n ct ion pci_ dr ive r st r u ct u r e pci_ e n a ble _ de vice ( ) fu n ct ion PCI Ex pr e ss 2nd PCI Ex pr e ss M in i Ca r d PCI Ex t e n de d ( PCI - X) pci_ fr e e _ con sist e n t ( ) fu n ct ion pci_ iom a p( ) fu n ct ion 2nd pci_ m a p_ pa ge ( ) fu n ct ion pci_ m a p_ sg( ) fu n ct ion 2nd pci_ m a p_ sin gle ( ) fu n ct ion 2nd pci_ r e a d_ con fig_ byt e ( ) fu n ct ion 2nd pci_ r e a d_ con fig_ dw or d( ) fu n ct ion 2nd pci_ r e a d_ con fig_ w or d( ) fu n ct ion 2nd pci_ r e gist e r _ dr ive r ( ) fu n ct ion 2nd pci_ r e qu e st _ r e gion ( ) fu n ct ion 2nd pci_ r e sou r ce _ e n d( ) fu n ct ion 2nd pci_ r e sou r ce _ fla gs( ) fu n ct ion 2nd pci_ r e sou r ce _ le n ( ) fu n ct ion 2nd pci_ r e sou r ce _ st a r t ( ) fu n ct ion 2nd pci_ se t _ dm a _ m a sk ( ) fu n ct ion pci_ se t _ m a st e r ( ) fu n ct ion pci_ u n m a p_ sg( ) fu n ct ion pci_ u n m a p_ sin gle ( ) fu n ct ion pci_ u n r e gist e r _ dr ive r ( ) fu n ct ion pci_ w r it e _ con fig_ byt e ( ) fu n ct ion 2nd pci_ w r it e _ con fig_ dw or d( ) fu n ct ion 2nd pci_ w r it e _ con fig_ w or d( ) fu n ct ion 2nd PCI - X ( PCI Ex t e n de d) PCI e ( PCI Ex pr e ss)

PCM ( pu lse code m odu la t ion ) PCM CI A ( Pe r son a l Com pu t e r M e m or y Ca r d I n t e r n a t ion a l Associa t ion ) At t ribut e m em ory CardBus devices Card Services CI S ( Card I nform at ion St ruct ure) client drivers, regist ering Com m on m em ory dat a- flow pat h bet ween com ponent s dat a st ruct ures cisparse_t cist pl_cft able_ent ry_t pcm cia_device pcm cia_device_id pcm cia_driver st ruct ure sum m ary of t uple_t debugging definit ion device I Ds and hot plug m et hods Driver Services driver services m odule ( ds) em bedded drivers ExpressCards kernel program m ing int erfaces Linux- PCMCI A subsyst em int eract ion m ailing list on em bedded syst em s on lapt ops pcm ciaut ils package serial PCMCI A sources st orage udev pcm cia - cs pa ck a ge pcm cia _ de vice _ id st r u ct u r e 2nd PCM CI A_ D EVI CE_ M AN F_ CARD ( ) m a cr o pcm cia _ de vice st r u ct u r e 2nd pcm cia _ dr ive r st r u ct u r e 2nd pcm cia _ ge t _ fir st _ t u ple ( ) fu n ct ion pcm cia _ ge t _ t u ple _ da t a ( ) fu n ct ion pcm cia _ pa r se _ t u ple ( ) fu n ct ion pcm cia _ r e gist e r _ dr ive r ( ) fu n ct ion 2nd pcm cia _ r e qu e st _ ir q( ) fu n ct ion pcm cia _ u n r e gist e r _ dr ive r ( ) fu n ct ion pcm cia u t ils pa ck a ge pcspk r _ e ve n t ( ) fu n ct ion pda _ m t d_ pr obe ( ) fu n ct ion pdflu sh k e r n e l t h r e a d Pe n t iu m TSC ( Tim e St a m p Cou n t e r ) pe r ce n t sign ( % ) pe r for m a n ce net work t hroughput driver perform ance ov er v iew prot ocol perform ance perform ance governor Pe r iph e r a l Com pon e n t I n t e r con n e ct [ See PCI ( Pe r iph e r a l Com pon e n t I n t e r con n e ct ) .] pe r iph e r a ls choosing peripheral cont rollers pe r m a n e n t vir t u a l cir cu it s ( PVCs) pe r son a l a r e a n e t w or k ( PAN ) pe r son a l ide n t ifica t ion n u m be r s ( PI N s)

PH Y ( ph ysica l la ye r ) t r a n sce ive r s PI BS boot loa de r Pico- I r D A PI N s ( pe r son a l ide n t ifica t ion n u m be r s) PI O ( pr ogr a m m e d I / O) pipe s 2nd pla ce m e n t plot s pla t for m _ a dd_ de vice s( ) fu n ct ion 2nd pla t for m _ de vice _ r e gist e r ( ) fu n ct ion pla t for m _ de vice _ r e gist e r _ sim ple ( ) fu n ct ion 2nd 3rd pla t for m _ de vice _ u n r e gist e r ( ) fu n ct ion 2nd pla t for m _ de vice r e gist e r ( ) fu n ct ion pla t for m _ de vice st r u ct u r e 2nd pla t for m _ dr ive r _ r e gist e r ( ) fu n ct ion 2nd pla t for m _ dr ive r _ u n r e gist e r ( ) fu n ct ion pla t for m _ dr ive r st r u ct u r e 2nd pla t for m dr ive r s Plu g- a n d- Pla y ( Pn P) PM AC ( Pow e r M a n a ge m e n t a n d Au dio Com pon e n t ) Pn P ( Plu g- a n d- Pla y) PoE ( Pow e r ove r Et h e r n e t ) poin t - of- sa le ( POS) Poin t - t o- Poin t Pr ot ocol ( PPP) 2nd poin t e r s poll( ) m e t h od 2nd 3rd poll_ t a ble st r u ct u r e 2nd poll_ w a it ( ) fu n ct ion 2nd pollin g in ch a r dr ive r s popu la t in g URBs por t _ da t a _ in ( ) fu n ct ion por t _ da t a _ ou t ( ) fu n ct ion por t a bilit y of code por t ch a r de vice por t in g k e r n e ls por t s kgdb port s parallel port com m unicat ion parallel port LED board cont rolling wit h sysfs led.c driver serial port s USB_UART port s POS ( poin t - of- sa le ) post - h a n dle r s ( k pr obe s) pow e r m a n a ge m e n t Pow e r M a n a ge m e n t a n d Au dio Com pon e n t ( PM AC) Pow e r ove r Et h e r n e t ( PoE) Pow e r PC boot loa de r s pow e r sa ve gove r n or ppde v dr ive r 2nd PPP ( Poin t - t o- Poin t Pr ot ocol) 2nd pppd da e m on pr e - h a n dle r s ( k pr obe s) pr e e m pt _ disa ble ( ) fu n ct ion pr e e m pt _ e n a ble ( ) fu n ct ion pr e e m pt ion cou n t e r s pr e pr oce sse d sou r ce code , ge n e r a t in g pr in t e d cir cu it boa r ds ( PCBs) pr in t k ( ) fu n ct ion 2nd 3rd pr obe ( ) fu n ct ion 2nd 3rd 4t h pr obe s [ See k pr obe s.] pr obin g EEPROM driver kprobes [ See k pr obe s.]

net work funct ions t elem et ry card exam ple pr oce sse s cont ext s init kernel processes [ See k e r n e l t h r e a ds.] st at es zom bie processes pr oce ss file syst e m [ See pr ocfs.] pr oce ssor s, ch oosin g pr oce ss sch e du lin g ( u se r m ode dr ive r s) CFS ( Com plet ely Fair Scheduler) O( 1) scheduler original scheduler overview pr ocfs docum ent at ion reading wit h exam ple large procfs reads seq files pr ofilin g Bluet oot h gprof OProfile cache m isses, count ing opcont rol opreport overview pr ogr a m m e d I / O ( PI O) pr ot e ct e d m ode 2nd PS/ 2 m ou se ps com m a n d pse u do ch a r dr ive r s pse u do t e r m in a ls ( PTYs) psm ou se _ pr ot ocol st r u ct u r e 2nd psm ou se st r u ct u r e PTR_ ERR( ) fu n ct ion pt r a ce u t ilit y pt y.c dr ive r PTYs ( pse u do t e r m in a ls) pu blic dom a in soft w a r e pu lse code m odu la t ion ( PCM ) pu lse - w idt h m odu la t or ( PW M ) u n it s PVCs ( pe r m a n e n t vir t u a l cir cu it s) PW M ( pu lse - w idt h m odu la t or ) u n it s

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] QoS ( qu a lit y of se r vice ) Qt r on ix in fr a r e d k e yboa r d dr ive r qu a lit y of se r vice ( QoS) Qu a r t e r VGA ( QVGA) qu e r ie s, CFI - com plia n t fla sh queues act ive queues expired queues overview run queues work queues 2nd 3rd QVGA ( Qu a r t e r VGA)

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] r a ce con dit ion s r a dio am at eur radio RF ( Radio Frequency) chips RFCOMM ( Radio Frequency Com m unicat ion) RFI D ( Radio Frequency I dent ificat ion) t ransm it t ers RAI D ( r e du n da n t a r r a y of in e x pe n sive disk s) r a ise _ soft ir q( ) fu n ct ion 2nd r a n dom ch a r de vice r a n dom n u m be r ge n e r a t or Ra p id I O Fibre Channel iSCSI ( I nt ernet SCSI ) RAS ( r e lia bilit y, a va ila bilit y, se r vice a bilit y) r c.sysin it file RCU ( Re a d- Copy Upda t e ) RD M A ( Re m ot e D M A) r dt sc( ) fu n ct ion r e a d( ) m e t h od READ _ CAPACI TY com m a n d Re a d- Copy Upda t e ( RCU) r e a d_ lock ( ) fu n ct ion r e a d_ lock _ ir qr e st or e ( ) fu n ct ion 2nd r e a d_ lock _ ir qsa ve ( ) fu n ct ion 2nd r e a d_ se qbe gin ( ) fu n ct ion r e a d_ se qlock ( ) fu n ct ion r e a d_ se qr e t r y( ) fu n ct ion r e a d_ se qu n lock ( ) fu n ct ion r e a d_ u n lock ( ) fu n ct ion r e a de r - w r it e r lock s r e a din g da t a CMOS driver wit h procfs exam ple large procfs reads seq files r e a dm e _ pr oc( ) fu n ct ion ar gum ent s exam ple large procfs reads large proc reads seq files r e a d pa t h s r e a dv( ) fu n ct ion r e a l m ode 2nd r e a l t im e ( - r t ) pa t ch 2nd Re a l Tim e Clock ( RTC) 2nd Re a l Tim e Tr a n spor t Pr ot ocol ( RTP) r e ce ive _ bu f( ) fu n ct ion r e ce ive pa t h ( N I Cs) r e ce pt a cle s ( USB) Re dBoot 2nd r e du n da n t a r r a y of in e x pe n sive disk s ( RAI D ) r e fe r e n ce de sign a t or s r e gist e r _ blk de v( ) fu n ct ion 2nd

r e gist e r _ ch r de v( ) fu n ct ion r e gist e r _ die _ n ot ifie r ( ) fu n ct ion r e gist e r _ in e t a ddr _ n ot ifie r ( ) fu n ct ion r e gist e r _ j pr obe s( ) fu n ct ion r e gist e r _ k r e t pr obe s( ) fu n ct ion r e gist e r _ n e t de v( ) fu n ct ion 2nd r e gist e r _ n e t de vice _ n ot ifie r ( ) fu n ct ion r e gist e r e d pr ot ocol fa m ilie s r e gist e r in g j probe handlers kprobe handlers m ap drivers m odem funct ions net work funct ions PCMCI A client drivers plat form drivers ret urn probe handlers UART drivers user m ode helpers r e gist e r la you t audio hardware char drivers m yblkdev st orage cont roller USB_UART r e le a se ( ) m e t h od 2nd r e le a se _ fir m w a r e ( ) fu n ct ion r e le a se _ r e gion ( ) fu n ct ion 2nd r e lia bilit y, a va ila bilit y, se r vice a bilit y ( RAS) Re m ot e D M A ( RD M A) r e m ove ( ) fu n ct ion r e m ove _ w a it _ qu e u e ( ) fu n ct ion 2nd r e por t in g ( ECC) [ See ECC ( e r r or cor r e ct in g code ) r e por t in g .] r e qu e st ( ) m e t h od r e qu e st _ fir m w a r e ( ) fu n ct ion 2nd r e qu e st _ ir q( ) fu n ct ion 2nd 3rd 4t h r e qu e st _ m e m _ r e gion ( ) fu n ct ion 2nd r e qu e st _ qu e u e st r u ct u r e 2nd r e qu e st _ r e gion ( ) fu n ct ion 2nd 3rd r e qu e st s, in t e r r u pt [ See I RQs ( in t e r r u pt r e qu e st s) .] r e qu e st st r u ct u r e 2nd Re qu e st To Se n d ( RTS) r e spon se t im e s ( u se r m ode dr ive r s) r e su m e ( ) fu n ct ion r e t u r n pr obe s ( k r e t pr obe s) RF ( Ra dio Fr e qu e n cy) ch ips RFCOM M ( Ra dio Fr e qu e n cy Com m u n ica t ion ) 2nd RFI D ( Ra dio Fr e qu e n cy I de n t ifica t ion ) t r a n sm it t e r s r j com m .k o r m b( ) fu n ct ion r m m od com m a n d r olle r _ a n a lyze ( ) fu n ct ion r olle r _ ca pt u r e ( ) fu n ct ion r olle r _ in t e r r u pt ( ) fu n ct ion r olle r m ou se de vice e x a m ple r olle r _ m ou se _ in it ( ) fu n ct ion r olle r w h e e l de vice e x a m ple edge sensit ivit y free_irq( ) funct ion overview request _irq( ) funct ion roller int errupt handler soft irqs t asklet s wave form s generat ed by

r oot fs com pact m iddleware NFS- m ount ed root obt aining overview r oot h u bs Rose r q_ for _ e a ch _ bio( ) fu n ct ion 2nd RS- 4 8 5 r s_ ope n ( ) fu n ct ion – r t ( r e a l t im e ) pa t ch 2nd RTC ( Re a l Tim e Clock ) 2nd 3rd 4t h r t c.c dr ive r r t c_ cla ss_ ops st r u ct u r e 2nd r t c_ de vice _ r e gist e r ( ) fu n ct ion 2nd r t c_ de vice _ u n r e gist e r ( ) fu n ct ion 2nd r t c_ in t e r r u pt ( ) fu n ct ion RTP ( Re a l Tim e Tr a n spor t Pr ot ocol) RTS ( Re qu e st To Se n d) r u n _ u m ode _ h a n dle r ( ) fu n ct ion r u n lt p scr ipt r u n n in g st a t e ( t h r e a ds) r u n qu e u e s r w lock _ t st r u ct u r e

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] SAM PLI N G_ RATE_ REGI STER SAN s ( st or a ge a r e a n e t w or k s) SAP ( SI M Acce ss Pr ofile ) SAS ( Se r ia l At t a ch e d SCSI ) SATA ( Se r ia l ATA) SBEs ( sin gle - bit e r r or s) SBP2 ( Se r ia l Bu s Pr ot ocol 2 ) sca t t e r - ga t h e r Sca t t e r list st r u ct u r e sch e d_ ge t pa r a m ( ) fu n ct ion sch e d_ pa r a m st r u ct u r e sch e d_ se t sch e du le r ( ) fu n ct ion 2nd sch e du le ( ) fu n ct ion sch e du le _ t im e ou t ( ) fu n ct ion 2nd 3rd sch e du le r s, I / O sch e du lin g pr oce sse s [ See pr oce ss sch e du lin g.] SCI s ( syst e m con t r ol in t e r r u pt s) SCLK ( Se r ia l CLocK) 2nd SCO ( Syn ch r on ou s Con n e ct ion Or ie n t e d) sco. k o scr e e n bla n k in g scr ip t s build script s runlt p script s direct ory sensor s- det ect SCSI ( Sm a ll Com pu t e r Syst e m I n t e r fa ce ) 2nd scsi_ a dd_ h ost ( ) fu n ct ion SCSI Ge n e r ic ( sg) SD ( Se cu r e D igit a l) ca r ds SD / M M C SD A ( Se r ia l D a t a ) SD P ( Se r vice D iscove r y Pr ot ocol) SECTOR_ COUN T_ REGI STER SECTOR_ N UM BER_ REGI STER se ct or s 2nd Se cu r e D igit a l ( SD ) ca r ds se cu r it y dir e ct or y SEEK_ CUR com m a n d SEEK_ EN D com m a n d SEEK_ SET com m a n d se e k ope r a t ion ( CM OS dr ive r ) se e k t im e s se le ct ( ) m e t h od Se lf- M on it or in g, An a lysis, a n d Re por t in g Te ch n ology ( SM ART) se m a ph or e st r u ct u r e 2nd se n sin g da t a a va ila bilit y ( ch a r dr ive r s) fasync( ) funct ion overview select ( ) / poll( ) m echanism se n sor s- de t e ct scr ipt se q file s advant ages docum ent at ion large procfs reads

NVRAM drivers, updat ing overview se qlock s se qu e n ce lock s se r ia l_ cs Ca r d Se r vice s dr ive r se r ia l8 2 5 0 _ r e gist e r _ por t ( ) fu n ct ion Se r ia l ATA ( SATA) Se r ia l At t a ch e d SCSI ( SAS) Se r ia l Bu s Pr ot ocol 2 ( SBP2 ) Se r ia l CLocK ( SCLK) 2nd se r ia l com m u n ica t ion Se r ia l D a t a ( SD A) se r ia l dr ive r s cell phone device exam ple claim ing/ freeing m em ory CPLD ( Com plex Program m able Logic Device) ov er v iew plat form drivers SoC ( Syst em - on- Chip) USB_UART driver USB_UART port s USB_UART regist er layout dat a st ruct ures layered archit ect ure line disciplines ( t ouch cont roller device exam ple) changing com piling connect ion diagram flushing dat a I / O Cont rol open/ close operat ions opening ov er v iew read pat hs unregist ering writ e pat hs overview sources sum m ary of kernel program m ing int erfaces TTY drivers UART drivers regist ering uart _driver st ruct ure uart _ops st ruct ure uart _port st ruct ure Se r ia l Lin e I n t e r n e t Pr ot ocol ( SLI P) se r ia l PCM CI A Se r ia l Pe r iph e r a l I n t e r fa ce ( SPI ) 2nd se r ia l por t s se r io se r io_ r e gist e r _ por t ( ) fu n ct ion se r por t Se r vice D iscove r y Pr ot ocol ( SD P) se r vice se t ide n t ifie r s ( SSI D s) Se ssion I n it ia t ion Pr ot ocol ( SI P) se t _ bit ( ) fu n ct ion se t _ ca pa cit y( ) fu n ct ion 2nd se t _ cu r r e n t _ st a t e ( ) fu n ct ion 2nd se t _ t e r m ios( ) fu n ct ion se t - t op box ( STB) se t it im e r ( ) fu n ct ion sg ( SCSI Ge n e r ic) SG_ I O com m a n d sg_ io_ h dr _ t st r u ct u r e 2nd

sg3 _ u t ils pa ck a ge sh or t de la ys sh ow k e y u t ilit y SI G ( Blu e t oot h Spe cia l I n t e r e st Gr ou p) siga ct ion ( ) fu n ct ion sign a l_ pe n din g( ) fu n ct ion 2nd SI Gs ( Spe cia l I n t e r e st Gr ou ps) silk scr e e n s SI M Acce ss Pr ofile ( SAP) sim ple _ m a p_ in it ( ) fu n ct ion sim ple _ m a p_ w r it e ( ) fu n ct ion sim u la t in g m ou se m ove m e n t s sin gle - bit e r r or s ( SBEs) sin gle _ ope n ( ) fu n ct ion SI P ( Se ssion I n it ia t ion Pr ot ocol) sk _ bu ff st r u ct u r e 2nd 3rd sk b_ clon e ( ) fu n ct ion 2nd sk b_ pu t ( ) fu n ct ion 2nd sk b_ r e le a se _ da t a ( ) fu n ct ion sk b_ r e se r ve ( ) fu n ct ion 2nd sk bu ff_ clon e ( ) fu n ct ion sla ve a ddr e sse s sla ve s SLI P ( Se r ia l Lin e I n t e r n e t Pr ot ocol) SLOF boot loa de r Sm a ll Com pu t e r Syst e m I n t e r fa ce ( SCSI ) 2nd SM ART ( Se lf- M on it or in g, An a lysis, a n d Re por t in g Te ch n ology) SM Bu s [ See also I 2 C.] dat a access funct ions definit ion overview SM I s ( syst e m m a n a ge m e n t in t e r r u pt s) SM P ( Sym m e t r ic M u lt i Pr oce ssin g) 2nd sn d_ a c9 7 _ code c m odu le sn d_ ca r d_ fr e e ( ) fu n ct ion 2nd sn d_ ca r d_ n e w ( ) fu n ct ion 2nd sn d_ ca r d_ pr oc_ n e w ( ) fu n ct ion 2nd sn d_ ca r d_ r e gist e r ( ) fu n ct ion 2nd sn d_ ca r d st r u ct u r e sn d_ ct l_ a dd( ) fu n ct ion 2nd sn d_ ct l_ e le m _ id_ se t _ in t e r fa ce ( ) fu n ct ion sn d_ ct l_ e le m _ id_ se t _ n u m id( ) fu n ct ion sn d_ ct l_ e le m _ in fo st r u ct u r e sn d_ ct l_ e le m _ w r it e ( ) fu n ct ion sn d_ ct l_ n e w 1 ( ) fu n ct ion 2nd sn d_ ct l_ ope n ( ) fu n ct ion sn d_ de vice _ n e w ( ) fu n ct ion sn d_ in t e l8 x 0 dr ive r sn d_ k con t r ol_ n e w st r u ct u r e sn d_ k con t r ol st r u ct u r e sn d_ pcm _ h a r dw a r e st r u ct u r e sn d_ pcm _ lib_ m a lloc_ pa ge s( ) fu n ct ion 2nd sn d_ pcm _ lib_ pr e a lloca t e _ pa ge s_ for _ a ll( ) fu n ct ion 2nd sn d_ pcm _ n e w ( ) fu n ct ion 2nd sn d_ pcm _ ops st r u ct u r e 2nd sn d_ pcm _ r u n t im e st r u ct u r e sn d_ pcm _ se t _ ops( ) fu n ct ion 2nd sn d_ pcm _ su bst r e a m st r u ct u r e sn d_ pcm st r u ct u r e SoC ( Syst e m - on - Ch ip) sock e t s buffers Net link socket s UNI X- dom ain socket s

soft dogs soft ir qs com pared t o t asklet s definit ion ksoft irqd/ 0 kernel t hread soft lock u p_ t ick ( ) fu n ct ion soft lock u ps soft w a r e RAI D sou n d [ See a u dio.] sou r ce s audio drivers block drivers char drivers input drivers I nt er- I nt egrat ed Circuit Prot ocol kdum p kernels kexec kprobes MTD 2nd NI Cs ( net work int erface cards) PCI PCMCI A serial drivers source t ree layout USB ( universal serial bus) user m ode drivers sou r ce t r e e la you t direct ories navigat ing Sou t h Br idge syst e m spa ce s ( ACPI ) spa r e a r e a ( N AN D ch ip dr ive r s) Spe cia l I n t e r e st Gr ou ps ( SI Gs) spe e ds ( USB) SPI ( Se r ia l Pe r iph e r a l I n t e r fa ce ) 2nd spi_ a sa yn c( ) fu n ct ion spi_ a syn c( ) fu n ct ion 2nd spi_ bu t t e r fly dr ive r spi_ de vice st r u ct u r e 2nd spi_ dr ive r st r u ct u r e spi_ m e ssa ge _ a dd_ t a il( ) fu n ct ion s spi_ m e ssa ge _ in it ( ) fu n ct ion s spi_ m e ssa ge st r u ct u r e spi_ r e gist e r _ dr ive r ( ) fu n ct ion 2nd spi_ syn c( ) fu n ct ion 2nd spi_ t r a n sfe r st r u ct u r e spi_ u n r e gist e r _ dr ive r ( ) fu n ct ion spin _ lock ( ) fu n ct ion 2nd 3rd spin _ lock _ bh ( ) fu n ct ion spin _ lock _ in it ( ) fu n ct ion spin _ lock _ ir qsa ve ( ) fu n ct ion spin _ u n lock ( ) fu n ct ion 2nd spin _ u n lock _ bh ( ) fu n ct ion spin _ u n lock _ ir qsa ve ( ) fu n ct ion spin lock _ t st r u ct u r e spin lock s SSI D ( se r vice se t ide n t ifie r ) ssize _ t a io_ r e a d( ) fu n ct ion ssize _ t a io_ w r it e ( ) fu n ct ion st a r t ( ) fu n ct ion st a r t _ k e r n e l( ) fu n ct ion 2nd st a r t _ t x ( ) fu n ct ion st a t e s of k e r n e l t h r e a ds

STATUS_ REGI STER 2nd STB ( se t - t op box ) st op( ) fu n ct ion st oppe d st a t e ( t h r e a ds) st or a ge a r e a n e t w or k s ( SAN s) st or a ge con t r olle r [ See m yblk de v st or a ge con t r olle r.] st or a ge _ pr obe ( ) fu n ct ion st or a ge t e ch n ologie s ATAPI ( ATA Packet I nt erface) I DE ( I nt egrat ed Drive Elect ronics) libATA MMC ( Mult iMediaCard) PCMCI A/ CF RAI D ( redundant array of inexpensive disks) SATA ( Serial ATA) SCSI ( Sm all Com put er Syst em I nt erface) SD ( Secure Digit al) cards sum m ary of st r a ce u t ilit y st r e a m in g D M A a cce ss m e t h ods st r u ct e 8 2 0 m a p st r u ct u r e s [ See spe cific st r u ct u r e s.] su bm it _ w or k ( ) fu n ct ion su bm it t in g URBs for dat a t ransfer work t o be execut ed lat er su bve r sion Su pe r I / O ch ips Su pe r Vide o Gr a ph ics Ar r a y ( SVGA) su spe n d( ) fu n ct ion SVCs ( sw it ch e d vir t u a l cir cu it s) SVGA ( Su pe r Vide o Gr a ph ics Ar r a y) SVGAlib sw a p spa ce sw it ch e d vir t u a l cir cu it s ( SVCs) Sym m e t r ic M u lt i Pr oce ssin g ( SM P) 2nd syn a pt ics_ in it ( ) fu n ct ion syn a pt ics_ pr oce ss_ byt e ( ) fu n ct ion s syn ch r on iza t ion com plet ion funct ions kt hread helpers SCO ( Synchronous Connect ion Orient ed) synchronous DMA synchronous int errupt s / sys/ de vice s/ syst e m / e da c/ dir e ct or y sysdia g u t ilit y sy sf s 2nd sysfs_ cr e a t e _ dir ( ) fu n ct ion sysfs_ cr e a t e _ file ( ) fu n ct ion sysfs_ cr e a t e _ gr ou p( ) fu n ct ion sysfs_ r e m ove _ gr ou p( ) fu n ct ion sysfsu t ils pa ck a ge SYSLI N U X syslog( ) fu n ct ion Syst e m - on - Ch ip ( SoC) syst e m con t r ol in t e r r u pt s ( SCI s) Syst e m M a n a ge m e n t Bu s [ See SM Bu s.] syst e m m a n a ge m e n t in t e r r u pt s ( SM I s) syst e m m e m or y m a p copying obt aining Syst e m Ta p

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T ] [ U] [ V] [ W] [ X] [ Y] [ Z ] t a ble s, n a n d_ fla sh _ ids[ ] t a il fie ld ( sk _ bu ff st r u ct u r e ) TASK_ I N TERRUPTI BLE st a t e TASK_ RUN N I N G st a t e TASK_ STOPPED st a t e TASK_ TRACED st a t e TASK_ UN I N TERRUPTI BLE st a t e t a sk le t _ disa ble ( ) fu n ct ion 2nd t a sk le t _ disa ble _ n osyn c( ) fu n ct ion 2nd t a sk le t _ e n a ble ( ) fu n ct ion 2nd t a sk le t _ in it ( ) fu n ct ion 2nd t a sk le t _ sch e du le ( ) fu n ct ion 2nd t a sk le t _ st r u ct st r u ct u r e t a sk le t s t e le _ de vice _ t st r u ct u r e t e le _ discon n e ct ( ) fu n ct ion t e le _ ope n ( ) fu n ct ion t e le _ pr obe ( ) fu n ct ion t e le _ r e a d( ) fu n ct ion t e le _ w r it e ( ) fu n ct ion t e le _ w r it e _ ca llba ck ( ) fu n ct ion t e le m e t r y ca r d e x a m ple dat a t ransfer driver init ializat ion pci_device_id st ruct ure probing and disconnect ing regist er access regist er space t e m pla t e s, libu sb pr ogr a m m in g t e m pla t e t e st _ a n d_ se t _ bit ( ) fu n ct ion t e st _ bit ( ) fu n ct ion t e st in g LTP ( Linux Test Proj ect ) t est equipm ent t est infrast ruct ure TFT ( Th in Film Tr a n sist or ) TFTP e m be dde d de vice s Th in Film Tr a n sist or ( TFT) t h r e a ds [ See k e r n e l t h r e a ds.] t h r ou gh pu t driver perform ance overview prot ocol perform ance Th t t pd t im e ( ) fu n ct ion t im e _ a ft e r ( ) fu n ct ion t im e _ a ft e r _ e q( ) fu n ct ion t im e _ be for e ( ) fu n ct ion t im e _ be for e _ e q( ) fu n ct ion t im e r _ fu n c( ) fu n ct ion s t im e r _ list st r u ct u r e t im e r _ pe n din g( ) fu n ct ion 2nd t im e r s HZ j iffies

long delays overview RTC ( Real Tim e Clock) short delays TSC ( Tim e St am p Count er) wat chdog t im er Tim e St a m p Cou n t e r ( TSC) 2nd t im e va l st r u ct u r e Tin yTP ( Tin y Tr a n spor t Pr ot ocol) Tin yX t ool ch a in s Tor va lds, Lin u s t ou ch con t r olle r com piling connect ion diagram flushing dat a I / O Cont rol open/ close operat ions opening read pat hs writ e pat hs t ou ch pa ds t ou ch scr e e n s t r a ce da e m on t r a ce d st a t e ( t h r e a ds) t r a ce r e a de r t r a ce v isu a liz e r t r a cin g LTT ( Linux Trace Toolkit ) com ponent s event s LTTng LTTV ( Linux Trace Toolkit Viewer) t race dum ps overview t r a ck poin t s t r a n sa ct ion s ( I 2 C) t r a n sce ive r s ( USB) t r a n sfe r [ See da t a t r a n sfe r .] Tr a n sist or - Tr a n sist or Logic ( TTL) t r a n sm it pa t h s ( N I Cs) t r oj a n _ fu n ct ion ( ) fu n ct ion TROUBLED _ D S e n vir on m e n t a l va r ia ble TSC ( Tim e St a m p Cou n t e r ) 2nd t sde v dr ive r TTL ( Tr a n sist or - Tr a n sist or Logic) t t y.c dr ive r t t y_ bu ffe r st r u ct u r e 2nd t t y_ bu fh e a d st r u ct u r e 2nd t t y_ dr ive r st r u ct u r e 2nd TTY dr ive r s t t y_ flip_ bu ffe r _ pu sh ( ) fu n ct ion 2nd t t y_ flip_ bu ffe r st r u ct u r e t t y_ in se r t _ flip_ ch a r ( ) fu n ct ion 2nd 3rd t t y_ ldisc st r u ct u r e 2nd t t y_ ope n ( ) fu n ct ion t t y_ r e gist e r _ de vice ( ) fu n ct ion t t y_ r e gist e r _ dr ive r ( ) fu n ct ion 2nd t t y_ r e gist e r _ ldisc( ) fu n ct ion t t y_ st r u ct st r u ct u r e 2nd t t y_ u n r e gist e r _ dr ive r ( ) fu n ct ion t t y_ u n r e gist e r _ ldisc( ) fu n ct ion TUN / TAP de vice dr ive r TUN n e t w or k dr ive r

t u ple _ t st r u ct u r e 2nd

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] U- Boot u a r t _ a dd_ on e _ por t ( ) fu n ct ion 2nd 3rd u a r t _ dr ive r st r u ct u r e 2nd UART ( Un ive r sa l Asyn ch r on ou s Re ce ive r Tr a n sm it t e r ) dr ive r s 2nd cell phone device exam ple claim ing/ freeing m em ory CPLD ( Com plex Program m able Logic Device) ov er v iew plat form drivers SoC ( Syst em - on- Chip) USB_UART driver USB_UART port s USB_UART regist er layout regist ering RS- 485 uart _driver st ruct ure uart _ops st ruct ure uart _port st ruct ure u a r t _ ops st r u ct u r e 2nd u a r t _ por t st r u ct u r e 2nd u a r t _ r e gist e r _ dr ive r ( ) fu n ct ion 2nd 3rd u a r t _ u n r e gist e r _ dr ive r ( ) fu n ct ion UCEs ( u n cor r e ct a ble e r r or s) u Clib c u Clin u x UDB class drivers debugging u de la y( ) fu n ct ion 2nd u de v on em bedded devices PCMCI A u de vm on it or u d e v se n d UH CI ( Un ive r sa l H ost Con t r olle r I n t e r fa ce ) UI O ( Use r spa ce I O) uI P UM L ( Use r M ode Lin u x ) u n cor r e ct a ble e r r or s ( UCEs) u n in t e r r u pt ible st a t e ( t h r e a ds) Un ive r sa l Asyn ch r on ou s Re ce ive r Tr a n sm it t e r [ See UART ( Un ive r sa l Asyn ch r on ou s Re ce ive r Tr a n sm it t e r ) d r iv e r s.] Un ive r sa l H ost Con t r olle r I n t e r fa ce ( UH CI ) u n ive r sa l se r ia l bu s [ See USB ( u n ive r sa l se r ia l bu s) .] UN I X- dom a in sock e t s u n lik e ly( ) fu n ct ion 2nd u n r e gist e r _ blk de v( ) fu n ct ion u n r e gist e r _ ch r de v_ r e gion ( ) fu n ct ion u n r e gist e r _ n e t de v( ) fu n ct ion u n r e gist e r _ n e t de vice _ n ot ifie r ( ) fu n ct ion u p( ) fu n ct ion u p_ r e a d( ) fu n ct ion u p_ w r it e ( ) fu n ct ion u pda t in g BI OS

NVRAM drivers u r a n dom ch a r de vice URBs ( USB Re qu e st Block s) u r b st r u ct u r e 2nd USB ( u n ive r sa l se r ia l bu s) addressing Bluet oot h 2nd bus speeds class drivers HI Ds ( hum an int erface devices) m ass st orage ov er v iew USB- Serial com pared t o I 2 C and PCI dat a st ruct ures descript ors pipes t ables of URBs ( USB Request Blocks) usb_device st ruct ure em bedded drivers on em bedded syst em s endpoint s enum erat ion gadget drivers host cont rollers illust rat ion of Linux- USB subsyst em kernel program m ing int erfaces, t able of Linux- USB subsyst em archit ect ure m ice OTG cont rollers overview recept acles sources t elem et ry card exam ple dat a t ransfer driver init ializat ion pci_device_id st ruct ure probing and disconnect ing regist er access regist er space t ransceivers t ransfer t ypes URBs ( USB Request Blocks) usbfs virt ual filesyst em USB Gadget proj ect USB- Serial u sb- se r ia l.c dr ive r u sb_ [ con t r ol| in t e r r u pt | bu lk ] _ m sg( ) fu n ct ion u sb_ [ r cv| sn d] [ ct r l| in t | bu lk | isoc] pipe ( ) fu n ct ion 2nd u sb_ a lloc_ u r b( ) fu n ct ion 2nd u sb_ bu ffe r _ a lloc( ) fu n ct ion u sb_ bu ffe r _ fr e e ( ) fu n ct ion u sb_ bu lk _ m sg( ) fu n ct ion u sb_ bu s st r u ct u r e u sb_ close ( ) fu n ct ion u sb_ con fig_ de scr ipt or st r u ct u r e 2nd u sb_ con t r ol_ m sg( ) fu n ct ion 2nd 3rd u sb_ ct r lr e qu e st st r u ct u r e u sb_ de r e gist e r ( ) fu n ct ion u sb_ de r e gist e r _ de v( ) fu n ct ion u sb_ de v_ h a n dle st r u ct u r e USB_ D EVI CE( ) m a cr o u sb_ de vice _ de scr ipt or st r u ct u r e 2nd

u sb_ de vice _ id st r u ct u r e u sb_ de vice st r u ct u r e 2nd 3rd u sb_ dr ive r st r u ct u r e u sb_ e n dpoin t _ de scr ipt or st r u ct u r e 2nd u sb_ fill_ bu lk _ u r b( ) fu n ct ion 2nd u sb_ fill_ con t r ol_ u r b( ) fu n ct ion 2nd 3rd u sb_ fill_ in t _ u r b( ) fu n ct ion 2nd u sb_ fin d_ bu se s( ) fu n ct ion u sb_ fin d_ de vice s( ) fu n ct ion u sb_ fin d_ in t e r fa ce ( ) fu n ct ion u sb_ fr e e _ u r b( ) fu n ct ion 2nd u sb_ ga dge t _ dr ive r st r u ct u r e 2nd u sb_ ga dge t _ r e gist e r _ dr ive r ( ) fu n ct ion 2nd u sb_ ge t _ in t fda t a ( ) fu n ct ion 2nd u sb_ in it ( ) fu n ct ion u sb_ in t e r fa ce _ de scr ipt or st r u ct u r e 2nd u sb_ ope n ( ) fu n ct ion u sb_ r e gist e r ( ) fu n ct ion 2nd u sb_ r e gist e r _ de v( ) fu n ct ion u sb_ se r ia l_ de r e gist e r ( ) fu n ct ion u sb_ se r ia l_ dr ive r st r u ct u r e u sb_ se r ia l_ r e gist e r ( ) fu n ct ion 2nd u sb_ se t _ in t fda t a ( ) fu n ct ion 2nd u sb_ su bm it _ u r b( ) fu n ct ion 2nd u sb_ t e le _ in it ( ) fu n ct ion USB_ UART USB_ UART dr ive r code list ing regist er layout USB_ UART por t s u sb_ u a r t _ pr obe ( ) fu n ct ion u sb_ u a r t _ r x in t ( ) fu n ct ion u sb_ u a r t _ st a r t _ t x ( ) fu n ct ion u sb_ u n lin k _ u r b( ) fu n ct ion 2nd u sbfs vir t u a l file syst e m 2nd USB Ga dge t pr oj e ct u sbh id dr ive r u sbh id USB clie n t dr ive r USB k e yboa r ds u sbm on com m a n d USB Re qu e st Block s ( URBs) u sbse r ia l dr ive r s u se r m ode dr ive r s dat a st ruct ures I / O regions accessing dum ping byt es from m em ory regions, accessing parallel port LED boards, cont rolling process scheduling CFS ( Com plet ely Fair Scheduler) O( 1) scheduler original scheduler ov er v iew response t im es sg ( SCSI Generic) sources UI O ( Userspace I O) usbfs virt ual filesyst em user m ode I 2 C user space library funct ions when t o use u se r m ode h e lpe r s Use r M ode Lin u x ( UM L)

Use r M odu le s block device em ulat ion char device em ulat ion definit ion JFFS ( Journaling Flash File Syst em ) MTD- ut ils overview YAFFS ( Yet Anot her Flash File Syst em ) u se r spa ce dr ive r s [ See u se r m ode dr ive r s.] u se r spa ce gove r n or Use r spa ce I O ( UI O) u se r spa ce libr a r y fu n ct ion s u sr dir e ct or y UU_ READ _ D ATA_ REGI STER UU_ STATUS_ REGI STER UU_ W RI TE_ D ATA_ REGI STER

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] V2 I P ( Vide o- a n d- Voice ove r I P) v a r ia ble s loops_per_j iffy 2nd 3rd xt im e VCI ( vir t u a l cir cu it ide n t ifie r ) ve r ify ch e ck su m com m a n d ( ioct l) ve r sion con t r ol Ve r y h igh spe e d in t e gr a t e d cir cu it H a r dw a r e D e scr ipt ion La n gu a ge ( VH D L) ve sa fb ( vide o fr a m e bu ffe r dr ive r ) VFS ( Vir t u a l File Syst e m ) 2nd vfs_ r e a ddir ( ) fu n ct ion VGA ( Vide o Gr a ph ics Ar r a y) VH D L ( Ve r y h igh spe e d in t e gr a t e d cir cu it H a r dw a r e D e scr ipt ion La n gu a ge v id e o cabling st andards cont rollers em bedded drivers VGA ( Video Graphics Array) video fram e buffer driver [ See ve sa fb ( vide o fr a m e bu ffe r dr ive r ) .] Vide o- a n d- Voice ove r I P ( V2 I P) vide o1 3 9 4 dr ive r vir t u a l a ddr e sse s vir t u a l cir cu it ide n t ifie r ( VCI ) Vir t u a l File Syst e m ( VFS) 2nd vir t u a l m ou se de vice e x a m ple gpm ( general- purpose m ouse) sim ulat ing m ouse m ovem ent s vm s.c input driver Vir t u a l N e t w or k Com pu t in g ( VN C) vir t u a l pa t h ide n t ifie r ( VPI ) vir t u a l t e r m in a ls ( VTs) Vit a l Pr odu ct D a t a ( VPD ) vm a lloc( ) fu n ct ion 2nd vm lin u x k e r n e l im a ge vm s.c a pplica t ion vm s_ in it ( ) fu n ct ion VN C ( Vir t u a l N e t w or k Com pu t in g) VoI P ( Voice ove r I n t e r n e t Pr ot ocol) VOLUM E_ REGI STER VPD ( Vit a l Pr odu ct D a t a ) VPI ( vir t u a l pa t h ide n t ifie r ) vt .c dr ive r VTs ( vir t u a l t e r m in a ls)

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W ] [ X] [ Y] [ Z ] w 1 bu s w 1 _ fa m ily_ ops st r u ct u r e w 1 _ fa m ily st r u ct u r e w a it _ e ve n t _ t im e ou t ( ) fu n ct ion 2nd w a it _ for _ com ple t ion ( ) fu n ct ion 2nd w a it _ qu e u e _ t st r u ct u r e w a it qu e u e s [ See q u e u e s.] w a k e _ u p_ in t e r r u pt ible ( ) fu n ct ion 2nd 3rd w a ll t im e w a t ch dog t im e ou t w a t ch dog t im e r w a t ch poin t s w d3 3 c9 3 _ in it ( ) fu n ct ion w e a r le ve lin g W iFi 2nd 3rd W iM a x w ir e le ss t rade- offs for WiFi 2nd 3rd Wireless Ext ensions w m b( ) fu n ct ion 2nd w or k , su bm it t in g t o be e x e cu t e d la t e r w or k _ st r u ct st r u ct u r e 2nd w or k e r t h r e a d w or k qu e u e _ st r u ct st r u ct u r e w or k qu e u e s 2nd 3rd w r it e ( ) m e t h od w r it e _ lock ( ) fu n ct ion w r it e _ lock _ ir qr e st or e ( ) fu n ct ion 2nd w r it e _ lock _ ir qsa ve ( ) fu n ct ion 2nd w r it e _ se qlock ( ) fu n ct ion w r it e _ se qu n lock ( ) fu n ct ion w r it e _ u n lock ( ) fu n ct ion w r it e _ vm s( ) fu n ct ion w r it e _ w a k e u p( ) fu n ct ion w r it e v( ) fu n ct ion w r it in g CMOS driver input event drivers

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] x 8 6 boot loa de r s x f8 6 SI GI O( ) fu n ct ion Xf8 6 W a it For I n pu t ( ) fu n ct ion XGA ( e Xt e n de d Gr a ph ics Ar r a y) XI P ( e Xe cu t e I n Pla ce ) x t im e va r ia ble X W in dow s

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] YAFFS ( Ye t An ot h e r Fla sh File Syst e m )

I n de x [ SYMBOL] [ A] [ B] [ C] [ D] [ E] [ F] [ G] [ H] [ I ] [ J] [ K] [ L] [ M] [ N] [ O] [ P] [ Q] [ R] [ S] [ T] [ U] [ V] [ W] [ X] [ Y] [ Z ] ze r o- pa ge .t x t file ze r o ch a r de vice ze r o pa ge Zig b e e zom bie pr oce sse s zom bie st a t e ( t h r e a ds) ZON E_ D M A ZON E_ H I GH ZON E_ N ORM AL