331 41 4MB
English Pages 513 Year 2005
HardwareandComputerOrganization
HardwareandComputerOrganization TheSoftwarePerspective By
ArnoldS.Berger
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Newnes is an imprint of Elsevier
NewnesisanimprintofElsevier 30CorporateDrive,Suite400,Burlington,MA01803,USA LinacreHouse,JordanHill,OxfordOX28DP,UK Copyright©2005,ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or transmittedinanyformorbyanymeans,electronic,mechanical,photocopying, recording,orotherwise,withoutthepriorwrittenpermissionofthepublisher. PermissionsmaybesoughtdirectlyfromElsevier’sScience&TechnologyRights DepartmentinOxford,UK:phone:(+44)1865843830,fax:(+44)1865853333, e-mail:[email protected].Youmayalsocompleteyourrequeston-linevia theElsevierhomepage(http://elsevier.com),byselecting“CustomerSupport”andthen “ObtainingPermissions.” Recognizingtheimportanceofpreservingwhathasbeenwritten, Elsevierprintsitsbooksonacid-freepaperwheneverpossible. LibraryofCongressCataloging-in-PublicationData Berger,ArnoldS. Hardwareandcomputerorganization:aguideforsoftwareprofessionals/byArnoldS.Berger. p.cm. ISBN0-7506-7886-0 1.Computerorganization.2.Computerengineering.3.Computerinterfaces.I.Title. QA76.9.C643B472005 004.2'2--dc22 BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ForinformationonallNewnespublications visitourWebsiteatwww.books.elsevier.com 04050607080910987654321 PrintedintheUnitedStatesofAmerica
2005040553
ForVivianandAndrea
Contents PrefacetotheFirstEdition.................................................................................. xi Acknowledgments............................................................................................. xvi What’sontheDVD-ROM?................................................................................ xvii CHAPTER1:IntroductionandOverviewofHardwareArchitecture................. 1 Introduction................................................................................................................................ 1 ABriefHistoryofComputing...................................................................................................... 1 NumberSystems....................................................................................................................... 12 ConvertingDecimalstoBases.................................................................................................... 25 EngineeringNotation............................................................................................................... 26 SummaryofChapter1.............................................................................................................. 27 ExercisesforChapter1.............................................................................................................. 28
CHAPTER2:IntroductiontoDigitalLogic......................................................... 29 ElectronicGateDescription........................................................................................................ 39 TruthTables............................................................................................................................... 44 SummaryofChapter2.............................................................................................................. 46 ExercisesforChapter2.............................................................................................................. 47
CHAPTER3:IntroductiontoAsynchronousLogic............................................. 49 Introduction.............................................................................................................................. 49 LawsofBooleanAlgebra........................................................................................................... 51 TheKarnaughMap................................................................................................................... 55 ClocksandPulses...................................................................................................................... 62 SummaryofChapter3.............................................................................................................. 67 ExercisesforChapter3.............................................................................................................. 68
CHAPTER4:IntroductiontoSynchronousLogic............................................... 71 Flip-Flops................................................................................................................................... 72 StorageRegister........................................................................................................................ 83 SummaryofChapter4.............................................................................................................. 90 ExercisesforChapter4.............................................................................................................. 91
CHAPTER5:IntroductiontoStateMachines..................................................... 95 ModernHardwareDesignMethodologies................................................................................ 115 SummaryofChapter5............................................................................................................ 119 ExercisesforChapter5............................................................................................................ 120 vii
Contents
CHAPTER6:BusOrganizationandMemoryDesign....................................... 123 BusOrganization..................................................................................................................... 123 AddressSpace......................................................................................................................... 136 DirectMemoryAccess(DMA).................................................................................................. 152 SummaryofChapter6............................................................................................................ 153 ExercisesforChapter6............................................................................................................ 155
CHAPTER7:MemoryOrganizationandAssemblyLanguageProgramming.. 159 Introduction............................................................................................................................ 159 Label....................................................................................................................................... 170 EffectiveAddresses.................................................................................................................. 174 PseudoOpcodes...................................................................................................................... 183 DataStorageDirectives............................................................................................................ 184 AnalysisofanAssemblyLanguageProgram............................................................................. 186 SummaryofChapter7............................................................................................................ 188 ExercisesforChapter7............................................................................................................ 189
CHAPTER8:ProgramminginAssemblyLanguage......................................... 193 Introduction............................................................................................................................ 193 AssemblyLanguageandC++.................................................................................................. 209 StacksandSubroutines........................................................................................................... 216 SummaryofChapter8............................................................................................................ 222 ExercisesforChapter8............................................................................................................ 223
CHAPTER9:AdvancedAssemblyLanguageProgrammingConcepts........... 229 Introduction............................................................................................................................ 229 AdvancedAddressingModes................................................................................................... 230 68000Instructions.................................................................................................................. 232 MOVEInstructions................................................................................................................... 233 LogicalInstructions.................................................................................................................. 233 OtherLogicalInstructions........................................................................................................ 234 Summaryofthe68KInstructions............................................................................................. 238 SimulatedI/OUsingtheTRAP#15Instruction....................................................................... 240 CompilersandAssemblers....................................................................................................... 242 SummaryofChapter9............................................................................................................ 259 ExercisesforChapter9............................................................................................................ 260
CHAPTER10:TheIntelx86Architecture......................................................... 265 Introduction............................................................................................................................ 265 TheArchitectureofthe8086CPU........................................................................................... 267 Data,IndexandPointerRegisters............................................................................................ 269 FlagRegisters.......................................................................................................................... 272 SegmentRegisters................................................................................................................... 273 InstructionPointer(IP)............................................................................................................. 273 MemoryAddressingModes..................................................................................................... 275 X86InstructionFormat............................................................................................................ 278
viii
Contents 8086InstructionSetSummary................................................................................................. 282 DataTransferInstructions........................................................................................................ 282 ArithmeticInstructions............................................................................................................ 283 LogicInstructions.................................................................................................................... 284 StringManipulation................................................................................................................. 285 ControlTransfer...................................................................................................................... 286 AssemblyLanguageProgrammingthe8086Architecture....................................................... 289 SystemVectors........................................................................................................................ 291 SystemStartup........................................................................................................................ 292 Wrap-Up................................................................................................................................. 292 SummaryofChapter10.......................................................................................................... 292 ExercisesforChapter10.......................................................................................................... 294
CHAPTER11:TheARMArchitecture................................................................ 295 Introduction............................................................................................................................ 295 ARMArchitecture.................................................................................................................... 296 ConditionalExecution............................................................................................................ 301 BarrelShifter........................................................................................................................... 302 OperandSize........................................................................................................................... 303 AddressingModes................................................................................................................... 304 StackOperations..................................................................................................................... 306 ARMInstructionSet................................................................................................................ 309 ARMSystemVectors............................................................................................................... 319 SummaryandConclusions...................................................................................................... 320 SummaryofChapter11.......................................................................................................... 320 ExercisesforChapter11.......................................................................................................... 321
CHAPTER12:InterfacingwiththeRealWorld................................................ 322 Introduction............................................................................................................................ 322 Interrupts................................................................................................................................ 323 Exceptions............................................................................................................................... 327 Motorola68KInterrupts.......................................................................................................... 327 Analog-to-Digital(A/D)andDigital-to-Analog(D/A)Conversion............................................... 332 TheResolutionofA/DandD/AConverters.............................................................................. 346 SummaryofChapter12.......................................................................................................... 348 ExercisesforChapter12.......................................................................................................... 349
CHAPTER13:IntroductiontoModernComputerArchitectures.................... 353 ProcessorArchitectures,CISC,RISCandDSP........................................................................... 354 AnOverviewofPipelining....................................................................................................... 358 SummaryofChapter13.......................................................................................................... 369 ExercisesforChapter13.......................................................................................................... 370
CHAPTER14:MemoryRevisited,CachesandVirtualMemory...................... 372 IntroductiontoCaches............................................................................................................ 372 VirtualMemory....................................................................................................................... 387 Pages...................................................................................................................................... 389
ix
Contents TranslationLookasideBuffer(TLB)............................................................................................ 391 Protection................................................................................................................................ 392 SummaryofChapter14.......................................................................................................... 393 ExercisesforChapter14.......................................................................................................... 395
CHAPTER15:PerformanceIssuesinComputerArchitecture......................... 397 Introduction............................................................................................................................ 397 HardwareandPerformance..................................................................................................... 398 BestPractices.......................................................................................................................... 414 SummaryofChapter15.......................................................................................................... 416 ExercisesforChapter15.......................................................................................................... 417
CHAPTER16:FutureTrendsandReconfigurableHardware.......................... 419 Introduction............................................................................................................................ 419 ReconfigurableHardware........................................................................................................ 419 MolecularComputing............................................................................................................. 430 Localclocks............................................................................................................................. 432 SummaryofChapter16.......................................................................................................... 434 ExercisesforChapter16.......................................................................................................... 436
APPENDIXA:SolutionsforOdd-NumberedExercises................................... 437 [Solutionstotheeven-numberedproblemsareavailablethroughtheinstructor’sresourcewebsiteathttp://www.elsevier.com/0750678860.]
AbouttheAuthor............................................................................................. 483 Index.................................................................................................................. 485
x
PrefacetotheFirstEdition Thankyouforbuyingmybook.Iknowthatmayringhollowifyouareapoorstudentandyourinstructormadeittherequiredtextforyourcourse,butIthankyounevertheless.Ihopethatyoufind itinformativeandeasytoread.AtleastthatwasoneofmygoalswhenIsetouttowritethisbook. ThistextisanoutgrowthofacoursethatI’vebeenteachingintheComputingandSoftware SystemsDepartmentoftheUniversityofWashington-Bothell.Thecourse,CSS422,Hardware andComputerOrganization,isoneoftherequiredcorecoursesforourundergraduatestudents. Also,itistheonlyrequiredarchitecturecourseinourcurriculum.Whileourstudentslearnabout algorithmsanddatastructures,comparativelanguages,numericmethodsandoperatingsystems, thisistheironlyexposureto“what’sunderthehood.”SincetheUniversityofWashingtonison thequartersystem,I’mfacedwiththeuphillbattletoteachasmuchaboutthearchitectureof computersasIcaninabout10weeks. Thematerialthatformsthecoreofthisbookwasdevelopedoveraperiodof5yearsintheform ofabout500MicrosoftPowerPoint®slides.Later,IconvertedthematerialintheslidestoHTML sothatIcouldalsoteachthecourseviaadistancelearning(DL)format.Sincefirstteachingthis courseinthefallof1999,I’vetaughtit3or4timeseachacademicyear.I’vealsotaughtit3times viaDL,withexcellentresults.Infact,theDLstudentsasawholehavedoneequallywellasthe studentsattendinglecturesinclass.So,ifyouthinkthatattendingclassisahighlyoverratedpart ofthecollegeexperience,thenthisbookisforyou. Thetextisappropriateforafirstcourseincomputerarchitectureatthesophomorethroughsenior level.Itisreasonablyself-containedsothatitshouldbeabletoserveastheonlyhardwarecourse thatCSstudentsneedtotakeinordertounderstandtheimplicationsofthecodethattheyare writing.AttheUniversityofWashington-Bothell(UWB),thiscourseispredominantlytaught toseniors.Asafaculty,we’vefoundthatthelevelofsophisticationachievedthroughlearning programmingconceptsinotherclassesmakesforaneasiertransitiontolow-levelprogramming. Ifthebookistobeusedwithlowerdivisionstudents,thenadditionaltimeshouldbeallottedfor gainingfluencywithassemblylanguageprogrammingconcepts.Forexample,inintroducing certainassemblylanguagebranchingandloopconstructs,anadvancedstudentwilleasilygrasp thesimilaritytoWHILE,DO-WHILE,FORandIF-THEN-ELSEconstructs.Alesssophisticated studentmayneedmoreconcreteexamplesinordertoseethesimilarities. WhywriteabookonComputerArchitecture?I’mgladyouasked.Inthe5+yearsthatItaughtthe course,Ichangedthetextbook4times.Attheendofthequarter,whenIheldaninformalcourse xi
Preface debriefingwiththestudents,theyuniversallypannedeverybookthatIused.The“goldstandard” textbooks,thetextsthatalmosteveryComputerSciencestudentusesintheirarchitectureclass, werejustnotrelevanttotheirneeds.Forthemajorityofthesestudents,theywerenotgoingtogo onandstudyarchitectureingraduateschools,ordesigncomputersforIntelorAMD.Theyneeded tounderstandthearchitectureofacomputeranditssupportinghardwareinordertowriteefficient anddefect-freecodetorunonthemachines.Recently,Ididfindatextthatatleastapproachedthe subjectmatterinthesamewaythatIthoughtitshouldbedone,butIalsofoundthattextlacking inseveralkeyareas.Ontheplusside,switchingtothenewtexteliminatedthecomplaintsfrom mystudentsanditalsoreinforcedmyopinionthatIwasn’taloneinseeinganeedforatextwith adifferentperspective.Unfortunately,thistext,eventhoughitwasagreatimprovement,stilldid notcoverseveralareasthatIconsideredtobeveryimportant,soIresolvedtowriteonethatdid, withoutlosingtheessenceofwhatIthinkthenewperspectivecorrectlyaccomplished. It’snotsurprisingthat,giventheUWBcampusislessthan10milesfromMicrosoft’smain campusinRedmond,WA,wearestronglyinfluencedbytheMicrosoftculture.Thevastmajority ofmystudentshaveonlywrittensoftwareforWindowsandtheIntelarchitecture.Thedesigners ofthisarchitecturewouldhaveyoubelievethatthesecomputersareinfinitelyfastmachineswith unlimitedresources.Howdoyoucounterthisviewoftheworld? Often,mystudentswillcryoutinfrustration,“Whyareyoumakingmelearnthis(deleted)?”(Actuallyitismoreofawhimper.)Thisusuallyhappensrightaroundthemid-termexamination.Since ourcampusisalsoapproximatelyequidistantfromwhereBoeingbuildsthe737and757aircraft inRenton,WAandthewidebody767and777aircraftinEverett,WA,analogiestotheaircraft industryareusuallyveryeffective.Isimplyanswertheirquestionthisway,“Wouldyouflyonan airplanethatwasdesignedbysomeonewhoiscluelessaboutwhatkeepsanairplaneintheair?” Sometimesitworks. Thebookisdividedintofourmajortopicareas: 1. Introductiontohardwareandasynchronouslogic. 2. Synchronouslogic,statemachinesandmemoryorganization. 3. Moderncomputerarchitecturesandassemblylanguageprogramming. 4. I/O,computerperformance,thehierarchyofmemoryandfuturedirectionsofcomputer organization. Thereisnosharplineofdemarcationbetweenthesubjectareas,andthesubjectmatterbuildsupon theknowledgebasefrompriorchapters.However,I’vetriedtolimittheinterdependenciessolater chaptersmaybeskipped,dependingupontheavailabletimeanddesiredsyllabus. Eachchapterendswithsomeexercises.Thesolutionstotheodd-numberedproblemsarelocated inAppendixA,andthesolutionstotheeven-numberedproblemsareavailablethroughtheinstructor’sresourcewebsiteathttp://www.elsevier.com/0750678860. Thetextapproachthatwe’lltakeistodescribethehardwarefromthegroundup.JustasaGeneticistcandescribethemostcomplexoforganicbeingsintermsofaDNAmoleculethatcontains onlyfournucleotides,adenine,cytosine,guanine,andthymine,abbreviated,A,C,GandT,wecan describethemostcomplexcomputerormemorysystemintermsoffourlogicalbuildingblocks, xii
Preface AND,OR,NOTandTRI-STATE.Strictlyspeaking,TRI-STATEisn’talogicalbuildingblocklike AND,itismorelikethe“glue”thatenablesustointerconnecttheelementsofacomputerinsuch awaythatthecomplexitydoesn’toverwhelmus.Also,IreallyliketheDNAanalogy,sowe’ll needtohave4electronicbuildingblockstokeepupwiththeA,C,GandTidea. Ioncegaveatalktoagroupofmiddleschoolteacherswhoweretryingtoearnsomein-service creditsduringtheirsummerbreak.IwasavolunteerwiththeAirAcademySchoolDistrictin ColoradoSprings,ColoradowhileIworkedfortheLogicSystemsDivisionofHewlett-Packard. NoneoftheteacherswerecomputerliterateandIhadtwohourstogivethemsomeappreciation ofthetechnology.IdecidedtostartwithAristotleandconceptofthelogicaloperatorsasabranch ofphilosophyandthenproceededwiththeDNAanalogyupthroughtheconceptofregisters.I seemedtobegettingthroughtothem,buttheymayhavebeenstrokingmyegosoIwouldsignoffontheirattendancesheets.Anyway,Ithinkthereisvalueindemonstratingthateventhemost complexcomputerfunctionalitycanbedescribedintermsofthelogicprimitivesthatwestudyin thefirstpartofthetext. WewilltaketheDNAorbuilding-blockapproachthroughmostofthefirsthalfofthetext.We willstartwiththesimplestofgatesandbuildcompoundgates.Fromthesecompoundgateswe’ll progresstotheposingandsolutionofasynchronouslogicalequations.We’lllearnthemethods oftruthtabledesignandsimplificationusingBooleanalgebraandthenKarnaughMap(K-map) methodology.Theexercisesandexampleswillstressthestatementoftheproblemasasetof specificationswhicharethentranslatedintoatruthtable,andfromtheretoK-mapsandfinallyto thegatedesign.Atthispointthestudentisencouragedtoactually”build”thecircuitinsimulation usingtheDigitalWorks®softwaresimulator(seethefollowing)includedontheDVD-ROMthat accompaniesthetext.Ihavefoundthiscombinationoftheabstractdesignandtheactualsimulationtobeanextremelypowerfulteachingandlearningcombination. Oneofthebenefitsoftakingthisapproachisthatthestudentsbecomeaccustomedtodealingwith variablesatthebitlevel.WhilemoststudentsarefamiliarwiththeC/C++Booleanconstructs,the conceptofasinglewirecarryingthestateofavariableseemstobequitenew. Oncetheideaofcreatingarbitrarilycomplex,asynchronousalgebraicfunctionsisundercontrol, weaddthedimensionoftheclockandofsynchronouslogic.Synchronouslogictakesustoflipflops,counters,shifters,registersandstatemachines.Weactuallyspendalotofeffortinthisarea, andtheconceptsarereintroducedseveraltimesaswelookatmicro-codeandinstructiondecompositionlateron. Themiddlepartofthebookfocusesonthearchitectureofacomputersystem.Inparticular,the memorytoCPUinterfacewillgetagreatdealofattention.We’lldesignsimplememorysystemsand decodingcircuitsusingourknowledgegainedintheprecedingchapters.We’llalsotakeabrieflook atmemorytiminginordertobetterunderstandsomeofthemoreglobalissuesofsystemdesign. We’llthenmakethetransitiontolookingatthearchitectureofthe68K,ARMandX86processor families.Thiswillbeourintroductiontoassemblylanguageprogramming. Eachoftheprocessorarchitectureswillbehandledseparatelysothatitmaybeskippedwithout creatingtoomuchdiscontinuityinthetext. xiii
Preface Thistextdoesemphasizeassemblylanguageprogramminginthethreearchitectures.Thereason forthisistwofold:First,assemblylanguagemayormaynotbetaughtasapartofaCSstudent’s curriculum,anditmaybetheironlyexposuretoprogrammingatthemachinelevel.Eventhough youasaCSstudentmayneverhavetowriteanassemblylanguageprogram,there’sahighprobabilitythatyou’llhavetodebugsomepartsofyourC++programattheassemblylanguagelevel, sothisisasgoodatimetolearnitasany.Also,bylookingatthreeverydifferentinstructionsets wewillactuallyreinforcetheconceptthatonceyouunderstandthearchitectureofaprocessor,you canprogramitinassemblylanguage.Thisleadstothesecondreasontostudyassemblylanguage. Assemblylanguageisametaphorforstudyingcomputerarchitecturefromthesoftwaredeveloper’spointofview. I’mabigfanof“Dr.Science.”He’softenonNationalPublicRadioanddoestoursofcollege campuses.Hisfamoustaglineis,“IhaveaMaster’sdegree...inscience.”Anyway,IwasataDr. Sciencelecturewhenhesaid,“Iliketoscancolumnsofrandomnumbers,lookingforpatterns.” IrememberedthatlineandIoftenuseitinmylecturestodescribehowyoucanbegintoseethe architectureofthecomputeremergingthroughtheseemingrandomnessofthemachinelanguage instructionset.Icouldjustseeinmymind’seyeabunchofMotorolaCPUarchitectsandengineerssittingaroundatableinarestaurant,pizzatraysscatteredhitherandyon,tryingtofigureout thecorrectbitpatternsforthelastfewinstructionssothattheydon’thaveabloatedandinefficient microcodeROMtable.Ifyouareastudentreadingthisanditdoesn’tmakeanysensetoyounow, don’tworry...yet. Thelastpartsofthetextstepsbackandlooksatgeneralissuesofcomputerarchitecture.We’ll lookatCISCversusRISC,moderntechniques,suchaspipelinesandcaches,virtualmemoryand memorymanagement.However,theoverridingthemewillbecomputerperformance.Wewillkeep returningtotheissuesassociatedwiththesoftware-to-hardwareinterfaceandtheimplicationsof codingmethodsonthehardwareandofthehardwareonthecodingmethods. OneuniqueaspectofthetextisthematerialincludedontheaccompanyingDVD-ROM.I’veincludedthefollowingprogramstousewiththematerialinthetext: • DigitalWorks(freeware):Ahardwaredesignandsimulationtool • Easy68K:Afreewareassembler/simulator/debuggerpackagefortheMotorola(Now Freescale)68,000architecture. • X86emul:Asharewareassembler/simulator/debuggerpackagefortheX86architecture. • GNUARMTools:TheARMdeveloperstoolsetwithInstructionSetSimulatorfromthe FreeSoftwareFoundation. TheARMcompanyhasanexcellenttoolsuitethatyoucanobtaindirectlyfromARM.Itcomes withafree45-dayevaluationlicense.Thisshouldbelongenoughtouseinyourcourse.Unfortunately,IwasunabletonegotiatealicenseagreementwithARMthatwouldenablemetoinclude theARMtoolsontheDVD-ROMthataccompaniesthistext.Thistoolsuiteisexcellentandeasy touse.Ifyouwanttospendsomeadditionaltimeexaminingtheworld’smostpopularRISC architecture,thencontactARMdirectlyandaskthemnicelyforacopyoftheARMtoolssuite. TellthemArniesentyou.
xiv
Preface IhavealsousedtheEasy68Kassembler/simulatorextensivelyinmyCSS422class.Itworkswell andhasextensivedebuggingcapabilitiesassociatedwithit.Also,sinceitisfreeware,thelogistical problemsoflicensesandevaluationperiodsneednotbedealtwith.However,wewillbemakingsomereferencestotheothertoolsinthetext,soitisprobablyagoodideatoinstallthemjust beforeyouintendtousethem,ratherthanatthebeginningofyourclass. AlsoincludedontheDVD-ROMare10shortlecturesonvarioustopicsofinterestinthistext byexpertsinthefieldofcomputerarchitecture.Thesevideosweremadeunderagrantbythe WorthingtonTechnologyEndowmentfor2004atUWB.Eachlectureisaninformal15to30 minute“chalktalk.”Ihopeyoutakethetimetoviewthemandintegratethemintoyourlearning experienceforthissubjectmatter. Eventhoughtheeditors,mystudentsandIhavereadthroughthismaterialseveraltimes,Murphy’s Lawpredictsthatthereisstillahighprobabilityoferrorsinthetext.Afterall,it’ssoftware.So ifyoucomeacrossany“bugs”inthetext,pleaseletmeknowaboutit.Sendyourcorrectionsto [email protected].I’llseetoitthatthecorrectionswillgetspostedonmywebsiteatUW (http://faculty.uwb.edu/aberger). ThelastpointIwantedtomakeisthattextbookscanjustgosofar.Whetheryouareastudentor aninstructorreadingthis,pleasetrytoseekoutexpertsandoriginalsourcesifyoucan.Professor JamesPatterson,writingintheJuly,2004issueofPhysicsToday,writes, Whenwewanttoknowsomething,thereisatendencytoseekaquickanswer inatextbook.Thisoftenworks,butweneedtogetinthehabitoflookingat originalpapers.Textbooksareoftenabbreviatedsecond-orthird-handdistortionsofthefacts...1 Let’sgetstarted. ArnoldS.Berger Sammamish,Washington
JamesD.Patterson,AnOpenLettertotheNextGeneration,PhysicsToday,Vol.57,No.7,July,2004,p.56.
1
xv
Acknowledgments First,andforemost,IwouldliketoacknowledgethesponsorshipandsupportofProfessorWilliam Erdly,formerDirectoroftheDepartmentofComputingandSoftwareSystemsattheUniversityof Washington-Bothell.ProfessorErdlyfirsthiredmeasanAdjunctProfessorinthefallof1999and askedmetoteachacoursecalledHardwareandComputerOrganization,eventhoughthecourseI wantedtoteachwasonEmbeddedSystemDesign. ProfessorErdlythenprovidedmewithfinancialsupporttoconvertmyHardwareandComputer OrganizationcoursefromaseriesoflecturesonPowerPointslidestoaseriesofonlinelessonsin HTML.Theselessonsbecamethecoreofthematerialthatledtothisbook. ProfessorsCharlesJackelsandFrankCioch,intheircapacitiesasActingDirectors,bothsupported myworkinperfectingtheon-linematerialandbringingmultimediaintothedistancelearning experience.Theirsupporthelpedmetoseetheeducationalvalueoftechnologyintheclassroom. IwouldalsoliketothanktheRichardP.andLoisM.WorthingtonFoundationfortheir2004 TechnologyGrantAwardwhichenabledmetotravelaroundtheUnitedStatesandvideotapeshort talksinComputerArchitecture.Also,Iwouldliketothankthe13speakerswhogaveoftheirtime toparticipateinthisproject. CarolLewis,myAcquisitionsEditoratNewnesBookssawthevalueofmyapproachandIthank herforit.I’msorrythatshehadtoleaveNewnesbeforeshecouldseethisbookfinallycompleted. TiffanyGasbarriniofElsevierPressandKellyJohnsonofBorregoPublishingbroughtthebookto life.Thankyouboth. Inlargemeasure,thisbookwasdesignedbythestudentswhohavetakenmyCSS422course. Theirend-of-Quarterevaluationsandfeedbackwereinvaluableinhelpingmetoseehowthisbook shouldbestructured. Finally,andmostimportant,IwanttothankmywifeVivianforhersupportandunderstanding.I couldnothavewritten500+pageswithoutit.
xvi
What’sontheDVD-ROM? OneuniqueaspectofthetextisthematerialincludedontheaccompanyingDVD-ROM.I’veincludedthefollowingprogramstousewiththematerialinthetext: • DigitalWorks(freeware):Ahardwaredesignandsimulationtool. • Easy68K:Afreewareassembler/simulator/debuggerpackagefortheMotorola(now Freescale)68,000architecture. • X86emul:Asharewareassembler/simulator/debuggerpackagefortheX86architecture. • GNUARMTools:TheARMdeveloperstoolsetwithInstructionSetSimulatorfromthe FreeSoftwareFoundation. • Tenindustryexpertvideolecturesonsignificanthardwaredesignanddevelopmenttopics.
xvii
1
CHAPTER
IntroductionandOverview ofHardwareArchitecture LearningObjectives Whenyou’vefinishedthislesson,youwillbeableto: Describetheevolutionofcomputingdevicesandthewaymostcomputer-baseddevices areorganized; Makesimpleconversionsbetweenthebinary,octalandhexadecimalnumbersystems,and explaintheimportanceofthesesystemstocomputingdevices; Demonstratethewaythattheatomicelementsofcomputerhardwareandlogicgatesare used,anddetailtherulesthatgoverntheiroperation.
Introduction Today,weoftentakeforgrantedtheimpressivearrayofcomputingmachinerythatsurroundsus andhelpsusmanageourdailylives.Becauseyouarestudyingcomputerarchitectureanddigital hardware,younodoubthaveagoodunderstandingofthesemachines,andyou’veprobablywrittencountlessprogramsonyourPCsandworkstations.However,itisveryeasytobecomejaded andforgettheevolutionofthetechnologythathasledustothepointwhereeveryNintendoGame Boy®has100timesthecomputingpowerofthecomputersystemsonthefirstMercuryspacemissions.
ABriefHistoryofComputing Computingmachineshavebeenaroundforalongtime,hundredsofyears.TheChineseabacus,the calculatorswithgearsandwheelsandthefirstanalogcomputersareallexamplesofcomputingmachinery;insomecasesquitecomplex,thatpredatestheintroductionofdigitalcomputingsystems. Thecomputingmachinesthatwe’reinterestedincameaboutinthe1940sbecauseWorldWarII artilleryneededamoreaccuratewaytocalculatethetrajectoriesoftheshellsfiredfrombattleships. Today,theprimaryreasonthatcomputershavebecomesopervasiveistheadvancesmadein integratedcircuitmanufacturingtechnology.WhatwasonceprimarilyorangegrovesinCalifornia, northofSanJoseandsouthofPaloAlto,istodaytheregionknownasSiliconValley.SiliconValleyisthehometomanyofthecompaniesthatarethelocomotivesofthistechnology.Intel,AMD, Cypress,CirrusLogicandsoonarehouseholdnames(ifyouliveinageek-speakhousehold) anywhereintheworld.
1
Chapter1 About30yearsago,GordonMoore,oneofthefoundersofIntel,observedthatthedensityof transistorsbeingplacedonindividualsiliconchipswasdoublingabouteveryeighteenmonths. ThisobservationhasbeenremarkablyaccuratesinceMoorefirststatedit,andithassincebecomeknownasMoore’sLaw.Moore’sLawhasbeenremarkablyaccuratesinceGordonMoore firstarticulatedit.Memorycapacity,morethenanythingelse,hasbeenanexcellentexampleof theaccuracyofMoore’sLaw.Figure1.1containsasemi-logarithmicgraphofmemorycapacity versustime.Manycircuitdesignersanddevicephysicistsarearguingaboutthecontinuedviability ofMoore’sLaw.Transistorscannotcontinuetoshrinkindefinitely,norcanmanufacturerseasily affordthecostofthemanufacturingequipmentrequiredtoproducesiliconwafersofsuchminute dimensions.Atsomepoint,thelawsofquantumphysicswillbegintoalterthebehaviorofthese tinytransistorsinveryprofoundways. 1000M ~2004 1000M ~2004 1000
256M
100
512M
Mbit capacity
64M 10
16M 4M
1M
1
64K
0.1
256K
16K 0.01
0.001 1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998 2000
Figure1.1:Thegrowthinthecapacityofdynamicrandomaccessmemories(DRAM) withtime.Notethesemi-logarithmicbehavior,characteristicofMoore’sLaw.
Today,wearecapableofplacinghundredsofmillionsoftransistors(the“active”switchingdevice fromwhichwecreatelogicgates)ontoasingleperfectpieceofsilicon,perhaps2cmonaside. Fromthepointofviewofdesigningcomputerchips,thebigbreakthroughcameaboutwhenMead andConway1describedamethodofcreatinghardwaredesignsbywritingsoftware.Calledsilicon compilation,ithasledtothecreationofhardwaredescriptionlanguage,orHDL.HardwaredescriptionlanguagessuchasVerilog®andVHDL®enablehardwaredesignerstowriteaprogram thatlooksremarkablyliketheCprogramminglanguage,andthentocompilethatprogramtoa recipethatasemiconductormanufacturercanusetobuildachip. Backtothebeginning;thefirstgenerationofcomputingengineswascomprisedofthemechanical devices.Theabacus,theaddingmachine,thepunchcardreaderfortextilemachinesfitintothis category.Thenextgenerationspannedtheperiodfrom1940–1960.Hereelectronicdevices— vacuumtubes—wereusedastheactivedeviceorswitchingelement.Evenaminiaturevacuum
2
IntroductionandOverviewofHardwareArchitecture tubeismillionsoftimeslargerthenthetransistoronasiliconwafer.Itconsumesmillionsoftimes thepowerofthetransistor,anditsusefullifetimeishundredsorthousandsoftimeslessthena transistor.Althoughthevacuumtubecomputersweremuchfasterthenthemechanicalcomputers oftheprecedinggeneration,theyarethousandsoftimesslowerthenthecomputersoftoday.Ifyou areafanofthegradeBsciencefictionmoviesofthe1950’s,thesecomputersweretheonesthat filledtheroomwithlightsflashingandmetersbouncing. Thethirdgenerationcoveredroughlytheperiodoftimefrom1960to1968.Herethetransistorreplacedthevacuumtube,andsuddenlythecomputersbegantobeabletodorealwork.Companies suchasIBM®,Burroughs®andUnivac®builtlargemainframecomputers.TheIBM360family isarepresentativeexampleofthemainframecomputeroftheday.Alsoatthistime,Xerox®was carryingoutsomepioneeringworkonthehuman/computerinterfaceattheirPaloAltoResearch Center,XeroxPARC.Heretheystudiedwhatlaterwouldbecomecomputernetworks,Windows® operatingsystemandtheubiquitousmouse.ProgrammersstoppedprogramminginmachinelanguageandassemblylanguageandbegantouseFORTRAN,COBOLandBASIC. Thefourthgeneration,roughly1969–1977wastheageoftheminicomputer.Theminicomputer wasthecomputerofthemasses.Itwasn’tquitethePC,butitmovedthecomputeroutofthe sterileenvironmentofthe“computerroom,”protectedbytechniciansinwhitecoats,toacomputer inyourlab.Theminicomputeralsorepresentedthereplacementofindividualelectronicparts, suchastransistorsandresistors,mountedonprintedcircuitboards(calleddiscretedevices),with integratedcircuits,orcollectionsoflogicfunctionsinasinglepackage.Herewastheintroduction ofthesmallandmediumscaleintegratedcircuits.CompaniessuchasDigitalEquipmentCompany (DEC),DataGeneralandHewlett-Packardallbuiltthisgenerationofminicomputer.2Alsowithin thistimeframe,simpleintegrated-circuitmicroprocessorswereintroducedandcommerciallyproducedbycompanieslikeIntel,TexasInstruments,Motorola,MOSTechnologyandZilog.Early microcomputerdevicesthatbestrepresentthisgenerationarethe4004,8008and8080fromIntel, the9900fromTexasInstrumentsandthe6800fromMotorola.Thecomputerlanguagesofthe fourthgenerationwere:assembly,C,Pascal,Modula,SmalltalkandMicrosoftBASIC. Wearecurrentlyinthefifthgeneration,althoughitcouldbearguedthatthefifthgenerationended withtheIntel®80486microprocessorandtheintroductionofthePentium®representsthesixth generation.We’llignorethatdistinctionuntilitismorewidelyaccepted.Theadvancesmadein semiconductormanufacturingtechnologybestcharacterizethefifthgenerationofcomputers. Today’ssemiconductorprocessestypifywhatisreferredtoasVeryLargeScaleIntegration,or VLSItechnology.Thenextstep,UltraLargeScaleIntegration,orULSIiseitherheretodayor rightaroundthecorner.Dr.DanielMann3,anAMDFellow,recentlytoldmethatamodernAMD AthlonXPprocessorcontainsapproximately60milliontransistors. Thefifthgenerationalsosawthegrowthofthepersonalcomputerandtheoperatingsystemasthe primaryfocusofthemachine.Standardhardwareplatformscontrolledbystandardoperatingsystemsenabledthousandsofdeveloperstocreateprogramsforthesesystems.Intermsofsoftware, thedominantlanguagesbecameADA,C++,JAVA,HTMLandXML.Inaddition,graphicaldesign language,basedupontheuniversalmodelinglanguage(UML),begantoappear.
3
Chapter1 Computer TwoViewsofToday’sComputer CentralProcessing Input/ Memory Themoderncomputerhasbecome Unit(CPU) Output fasterandmorepowerfulbutthebasic BIOS DISPLAY architectureofacomputingmachinehas PROGRAM DRIVERS CONTROL KEYBOARD essentiallystayedthesameformany NETWORK years.Todaywecantaketwoequivalent APPLICATIONS DATA viewsofthemachine:Thehardware DISK MANIPULATION OPERATING DRIVES SYSTEM viewandthesoftwareview.Thehardwareview,notsurprisingly,focuseson themachineanddoesallowforthefact Figure 1.2: Abstract view of a computer. The three thatthesoftwarehassomethingtodo main elements are the control and data processor, withitsreasontoexist.From50,000 feet,ourcomputerlookslikeFigure1.2. the input and output, or I/O devices, and the program
Inthiscourse,we’llfocusprimarilyon theCPUandmemorysystems,withsome considerationofthesoftwarethatdrives thishardware.We’lltouchbrieflyonI/O, sinceacomputerisn’tmuchgoodwithout it.
thatisexecutingonthemachine.
UserInterface(Command)
Application
OperatingSystem Thesoftwaredeveloper’sviewisroughly equivalent,buttheperspectivedoeschange somewhat.Figure1.3showsthecomputer SoftwareDrivers,FirmwareandBIOS fromthesoftwaredesigner’spointofview. Notethattheviewofthesystemshown PhysicalHardware inthisfigureissomewhatproblematic becauseitisn’talwaysclearthattheuser Figure 1.3: Representing the computer in terms of interfacecommunicatesdirectlywiththe theabstractionlevelsofthesystem. applicationprogram.Inmanycases,the userinterfacefirstcommunicateswiththeoperatingsystem.However,let’slookatthisdiagram somewhatlooselyandconsiderittobetheflowofinformation,ratherthantheflowofcontrol.
AbstractionLevels Oneofthemoremodernconceptsofcomputerdesignistheideaofabstractionlevels.Eachlevel providesanabstractionofthelevelbelowit.Atthelowestlevelisthehardware.Inordertocontrol thehardwareitisnecessarytocreatesmallprograms,calleddrivers,whichactuallymanipulate theindividualcontrolbitsofthehardware.
4
IntroductionandOverviewofHardwareArchitecture Sittingabovethedriversistheoperatingsystemandothersystemprograms.Theoperatingsystem, orOS,communicateswiththedriversthroughastandardsetofapplicationprogramminginterfaces,orAPIs.TheAPIsprovideastructurebywhichthenextlevelofupintheabstractionlevel cancommunicatewiththelayerbelowit.Thus,inordertoreadacharacterfromthekeyboardof yourcomputer,thereisalow-leveldriverprogramthatbecomesactivewhenakeyisstruck.The operatingsystemcommunicateswiththisdriverthroughitsAPI. Atthenextlevel,theapplicationsoftwarecommunicateswiththeOSthroughsystemAPI’sthatonce again,abstractthelowerlevelssothattheindividualdifferencesinbehaviorofthehardwareandthe driversmaybeignored.However,weneedtobeabitcarefulabouttakingtheviewpointofFigure 1.3tooliterally.WecouldalsoarguethattheApplicationandOperatingSystemlayersshouldbe reversedbecausetheuserinteractswiththeapplicationthroughtheOperatingSystemlayeraswell. Thus,mouseandkeyboardinputsarereallypassedtotheapplicationthroughtheOperatingSystem anddonotgodirectlyfromtheUsertotheApplication.Anyway,yougetthepicture. Thecomputerhardware,asrepresentedbyadesktopPC,canbethoughtofasbeingcomprisedof fourbasicparts: 1. Inputdevicescanincludecomponentssuchasthemouse,keyboard,microphone, disks,modemandthenetwork. 2. Outputdevicesarecomponentssuchasthedisplay,disk,modem,soundcardand speakers,andthenetwork. 3. Thememorysystemiscomprisedofinternalandexternalcaches,mainmemory, videomemoryanddisk. 4. Thecentralprocessingunit,orCPU,iscomprisedofthearithmeticandlogicunit (ALU),controlsystemandbusses. Busses Thebussesarethenervoussystemofthecomputer.Theyconnectthevariousfunctionalblockofthe computerbothinternallyandexternally.Withinacomputer,abusisagroupingofsimilarsignals. Thus,yourPentiumprocessorhasa32-bitaddressbusanda32-bitdatabus.Intermsofthebus structure,thismeansthattherearetwobundlesof32wireswitheachbundlecontaining32individual signals,butwithacommonfunction.We’lldiscussbussesmuchmorethoroughlyinalaterlesson. Thetypicalcomputerhasthreebusses:Oneformemoryaddresses,onefordataandoneforstatus (housekeepingandcontrol).TherearealsoindustrystandardbussessuchasPCI,ISA,AGP, PC-105,VXIandsoforth.Sincethesignaldefinitionsandtimingrequirementsfortheseindustrystandardbussesarecarefullycontrolledbythestandardsassociationthatmaintainsthem,hardware devicesfromdifferentmanufacturerscangenerallybeexpectedtoworkproperlyandinterchangeably.Somebussesarequitesimple—onlyonewire—butthesignalssentdownthatwirearequite complexandrequirespecialhardwareandstandardprotocolstounderstandit.Examplesofthese typesofbussesaretheuniversalserialbus(USB),thesmallcomputersysteminterfacebus(SCSI), EthernetandFirewire.
5
Chapter1 Memory Fromthepointofviewofasoftwaredeveloper,thememorysystemisthemostvisiblepartofthe computer.Ifwedidn’thavememory,we’dneverhaveaproblemwithanerrantpointer.Butthat’s anotherstory.Thecomputermemoryistheplacewhereprogramcode(instructions)andvariables (data)arestored.Wecanmakeasimpleanalogyaboutinstructionsanddata.Considerarecipeto bakeacake.Therecipeitselfisthecollectionofinstructionsthattellushowtocreatethecake. Thedatarepresentstheingredientsweneedthattheinstructionsmanipulate.Itdoesn’tmakemuch sensetosifttheflourifyoudon’thaveflourtosift. Wemayalsodescribememoryasahierarchy,baseduponspeed.Inthiscase,speedmeanshow fastthedatacanberetrievedfromthememorywhenthecomputerrequestsit.Thefastestmemory isalsothemostexpensivesoasthememoryaccesstimesbecomeslower,thecostperbitdecreases,sowecanhavemoreofit.Thefastestmemoryisalsothememorythat’sclosesttotheCPU. Thus,ourCPUmighthaveasmallnumberofon-chipdataregisters,orstoragelocations,several thousandlocationsofoff-chipcachememory,severalmillionlocationsofmainmemoryand severalbillionlocationsofdiskstorage.Theratiooftheaccesstimeofthefasteston-chipmemory totheslowestmemory,theharddisk,isabout10,000toone.Theratioofthecostofthetwo memoriesissomewhatmoredifficulttocalculatebecausethefastestsemiconductormemoryisthe on-chipcachememory,andyoucannotbuythatseparatelyfromthemicroprocessoritself.However,ifweestimatetheratioofthecostpergigabyteofthemainmemoryinyourPCtothecost pergigabyteofharddiskstorage(andtakingintoaccountthemail-inrebates)thenwefindthatthe fastersemiconductorstoragewithanaverageaccesstimeof20–40nanosecondsis300timesmore costlythenharddiskstorage,withanaverageaccesstimeof1millisecond. Today,becauseoftheeconomyofscaleprovidedbythePCindustry,memoryisincredibly inexpensive.Astandardmemorymodule(SIMM)withacapacityof512millionstoragelocations costsabout$60.PCmemoryisdominatedbyamemorytechnologycalleddynamicrandomaccess memory,orDRAM.ThereareseveralvariationsofDRAM,andwe’llcoverthemingreaterdepth lateron.DRAMischaracterizedbythefactthatitmustbeconstantlyaccessedoritwillloseits storeddata.Thisforcesustocreatehighlyspecializedandcomplexsupporthardwaretointerface thememorysystemstotheCPU.ThesedevicesarecontainedinsupportchipsetsthathavebecomeasimportanttothemodernPCastheCPU.Whyusethesecomplexmemories?DRAM’sare inherentlyverydenseandcanholdupwardsof512millionbitsofinformation.Inordertoachieve thesedensities,thecomplexityofaccessingandcontrollingthemwasmovedtothechipset. StaticRAM(SRAM) Thememorythatwe’llfocusoniscalledstaticrandomaccessmemoryorSRAM.Eachmemory cellofanSRAMdeviceismorecomplicatedthentheDRAM,buttheoveralloperationofthe deviceiseasiertounderstand,sowe’llfocusonthistypeofmemoryinourdiscussions.Theterm, staticrandomaccessmemory,orSRAM,referstothefactthat: 1. wemayreadfromthechiporwritedatatoit,and 2. anymemorycellinthechipmaybeaccessedatanytime,oncetheappropriateaddressof thecellispresentedtothechip.
6
IntroductionandOverviewofHardwareArchitecture 3. Aslongaspowerisappliedtothememory,weonlyarequiredtoprovideanaddressto theSRAMcell,togetherwithaREADoraWRITEsignal,inordertoaccess,ormodify, thedatainthecell.Thisisquiteabitdifferentthentheeffortrequiredtomaintaindata integrityinDRAMcells.We’lldiscussthispointingreaterdetailinalaterchapter. WithaRAMmemorythereisnoneedtosearchthroughalltheprecedingmemorycellsinorderto gettotheoneyouwanttoread.Inotherwords,datacanbereadfromthelastcelloftheRAMas quicklyasitcouldbereadfromthefirstcell.Incontrast,atapebackupdevicemuststreamthrough theentiretapebeforethelastpieceofdatacanberetrieved.That’swhytheterm“randomaccess”is used.AlsonotethatwhenarediscussingSRAMorDRAMinthegeneralsense,aslistedinitems1 and2,we’lljustusethetermRAM,withouttheSRAMorDRAMdistinction. MemoryHierarchy Figure1.4showsthememoryhierarchyinrealterms,whereweseecanseehowthevarious memoryelementsgrowexponentiallyinsizeanddecreaseexponentiallyinaccesstime.Closest memorytotheCPUisthesmallest,typicallyintherangeof1Kbyteofdata(1,0008-bitcharacters)to1Mbyteofdata(1,000,000characters).Theseon-chipcachememoriescanhaveaccess timesasshortas½to1nanosecondorone-billionthofasecond.Asabenchmark,lightcantravel aboutonefootthroughtheairinonenanosecond.WecallthiscachetheLevel1,orL1cache. BelowtheL1cacheistheLevel2,orL2cache.Intoday’sPentium-classprocessors,theL2cache isusuallyontheprocessorchipitself.Infact,ifyoucouldliftthelidofaPentiumorAthlon processorandlookatthesilicondieitselfunderamicroscopeyoumightbesurprisedtofindthat thebiggestpercentageofchipareawastakenupbythecachememories. Thesecondarycachesitsyourcomputer’smainmemory. PrimaryCache(L1) CPU 64K–1M(1ns) Thesearethe“memorysticks”thatyoucanpurchase inacomputershoptoincreaseyourcomputer’sperformance.Thismemoryisconsiderablyslowerthanon-chip SecondaryCache(L2) cachememory,butyouusuallyhaveamuchlargermain 256K–4M(10ns) memorythentheon-chipmemory.Youmightwonderwhy MainMemory addingmorememorywillgiveyoubetterperformance.To 64M–1024M(50ns) seethis,considerthenextleveldownfrommainmemory, theharddiskdrive.Yourharddrivehaslotsofcapacity, HardDrive butitcomesataprice,speed.Theharddiskisanelectro50G–250G(100,000ns) mechanicaldevice.Thedataisstoredonarotatingplatter and,evenatarotationalspeedof7200revolutionsper TapeBackup 10G–10T(seconds) minute,ittakesprecioustimeforthecorrectdatatocome underthediskreadheads.Also,thedataisorganizedas Internet individualtracksoneachsideofmultipleplatters.The Unlimited(minutes) headshavetomovefromtrack-to-trackinordertoaccess Figure 1.4: The memory hierarchy. thecorrectdata.Thistakestimeaswell. Now,youroperatingsystem,whetheritsMACO/S, LinuxoroneoftheflavorsofWindows,usesthehard 7
Noticetheinverserelationshipbetween thesizeofthememoryandtheaccess time.
Chapter1 diskasahandyplacetoswapoutprogramsordatathatcan’tfitinmainmemoryataparticular pointintime.So,ifyouhaveseveralwindowsopenonyourcomputer,andonly64megabytesof mainmemory,youmightseethehourglassformofthecursorappearingquiteoftenbecausethe operatingsystemisconstantlyswappingthedifferentapplicationsinandoutofmainmemory. FromFigure1.4,weseethattheratioofaccesstimesbetweentheharddiskandmainmemorycan be10,000toone,soanytimethatwegototheharddisk,wewillhavetowait.Themoralhereis thatthebestperformanceboostyoucangivetoyourcomputeristoaddasmuchmemoryasitcan hold. Finally,thereareseveralsymbolsusedinFigure1.4thatmightbestrangetoyou.We’lldiscuss themindetailinduecourse,butincaseyouwerewonderingwhatitmeant,here’salittlepreview: Symbol
Name
ns K M G T
nanosecond kilobytes megabytes gigabytes terabytes
Meaning
billionthofasecond 210or10248-bitcharacters(bytes) 220or1,048,576bytes 230or1,073,741,824bytes 240or1,099,511,627,776bytes
Hopefully,thesenumberswillsoonbecomequitefamiliartoyoubecausetheyarethedimensions ofmoderncomputertechnology.However,onenoteofcaution:Thetermskilo,mega,gigaand teraareoftenoverloaded.Sometimestheyareusedinthestrictlyscientificsenseofashorthand notationforamultiplicationfactorof103,106,109and1012,respectively.So,howdoyouknowthe computer-speakversions,210,220,230,240arebeingused,orthetraditionalscienceandengineering meaningisbeingused?That’sagoodquestion.Sometimesitisn’tsoobviousandmistakescould bemade.Forexample,wheneverwearedealingwithmemorysizeandmemoryissues,weare almostalwaysusingthebase2senseoftheterms.However,thisisn’talwaysthecase.Harddisk drives,eventhoughtheyarestoragedevices,usethetermsintheengineeringsense.Therefore,an old1gigabyteharddrivedoesnotholdthesameamountofdataas1gigabyteofmemorybecause the“giga”termisbeingusedintwodifferentsenses.Inanycase,youaregenerallysafetoassume thebase2versionsoftheterminthistext,unlessIspecificallystateotherwise.Intherealworld, caveatemptor. HardDiskDrive Let’slookabitfurtherintothedynamicsoftheharddiskdrive.Diskdrivesareamarvelofengineeringtechnology.Foryears,electronicindustryanalystswerepredictingthedemiseofthehard drive.Economicalsemiconductormemoriesthatwereascosteffectiveasaharddiskwerealways “afewyearsaway.”However,thediskdrivemanufacturersignoredthepunditsandjustcontinued toincreasethecapacityandperformance,improvethereliabilityandreducethecostofthedisk drives.Today,anaveragediskdrivecostsabout60centspergigabyteofstoragecapacity. Consideramodern,high-performancediskdrive.Specifically,let’slookattheModelST3146807LC fromSeagateTechnology®4.Followingaretherelevantspecificationsforthisdrive:
8
IntroductionandOverviewofHardwareArchitecture • • • • • • • • • •
RotationalSpeed:10,000rpm Interface:Ultra320SCSI Discs/Heads:4/8 FormattedCapacity(512bytes/sector)Gbytes:146.8 Cylinders:49,855 Sectorsperdrive:286,749,488 Externaltransferrate:320Mbytespersecond Track-to-trackSeekRead/Write(msec):0.35/0.55 AverageSeekRead/Write(msec):4.7/5.3 AverageLatency(msec):2.99
Whatdoesallthismean?ConsiderFigure1.5.Hereweseeahighlysimplifiedschematicdiagram oftheSeagateCheetah®diskdrivediscussedabove.Theharddiskiscomprisedof4aluminum platters.Eachsideoftheplatteriscoatedwithmagneticmaterialthatrecordsthestoreddata. Aboveeachplatterisatinymagneticpick-up,orhead,thatfloatsabovethediskplatter(disc)on acushionofair.Inaveryrealsense,theheadisflyingoverthesurfaceoftheplatter,adistance muchlessthanthethicknessofahumanhair.Thus,whenyougetadiskcrash,theresultsarequite analogoustowhenanairplanecrashes.Ineithercase,theairplaneortheread/writeheadloses liftandhitsthegroundorthesurfaceofthedisk.Whenthathappens,themagneticmaterialis scrappedawayandthediskisnolongerusable. TRACK49854
10,000RPM
TRACK0
HEAD0
HEAD7
CYLINDER
.35/.55track-to-track
Figure1.5:SchematicdiagramoftheSeagateCheetah®harddiskdrive.
Eachsurfacecontains49855concentrictracks.Eachtrackisdiscreet.Itisnotconnectedtothe adjacenttrack,asaspiral.Thus,inordertomovefromonetracktoanadjacenttrack,thehead mustphysicallymoveaslightamount.Allheadsareconnectedtoacommonshaftthatcanquickly andaccuratelyrotatetheheadstoanewposition.Sincethetracksonallthesurfacesareinvertical alignment,wecallthisacylinder.Thus,cylinder0contains8tracksfromthe4platters. Now,howdowegeta146.8Gbytedrive?First,eachplattercontainstwosurfacesandwehave fourplatters,sothatthetotalcapacityofthedrivewillbe8timesthecapacityofonesurface.Each 9
Chapter1 sectorholds512bytesofdata.Bydividingthetotalnumberofsectorsby8,weseethatthereare 35,843,686sectorspersurface.Dividingagainby49855,weseethatthereareapproximately719 sectorspertrack.Itisinterestingthattheactualnumberis718.96sectorspertrack.Whyisn’tthis valueawholenumber?Inotherwords,howcanwehaveafractionalnumberofsectorspertrack? Thereareanumberofpossibilitiesandweneednotdwellonit.However,onepossibilityisthat thenumberofsectorspertrackisnotuniformacrossthedisksurfacebecausethesectorspacing changesaswemovefromthecenterofthedisktotheoutside.InaCD-ROMorDVDdrivethis iscorrectedbychangingtherotationalspeedofthedriveasthelasermovesinandout.But,since theharddiskrotatesatafixedrate,changingtherecordingdensityaswemovefrominnertoouter tracksmakesthemostsense. Anyway,backtoourcalculation.Ifeachtrackholds719sectorsandeachsectoris512bytes, theneachtrackholds368,128bytesofdata.Sincethereare8trackspercylinder,eachcylinder holds2,945,024bytes.Now,here’swherethebignumberscomein.Sincewehave49855cylinders,ourtotalcapacityis146,824,171,520bytes,or146.8Gbytes. Beforeweleavethesubjectofdiskdrives,let’sconsideronemoreissue.Consideringthehard drivespecifications,above,weseethataccesstimesaremeasuredinunitsofmilliseconds(msec), orthousandthsofasecond.Thus,ifyourdatasectorsarespreadoutoverthedisk,thenaccessing eachblockof512bytescaneasilytakesecondsoftime.Comparingthistothetimerequiredto accessdatastoredinmainmemory,itiseasytoseewhytheharddriveis10,000timesslowerthan mainmemory. ComplexInstructionSetArchitecture,andRISC orReducedInstructionSetComputer Todaywehavetwodominantcomputerarchitectures,complexinstructionsetarchitectureorCISC andreducedinstructionsetcomputer,orRISC.CISCistypifiedbywhatisreferredtoasthevon NeumannArchitecture,inventedbyJohnvonNeumannofPrincetonUniversity.InthevonNeumannarchitecture,boththeinstructionmemoryandthedatamemorysharethesamephysical memoryspace.ThiscanleadtoaconditioncalledthevonNeumannbottleneck,wherethesameexternaladdressanddatabussesmustservedoubleduty,transferringinstructionsfrommemorytothe processorforprogramexecution,andmovingdatatoandfrommemoryforthestorageandretrieval ofprogramvariables. Whentheprocessorismovingdata,itcan’tfetchthenextinstruction,andviceversa.Thesolutiontothisdilemma,asweshallsee later,istheintroductionofseparateon-chipinstructionanddata caches.Motorola’s68000processoranditssuccessors,andIntel’s 8086processoranditssuccessorsareallcharacteristicofCISC processors.We’lltakeamorein-depthlookatthedifferences betweenCISCandRISCprocessorsagainlaterwhenwestudy pipelining.
10
Note:Youwilloftenseethe processorrepresentedas80X86, or680X0.The“X”isusedas aplaceholdertorepresenta memberofafamilyofdevices. Thus,the80X86(oftenwrittenas X86)representsthe8086,80186, 80286,80386,80486and80586 (thedesignationofthefirstPentiumprocessor).TheMotorola processorsarethe68000,68010, 68020,68030,68040and68060.
IntroductionandOverviewofHardwareArchitecture HowardAikenofHarvardUniversitydesignedanalternatecomputerarchitecturethatwecommonlyassociatetodaywiththeReducedInstructionSetComputer,orRISCarchitecture.The classicHarvardarchitecturecomputerhastwoentirelyseparatememoryspaces,oneforinstructionsandonefordata.Aprocessorwiththesememoriescouldoperatemuchmoreefficiently becausedataandinstructionscouldbefetchedfromthecomputer’smemorywhenneeded,notjust whenthebusseswereavailable.ThefirstpopularmicroprocessorthatusedtheHarvardArchitecturewastheAm29000fromAMD.Thisdeviceachievedreasonablepopularityintheearly Hewlett-Packardlaserprinters,butlaterfellfromfavorbecausedesigningacomputerwithtwo separatememorieswasjustnotcosteffective.TheobviousperformancegainfromthetwomemoryspaceswasthedrivingforcetoinspiretheCPUdesignerstomovethememoryspacesontothe chips,intheformoftheinstructionanddatacachesthatarecommonintoday’shigh-performance microprocessors. ThegainsinCPUpowerarequiteevidentifwelookatadvancesinworkstationperformanceover thepastfewyears.Itisinterestingthatjustfouryearslater,ahigh-endPC,suchasa2.4GHz Pentium4fromIntelcouldeasilyoutperformtheDECAlpha21264/600workstation. Figure1.65isagraphofworkstationperformanceovertime. 1200 DEC Alpha 21254/600
1100 1000 900
Performance
800 700 600 500
DEC Alpha 5/500
400 300
DEC Alpha 5/300
200 100
DEC Alpha 4/266 IBM POWER 100 DEC AXP/500 HP 9000/750 1991 1992 1993 1994 1995 1996 Year
SUN-4/ MIPS MIPS IBM 260 M/120 M2000 RS6000
0 1987
1988
1989
1990
1997
Figure1.6:Improvementinworkstationperformanceovertime.(FromPattersonandHennessy.)
Therelativeperformanceofthesecomputerswasmeasuredusingastandardsetofprograms calledbenchmarks.Inthiscase,theindustrystandardSPECbase_int92benchmarkwasused. Itisdifficulttocompareperformancebasedonthesenumberswithmoremodernperformance measurementsbecausethebenchmarkchangedalongtheway.Today,thebenchmarkofchoiceis theSPECint95,whichdoesnotdirectlycorrelatewiththeearlierSPECint92benchmark6.How11
Chapter1 ever,accordingtoMann,aconversionfactorofabout38allowsforaroughcomparison.Thus, accordingtothepublishedresults7,a1.0GHzAMDAthlonprocessorachievedaSPECint95 benchmarkresultof42.9,whichroughlycomparestoaSPECint92resultof1630.TheDigital EquipmentCorporation(DEC)AlphaStation5/300isoneworkstationthathaspublishedresults forbothbenchmarktests.Itmeasuresabout280inthegraphofFigure1.6and7.33accordingto theSPECint95benchmark.Multiplyingby38,weget278.5,whichisinreasonableagreement withtheearlierresult.We’llreturntotheissueofperformancemeasurementsinalaterchapter.
NumberSystems Howdoyourepresentanumberincomputer?Howdoyousendthatnumber,whateveritmay be,achar,anint,afloatorperhapsadoublebetweentheprocessorandmemory,orwithin themicroprocessoritself?Thisisafairquestiontoaskandtheanswerleadsusnaturallyto anunderstandingofwhymoderndigitalcomputersarebasedonthebinary(base2)number system.Inordertoinvestigatethis,considerFigure1.7. InFigure1.7we’lldoasimple-mindedexperiment. Let’spretendthatwecanplaceanelectricalvoltage onthewirethatrepresentsthenumberwewould liketotransmitbetweentwofunctionalelementsof thecomputer.Themethodmightworkforsimple numbers,butIwouldn’twanttotouchthewireifI wassending2000.456!Infact,thismethodwould beextremelyslow,expensiveandwouldonlywork foranarrowrangeofvalues.
24.56345 RADIO SHACK
24.56345 V
Direction of signal
However,thatdoesn’timplythatthismethodisn’t usedatall.Infact,oneofthefirstfamiliesofelecZero volts troniccomputerswastheanalogcomputer.The (ground) analogcomputerisbaseduponlinearamplifiers, orthekindofelectroniccircuitrythatyoumight Figure 1.7: Representing the value of a findinyourstereoreceiverathome.Thekeypoint numberbythevoltageonawire. isthatvariables(inthiscasethevoltagesonwires) canassumeaninfiniterangeofvaluesbetweensomelimitsimposedbythenatureofthecircuitry. Inmanyoftheearlyanalogcomputersthisrangemightbebetween–25voltsand+25volts.Thus, anyquantitythatcouldberepresentedasasteady,ortimevaryingvoltagewithinthisrangecould beusedasavariablewithinananalogcomputer. Theanalogcomputertakesadvantageofthefactthatthereareelectroniccircuitsthatcandothe followingmathematicaloperations: • Add/subtract • Log/anti-log • Multiply/divide • Differentiate/integrate
12
IntroductionandOverviewofHardwareArchitecture Bycombiningthiscircuitsoneafteranotherwithintermediateamplificationandscaling,real-time systemscouldbeeasilymodeledandthesolutiontocomplexlineardifferentialequationscouldbe obtainedasthesystemwasoperating. However,theanalogcomputersuffersfromthesame limitationsasdoesyourstereosystem.Thatis,its amplificationaccuracyisnotinfinitelyperfect,sothebest accuracythatcouldbehopedforisabout0.01%,orabout 1partin10,000.Figure1.8showsananalogcomputer ofthetypeusedbytheUnitedStatessubmarinesduring WorldWarII.TheTorpedoDataComputer,orTDC, wouldtakeasitsinputsthecompassheadingandspeedof thetargetship,theheadingandspeedofthesubmarine, thedesiredfiringdistance.Thecorrectspeedandheading wasthensenttothetorpedoesandtheywouldtrackthe course,speedanddepthtransmittedtothembytheTDC. Thus,withinthelimitationsimposedbytheelectronic circuitryofthe1940’s,anentirefamilyofcomputers Figure 1.8: An analog computer from a basedupontheideaofinputsandoutputsbasedupon WWII submarine. Photo courtesy of www. continuousvariables.Inthatsense,yourstereoamplifier fleetsubmarine.com. isananalogcomputer.Anamplifieramplifies,orboostsan electricalsignal.Anamplifierwithagainof10,has anoutputvoltagethatis,ateveryinstantoftime,10 4.2 timesgreaterthantheinputvoltage.Thus,Vout=10 RADIO SHACK Vin.Herewehaveananalogcomputingblockthat happenstobeamultiplicationblockwithaconstant multiplier. 2 Anyway,let’sgetbacktodiscussingtonumbersystems.Wemightbeabletoimproveonthismethod bybreakingthenumberintomoremanageableparts andsendamorelimitedsignalrangeoverseveral wiresatthesametime(inparallel).Thus,eachwire wouldonlyneedtotransmitanarrowrangeofvalues.Figure1.9showshowthismightwork.
4 5 6 3 4 5
Zero volts (ground)
Figure1.9:Usingaparallelbundleofwiresto
transmitanumericvalueinacomputer.The Inthiscase,eachwireinthebundlerepresents wire’s position in the bundle determines its adecimaldecadeandeachnumberthatwesend wouldberepresentedbythecorrespondingvoltages digital weight. Each wire carries between 0 voltsand9volts. onthewires.Thus,insteadofneedingtotransmit potentiallylethalvoltages,suchas12,567volts, thevoltageineachwirewouldneverbecomegreaterthanthatofa9-voltbattery.Let’sstopfora momentbecausethisapproachlookspromising.Howaccuratewouldthevoltageonthewirehave tobeinsothatthecircuitryinterpretsthenumberas4,andnot3or5?InFigure1.7,ourvoltme13
Chapter1 tershowsthatthesecondwirefromthebottommeasures4.2V,not4volts.Isthatgoodenough? Shoulditreallybe4.000±.0005volts?Inallprobability,thissystemmightworkjustfineifeach voltageincrementhasa“slop”ofabout0.3volts.Sowewouldonlyneedtosend4volts±0.3 volts(3.7–4.3volts)inordertoguaranteethatthecircuitryreceivedthecorrectnumber.Whatif thecircuiterredandsent4.5voltsinstead?Thisistoolargetobea4buttoosmalltobea5.The answeristhatwedon’tknowwhatwillhappen.Thevaluerepresentedby4.5voltsisundefined. Hopefully,ourcomputerworksproperlyandthisisnotaproblem. ThemethodproposedinFigure1.9isactuallyveryclosetoreality,butitisn’tquitewhatweneed.With thespeedofmoderncomputers,itisstillfartoodifficulttodesigncircuitrythatisbothfastenoughand accurateenoughtoswitchthevoltageonawirebetween10differentvalues.However,thisideaisbeing lookedatforthenextgenerationofcomputermemorycells.Moreonthatlater,staytuned! Moderntransistorsareexcellentswitches.Theycanswitchavoltageoracurrentonoroffintrillionthsofasecond(picoseconds).Canwemakeuseofthisfact?Let’ssee.Supposeweextendthe conceptofabundleofwiresbutlet’srestrictevenfurtherthevaluesthatcanexistonanyindividualwire.Sinceeachwireiscontrolledbyaswitch,we’llswitchbetweennothing(0volts)and something(~3volts).Thisimpliesthatjusttwonumbersmaybecarriedoneachwire,0orsomething(not0).Willitwork?Let’slookatFigure1.10. 20 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215
on off on off off on on on off off off on on on on off
1 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0
Figure 1.10: Sending numbers as binary values. Each arrow represents a wire with the arrowheadrepresentingthedirectionofsignaltransmission.Thepositionofeachwireinthe bundlerepresentsitsnumericalweight.Eachrowrepresentsanincreasingpowerof2.
Inthisscenario,theamountofinformationthatwecancarryonasinglewireislimitedtonothing, 0,orsomething(let’scallsomething“1”or“on”),sowe’llneedalotofwiresinordertotransmit anythingofsignificance.Figure1.10shows16wires,and Note:Formanyyears,moststandarddigital asyou’llsoonsee,thislimitsustonumbersbetween0 circuitsused5voltsfora1.However,as theintegratedcircuitsbecamesmallerand and65,535ifwearedealingwithunsignednumbers,or thesignedrangeof–32,768to+32,767.Herethedecimal denser,thelogicalvoltagelevelsalsohad tobereduced.Today,thecoreofamodern number0wouldberepresentedbythebinarynumber PentiumorAthlonprocessorrunsata 00000000000000andthedecimalnumber65,535would voltageofaround1.7–1.8volts,notvery berepresentedbythebinarynumber1111111111111111. differentfromastandardAAbattery. 14
IntroductionandOverviewofHardwareArchitecture Nowwe’refinallythere.We’lltakeadvantageofthefactthatelectronicswitchingelements,or transistors,canrapidlyswitchthevoltageonawirebetweentwovalues.Themostcommonform ofthisisbetweenalmost0voltsandsomething(about3volts).Ifoursystemisworkingproperly, thenwhatwedefineas“nothing”,or0,mightneverexceedabout½ofavolt.So,wecandefine thenumber0tobeanyvoltagelessthan½volt(actually,itisusually0.4volts).Similarly,ifthe numberthatwedefineasa1,wouldneverbelessthan2.5volts,thenwehavealltheinformation weneedtodefineournumbersystem.Here,thenumber0isnevergreaterthan0.4voltsandthe number1isneverlessthan2.5volts.Anythingbetweenthesetworangesisconsideredtobeundefinedandisnotallowed. Itshouldbementionedthatwe’vebeenreferringto“thevoltageonawire.”Justwhereare thewiresinourcomputer?Strictlyspeaking, weshouldcallthewires“electricalconductors.” Theycanberealwires,suchasthewiresinthe cablethatyouconnectfromyourprintertothe parallelportonthebackofyourcomputer.They canalsobethinconductingpathsonprintedcircuitboardswithinyourcomputer.Finally,they canbetinyaluminumconductorsontheprocessorchipitself.Figure1.11showsaportionofa Figure1.11:Printedwiresonacomputercircuitboard. printedcircuitboardfromacomputerdesigned Eachwireisactualacoppertraceapproximately0.08 mmwide.Tracescanbeascloseas0.08mmapart bytheauthor. fromeachother.Thelargespotsarethesoldered
Noticethatsomeoftheintegratedcircuit’s pinsoftheintegratedcircuitscomingthroughfrom (ICs)pinsappearhavewiresconnectingthemto theothersideoftheboard. anotherdevicewhileothersseemtobeunconnected.Thereasonforthisisthatthisprintedcircuitboardisactuallyasandwichmadeupoffive thinnerlayerswithwiresprintedoneitherside,givingatotaloftenlayers.Theeightinnerlayers alsohaveathininsulatinglayerbetweenthentopreventelectricshortcircuits.Duringthemanufacturingprocess,thefiveconductinglayersandthefourinsulatinglayersarecarefullyalignedand bondedtogether.Theresultant,ten-layerprintedcircuitboardisapproximately2.5mmthick. Withoutthismultilayermanufacturingtechnique,itwouldbeimpossibletobuildcomplexcomputersystemsbecauseitwouldnotbepossibletoconnectthewiresbetweencomponentswithout havingtocrossaseparatewirewithadifferentpurpose. [NOTE:AcolorversionofthefollowingfigureisincludedontheDVD-ROM.]Figure1.12shows usjustwhat’sgoingonwiththeinnerlayers.HereisanX-rayviewofanothercomputersystem hardwarecircuit.ThisisaboutthesamelevelofcomplexitythatyoumightfindonthemotherboardofyourPC.Theviewislookingthroughthelayersoftheboardandtheconductivetraceson eachlayerareshowninadifferentcolor. Whilethismayappearquiteimposing,mostofthelayoutwasdoneusingcomputer-aideddesign (CAD)software.Itwouldtakealtogethertoomuchtimeforevenaskilleddesignertocomplete thelayoutofthisboard.Figure1.13isamagnificationofasmallerportionofFigure1.12.Here 15
Chapter1 youcanclearlyseethevarioustracesonthedifferentlayers.Eachprintedwireisapproximately 0.03mmwide. [NOTE:Acolorversionofthefollowingfigureis includedontheDVD-ROM.]Ifyoulookcarefully atFigure1.13,you’llnoticethatcertaincolored wirestouchablackdotandthenseemtogooffin anotherdirectionasawireofadifferentcolor.The blackdotsarecalledvias,andtheyrepresentplaces inthecircuitwhereawireleavesitslayerandtraversestoanotherlayer.Viasareverticalconductors thatallowsignalstocrossbetweenlayers.Without vias,wirescouldn’tcrosseachotherontheboard withoutshortcircuitingtoeachother.Thus,when youseeagreenwire(forpurposesofthegrayscale imageonthispage,thegreenwireappearsasadottedline)crossingaredwire,thetwowiresarenot inphysicalcontactwithother,butarepassingover eachotherondifferentlayersoftheboard.Thisis animportantconcepttokeepinmindbecausewe’ll soonbelookingat,anddrawingourownelectronic Figure 1.12: An X-ray view of a portion of a circuitdiagrams,calledschematicdiagrams,and computersystemsboard. we’llneedtokeepinmindhowtorepresentwires thatappeartocrosseachotherwithoutbeingphysicallyconnected,andthosewiresthatareconnected toeachother. Let’sreviewwhatwe’vejustdiscussed.Moderndigitalcomputersusethebinary(base2)numbersystem. Theydosobecauseanumbersystemthathasonly twodigitsinitsnaturalsequenceofnumberslends itselftoahardwaresystemwhichutilizesswitches toindicateifacircuitisina“1”state(on)ora“0” state(off).Also,thefundamentalcircuitelements thatareusedtocreatecomplexdigitalnetworks arealsobasedontheseprinciplesaslogicalexpressions.Thus,justaswemightsay,logically,thatan expressionisTRUEorFALSE,wecanjustaseasily Figure1.13:Amagnifiedviewofaportionofthe describeitasa“1”(TRUE)or“0”(FALSE).As you’llsoonsee,theassociationof1withTRUEand boardshowninFigure1.12. 0withFALSEiscompletelyarbitrary,andwemayreversethedesignationswithlittleornoilleffects. However,fornow,let’sadopttheconventionthatabinary1representsaTRUEorONcondition,anda binary0representsaFALSEorOFFcondition.Wecansummarizethisinthefollowingtable: 16
IntroductionandOverviewofHardwareArchitecture BinaryValue ElectricalCircuitValue
0 1
OFF ON
LogicalValue
FALSE TRUE
ASimpleBinaryExample Sinceyouhaveprobablyneverbeenexposedtoelectricalcircuitdiagrams,let’sdiverightin. Figure1.14isasimpleschematicdiagramofacircuitcontainingabattery,twoswitches,labeled AandB,andalightbulb,C.Thepositiveterminalonthebatteryislabeledwiththeplus(+)sign andthenegativebatteryterminalislabeledwiththeminus(–)sign.ThinkofatypicalAAbattery thatyoumightuseinyourportableMP3player.Thelittlebumpontheendisthepositiveterminal andtheflatportionontheoppositeendisthenegativeterminal.ReferringtoFigure1.11,itmight seemcuriousthatthepositiveterminalisdrawnasawideline,andthenegativeterminalisdrawn asanarrowline.There’sareasonforit,butwewon’tdiscussthathere.ElectricalEngineering studentsaretaughtthereasonforthisduringtheirinitiationceremony,butI’msworntosecrecy. +
A
B
C C = A and B
-
Battery Symbol
Lightbulb (load)
Figure1.14:AsimplecircuitusingtwoswitchesinseriestorepresenttheANDfunction.
Thelightbulb,C,willilluminatewhenenoughcurrentflowsthroughittoheatthefilament.We assumethatinelectricalcircuitssuchasthisone,thatcurrentflowsfrompositivetonegative. Thus,currentexitsthebatteryatthe+terminalandflowsthroughtheclosedswitches(AandB), thenthroughthelamp,andfinallytothe–terminalofthebattery.Now,youmightwonderabout thisbecause,asweallknowfromourhighschoolscienceclasses,thatelectricalcurrentisactuallymadeupofelectronsandelectrons,beingnegativelycharged,actuallyflowfromthenegative terminalofthebatterytothepositiveterminal;thereversedirection. Theanswertothisapparentparadoxishistoricalprecedent.Aslongaswethinkofthecurrentas beingpositivelycharged,theneverythingworksoutjustfine. Anyway,inorderforcurrenttoflowthroughthefilament,twothingsmusthappen:switchAmust beclosed(ON)andswitchBmustbeclosed(ON).Whenthisconditionismet,theoutputvariable,C,willbeON(illuminated).Thus,wecantalkaboutourfirstexampleofalogicalequation: C=AANDB Thisisaveryinterestingresult.We’veseentwoapparentlyverydifferentconsequencesofusing switchestobuildcomputersystems.Thefirstisthatweareleadtohavingtodealwithnumbers asbinary(base2)valuesandthesecondisthattheseswitchesalsoallowustocreatelogical equations.Fornow,let’skeepitemtwoasaninterestingconsequence.We’lldealwithitmore thoroughlyinthenextchapter.BeforeweleaveFigure1.14,weshouldpointoutthattheswitches, AandB,areactuatedmechanically.Someoneflipstheswitchtoturnitonoroff.Ingeneral,a 17
Chapter1 switchisathree-terminaldevice.Thereisacontrolinputthatdeterminesthesignalpropagation betweentheothertwoterminals. Bases Let’sreturntoourdiscussionofthebinarynumbersystem.Weareaccustomedtousingthedecimal(base10)numbersystembecausewehadtenfingersbeforewehadaniMAC®.Thebase(or radix)ofanumbersystemisjustthenumberofdistinctdigitsinthatnumbersystem.Considerthe followingtable: Base2 Base8 Base10 Base16
0,1 0,1,2,3,4,5,6,7 0,1,2,3,4,5,6,7,8,9 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
Binary Octal Decimal Hexadecimal
Lookatthehexadecimalnumbersinthetableabove.Thereare16,distinctdigits,0through9and AthroughF,representingthedecimalnumbers0through15,butexpressingtheminthehexadecimalsystem. Now,ifyou’veeverhadyourPClockupwiththe“bluescreenofdeath,”youmightrecallseeing somefunnylookinglettersandnumbersonthescreen.Thatcrypticmessagewastryingtoshow youanaddressvalueinhexadecimalwheresomethingbadhasjusthappened.Itmaybeoflittle solacetoyouthatfromnowonthemessageonthebluescreenwillnotonlytellyouthatabad thinghashappenedandyou’vejustlostfourhoursofwork,butwithyournew-foundinsight,you willknowwhereinyourPC’smemorytheillegaleventtookplace. Whenwewriteanumberinbinary,octaldecimalorhexadecimal,wearerepresentingthenumberinexactlythesameway,althoughthenumberwilllookquitedifferenttous,dependingupon thebasewe’reusing.Let’sconsiderthedecimalnumber65,536.Thishappenstobe216.Later, we’llseethatthishasspecialsignificance,butfornow,it’sjustanumber.Figure1.15,showshow eachdigitofthenumber,65,536,representsthecolumnvaluemultipliedbythenumericalweight ofthecolumn.Theleftmost,ormostsignificantdigit,isthenumber6.Therightmost,orleast significantdigit,alsohappenstobe6.Thecolumnweightofthemostsignificantdigitis10,000 (104)sothevalueinthatcolumnis6x10,000,or60,000.Ifwemultiplyouteachcolumnvalue
••
104
103
102
101
100
6
5
5
3
6
Notice Noticehow howeach eachcolumn columnisisweighted weightedby by the value of the base raised the value of the base raisedto tothe thepower power +
6 x 100 =
6
3 x 101 =
30
5 x 102 =
500
5 x 103 =
5000
6 x 104 =
60000
=
65536
Figure1.15:Representinganumberinbase10.Goingfromrighttoleft,eachdigitmultiplies thevalueofthebase,raisedtoapower.Thenumberisjustthesumofthesemultiples. 18
IntroductionandOverviewofHardwareArchitecture andthenarrangethemasalistofnumberstobeaddedtogether,aswe’vedoneontherightsideof Figure1.15,wecanaddthemtogetherandgetthesamenumberaswestartedwith.OK,perhaps we’reoverstatingtheobvioushere,butstaytuned,becauseitdoesgetbetter.Thislittleexample shouldbeobvioustoyoubecauseyou’reaccustomedtomanipulatingdecimalnumbers.Thekey pointisthatthecolumnvaluehappenstobethebasevalueraisedtoapowerthatstartsat0inthe rightmostcolumnandincreasesby1aswemovetotheleft.Sincedecimalisbase10,thecolumn weightsmovingleftwardare1,10,100,1000,10000andsoon. Ifwecangeneralizethismethodofrepresentingnumber,thenitfollowsthatwewouldusethe samemethodtorepresentnumbersinanyotherbase. TranslatingNumbersBetweenBases Let’srepeattheaboveexercise,butthistimewe’lluseabinarynumber.Let’sconsiderthe8-bit binarynumber10101100.Becausethisnumberhas8binarynumbers,orbitsassociatedwithit, wecallitan8-bitnumber.Itiscustomarytocallan8-bitbinarynumberabyte(inCorC++this isachar).Itshouldnowbeobvioustoyouwhybinarynumbersareall1’sand0’s.Asidefromthe factthatthesehappentobethetwostatesofourswitchingcircuits(transistors)theyaretheonly numbersavailableinabase2numbersystem. Thebyteisperhapsmostnotablebecausewemeasurestoragecapacityinbyte-sizechunks(sorry). ThememoryinyourPCisprobablyatleast256Mbytes(256millionbytes)andyourharddisk hasacapacityof40Gbytes(40billionbytes),ormore. ConsiderFigure1.16.WeusethesamemethodasweusedinthedecimalexampleofFigure1.15. However,thistimethecolumnweightisamultipleofbase2,notbase10.Thecolumnweights gofrom27,or128,themostsignificantdigit,downto20,or1.Eachcolumnissmallerbyapower of2.Toseewhatthisbinarynumberisindecimal,weusethesameprocessaswedidbefore;we multiplythenumberinthecolumnbytheweightofthecolumn. Bases of Hex and Octal
1 0 1 0 1 1 0 0
x x x x x x x x
27 26 25 24 23 22 21 20
= 128 = 0 = 32 = 0 = 8 = 4 = 0 = 0
128
64
32
16
8
4
2
1
27
26
25
24
23
22
21
20
1
0
1
0
1
1
0
0
10101100 2
=
172 10
172
Figure1.16:Representingabinarynumberintermsofthepowersofthebase2.Noticethat thebasesoftheoctal(8)andhexadecimal(16)numbersystemsarealsopowersof2. 19
Chapter1 Thus,wecanconcludethatthedecimalnumber172isequaltothebinarynumber10101100.Itis alsonoteworthythatthebasesofthehexadecimal(Hex)numbersystem,16andtheoctalnumber system,8,arealso24and23,respectively.Thismightgiveyouahintastowhywecommonlyuse thehexadecimalrepresentationandthelesscommonoctalrepresentationinsteadofbinarywhen wearedealingwithourcomputersystem.Quitesimply,writingbinarynumbersgetsextremely tediousveryquicklyandishighlypronetohumanerrors. Toseethisinallofitsstarkreality,considerthebinaryequivalentofthedecimalvalue: 2,098,236,812 Inbinary,thisnumberwouldbewrittenas: 1111101000100001000110110001100 Now,binarynumbersareparticularlyeasytoconverttodecimalbythisprocessbecausethe numberiseither1or0.Thismakesthemultiplicationeasyforthoseofuswhocan’trememberthe timestablesbecauseourPDA’shaveallowedsignificantportionsofourcerebralcortextoatrophy. Sincethereseemstobeaconnectionbetweenthebases2,8and16,thenitisreasonabletoassume thatconvertingnumbersbetweenthethreebaseswouldbeeasierthanconvertingtodecimal,since base10isnotanaturalpowerofbase2.Toseehowweconvertfrombinarytooctalconsider Figure1.17. 82
26 (21
81
20 ) 23( 22
80
21
20 ) 20 ( 22
21
20)
128
64
32
16
8
4
2
1
27
26
25
24
23
22
21
20
1
0
1
0
1
1
0
0
0 thru 192
0 thru 56
4 x 80 = 4 5 x 8 1 = 40 2 x 8 2 = 128 172
0 thru 7
Figure1.17:Translatingabinarynumberintoanoctalnumber.Byfactoring outthevalueofthebase,wecancombinethebinarynumberintogroupsof threeandwritedowntheoctalnumberbyinspection.
Figure1.17takestheexampleofFigure1.16onestepfurther.Figure1.17startswiththesame binarynumber,10101100,or172inbase10.However,simplearithmeticshowsusthatwecan factoroutvariouspowersof2thathappentoalsobepowersof8.ConsiderthedarkgrayhighlightedstripeinFigure1.17.Wecanmakethefollowingsimplifications. Sinceanynumbertothe0power=1, (222120)=20×(222120)
20
IntroductionandOverviewofHardwareArchitecture Thus, 20=80=1 Wecanperformthesamesimplificationwiththenextgroupofthreebinarynumbers: (252423)=23×(222120) since23isacommonfactorofthegroup. However,23=81,whichisthecolumnweightofthenextcolumnintheoctalnumbersystem. Ifwerepeatthisexerciseonemoretimewiththefinalgroupoftwonumbers,weseethat: (2726)=26×(2120) since26isacommonfactorofthegroup.Again,26=82,whichisjustthecolumnweightofthe nextcolumnintheoctalnumbersystem. Sincethereisthisnaturalrelationshipbetweenbase8andbase2,itisveryeasytotranslatenumbers betweenthebases.Eachgroupofthreebinarynumbers,startingfromtherightside(leastsignificant digit)canbetranslatedtoanoctaldigitfrom0to7bysimplylookingatthebinaryvalueandwriting downtheequivalentoctalvalue.InFigure1.14,therightmostgroupofthreebinarynumbersis100. Referringtothecolumnweightsthisisjust1*(1*4+0*2+0*1),or4.Themiddlegroupof threegivesus8*(1*4+0*2+1*1),or8*5(40).Thetworemainingnumbersgivesus64* (1*2+0*1)or128.Thus,4+40+128=172,ourbinarynumberfromFigure1.8.Butwhere’s theoctalnumber?Simple,eachgroupofthreebinarynumbersgaveusthecolumnvalueofthe octaldigit,soourbinarynumberis254.Therefore,10101100inbinaryisequalto254inoctal, whichequals172indecimal. Neat!Thus,wecanconvertbetweenbinaryandoctalasfollows: • Ifthenumberisinoctal,writeeachoctaldigitintermsofthreebinarydigits.Forexample: 256773=10101110111111011 • Ifthenumberisinbinary,thengatherthebinarydigitsintogroupsofthree,startingfrom theleastsignificantdigitandwritedowntheoctal(0through7)equivalent.Forexample: 1100010101001101112=110001010100110111=6124678 • Ifthemostsignificantgroupingofbinarydigitshasonly1or2digitsremaining,justpad thegroupwith0’stocompleteagroupofthreeforthemostsignificantoctaldigit. Today,octalisnotascommonlyusedasitoncewas,butyouwillstillseeitusedoccasionally.For example,intheUNIX(Linux)commandchmod777,thenumber777istheoctalrepresentationoftheindividualbitsthatdefinefilestatus.Thecommandchangesthefilepermissionsforthe userswhomaythenhaveaccesstothefile. Wecannowextendourdiscussionoftherelationshipbetweenbinarynumbersandoctalnumbers toconsidertherelationshipbetweenbinaryandhexadecimal.Hexadecimal(hex)numbersareconvertedtoandfrombinaryinexactlythesamewayaswedidwithoctalnumbers,exceptthatnow weuse24,asthecommonfactorratherthan23.ReferringtoFigure1.18,weseethesameprocess forhexnumbersasweusedforoctal.
21
Chapter1 161 24(23 128 27 1
22 64 26 0
160
21
20)
20 ( 23
22
21
20 )
32
16
8
4
2
1
25
24
23
22
21
20
1
0
1
1
0
0
Figure 1.18: Converting a binary number to base 16 (hexadecimal). Binary numbersaregroupedbyfour,startingfromtheleastsignificantdigit,andthe hexadecimalequivalentiswrittendownbyinspection.
Inthiscase,wefactoroutthecommonpowerofthebase,24,andwe’releftwitharepeatinggroup ofbinarynumberswithcolumnvaluesrepresentedas 23222120 Itissimpletoseethatthebinarynumber1111=1510,sogroupsoffourbinarydigitsmaybe usedtorepresentanumberbetween0and15indecimal,or0throughFinhex.Referringbackto Figure1.16,nowthatweknowhowtodotheconversion,what’sthehexequivalentof10101100? Referringtotheleftmostgroupof4binarydigits,1010,thisisjust8+0+2+0,orA.Therightmostgroup,1100equals8+4+0+0,orC.Therefore,ournumberinhexisAC. Let’sdoa16-bitbinarynumberconversionexample. Binarynumber:
0101111111010111
Octal: Hex: Decimal:
0101111111010111=057727(groupedbythrees) 0101111111010111=5FD7(groupedbyfours) Toconverttodecimal,seebelow: OctaltoDecimal
HextoDecimal
7×80=7 2×81=16 7×82=448 7×83=3584 5×84=20480 24,535
7×160=7 13×161=208 15×162=3840 5×163=20480 24,535
Definitions Beforewegofullcircleandconsiderthereverseprocess,convertingfromdecimaltohex,octaland binary,weshoulddefinesometerms.Thesetermsareparticulartocomputernumbersandgiveus ashorthandwaytorepresentthesizeofthenumbersthatwe’llbeworkingwithlateron.Bysize, wedon’tmeanthemagnitudeofthenumber,weactuallymeanthenumberofbinarybitsthatthe numberrefersto.Youarealreadyfamiliarwiththisconceptbecausemostcompilersrequirethatyou declarethetypeofavariablebeforeyoucanuseit.Declaringthetypereallymeanstwothings: 22
IntroductionandOverviewofHardwareArchitecture 1. Howmuchstoragespacewillthisvariableoccupy,and 2. Whattypeofassemblylanguagealgorithmsmustbegeneratedtomanipulatethisnumber? Thefollowingtablesummarizesthevariousgroupingsofbinarybitsanddefinesthem. bit nibble byte
Thesimplestbinarynumberis1digitlong Anumbercomprisedoffourbinarybits.ANIBBLEisalsoonehexadecimaldigit Eightbinarybitstakentogetherformabyte.Abyteisthefundamentalunitofmeasuringstoragecapacityincomputermemoriesanddisks.Thebyteisalsoequaltoa charinCandC++. word Awordis16binarybitsinlength.Itisalso4hexdigitsinlengthor2bytesinlength. Thiswillbecomemoreimportantwhenwediscussmemoryorganization.InCorC++, awordissometimescalledashort. longword AlsocalledaLONG,thelongwordis32binarybitsor8hexdigits.Today,thisisanint inCorC++. doubleword AlsocalledDOUBLE,thedoublewordis64binarybitsinlength,or16hexdigits.
Fromthetableyoumaygetaclueastowhytheoctalnumberrepresentationhasbeenmostlysupplantedbyhexnumbers.Sinceoctalisformedbygroupsofthree,weareusuallyleftwiththose peskyremainderstodealwith.Wealwaysseemtohaveanextra1,2,or3asthemostsignificant octaldigit.Ifthecomputerdesignershadsettledon15and33bitsforthebuswidthsinsteadof 16and32bits,perhapsoctalwouldstillbealiveandkicking.Also,hexadecimalrepresentationis afarmorecompactwayofrepresentingnumber,soithasbecometoday’sstandard.Figure1.19 summarizesthevarioussizesofdataelementsastheystandtodayandasthey’llprobablybeinthe nearfuture. Bit (1)
Todaywealreadyhave D3 D0 computersthatcanmaNibble (4) nipulate64-bitnumbers inoneoperation.The D7 D0 Byte (8) Athlon64®fromAdvanced MicroDevicesCorporaD15 D0 Word (16) tionisonesuchprocessor. Anotherexampleisthe D31 D0 processorintheNintendo Long (32) N64GameCube®.Also, ifyouconsideryourselfa D63 D0 Double (64) PCGamer,andyoulike toplayfastactionvideo D127 D0 gamesonyourPC,then VLIW (128) youlikelyhaveahighperformancevideocardin Figure1.19:Sizeofthevariousdataelementsinacomputersystem. yourgamemachine.Itis likelythatyourvideocardhasavideoprocessingcomputerchiponitthatcanprocess128bitsat atime.Isa256-bitprocessorfarbehind?
23
Chapter1 FractionalNumbers Wedealwithfractionsinthesamemanneraswe dealwithwholenumbers.Forexample,consider Figure1.20.
102
101
100
5
6
7
.
10−1
10−2
10−3
10−4
4
3
2
1
Figure1.20:Representingafractionalnumber inbase10.
Weseethatforadecimalnumber,thecolumnsto therightofthedecimalpointgoinincreasingnegativepowersoften.Wewouldapplythesame methodsthatwejustlearnedforconvertingbetweenbasestofractionalnumbers.However,having saidthat,itshouldbementionedthatfractionalnumbersarenotusuallyrepresentedthiswayina computer.Anyfractionalnumberisimmediatelyconvertedtoafloatingpointnumber,orfloat.The floating-pointnumbershavetheirownrepresentation,typicallyasa64-bitvalueconsistingofa mantissaandanexponent.Wewilldiscussfloatingpointnumbersinalaterchapter. Binary-CodedDecimal There’sonelastformofbinarynumberrepresentationthatweshouldmentioninpassing,mostly forreasonsofcompleteness.Intheearlydaysofcomputerswhentherewasatransitiontaking placefrominstrumentationbasedondigitallogic,butnottrulycomputer-based,astheyaretoday, itwasconvenienttorepresentnumbersinaformcalledbinarycodeddecimal,orBCD.ABCD numberwasrepresentedas4binarydigits,justlikeahexnumber,exceptthehighestnumberin thesequenceis9,ratherthanF.DeviceslikecountersandmetersusedBCDbecauseitwasaconvenientwaytoconnectadigitalvaluetosomesortofadisplaydevice,likea7-segmentdisplay. Figure1.21showsthedigitsofaseven-segmentdisplay. Theseven-segmentdisplayconsistsof7barsandusuallyalsocontainsadecimalpointandeachof theelementsisilluminatedbyalightemittingdiode(LED).Figure1.21showshowthe7-segment displaycanbeusedtoshowthenumbers0through9.Infact,withalittlecreativity,itcanalso showthehexadecimalnumbersAthroughF.
BCDwasaneasywaytoconvertdigitalcountersandvoltmeterstoaneasytoreaddisplay. Imaginewhatyourreactionwouldbeif theRadioShack®voltmeterread7Avolts, insteadof122volts.Figure1.21shows whathappenswhenwecountinBCD.The numbersthataredisplayedmakesenseto 0100 0101 0000 0001 0010 0011 usbecausetheylooklikedecimaldigits. carry the Whenthecountreaches1001(9),thenext one incrementcausesthedisplaytorollaround to0andcarrya1,insteadofdisplayingA. 0001 0000 0110 0111 1000 1001 Manymicroprocessorstodaystillcontain vestigesofthetransitionperiodfromBCD Figure 1.21: Binary coded decimal (BCD) number tohexnumbersbycontainingspecial representation.Whenthenumberexceedsthecount instructions,suchasdecimaladdadjust,that of9,acarryoperationtakesplacetothenextmost areusedtocreatealgorithmsforimplement- significant decade and the binary digits roll around tozero. ingBCDarithmetic. 24
IntroductionandOverviewofHardwareArchitecture
ConvertingDecimalstoBases Convertingadecimalnumbertobinary,octalorhexisabitmoreinvolvedbecausebase10does nothaveanaturalrelationshiptoanyoftheotherbases.However,itisaratherstraightforward processtodescribeanalgorithmtousetotranslateanumberindecimaltoanotherbase.Forexample,let’sconvertthedecimalnumber,38,070tohexadecimal. 1. Findthelargestvalueofthebase(inthiscase16),raisedtoanintegerpowerthatisstill lessthanthenumberthatyouaretryingtoconvert.Inordertodotheconversion,we’ll needtorefertothetableofpowersof16,shownbelow.Fromthetableweseethatthe number38,070isgreaterthan4,096butlessthan65,536.Therefore,weknowthatthe largestcolumnvalueforourconversionis163. 160=1 164=65,536
161=16 165=1,048,576
162=256 166=16,777,216
163=4096 167=268,435,456
2. 3.
Performanintegerdivisiononthenumbertoconvert: a. 38,070DIV4096=9 b. 38,070MOD4096=1206 Themostsignificanthexdigitis9.Repeatstep1withtheMOD(remainder)fromstep1. 256islessthan1206and4096isgreaterthan1206. a. 1206DIV256=4 b. 1206MOD256=182 4. Thenextmostsignificantdigitis4.Repeatstep2withtheMODfromstep2.182is greaterthan16butlessthan256. a. 182DIV16=11(B) b. 4b182MOD16=6 5. ThethirdmostsignificantdigitisB.Wecanstopherebecausetheleastsignificantdigitis, byinspection, 6. Therefore:38,07010=94B616 Beforewemoveontothenexttopic,logicgates,itisworthwhiletosummarizewhywedidwhat wedid.Itwouldn’ttakeyouverylong,writing32-bitnumbersdowninbinary,torealizethat therehastobeabetterway.Thereisabetterway,hexandoctal.Hexadecimalandoctalnumbers, becausetheirbases,16and8respectively,haveanaturalrelationshiptobinarynumbers,base2. Wecansimplifyournumbermanipulationsbygatheringthebinarynumbersintogroupsofthree orfour.Thus,wecanwritea32-bitbinaryvaluesuchas10101010111101011110000010110110 inhexasAAF5E0B6. However,rememberthatwearestilldealingwithbinaryvalues.Onlythewaywechoosetorepresentthesenumbersisdifferent.Asyou’llseeshortly,thisnaturalrelationshipbetweenbinaryand hexalsoextendstoarithmeticoperationsaswell.Toprovethistoyourself,performthefollowing hexadecimaladdition: 0B+1A=25(Remember,that’s25inhex,notdecimal.25inhexis37indecimal.) Nowconvert0Band1Atobinaryandperformthesameaddition.Rememberthatinbinary 1+1=0,withacarryof1. 25
Chapter1 Sinceitisveryeasytomistakenumbersinhex,binaryandoctal,assemblersandcompilersallow youtoeasilyspecifythebaseofanumber.InCandC++werepresentahexnumber,suchas AA55,by0xAA55.Inassemblylanguage,weusethedollarsign.Sothesamenumberinassemblylanguageis$AA55.However,assemblylanguagedoesnothaveastandard,suchasANSIC, sodifferentassemblersmayusedifferentnotation.Anothercommonmethodistoprecedethehex numberwithazero,ifthemostsignificantdigitisAthroughF,andappendan“H”tothenumber. Thus$AA55couldberepresentedas0AA55Hbyadifferentvendor’sassembler.
EngineeringNotation Whilemoststudentshavelearnedthebasicsofusingscientificnotationtorepresentnumbers thatareeitherverysmallorverylarge,noteveryonehasalsolearnedtoextendscientificnotation somewhattosimplifytheexpressionofmanyofthecommonquantitiesthatwedealwithindigital systems.Therefore,let’stakeaverybriefdetourandcoverthistopicsothatwe’llhaveacommon startingpoint.Forthoseofyouwhoalreadyknowthis,youmaytakeyourbiobreakabout 10minutesearlier. Engineeringnotationisjustashorthandwayofrepresentingverylargeorverysmallnumbersina formatthatlendsitselftocommunicationsimplicityamongengineers.Let’sstartwithanexample thatIrememberfromanatureprogramaboutbatsthatIsawonTVanumberofyears.Batslocate insectsinabsoluteblacknessbyusingtheechoesfromultrasonicsoundwavethattheyemit.An insectreflectsthesoundburstsandthebatisabletolocatedinner.WhatIrememberisthenarrator sayingthatthenervoussystemofthebatissogoodatecholocationthatthebatcandiscernsound pulsesthatarrivelessthanafewmillionthsofasecondapart.Wow! Anyway,whatdoesa“fewmillionths”mean?Let’ssaythatafewmillionthsis5millionths.That’s 0.000005seconds.Inscientificnotation0.000005secondswouldbewrittenas5×10–6seconds. Inengineeringnotationitwouldbewrittenas5µs,andpronounced5microseconds.Weusethe Greeksymbol,µ(mu)torepresentthemicroportionofmicroseconds.Whatarethesymbolsthat wemightcommonlyencounter?Thetablebelowliststhecommonvalues: TERA=1012(T) GIGA=109(G) MEGA=106(M) KILO=103(K)
PICO=10-12(p) NANO=10-9(n) MICRO=10-6(µ) MILLI=10-3(m) FEMTO=10-15(f)
So,howdowechangeanumberinscientificnotationtoanequivalentoneinengineeringnotation?Here’stherecipe: 1. Adjustthemantissaandtheexponentsothattheexponentisdivisibleby3andthemantissaisnotafraction.Thus,3.05×104bytesbecomes30.5×103andnot0.03×106. 2. Replacetheexponenttermswiththeappropriatedesignation.Thus,30.5×103bytes becomes30.5Kbytes,or30.5Kilobytes. About99.99%ofthetime,theunitwillbewithintheexponentrangeof±12.However,ascomputersgeteverfaster,wewillbemeasuringtimesinthefractionsofapicosecond,soit’sappropriate 26
IntroductionandOverviewofHardwareArchitecture toincludethefemtosecondonourtable.Asanexercise,trytocalculatehowfarlightwilltravel inonefemtosecond,giventhatlighttravelsatarateofabout15cmpernanosecondonaprinted circuitboard. Althoughwediscussedthisearlier,itshouldbementionedagaininthiscontextthatwehavetobe carefulwhenweusetheengineeringtermsforkilo,megaandgiga.That’saproblemthatcomputerfolkhavecreatedbymisappropriatingstandardengineeringnotationsfortheirownuse.Since 210=1024,computer“Geekspeakers”decidedthatitwasjusttoocloseto1000toletitgo,sothe overloadedtheK,MandGsymbolstomean1024,1048576and1073741824,respectively,rather than1000,1000000or1000000000,respectively. Fortunately,werarelygetmixedupbecausethecomputerdefinitionisusuallyconfinedtomeasurementsinvolvingmemorysize,orbytecapacities.Anytimethatwemeasureanythingelse, suchasclockspeedortime,weusetheconventionalmeaningoftheunits.
SummaryofChapter1 • Thegrowthmoderndigitalcomputerprogressedrapidly,drivenbytheimprovements madeinthemanufacturingofmicroprocessorsmadefromintegratedcircuits. • Thespeedofcomputermemoryhasaninverserelationshiptoitscapacity.Thefastera memory,thecloseritistothecomputercore. • Moderncomputersarebasedupontwobasicdesigns,CISCandRISC. • Sinceanelectroniccircuitcanswitchonandoffveryquickly,wecanusethebinarynumberssystem,orbase2,toasthenaturalnumbersystemforourcomputer. • Binary,octalandhexadecimalarethenaturalnumberbasesofcomputersandthereare simplewaystoconvertbetweenthemanddecimal.
Chapter1:Endnotes CarverMead,LynnConway,IntroductiontoVLSISysyems,ISBN0-2010-4358-0,Addison-Wesley, Reading,MA,1980.
1
Foranexcellentandhighlyreadabledescriptionofthecreationofanewsuperminicomputer,seeTheSoulofaNew Machine,byTracyKidder,ISBN0-3164-9170-5,Little,BrownandCompany,1981.
2
DanielMann,PrivateCommunication.
3
http://www.seagate.com/cda/products/discsales/enterprise/family/0,1086,530,00.html.
4
DavidA.PattersonandJohnL.Hennessy,ComputerOrganizationandDesign,SecondEdition,ISBN1-5586-0428-6, MorganKaufmann,SanFrancisco,CA,p.30.
5
DanielMann,PrivateCommunication.
6
http://www.specbench.org/cgi-bin/osgresults?conf=cint95.
7
27
ExercisesforChapter1 1. DefineMoore’sLaw.WhatistheimplicationofMoore’sLawinunderstandingtrendsincomputerperformance?Limityouranswertonomorethantwoparagraphs. 2. SupposethatinJanuary,2004,AMDannouncesanewmicroprocessorwith100million transistors.AccordingtoMoore’sLaw,whenwillAMDintroduceamicroprocessorwith200 milliontransistors? 3. Describeanadvantageandadisadvantageoftheorganizationofacomputeraroundabstractionlayers. 4. WhataretheindustrystandardbussesinatypicalPC? 5. Supposethattheaveragememoryaccesstimeis35nanoseconds(ns)andtheaverageaccess timeforaharddiskdriveis12milliseconds(ms).Howmuchfasterissemiconductormemory thanthememoryontheharddrive? 6. Whatisthedecimalnumber357inbase9? 7. Convertthefollowinghexadecimalnumberstodecimal:
(a)0xFE57 (b)0xA3011 (c)0xDE01 (d)0x3AB2
8. Convertthefollowingdecimalnumberstobinary:
(e)510 (f)64,200 (g)4,001 (h)255
9. Supposethatyouweretravelingat14furlongsperfortnight.Howfastareyougoinginfeet persecond?Expressyouranswerinengineeringnotation.
28
2
CHAPTER
Introductionto DigitalLogic Objectives
Learntheelectroniccircuitbasisfordigitallogicgates; UnderstandhowmodernCMOSlogicworks; Becomefamiliarwiththebasicsoflogicgates.
RememberthesimplebatteryandflashlightcircuitwesawinFigure1.14?Thetwoon/off switches,wiredastheywereinseries,implementedthelogicalANDfunction.Wecanexpressthat exampleas,“IfswitchAisclosedANDswitchBisclosed,THENlampCwillbeilluminated.” Admittedly,thisisprettyfarremovedfromyourPC,butthelogicalfunctionimplementedbythe twoswitchesinthiscircuitisoneofthefourkeyelementsofamoderncomputer. Itmaysurpriseyoutoknowthatalloftheprimarydigitalelementsofamoderncomputer,the centralprocessingunit(CPU),memoryandI/Ocanbeconstructedfromfourprimarylogical functions:AND,OR,NOTandTri-State(TS).NowTSisnotalogicalfunction,itisactuallycloser toanelectroniccircuitimplementationtool.However,withouttri-statelogic,moderncomputers wouldbeimpossibletobuild. Aswe’llsoonsee,tri-statelogicintroducesathirdlogicalconditioncalledHi-Z.“Z”istheelectronicsymbolforimpedance,ameasureoftheeasybywhichelectricalcurrentcanflowina circuit.So,Hi-Z,seemstoimplyalotofresistancetocurrentflow.Asyou’llsoonsee,thisiscriticalforbuildingoursystem. Havingsaidthatwecouldbuildacomputerfromthegroundupusingthefourfundamentallogicgates: AND,OR,NOTandTS,itdoesn’tnecessarilyfollowthatwewouldbuilditthatway.Thisisbecause engineerswillusuallytakeimplementationshortcutsindesigningmorecomplexfunctionsandthese designefficiencieswilltendtoblurthedistinctionsbetweenthefundamentallogicelements.However, itdoesnotdiminishtheconceptualimportanceofthesefourfundamentallogicalfunctions. Inwritingthis,IwasstruckbythesimilaritybetweentheDNAmoleculeanditsfournucleotides: adenine,cytosine,guanine,andthymine;abbreviated,A,C,GandT,andthefactthatacomputer canalsobedescribedbyfour“electronicnucleotides.”Weshouldn’tlooktoodeeplyintothiscoincidencebecausethedifferencescertainlyfaroutweighthesimilarities.Anyway,it’sfuntoimagine an“electronicDNAmolecule”ofthefuturethatcanbeusedasablueprintforreplicatingitself. Couldtherebeascience-fictionnovelinthissomewhere?Anyway,Figure2.1 29
Chapter2 EnoughoftheDNAanalogy!Let’smoveon.Let memakeonelastpointbeforewedoleave.The keypointofmakingtheanalogyisthateverything thatwewillbedoingfromnowonintherealmof digitalhardwareisbaseduponthesefourfundamentallogicfunctions.Thatmaynotmakeitany easierforyou,butthat’swherewe’reheaded.
AND TS OR AND TS NOT
OR TS NOT OR
C
G
G
C
A
T
AT G
AND NOT
C G
C
TS AND
C
NOT OR OR TS
T
AND NOT NOT TS
G
A T A
A T
Figure2.2isaschematicdiagramofadigital TA logicgate,whichcanexecutethelogical“AND” OR TS G C AND NOT C G function.Thesymbol,representedbythelabel OR AND A T F(A,B)isthestandardsymbolthatyouwoulduse Figure2.1:Thinkingaboutacomputerbuiltfrom torepresentanANDgateintheschematicdiagramforadigitalcircuitdesign.Theoutput,C,is “logicalDNA”ontheleftandschematicpictureof afunctionofthetwobinaryinputvariablesAand aportionofarealDNAmoleculeontheright.(DNA B.AandBarebinaryvariables,butinthiscircuit moleculepicturefromtheDolanLearningCenter, ColdSpringsHarborLaboratory1.) theyarerepresentedbyvaluesofvoltage.Inthe GATE FUNCTION caseofpositivelogic,wherethemore A positive(higher)voltageisa“1”andthe F(A,B) C = F(A,B) lesspositive(lower)voltageisa“0”,this B circuitelementismostcommonlyused incircuitswherealogic“0”isrepreOutput signal Input signals to the gate from the gate sentedbyavoltageintherangeof0to 0.8voltsanda“1”isavoltagebetween Figure 2.2: Logical AND gate. The gate is an electronic approximately3.0and5.0volts.*From switchingcircuitthatimplementsthelogicalANDfunction 0.8voltsto3.0voltsis“noman’sland.” forthetwoinputvaluesAandB. Signalstravelthroughthisregionveryquicklyontheirwayupordown,butdon’tdwellthere.Ifa logicvalueweretobemeasuredinthisregion,therewouldbeanelectricalfaultinthecircuit. G
C
ThelogicalfunctionofFigure2.2canbedescribedinseveralequivalentways: • IFAisTRUEANDBisTRUETHENCisTRUE • IFAisHIGHANDBisHIGHTHENCisHIGH • IFAis1ANDBis1THENCis1 • IFAisONANDBisONTHENCisON • IFAis5voltsANDBis5voltsTHENCis5volts Thelastbulletisalittleiffybecauseweallowthevaluestoexistinranges,ratherthanabsolute numbers.It’sjusteasiertosay“5volts”insteadof“arangeof3to5volts.” Aswe’vediscussedinthelastchapter,inarealcircuit,AandBaresignalsonindividualwires. Thesewiresmaybeactualwiresthatareusedtoformacircuit,ortheymaybeextremelyfinecopperpaths(traces)onaprintedcircuit(PC)board,asinFigure1.11,ortheymaybemicroscopic pathsonanintegratedcircuit.Inallcases,itisasinglewireconductingadigitalsignalthatmay takeontwodigitalvalues:“0”or“1”. *Assumingthatweareusingtheolder5-voltlogicfamilies,ratherthanthenewer3.3-voltfamilies.
30
IntroductiontoDigitalLogic Thenextpointthatwewanttoconsideriswhywecallthiscircuitelementa“gate.”Youcan imaginearealgateinfrontofyourhouse.Someonehastoopenthegateinordertogainentrance. TheANDgatecanbethoughtofinexactlythesameway.Itisagateforthepassageofthelogical signal.ThepreviousbulletedstatementsdescribetheANDgateasalogicstatement.However,we canalsodescribeitthisway: • IFAis1THENCEQUALSB • IFAis0THENCEQUALS0 Ofcourse,wecouldexchangeAandBbecausetheseinputsareequivalent.WecanseethisgraphicallyinFigure2.3.NoticethestrangelineatinputsBandoutputC.Thisisawaytorepresenta digitalsignalthatisvaryingwithtime.Thisiscalledawaveformrepresentation,oratimingdiagram.Theideaisthatthewaveformisonsomekindofgraphpaper,suchasastripchartrecorder, andtimeischangingonthehorizontalaxis.Theverticalaxisrepresentsthelogicalvalueapointin thecircuit,inthiscasepointsBandC. WhenA=1,thesamesignalappearsatoutputCthatisinputatB.ThechangethatoccursatCas aresultofthechangeatBisnotinstantaneousbecausethespeedoflightisfinite,andcircuitsare notinfinitelyfast.However,intermsofahumanscale,it’sprettyfast.Inatypicalcircuit,ifinput Bwentfroma0valuetoa1,thesamechangewouldoccuratoutputC,delayedbyabout5billionthsofasecond(5nanoseconds). You’veactuallyseenthesetimingdiagramsbeforeinthecontextofanEKG(electrocardiogram) whenaphysicianchecksyourheart.Eachofthesignalsonthechartrepresentstheelectricalvoltageatvariouspartsofyourheartmusclesovertime.Timeistravelingalongthelongaxisofthe chartandthevoltageatanypointintimeisrepresentedbytheverticaldisplacementoftheink. Sincetypicaldigitalsignalschangemuchmorerapidlythanwecouldseeonastripchartrecorder, weneedspecializedequipment,suchasoscilloscopesandlogicanalyzerstorecordthewaveforms anddisplaythemforusinawaythatwecancomprehend.We’lldiscusswaveformsandtiming diagramsintheupcomingchapters. A=1
1
AND
0
C
B
Timing Diagram A=0 1 0
AND
C
B
Figure2.3:ThelogicalANDcircuitrepresentedasagatingdevice.WhentheinputA=1,outputC followsinputB(thegateisopen).WheninputA=0,outputC=0,independentofhowinputBvaries withtime(thegateisclosed).
Thus,Figure2.3showsusaninputwaveformatB.Ifwecouldgetreallysmall,andwehadavery faststopwatchandafastvoltmeter,wecouldimaginethatwearesittingonthewireconnectedto 31
Chapter2 thegateatpointB.AsthevalueofthevoltageatBchanges,wecheckourwatchandplotthevoltageversustimeonthegraph.That’sthewaveformthatisshowninFigure2.3. Alsonoticethattheverticallinethatrepresentsthetransitionfromthelogiclevel0tothelogiclevel 1.Thisiscalledtherisingedge.Likewise,theverticallinethatrepresentsthetransitionfromlogic level1tologiclevel0iscalledthefallingedge.Wetypicallywantthetimedurationsoftherising edgeandfallingedgetobeverysmall,afewbillionthsofasecond(nanoseconds)becauseindigital systems,thespacebetween0and1isnotdefined,andwewanttogetthroughthereasquicklyas possible.That’snottosaythattheseedgesareuselesstous.Quitethecontrary,theyareprettyimportant.Infact,you’llseethevalueoftheseedgeswhenwestudysystemclockslateron. If,inFigure2.3,A=1(upperfigure),theoutputatCfollowstheinputatAwithaslighttime delay,calledthepropagationdelay,becauseitrepresentsthetimerequiredfortheinputchangeto workitsway,orpropagate,throughthecircuitelement.WhenthecontrolinputatA=0,outputC willalwaysequal0,independentofwhateverinputBdoes.Thus,inputBisgatedbyinputA.For now,we’llleavethetimingdiagraminthewonderfulworldofhardwaredesignandreturntoour studiesoflogicgates. Earlier,wetouchedonthefactthattheANDgateisoneofthreelogicgates.We’llkeepthetristategateseparatefornowandstudyitinmoredepthwhenwediscussbusorganizationinalater chapter.Intermsof“atomic,”oruniqueelements,thereareactuallythreetypesoflogicalgates: AND,ORandNOT.Thesearethefundamentalbuildingblocksofallthecomplexdigitallogic circuitstofollow.ConsiderFigure2.4. A AND
C=A*B
C
C is TRUE if A is TRUE AND B is TRUE
B
A OR B
C=A+B
C
C is TRUE if A is TRUE OR B is TRUE
NEGATION SYMBOL
A
NOT
B
B=A
B is TRUE if A is FALSE
Figure2.4:Thethree“atomic”logicgates:AND,ORandNOT.
ThesymbolfortheANDfunctionisthesameasthemultiplicationsymbolthatweuseinalgebra. Aswellseelateron,theANDoperationissimilartomultiplyingtwobinarynumbers,since 1×1=1and1×0=0.Theasteriskisaconvenientsymboltousefor“ANDing”twovariables. ThesymbolfortheORfunctionistheplussign,anditis“sortof”likeaddition,because0+0=0, 1+0=1.Theanalogyfailswith1+1,becauseifwe’readding,then1+1=0(carrythe1)butif weareOR’ing,then1+1=1.OK,sotheyoverloadedthe+sign.Don’tgetangrywithme,I’m onlythemessenger. 32
IntroductiontoDigitalLogic Thenegationsymboltakesmanyforms,mostlybecauseitisdifficulttodrawalineoveravariable usingonlyASCIItextcharacters.InFigure2.4,weusethebaroverthevariableAtoindicatethat theoutputBisthenegation,oroppositeoftheinputA.IfA=1,thenB=0.IfA=0,thenB=1. UsingonlyASCIItextcharacters,youmightseetheNOTsymbolwrittenas,B=~A,orB=/A withtheforwardslashorthetilderepresentingnegation.Negationisalsocalledthecomplement. TheNOTgatealsousesasmallopencircleontheoutputtoindicatenegation.Agatewithasingle inputandasingleoutput,butwithoutthenegationsymboliscalledabuffer.Theoutputwaveform ofabufferalwaysfollowstheinputwaveform,minusthepropagationdelay.Logicallythereisno obviousneedforabuffergate,butelectrically(thosepeskyhardwareengineersagain!)thebuffer isanimportantcircuitelement. We’llactuallyreturntotheconceptofthebuffergatewhenwestudyanalogtodigitalconversion inafuturelesson.UnliketheANDgateandtheORgate,theNOTgatealwayshasasingleinput andasingleoutput.Furthermore,forallofitssimplicity,thereisnoobviouswaytoshowtheNOT gateintermsofasimpleflashlightbulbcircuit.Sowe’llturnourattentiontotheORgate. WecanlookattheORgateinthesamewaywefirstexaminedtheANDgate.We’lluseoursimple flashlightcircuitfromFigure1.14,butthistimewe’llrearrangetheswitchestocreatethelogical ORfunction.Figure2.5isourflashlightcircuit. Connection between two wires indicated with a dot
+
B C A C = A OR B
–
Battery Symbol Lightbulb (load)
Figure2.5:ThelogicalORfunctionimplementedastwoswitchesin parallel.ClosingeitherswitchAorBturnsonthelightbulb,C.
ThecircuitinFigure2.5showsthetwoswitches,AandBwiredinparallel.Closingeitherswitch willallowthecurrentfromthebatterytoflowthroughtheswitchintothebulbandturniton. Closingbothswitchesdoesn’tchangethefactthatthelampisilluminated.Thelampwon’tshine anybrighterwithbothswitchesclosed.Theonlywaytoturnitoffistoopenbothswitchesand interrupttheflowofcurrenttothebulb.Finally,thereisasmalldotinFigure2.5thatindicatethat twowirescometogetherandactuallymakeelectricalcontact.Aswe’vediscussedearlier,since ourschematicdiagramisonlytwodimensional,andprintedcircuitboardsoftenhavetenlayersof electricalconductorsinsulatedfromeachother,itissometimesconfusingtoseetwowirescross eachotherandnotbephysicallyconnectedtogether.Usuallywhentwowirescrosseachother,we getlotsofsparksandthehousegoesdark.Inourschematicdiagrams,we’llrepresenttwowires thatactuallytoucheachotherwithasmalldot.Anyotherwiresthatcrosseachotherwe’llconsidertobeinsulatedfromeachother. 33
Chapter2 Westillhaven’tlookedatthetri-statelogicgate.Perhapsweshould,eventhoughthereasonfor thegate’sexistencewon’tbeobvioustoyounow.So,attheriskoflettingthecatoutofthebag, let’slookatthefourthmemberofouratomicgroupoflogicelements,thetri-statelogicgate shownschematicallyinFigure2.6. INPUT
Truth table for bus interface gate
Tri-state logic gate
Logic Gate
OUTPUT ENABLE OUTPUT
1
0
1
0
0
0
1
1
Hi-Z
0
1
Hi-Z
1 or 0 or Hi-Z
1 or 0
Output Enable
Figure2.6:Thetri-state(TS)logicgate.Theoutputofthegatefollowstheinputalongas theOutputEnable(OE)inputislow.WhentheOEinputgoeshigh,theoutputofthegate enterstheHi-Zstate.
Thereareseveralimportantconceptsheresoweshouldspendsometimeonthisfigure.First,the schematicdiagramforthegateappearstobethesameasaninverter,exceptthereisnonegation bubbleontheoutput.Thatmeansthatthelogicsenseoftheoutputfollowsthelogicsenseofthe input,afterthepropagationdelay.Thatmakesthisgateabuffergate.Thisdoesn’timplythatwe couldn’thaveatri-stategatewithabuilt-inNOTgate,butthatwouldnotbeatomic.Thatwouldbe acompoundgate. TheTSgatehasathirdinput,labeledOutputEnable.Theinputonthegatealsohasanegation bubble,indicatingthatthegateisactivelow.We’veintroducedanewtermhere.Whatdoes“active low”mean?Inthepreviouschapter,wemadeapassingreferencetothefactthatlogiclevels weresomewhatarbitrary.Forconvenience,weweregoingmakethemorepositivevoltagea1,or TRUE,andthelesspositivevoltagea0,orFALSE.Wecallthisconventionpositivelogic.There’s nothingspecialaboutpositivelogic,itissimplytheconventionthatwe’veadopted. However,therearetimeswhenwewillwanttoassignthe“TRUENESS”ofalogicstatetobe low,or0.Also,thereisamorefundamentalissuehere.Eventhoughwearedealingwithalogical condition,themeaningofthesignalisnotsoclear-cut.Therearemanyinstanceswherethesignal isusedasacontrolling,orenablingdevice.Undertheseconditions,TRUEandFALSEdon’treally applyinthesamewayastheywouldifthesignalwaspartofacomplexlogicalequation.Thisis thesituationwearefacedwithinthecaseofatri-statebuffer. TheOutputEnable(OE)inputtothetri-statebufferisactivewhenthesignalisinitslowstate.In thecaseoftheTSbuffer,whentheOEinputislow,thebufferisactive,andtheoutputwillfollow 34
IntroductiontoDigitalLogic theinput.Now,atthispointyoumightbegettingreadytosay,“WhoathereBucko,that’sanAND gate.Isn’tit?”andyou’dalmostbecorrect.InFigure2.3,weintroducedtheideaoftheANDlogic functionasagateandwhentheAinputwas0,theoutputofthegatewasalso0,independentof thelogicstateoftheBinput.TheTSbufferisdifferentinacriticalway.WhenOEishigh,froman electricalcircuitpointofview,theoutputofthegateceasestoexist.Itisjustasifthegatewasn’t there.ThisistheHi-Zlogicstate.So,inFigure2.3wehavetheuniquesituationthattheTSbuffer actsbehaveslikeaclosedswitchwhenOEislow,anditactslikeanopenswitchwhenOEishigh. Inotherwords,Hi-Zisnota1or0logicstate,itisauniquestateofitown,andhaslesstodowith digitallogicthenwiththeelectronicrealitiesofbuildingcomputers.We’llreturntotri-statebuffers inalaterchapter,staytuned! There’sonelastnewconceptthatwe’veintroducedinFigure2.6.Noticethetruthtableinthe upperrightofthefigure.Atruthtableisashorthandwayofdescribingallofthepossiblestates ofalogicalsystem.Inthiscase,theinputtotheTSbuffercanhavetwostatesandtheOEcontrol inputcanhavetwostates,sowehaveatotaloffourpossiblecombinationsforthisgate.WhenOE islow,theoutputagreeswiththeinput,whenOEishigh,theoutputisintheHi-Zlogicstateand theinputcannotbeseen.Thus,we’vedescribedinatidylittlechartallofthepossibleoperational statesofthisdevice. WenowhaveinourvocabularyoflogicalelementsAND,OR,NOTandTS.JustlikethebuildingblocklifeinDNA,thesearethebuildingblocksofdigitalsystems.Inactuality,thesethree gatesaremostoftencombinedtoformslightlydifferentgatescalledNAND,NORandXOR.The NAND,NORandXORgatesarecompoundgates,becausetheyareconstructedbycombining theAND,ORandNOTgates.Electrically,thesecompoundcircuitsarejustasfastastheprimary circuitsbecausethecompoundfunctioniseasilyimplementedbyitself.Itisonlyfromalogical perspectivedowedrawadistinctionbetweenthem. Figure2.7shows A NAND C thecompoundgates A NOT C NOT AND B B NANDandNOR. TheNANDgateis C is FALSE if A is TRUE AND B is TRUE C=A*B anANDgatefollowedbyaNOT A A NOT C OR NOR C gate.Thelogical B B functionofthe NANDgatemaybe C is FALSE if A is TRUE OR B is TRUE C=A+B statedas: Figure 2.7: A schematic representation of the NAND and NOR gates as a • OUTPUTC combinationoftheANDgatewiththeNOTgate,andtheORgatewiththe goesLOWif NOTgate,respectively. andonlyif inputAisHIGHANDinputBisHIGH. ThelogicalfunctionoftheNORgatemaybestatedas: • OUTPUTCgoesLOWifinputAisHIGH,orinputBisHIGH,orifbothinputsAandB areHIGH. 35
Chapter2 Finally,wewanttostudyonemorecompoundgateconstruct,theXORgate.XORisashorthand notationfortheexclusiveORgate(pronouncedas“exor”).TheXORgateisalmostlikeanOR gateexceptthattheconditionwhenbothinputsAandBequals1willcausetheoutputCtobe0, ratherthan1. Figure2.8illustratesthecircuitdiagramfortheXORcompoundgate.Sincethisisalotmore complexthananythingwe’veseensofar,let’stakeourtimeandwalkthroughit.TheXORgate hastwoinputs,AandB,andasingleoutput,C.InputAgoestoANDgate#3andtoNOTgate#1, whereitisinverted.Likewise,inputBgoestoANDgate#4anditscomplement(negation)goes toANDgate#3.Thus,eachoftheANDgateshasasitsinputoneofthevariablesAorB,andthe complement,ornegationoftheothervariable,AorB,respectively.Asanaside,youshouldnow appreciatethevalueoftheblackdotontheschematicdiagram.Withoutit,wewouldnotbeableto discernwiresthatareconnecttoeachotherfromwiresthataresimplycrossingovereachother. A
A
3 1
A* B A
B
XOR
B
C=A⊕B 5
2
Physical Connection
C
C = A *B + A * B A
C=A⊕B 4
B
C
B
A*B
C is TRUE if A is TRUE OR B is TRUE, but not if A is TRUE AND B is TRUE
Figure2.8:SchematiccircuitdiagramforanexclusiveOR(XOR)gate.
Thus,theoutputofANDgate#3canberepresentedasthelogicalexpressionA*Bandsimilarly, theoutputofANDgate#4isB*A.Finally,ORgate#5isusedtocombinethetwoexpressions andallowustoexpresstheoutputvariableC,asafunctionofthetwoinputvariables,AandBas, C=A*B+B*A.ThesymbolforthecompoundXORgateisshowninFigure2.8astheORgate withanaddedlineontheinput.TheXORsymbolistheplussignwithacirclearoundit. Let’swalkthroughthecircuittoverifythatitdoes,indeed,dowhatwethinkitdoes.Supposethat AandBareboth0.ThismeansthatthetwoANDgatesseeoneinputasa0,sotheiroutputsmust bezeroaswell.Gate#5,theORgate,hasbothinputsequalto0,soitsoutputisalso0.IfAandB areboth1,thenthetwoNOTgates,#1and#2,negatethevalue,andwehavethesamesituationas before,eachANDgatehasoneinputequalto0. Inthethirdsituation,eitherAis0andBis1,orviceversa.Ineithercase,oneoftheANDgates willhavebothofitsinputsequalto1sotheoutputofthegatewillalsobe1.Thismeansthat atleastoneinputtotheORgatewillbe1,sotheoutputoftheORgatewillalsobe1.Whew! AnotherwaytodescribetheXORgateistosaythattheoutputisTRUEifeitherinputAisTRUE ORinputBisTRUE,butnotifbothinputsAandBareTRUE. 36
IntroductiontoDigitalLogic Youmightbeaskingyourself,“What’sallthefussabout?”Asyou’llsoonsee,theXORgate formsoneofthekeycircuitelementsofacomputer,theadditioncircuit.Inordertounderstand this,let’ssupposethatweareaddingtwosingle-bitbinarynumberstogether.Wecanhavethe followingpossibilitiesforAandB: • A=0andB=0:A+B=0 • A=1andB=0:A+B=1 • A=0andB=1:A+B=1 • A=1andB=1:A+B=0,carrythe1 TheseconditionslooksuspiciouslylikeAXORB,providedthatwecansomehowfigureoutwhat todoaboutthe“carry”situation. Also,noticeinFigure2.8thatwe’veshownthattheoutputoftheXORgate,C,canbeexpressed asalogicalequation.Inthiscase,
C=A*B+B*A
ThisisanotherreasonwhywecalltheXORgateacompoundgate.Wecandescribeitintermsof alogicalequationofthemoreatomicgatesthatwe’vepreviouslydiscussed. NowsupposethatwemodifytheXORgateslightlyandaddaNOTgatetoitaftertheoutput.In effect,we’llconstructanXNORgate.Inthissituationtheoutputwillbe1,ifbothinputsarethe same,andtheoutputwillbe0iftheinputsaredifferent.Thus,we’vejustconstructedacircuit elementthatdetectsequality.With32XNORgates,wecanimmediatelytell(aftertheappropriate propagationdelay)iftwo32-bitnumbersareequaltoeachotheroraredifferentfromeachother. Justaswemightuse32XNORgatestocomparethevaluesoftwo32-bitnumbers,let’sseehow thatmightalsoworkifwe 0x55 0xAA 0x00 wantedtodoalogicalAND 1 0 AND 0 operationontwo32-bit 0 numbers.Youalreadyknow 0 AND 1 fromyourotherprogramming 1 AND 0 0 coursesthatyoucandoa 0 logicalANDoperationon AND 0 1 variablesthatevaluateto 1 AND 0 TRUEorFALSE(Booleans), 0 0 butwhatdoesANDingtwo 0 AND 1 32-bitnumbersmean?In 1 0 AND thiscase,theANDoperation 0 0 iscalledabitwiseAND, AND 0 1 becauseitANDstogetherthe Logical Equation: 0xAA AND 0x55 = 0x00 correspondingbitpairsofthe twonumbers.RefertoFigure Figure 2.9: Bitwise AND operation on two 8-bit (byte) values. The 2.9.ForthesakeofsimplicoperationperformediscalledabitwiseANDbecauseitoperateson ity,we’llshowthebitwise thecorrespondingbitpairsofthetwonumbers.IntheCandC++ ANDoperationontwo8-bit language,ahexadecimalnumberisrepresentedwiththeprefix0x. 37
Chapter2 numbers,ratherthantwo32-bitnumbers.Theonlydifferenceinthetwosituationsisthenumber ofANDgatesusedintheoperation. InFigure2.9,weareperformingthebitwiseANDoperationonthetwobytevalues0xAAand 0x55.SinceeachANDgatehasoneinputequalto1andtheotherequalto0,everyANDgatehas a0foritsoutput.InC/C++,theampersand,&,isthebitwiseANDoperator,sowecanwritethe codesnippet: CodeExample charinA=0xAA; charinB=0x55; charoutC=inA&inB; cout output disconnected outputfromthebus.Ifallofthe outputsofthevariousfunctional Figure6.5:Mechanicalswitchrepresentationofbusinterfacelogic. blocksaredisconnectedexcept one,thentheoneoutputthatisconnectedtothebuscan“drivethebus”eitherhighorlow.Since itistheonlytalkerandeveryotherdeviceisa“listener”(onetoN)therewillnotbeanyconflicts fromotheroutputdevices.Alloftheotheroutputs,whichremaindisconnectedbetheirelectronic switches,havenoimpactonthestateofthebussignal. Asyouknow,withthetri-statebuffers,wearen’treallybreakingtheelectricalcontinuityofthe gatestothebus;wearesimplymakingarapidchangeintheelectricalconductivityofasmall portionofthecircuitthatresidesbetweenthelogicaloutputofthegateandthebussignalthat itisconnectedtothecircuit.InFigure6.5,wesimplifythisbitofcircuitmagicbydrawingthe connect/disconnectportionofthecircuitasifitwasaphysicalswitch,likethelightswitchonthe wallofaroom.However,keepinmindthatwecannotopenandclosearealmechanicalswitchas nearlyasquicklyorascleanlyaswecangofromhighimpedance(nosignalflow)tolowimpedance(connecttothebus). Itisgenerallytruethatthebuscontrolsignalwillbeactivelow.Whateverlogicstate(1or0)we wanttoplaceonthebus,wemustfirstbringthebuscontrolforthatgate,functionblock,memory cellandsoforth,lowtoconnectthegatetothebussotheoutputofthelogicaldevicecanbeconnectedtothebus.Thissignalhasseveralnames.Sometimesitiscalledoutputenable(OE),chip enable(CE)orchipselect(CS).Thesesignalsarenotallthesame,and,aswe’llsoonsee,may performdifferenttaskswithinthedevice,buttheyallhavethecommonpropertyofdisablingthe outputofthedeviceanddisconnectingitfromthebus. ReferringbacktoFigure2.6,thetri-statelogicgate,weseethatwhentheOE,orbuscontrol signal,isLOW,theoutputofthebusinterface(BUSI/F)portionofthecircuitfollowstheinput. 126
BusOrganizationandMemoryDesign WhentheCSsignalisHIGH,theoutputoftheBUSI/FgoestoHi-Z,effectivelyremovingthe gatefromthecircuit.Onefinalpointneedstobeemphasized.Thetri-statepartofthecircuit doesn’tchangethelogicalstateofthegate,ormemorydevice,orwhateverelseisconnectedtoit. Itsonlyfunctionistoisolatetheoutputofthegatefromthebussothatanotherdevicemaytake controlandsendasignalonthebus. DB0 Let’snowlookat howwecanbuildour DB1 DB2 singlebitbusintoa DB3 Connection Connection Symbol realdatabus.Refer DB4 Symbol toFigure6.6.Here DB5 weseethatthebusis DB6 actuallyagroupingof DB7 eightsimilarsignals. Inthiscase,itrepresents8databits.So,if Bus Bus Interface Interface weweretotakeapart CE ourVCRandlookat Actual Actual Logic inputs themicroprocessor Logic inputs insideofit,wemight DB7 DB6 DB5 DB4 DB3 DB2 DB1 DB0 findan8-bitmicroprocessorinside.The Figure6.6:Tri-statebusorganizationforan8-bitwidedatabus. numberofbitsofthe databusreferstothesizeofanumberthatwecanfetchinoneoperation.Recallthat8-bitsallow ustorepresentnumbersfrom0to255,or28.
Thebusisstillmadeupofeightindividualsignalwiresthatareelectricallyisolatedfromeach other.Thisisimportanttorememberbecause,inordertokeepourdrawingssimple,we’llusuallydrawabusasoneline,ratherthan8,16,or32lines.Eachwireofthebusislabeledwiththe correspondingdatabit,DB0throughDB7,withDB0representingthenumber20,or1,andDB7 representingthenumber27,or128. Insidethedottedlineisourdevicethatisconnectedtothebus.Eachdatabitinsidethedevice connectswiththecorrespondingdatalineonthebusthroughthetri-statebuffer.Noticehowthe CEsignalisconnectedtoallofthetri-statebuffersatonce.BybringingCELOW,all8individual databitsaresimultaneouslyconnectedtothedatabus.Alsonotethattheeighttri-statebuffers (drawnastriangleswiththecircleontheside)innowaychangethevalueofthedata.Theysimply connectthesignalsontheeightdatabits,DB0...DB7,insideofthefunctionalblock,tothecorrespondingdatalinesoutsideofthefunctionalblock.Don’tconfusethetri-statebufferwiththe NOTgate.TheNOTgateinvertstheinputsignalwhilethetri-statebuffercontrolswhetherornot thesignalisallowedtopropagatethroughthebuffer. Figure6.7takesthisconceptonestepfurther.Herewehavefour32-bitstorageregistersconnected toa32-bitdatabus.Noticehowthedatabits,D0...D31aredrawntoshowthattheindividual bitsbecomethe32-bitbus.Figure3.36alsoshowsanewlogicelement,theblocklabeled 127
Chapter6
Table6.1isthetruthtableforthe2:4decodercircuitshowninFigure6.7.
32-bit register
32-bit register
32-bit register
32-bit register
“2:4Address Data Bus D0..D31 Decoder.”Recall D0 D0 D0 D0 thatweneedsome waytodecidewhich A0 devicecanputits A1 dataontothebus becauseonlyone D31 D31 D31 D31 outputmaybeonat atime.Thedecoder CS2 CS3 CS1 CS0 2:4 Address Decoder circuitdoesjust that.Imaginethat Figure6.7:Four32-bitstorageregistersconnectedtoabus.The2:4decoder twosignals,A0 takesthe2inputvariables,A0andA1,andselectsonly1ofthe4outputs, andA1,comefrom CS0..CS3toenable. someotherpartof thecircuitandareusedtodeterminewhichofthefour,32-bitregistersshowninthefigureshould beconnectedtothebusatanygiventime.Thetwo“address”inputbits,labeledA0andA1,give usfourpossiblecombinations.Thus,wecanthencreatesomerelativelysimplelogicgatecircuittoconvertthecombinationsofourinputs,A0andA1,to4possibleoutputs,thecorrectchip selectbit,CS0,CS1,CS2orCS3.Thiscircuitdesignisabitdifferentfromwhatyou’realready accustomedtobecausewewantour“TRUE”outputtogolow,ratherthangohigh.Thisisaconsequenceofthefactthattri-statelogicisusuallyassertedwithalowgoingsignal. Table6.1:Truthtablefora2:4decoder. A0
A1
CS0
CS1
CS2
CS3
0 0 0 1 1 1 Aswe’vepreviouslydiscussed,itiscommontosee 1 0 1 0 1 1 thecircledrawnattheinputtothetri-stategate. 0 1 1 1 0 1 Thismeansthatthesignalisactivelow.Justasthe 1 1 1 1 1 0 circleontheoutputsoftheinvertergate—NAND gateandNORgate—indicateasignalinversion,the circleontheinputtoagateindicatesthatthesignal isasserted(TRUE)whenitisLOW.Thisgoesbacktoourearlierdiscussionabout1and0being ratherarbitraryintermsofwhatisTRUEandwhatisFALSE.
Beforewemoveontolookatmemoryorganization,let’srevisitthealgorithmicstatemachine fromthepreviouschapter.Withtheintroductionoftheconceptofbussesinthischapter,wenow haveabetterbasisofunderstandingtoseehowtheoperationofabus-basedsystemmightbe controlledbythestatemachine.Figure6.8showsusasimplifiedschematicdiagramofpartofthe controlsystemofacomputingmachine.Eachoftheregistersshown,RegisterA,RegisterB,TemporaryRegisterandtheOutputRegisterhaveindividualcontrolstoreaddataintotheregisterona risingedgetotheclockinputs,andanOutputEnable(OE)signaltoallowtheregistertoplacedata onthecommondatabusthatconnectsallofthefunctionalelementsinsideofthecomputer. InordertoplacedataintoRegisterAwewouldfirsthavetoenableanothersourceofthedata,perhapsmemory,andputthedataontotheDataBus.Whenthedataisstable,theclocksignal,clk_A 128
BusOrganizationandMemoryDesign DATA BUS
Temporary Register
Register B
Register A
Arithmetic and logic unit (ALU) clk_A
oe_A
clk_B
oe_B
Microsequence Controller
clk_t ALU_0 ALU_1 ALU_2
ALU Output Register
C
oe_OUT clk_OUT System Clock in
Figure6.8Schematicdiagramofpartofacomputingengine.Theregistersare interconnectedusingtri-statelogicandbusses.TheMicrosequenceController mustsequencetheOutputEnable(OE)ofeachregisterproperlysothatthere arenobuscontentions.
wouldgothroughalowtohightransitionandthedatawouldbestoredintheregister.Remember thatRegisterAisajustacollectionofD-FF’swithacommonclock.Nowsupposethatwewant toaddthecontentsofregisterAandRegisterB,respectively.Thearithmeticandlogicunit(ALU) requirestwoinputsourcestoaddthenumberstogether.Ithasatemporaryregisterandanoutput registerassociatedwithitbecausetheALUitselfisanasynchronousgatedesign.Thatis,ifthe inputsweretochange,theoutputswouldalsoimmediatelychangebecausethereisnoDregister theretosynchronizeitsbehavior. Figure6.9showsaportionofthetruthtablethatcontrolsthebehaviorofthemicrosequencecontroller.Inordertoaddtwonumberstogether,suchasthecontentsofAandB,wemightissuethe assemblylanguageinstruction: ADDB,A ThisinstructiontellsthecomputertoaddtogetherthecontentsoftheAregisterandtheBregister andplacetheresultbackintoregisterA.Ifthereisanycarrygeneratedbytheadditionoperation, itshouldbeplacedinthecarrybitposition,whichisshownasthe“C”bitattachedtotheoutput registeroftheALU.Thus,wecanbegintoseethatwewouldneedseveralstepsinordertoadd thesenumberstogether.Let’sdescribetheprocessinwords: 1. CopythedatafrommemoryintheARegister.Thiswillbeoperand#2. 2. Copyoperand#1frommemoryintotheBregister. 3. MovethedatafromtheAregisterintothetemporaryregister. 4. MovetheresultsoftheadditionintheALUintotheALUOutputRegisterandCarrybit. 5. MovethedatafromtheOutputregisterbackintoregisterA. 129
Chapter6 ReferringtotheportionofthetruthtableshowninFigure6.9,wecanfollowtheinstruction’sflow throughthecontroller: Clockpulse1:
ThereisalowtohightransitionoftheRegisterAinputclock.Thiswillstorethedata thatiscurrentlybeingheldonthedatabusintheregister. ThereisalowtohightransitionoftheRegisterBinputclock.Thiswillstorethedata thatisbeingheldonthedatabusintoRegisterB. TheinputtoclockBreturnsto0andtheOutputEnablesignalforRegisterAbecomesactive.ThisputsthedatathatwaspreviouslystoredinRegisterAbackonto theDataBus.ThereisnoeffectoftheclocksignaltoRegisterBgoinglowbecause thisisafallingedge. ThereisarisingedgeoftheclocktotheTemporaryRegister.Thisstoresthedata thatwaspreviouslystoredinRegisterAandisnowontheDataBus.TheOutput EnablesignaloftheAregisteristurnedoffandtheOutputEnableofRegisterBis turnedon.AtthispointthedatastoredinRegisterBisontheDataBusandthe DatastoredinregisterAisstoredintheTemporaryRegister.TheALUinputsignals, ALU0,ALU1andALU2aresetforaddition*,sotheoutputoftheALUisthesumof AandBplusanycarrygenerated. ThesumisstoredintheOutputRegisterontherisingedgeoftheclk_outsignal. TheOutputEnableofBregisteristurnedoff,theclockinputtoBisreturnedto0 andthedataintheTemporaryRegisterisputontheDataBus. TheDataisclockedintotheAregisterandtheOutputEnableoftheTemporary Registeristurnedoff. Thesystemreturnstotheoriginalstate.
Clockpulse2: Clockpulse3:
Clockpulse4:
Clockpulse5: Clockpulse6: Clockpulse7: Clockpulse8:
clock
clk_A
oe_A
clk_B
oe_B
clk_T
clk_OUT
oe_OUT
ALU_0
ALU_1
ALU_2
clk_A
oe_A
clk_B
oe_B
clk_T
clk_OUT
oe_OUT
ALU_0
ALU_1
ALU_2
*TheALUisacircuitthatcanperformuptoeightdifferentarithmeticorlogicaloperationsbaseduponthestate ofthethreeinputvariables,ALU_0..ALU_3.Inthisexample,weassumethattheALUcode,000,setstheALUupto performanadditionoperationonthetwoinputvariables.
1
0
1
0
1
0
0
1
0
0
0
1
1
0
1
0
0
1
0
0
0
2
1
1
0
1
0
0
1
0
0
0
0
1
1
1
0
0
1
0
0
0
3
0
1
1
1
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
4
0
0
0
1
0
0
1
0
0
0
0
1
0
0
1
0
1
0
0
0
5
0
1
0
0
1
0
1
0
0
0
0
1
0
0
0
1
1
0
0
0
6
0
1
0
0
0
1
1
0
0
0
0
1
0
1
0
0
0
0
0
0
7
0
1
0
1
0
0
0
0
0
0
1
1
0
1
0
0
1
0
0
0
8
1
1
0
1
0
0
1
0
0
0
0
1
0
1
0
0
1
0
0
0
Figure6.9:TruthtablefortheMicrosequenceControllerofFigure3.37showingthebeforeandafter logiclevelchangeswitheachclockpulse.Theleft-handsideofthetableshowsthestateoftheoutputs beforetheclockpulseandtherighthandsideofthetableshowsthestateoftheoutputsaftertheclock pulse.Theareasshowningrayarehighlightedtoshowthechangesthatoccuroneachclockpulse. 130
BusOrganizationandMemoryDesign Now,beforeyougooffandassumethatyouareaqualifiedCPUarchitect,letmewarnyouthat it’salotmoreinvolvedthenwhatwesawinthisexample.However,theprinciplesarethesame. Forstarters,wedidnotdiscusshowtheinstructionitself,ADDB,Aactuallygotintothecomputerandhowthecomputerfiguredoutthatthebitpatternwhichrepresentedtheinstructioncode wasanADDinstructionandnotsomethingelse.Also,wedidn’tdiscusshowweestablishedthe contentsoftheAandtheBregistersinthefirstplace.However,foratleastthatportionofthestate machinethatactuallydoestheadditionprocess,itprobablyisprettyclosetohowitreallyworks. You’venowseentwoexamplesofhowthestatemachineisusedtosequenceaseriesoflogical operationsandhowtheseoperationsforthebasisoftheexecutionofaninstructioninacomputer. Withouttheconceptofabus-basedarchitectureandthetri-statelogicalgatedesign,thiswouldbe verydifficultorimpossibletoaccomplish. MemorySystemDesign Wehavealreadybeenintroducedtotheconceptoftheflip-flop.Inparticularwesawhowthe“D” typeflip-flopcouldbeusedtostoreasinglebitofinformation.Recallthatthedatapresentatthe DinputtotheD-flopwillbestoredwithinthedeviceandthestoredvaluewillbetransferredtothe Qoutputontherisingedgeoftheclocksignal.Now,iftheclocksignalgoesaway,wewouldstill havethedatapresentontheQoutput.Inotherwords,we’vejuststored1bitofdata.OurD-flop circuitisamemorycell. Historically,therehavebeenalargenumberofdifferentdevicesusedtostoreinformationasa computer’srandomaccessmemory,orRAM.Magneticdevices,suchascorememories,were veryimportantuntilintegratedcircuits(IC)becamewidelyavailable.Today,oneICmemory devicecanhold512millionbitsofdata.Withthistypeofminiaturization,magneticdevices couldn’tkeepupwiththesemiconductormemoryintermsofspeedorcapacity.However,memoriesbasedonmagneticstoragehaveoneongoingadvantageoverICmemories—theydon’tforget theirdatawhenthepoweristurnedoff.Thus,westillhaveharddiskdrivesandtapestorageasour secondarymemorysystemsbecausetheycanholdontotheirdataevenifthepowerisremoved. Manyindustryanalystsareforetellingthedemiseoftheharddrive.ModernFLASHmemories, suchastheonesthatyoumayuseinyourPDA,digitalcameraorMP3playersarealreadyat 1gigabyteofstoragecapacity.FLASHmemoriesretaintheirinformationevenwithpower removed,andarefasterandmoreruggedthandiskdrives.However,diskdrivesandtapedrives stillwinoncapacityandpriceperbit.Atthetimeofthiswriting(Summerof2004),amoderndisk drivewithacapacityof160gigabytescanbepurchasedforabout$90ifyouarewillingtosend inalloftherebateinformationandthecompanyactuallysendsyoubackyourrebatecheck.But that’sanotherstoryforanothertime. Harddisksandtapesystemsareelectromechanicalsystems,withmotorsandothermovingparts. ThemechanicalcomponentscausethemtobefarlessreliableandmuchslowerthanICmemories, typically10,000timesslowerthananICmemorydevice.Itisforthisreasonthatwedon’tuseour harddisksforthemainmemoryinourcomputersystembecausethey’rejusttooslow.However,as you’llseeinalaterlesson,theabilityoftheharddrivetoprovidealmostlimitlessstorageenables anoperatingsystemsuchasLinuxorWindowstogivetheusertheimpressionthateveryapplicationthatisopenedonyourdesktopalwayshasasmuchmemoryasitneeds. 131
Chapter6 Inthissection,wewillstartfrom theD-flopasanindividualdevice andseehowwecaninterconnect manyofthemtoformamemory array.Inordertoseehowdata canbewrittentothememoryand readfromthememoryalongthe samesignalpath(althoughnotat thesameinstantintime),consider Figure6.10.
A single bit memory cell DATA IN/OUT
D
W
Q
CLK
Theblackboxisjustaslightly D-FF core without Tri-state buffer OE S, R and Q simplifiedversionofthebasic Dflip-flop.We’veeliminatedthe Figure6.10Schematicrepresentationofasinglebitofmemory. Thetri-statebufferontheoutputofthecellcontrolswhenthe S,RinputsandQoutput.The darkgrayboxisthetri-statebuf- Qoutputmaybeconnectedtothebus. fer,whichiscontrolledbyaseparateOE(outputenable)input.WhenOEisHIGH,thetri-state bufferisdisabled,andtheQoutputofthememorycellisisolated(Hi-Zstate)fromthedatalines (DATAI/Oline).However,theDatalineisstillconnectedtotheDinputofthecell,soitispossibletowritedatatothecell,butthenewdatawrittentothecellisnotimmediatelyvisibleto someonetryingtoreadfromthecelluntilthetri-statebufferisenabled.Whenwecombinethe basicFFcellwiththetri-statebuffer,wehaveallthatweneedtomakea1-bitmemorycell.Thisis indicatedbythelightgrayboxsurroundingthetwoelementsthatwe’vejustdiscussed. Thewritesignalisabitmisleading,soweshoulddiscussit.Weknowthatdataiswrittenintothe D-FFontherisingedgeofapulse,whichisindicatedbytheup-arrowonthewritepulse(W)in Figure6.10.Sowhyisthewritesignal,W,writtenasifitwasanactivelowsignal?Thereasonis thatwenormallykeepthewritesignalina1state.Inordertoaccomplishawriteoperation,the Wmustbebroughtlow,andthenreturnedhighagain.Itisthelow-to-hightransitionthataccomplishestheactualdatawriteoperation,butsincewemustbringthewritelinetoalowstateinorder toaccomplishtheactualwritingofthedata,weconsiderthewritesignaltobeactivelow.Also, youshouldinferfromthisdiscussionthatyouwouldneveractivatetheWlineandtheOElinesat thesametime.EitheryoubringWlowandkeepOEhigh,orviceversa.Theyneverarelowatthe sametime.Now,let’sreturntoouranalysisofthememoryarray. We’lltakeanotherstepforwardincomplexityandbuildamemoryoutoftri-statedevicesand D-flops.Figure6.11showsasimple(wellmaybenotsosimple)16-bitmemoryorganizedasfour, 4-bitnibbles.EachstoragebitisaminiatureD-flopthatalsohasatri-statebuffercircuitinsideof itsothatwecanbuildabussystemwithit. EachrowoffourD-FF’shastwocommoncontrollinesthatprovidetheclockfunction(write) andtheoutputenablefunctionforplacingdataontotheI/Obus.Noticehowthecorresponding bitpositionfromeachrowisphysicallytiedtothesamewire.Thisiswhyweneedthetri-state controlsignal,OE,oneachbitcell(D-FF).Forexample,ifwewanttowritedataintorow2of D-FF’sthedatamustbeplaceontheDB0throughDB3fromtheoutsidedeviceandtheW2signal 132
BusOrganizationandMemoryDesign DB0
DB1
DB2
DB3
A0 D CLK
A1 CS
W
Memory decoding logic
mustgohightostore thedata.Also,towrite dataintothecells,the OEsignalmustbekept intheHIGHstatein ordertopreventthedata alreadystoredinthe cellfrombeingplaced onthedatalinesand corruptingthenewdata beingwrittenintoacell.
(W0)D0..D3
Q OE
D CLK
Q
D
OE
CLK
Q OE
D CLK
Q OE
(OE0)D0..D3 D CLK
(W1)D0..D3 (OE1)D0..D3 (W2)D0..D3
D CLK
Q OE
Q OE
D CLK
D CLK
Q
D
OE
CLK
Q
D
OE
CLK
Q OE
Q OE
D CLK
D CLK
Q OE
Q OE
Thecontrolinputsto (OE2)D0..D3 Q D Q D Q D D Q the16-bitmemoryare CLK CLK CLK OE CLK OE OE OE shownontheleftof (W3)D0..D3 Figure6.11.Thedata (OE3)D0..D3 inputandoutput,orI/O, Figure6.11:16-bitmemorybuiltusingdiscrete“D”flip-flops.Wewould isshownonthetopof accessthetoprowofthefourpossiblerowsifwesettheaddressbits,A0 thedevice.Noticethat andA1to0.Inasimilarvein,(A0,A1)=(1,0),(0,1)or(1,1)wouldselect thereisonlyoneI/O rows1,2and3,respectively. lineforeachdatabit. That’sbecausedatacanflowinoroutonthesamewire.Inotherwords,we’veusedbusorganizationtosimplifythedataflowintoandoutofthedevice.Let’sdefineeachofthecontrolinputs: A0andA1
Addressinputsusedtoselectwhichrowofthememoryisbeingaddressedforinputor outputoperations.Sincewehavefourrowsinthedevice,weneedtwoaddresslines.
CS
Chipselect.Thisactivelowsignalisthemasterswitchforthedevice.Youcannotwrite intoitorreadfromitifCSisHIGH.
W
IftheWlineisHIGH,thenthedatainthechipmaybereadbytheexternaldevice, suchasthecomputerchip.IftheWlineislow,dataisgoingtobewrittenintothe memory.
ThesignalCS(chipselect)is,asyoumightsuspect,themastercontrolfortheentirechip.Without thissignal,noneoftheQoutputsfromanyofthesixteenD-FF’scouldbeenabled,sotheentire chipwouldremainintheHi-Zstate,asfarasanyexternalcircuitrywasconcerned.Thus,inorder toreadthedatainthefirstrow,notonlymust(A0,A1)=(0,0),wealsoneedCS=0.Butwait, there’smore! We’renotquitedonebecausewestillhavetodecideifwewanttoreadfromthememoryorwrite toit.Ifwewanttoreadfromit,wewouldwanttoenabletheQoutputofeachofthefourD-flops thatmakeuponerowofthememorycell.Thismeansthatinordertoreadfromanyrowofthe memory,weneedthefollowingconditionstobeTRUE: • READFROMROW0>(A0=0)AND(A1=0)AND(CS=0)AND(W=1) • READFROMROW1>(A0=1)AND(A1=0)AND(CS=0)AND(W=1) 133
Chapter6 • •
READFROMROW2>(A0=0)AND(A1=1)AND(CS=0)AND(W=1) READFROMROW3>(A0=1)AND(A1=1)AND(CS=0)AND(W=1) SupposethatwewanttowritefourbitsofdatatoROW1.Inthiscase,wedon’twanttheindividualOEinputstotheD-flopstobeenabledbecausethatwouldturnonthetri-stateoutputbuffers andcauseaconflictwiththedatawe’retryingtowriteintothememory.However,we’llstillneed themasterCSsignalbecausethatenablesthechiptobewrittento.Thus,towritefourbitsofdata toROW1,weneedthefollowingequation: WRITETOROW1>(A0=1)AND(A1=0)AND(CS=0)AND(W=0)
Row decoder
Address buffer
Figure6.12isasimplifiedschematicdiagramofacommerciallyavailablememorycircuitfrom NEC®,aglobalelectronicsandsemiconductormanufacturerheadquarteredinJapan.Thedeviceisa µPD44400814M-BitCMOSFastStaticRAM(SRAM)organizedas512K×8-bitwidewords(bytes). TheactualmemoryarrayiscomposedofanX-Ymatrix4,194,304individualmemorycells.Thisis justlikethe16-bitmemorythat wediscussedearlier,onlyquitea A0 Memory cell array bitlarger.Thecircuithas19ad| 4,194,304 bits A18 dresslinesgoingintoit,labeled A0...A18.Weneedthatmanyaddresslinesbecause219=524,288, I/0 so19addresslineswillgiveusthe Sense amplifier / Output data Input data | controller controller switching circuit I/08 rightnumberofcombinationsthat Column decoder we’llneedtoaccesseverymemory wordinthearray. Address buffer ThesignalnamedWEisthesame astheWsignalofourearlier example.It’sjustlabeleddifferently,butstillrequiredaLOWto CS HIGHtransitiontowritethedata. TheCSsignalisthesameasour CE CSintheearlierexample.One differenceisthatthecommercial WE partalsoprovidesanexplicit Vcc outputenablesignal(calledCE GND inFigure6.12)forcontrollingthe Truth Table tri-stateoutputbuffersduringa CS CE WE Mode I/O Supply current readoperation.Inourexample, H x x Not selected High impedance ICC theOEoperationisimpliedby L L H Read DOUT ICC thestateoftheWinput.Inactual L x L Write DIN use,theabilitytoindependently L H H Output Disable High Impedance controlOEmakesforamore Remark x: Don’t care flexiblepart,soitiscommonly Figure6.12:LogicaldiagramofanNECµPD4440084M-BitCMOS addedtomemorychipssuchas FastStaticRAM.DiagramcourtesyofNECCorporation.
134
BusOrganizationandMemoryDesign thisone.Thus,youcanseethatour16-bitmemoryisoperationallythesameasthecommercially availablepart. Let’sreturntoFigure6.11foramomentbeforewemoveon.NoticehoweachrowofD-flopshas twocontrolsignalsgoingtoeachofthechips.OnesignalgoestotheOEtri-statecontrolsandthe othergoestotheCLKinput.Whatwouldthecircuitinsideoftheblockontheleftactuallylook like?Rightnow,youhavealloftheknowledgeandinformationthatyouneedtodesignit. Let’sseewhatthetruthtable wouldlooklikeforthiscircuit. Figure6.13isthetruthtable. Youcanseethatthecontrollogic forarealmemorydevice,such astheµPD444008inFigure6.12 couldbecomesignificantlymore complexasthenumberofbits increasesfrom16to4million, buttheprinciplesarethesame. Also,ifyourefertoFigure6.13 youshouldseethatthedecoding logicishighlyregularandscalable.Thiswouldmakethedesign ofthehardwaremuchmore straightforward.
A0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
A1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
R/W 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
CS 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
W0 OE0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
W1 OE1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
W2 OE2 W3 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
OE3 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
Figure6.13:Truthtablefor16-bitmemorydecoder. DataBusWidthand AddressableMemory Beforewemoveontolookatmemorysystemdesignsofhighercomplexity,weneedtostopand catchourbreathforamoment,andconsidersomeadditionalinformationthatwillhelptomake theupcomingsectionsmorecomprehensible.Weneedtoputtwopiecesofinformationintotheir properperspective: 1. Databuswidth,and 2. Addressablememory. Thewidthofacomputer’sdatabusdeterminesthesizeofthenumberthatitcandealwithinone operationorinstruction.IfweconsiderembeddedsystemsaswellasdesktopPC’s,servers,workstations,andmainframecomputers,wecanseeaspectrumofdatabuswidthsgoingfrom4bitsup to128bitswide,withdatabusesof256bitsinwidthjustoverthehorizon.It’sfairtoask,“Why istheresuchavariety?”Theanswerisspeedversuscost.Acomputerwithan8-bitdatapathto memorycanbeprogrammedtodoeverythingaprocessorwitha16-bitdatapathcando,exceptit willtakelongertodoit.Considerthisexample.Supposethatwewanttoaddtwo16-bitnumbers togethertogeneratea16-bitresult.Thenumberstobeaddedarestoredinmemoryandtheresult willbestoredinmemoryaswell.Inthecaseofthe8-bitwidememory,we’llneedtostoreeach 16-bitwordastwosuccessive8-bitbytes.Anyway,here’sthealgorithmforaddingthenumbers. 135
Chapter6 Case1:8-bitWideDataBus 1. Fetchlowerbyteoffirstnumberfrommemoryandplaceinaninternalstorageregister. 2. Fetchlowerbyteofsecondnumberfrommemoryandplaceinanotherinternalstorage register. 3. Addthelowerbytestogether. 4. Writetheloworderbytetomemory. 5. Fetchupperbyteoffirstnumberfrommemoryandplaceinaninternalstorageregister. 6. Fetchupperbyteofsecondnumberfrommemoryandplaceinanotherinternalstorage register. 7. Addthetwoupperbytestogetherwiththecarry(ifpresent)fromtheprioraddoperation. 8. Writetheupperbytetothenextmemorylocationfromtheloworderbyte. 9. Writethecarry(ifpresent)tothenextmemorylocation. Case2:16-bitWideDataBus 1. Fetchthefirstnumberfrommemoryandplaceinaninternalstorageregister. 2. Fetchthesecondnumberfrommemoryandplaceinanotherinternalstorageregister. 3. Addthetwonumberstogether. 4. Writetheresulttomemory. 5. Writethecarry(ifpresent)tomemory. Asyoucansee,Case1requiredalmosttwicethenumberofstepsasCase2.Theefficiencygained bygoingtowiderdatabussesisdependentuponthealgorithmbeingexecuted.Itcanvaryfromas littleasafewpercentimprovementtoalmostfourtimesthespeed,dependinguponthealgorithm beingimplemented. Here’sasummaryofwherethevariousbuswidthsaremostcommon: • 4,8bits:appliances,modems,simpleapplications • 16bits:industrialcontrollers,automotiveapplications • 32bits:telecommunications,laserprinters,desktopPC’s • 64bits:highendPCs,UNIXworkstations,games(Nintendo64) • 128bits:highperformancevideocardsforgaming • 128,256bits:nextgeneration,verylonginstructionword(VLIW)machines Sometimeswetrytoeconomizebyusingaprocessorwithawideinternaldatabuswithanarrower memory.Forexample,theMotorola68000processorthatwe’llstudyinthisclasshasa16-bitexternaldatabusanda32-bitinternaldatabus.Ittakestwomemoryfetchestobringina32-bitquantity frommemory,butonceitisinsidetheprocessoritcanbedealtwithasasingle32-bitvalue.
AddressSpace Thenextconsiderationinourcomputerdesignishowmuchaddressablememorythecomputeris equippedtohandle.Theamountofexternallyaccessiblememoryisdefinedastheaddressspace ofthecomputer.Thisaddressspacecanvaryfrom1024bytesforasimpledevicetoover60gigabytesforahighperformancemachine.Also,theamountofmemorythataprocessorcanaddress isindependentofhowmuchmemoryyouactuallyhaveinyoursystem.ThePentiumprocessorin 136
BusOrganizationandMemoryDesign yourPCcanaddressoverfourbillionbytesofmemory,butmostusersrarelyhavemorethan1gigabyteofmemoryinsidetheircomputer.Herearesomesimpleexamplesofaddressablememory: • Asimplemicrocontroller,suchastheoneinsideofyourMr.Coffee®machine,mighthave 10addresslines,A0...A9,andisabletoaddress1024bytesofmemory(210=1024). • Ageneric8-bitmicroprocessor,suchastheoneinsideyourburglaralarm,has16address lines,A0...A15,andisabletoaddress65,536bytesofmemory(216=65,536). • TheoriginalIntel8086microprocessorthatstartedthePCrevolutionhas20addresslines, A0...A19,andisabletoaddress1,048,576bytesofmemory(220=1,048,576). • TheMotorola68000microprocessorhas24addresslines,A0...A23,andisableto address16,777,216bytesofmemory(224=16,777,216). • ThePentiummicroprocessorhas32addresslines,A0...A31,andisabletoaddress 4,294,967,296bytesofmemory(232=4,294,967,296). Asyou’llsoonsee,wegenerallyrefertoaddressablememoryintermsofbytes(8-bitvalues)even thoughthememorywidthisgreaterthanthat.Thiscreatesallsortsofmemoryaddressingambiguitiesthatwe’llsoongetinto. Paging Supposethatyou’rereadingabook.Inparticular,thisbookisaverystrangebook.Ithasexactly 100wordsoneverypageandeachwordoneachpageisnumberedfrom0to99.Thebookhas exactly100pages,alsonumberedfrom0to99.Aquickcalculationtellsyouthatthebookhas 10,000words(100words/page×100pages).Also,nexttoeverywordoneverypageistheabsolutenumberofthatwordinthebook,withthefirstnumberonpage0giventheaddress0000and thelastnumberonthelastpagegiventhenumber9,999.Thisisaverystrangebookindeed! However,wenoticesomethingquiteinteresting.Everywordonapagecanbeuniquelyidentified inthebookinoneoftwoways: 1. Givetheabsolutenumberofthewordfrom0000to9,999. 2. Givethepagenumberthatthewordison,from00to99andthengivethepositionofthe wordonthepage,from00to99. Thus,the45thwordonpage36couldbenumberedas3644inabsoluteaddressingoraspage=36, offset=44.Asyoucansee,howeverwechoosetoformtheaddress,wegettothecorrectword. Asyoumightexpect,thistypeofaddressingiscalledpaging.Pagingrequiresthatwesupplytwo numbersinordertoformthecorrectaddressofthememorylocationwe’reinterestedin. 1. Pagenumberofthepageinmemorythatcontainsthedata, 2. Pageoffsetofthememorylocationinthatpage. Figure6.14showssuchaschemeforamicroprocessor(sometimeswe’llusetheGreekletter“mu” andtheletter“P”together,µP,asashorthandnotationformicroprocessor).Themicroprocessor has20addresslines,A0...A19,soitcanaddress1,048,576bytesofmemory.Unfortunately, wedon’thaveamemorychipthatisjusttherightsizetomatchthememoryaddressspaceofthe processor.Thisisusuallythecase,sowe’llneedtoaddadditionalcircuitry(andmultiplememory devices)toprovideenoughmemorysothateverypossibleaddresscomingoutoftheprocessorhas acorrespondingmemorylocationtolinkto. 137
Chapter6 Sincethismemorysystemis builtwith64Kbytememory devices,eachofthe16memorychipshas16addresslines, A0throughA15.Therefore, eachoftheaddresslineofthe addressbus,A0throughA15, goestoeachoftheaddresspins ofeachmemorychip.
A15−A0
uP
64K Page 0
64K Page 1
CS
CS
0000 A19−A16
0001
64K Page E
64K Page F
CS
CS
1110
1111
Page select (4 to 16 decoder)
Figure6.14:Memoryorganizationfora20-bitmicroprocessor.The
Theremainingfouraddress memoryspaceisorganizedas16and64Kbytememorypages. linescomingoutoftheprocessor,A16throughA19areusedtoselectwhichofthe16memorychipswewillbeaddressing. Rememberthatthefourmostsignificantaddresslines,A16throughA19canhave16possible combinationsofvaluesfrom0000to1111,or0throughFinhexadecimal.
Let’sconsiderthemicroprocessorinFigure6.14.Let’sassumethatitputsoutthehexadecimal address9A30D.TheleastsignificantaddresslinesA0throughA15fromtheprocessorgotoeach ofthecorrespondingaddressinputsofthe16memorydevices.Thus,eachmemorydeviceseesthe hexadecimaladdressvalueA30D.AddressbitsA16throughA19gotothepageselectcircuit.So, wemightwonderifthissystemwillworkatall.Won’tthedatastoredinaddressA30Dofeachof thememorydevicesinterferewitheachotherandgiveusgarbage? Theanswerisno,thankstotheCSinputsoneachofthememorychips.Assumingthatthe processorreallywantsthebyteatmemorylocation9A30D,theremainingfouraddresslines comingoutoftheprocessor,A16throughA19areusedtoselectwhichofthe16memorychipswe willbeaddressing.Rememberthatthefourmostsignificantaddresslines,A16throughA19can have16possiblecombinationsofvaluesfrom0000to1111,or0throughFinhexadecimal. Thislookssuspiciouslylikethedecoderdesignproblemwediscussedearlier.Thismemorydesign hasa4:16decodercircuittodothepageselectionwiththemostsignificant4addressbitsselecting thepageandtheremaining16addressbitsformthepageoffsetofthedatainthememorychips. Noticethatthesameaddresslines,A0throughA15,gotoeachofthe16memorychips,soifthe processorputsoutthehexadecimaladdressE3AB0,all16memorychipswillseetheaddress 3AB0.Whyisn’tthereaproblem?AsI’msureyoucanallchantinunisonbynowitisthetri-state bufferswhichenableustoconnectthe16pagestoacommondatabus.AddressbitsA16through A19determinewhichoneofthe16CSsignalstoturnon.Theother15remainintheHIGHstate, sotheircorrespondingchipsaredisabledanddonothaveaneffectonthedatatransfer. Pagingisafundamentalconceptincomputersystems.Itwillappearoverandoveragainaswe delvefurtherintotheoperationofcomputersystems.InFigure6.14,weorganizedthe20-bit addressspaceoftheprocessoras16,64Kbytepages.Weprobablydiditthatwaybecausewe wereusing64Kmemorychips.Thiswassomewhatarbitrary,aswecouldhaveorganizedthe pagingschemeinatotallydifferentway;dependinguponthetypeofmemorydeviceswehad availabletous.Figure6.15showsotherpossiblewaystoorganizethememory.Also,wecould buildupeachpageofmemoryfrommultiplechips,sothepagesthemselvesmightneedtohave additionalhardwaredecodingonthem. 138
BusOrganizationandMemoryDesign Figure6.15: Possiblepaging schemesfora 20-bitaddress space.
Page address
Page address bits
Page offset
Offset address bits
NONE
NONE
0 to 1,048,575
A0 to A19
0 to 1
A19
0 to 524,287
A0 to A18
0 to 3
A19−A18
0 to 262,143
A0 to A17
0 to 7
A19−A17
0 to 131,071
A0 to A16
0 to 15
A19−A16
0 to 65,535
A0 to A15
0 to 31
A19−A15
0 to 32,767
A0 to A14
0 to 63
A19−A14
0 to 16,383
A0 to A13
Linear address
Our example
Itshouldbeemphasizedthatthetypeofmemoryorganizationusedinthedesignofthecomputer will,ingeneral,betransparenttothesoftwaredeveloper.Thehardwaredesignspecificationwill certainlyprovideamemorymaptothesoftwaredeveloper,providingtheaddressrangeforeach typeofmemory,suchasRAM,ROM,FLASHandsoon.However,thesoftwaredeveloperneed notworryabouthowthememorydecodingisorganized. Fromthesoftwaredesigner’spointofview,theprocessorputsoutamemoryaddressanditisupto thehardwaredesigntocorrectlyinterpretitandassignittothepropermemorydeviceordevices. Pagingisimportantbecauseitisneededtomapthelinearaddressspaceofthemicroprocessor intothephysicalcapacityofthestoragedevices.Somemicroprocessors,suchastheIntel8086 anditssuccessors,actuallyusepagingastheirprimaryaddressingmode.Theexternaladdress isformedfromapagevalueinoneregisterandanoffsetvalueinanother.Thenexttimeyour computercrashesandyouseetheinfamous“BlueScreenofDeath”lookcarefullyatthefunny hexadecimaladdressthatmightlooklike BD48:0056 Thisisa32-bitaddressinpage-offsetrepresentation. Diskdrivesusepagingastheironlyaddressingmode.Eachdiskisdividedinto512bytesectors (pages).A4gigabytediskhas8,388,608pages. DesigningaMemorySystem Youmaynotagree,butwe’rereadytoputitalltogetheranddesignarealmemorysystemfora realcomputer.OK,maybe,we’renotquiteready,butwe’reprettyclose.Closeenoughtogiveit try.Figure6.16isaschematicdiagramforacomputersystemwitha16-bitwidedatabus. First,justaquickreminderthatinbinaryarithmetic,weusetheshorthandsymbol“K”torepresent1024,andnot1000,aswedoinmostengineeringapplications.Thus,bysaying256Kyou reallymean262,144andnot256,000.Usually,thecontextwouldeliminatetheambiguity;butnot always,sobeware. ThecircuitinFigure6.16looksalotmorecomplicatedthananythingwe’veconsideredsofar, butitreallyisn’tverydifferentthanwhatwe’vealreadystudied.First,let’slookatthememory chips.Eachchiphas15addresslinesgoingintoit,implyingthatithas32Kuniquememory addressesbecause215=32,768.Also,eachchiphaseightdatainput/output(I/O)linesgoinginto 139
Chapter6 Address Bus: A0..A14 To uP
W
ADDR VAL RD WR
ADDRESS DECODE LOGIC
OE A15..A23
A0 W D0 A1 D1 A2 A3 A4 D2 A5 A6 D3 A7 A8 D4 A9 A10 D5 A11 A12 D6 A13 A14 D7
A0 W D0 A1 A2 D1 A3 A4 D2 A5 A6 D3 A7 A8 D4 A9 A10 D5 A11 A12 D6 A13 A14 D7
OE CS
OE CS
A0 W A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14
CS0 CS1
To uP
D0 D1 D2 D3 D4 D5 D6 D7
OE CS
A0 W D0 A1 D1 A2 A3 A4 D2 A5 A6 D3 A7 A8 D4 A9 A10 D5 A11 A12 D6 A13 A14 D7 OE CS
Data Bus: D0..D15
Figure 6.16: Schematic diagram for a 64 K × 16 memory systembuiltfromfour32K×8memorychips.
it.However,youshouldkeepinmindthatthedatabusinFigure6.16isactually16bitswide (D0…D15)sowewouldactuallyneedtwo,8-bitwide,memorychipsinordertoprovidethe correctmemorywidthtomatchthewidthofthedatabus.We’lldiscussthispointingreaterdetail whenwediscussFigure6.17. TheinternalorganizationofthefourmemorychipsinFigure6.17isidenticaltotheorganization ofthecircuitswe’vealreadystudiedexceptthesedevicescontain256Kmemorycellsandthe memorywestudiedinFigure6.11had16memorycells.It’sabitmorecomplicated,buttheidea isthesame.Also,itwouldhavetakenmemoretimetodraw256Kmemorycellsthentodraw16, soItooktheeasywayout. Thismemorychiparrangementof32Kmemorylocationswitheachlocationbeing8-bitswide isconceptuallythesameideaasour16-bitexampleinFigure6.11intermsofhowwewouldadd moredevicestoincreasethesizeofourmemoryinbothwide(sizeofthedatabus)anddepth (numberofavailablememorylocations).InFigure6.11,wediscusseda16-bitmemoryorganized asfourmemorylocationswitheachlocationbeing4-bitswide.InFigure4.5,thereareatotalof 262,144memorycellsineachchipbecausewehave32,768rowsby8columnsineachchip. Eachchiphasthethreecontrolinputs,OE,CSandW.Inordertoreadfromamemorydevicewe mustdothefollowingsteps: 1. PlacethecorrectaddressofthememorylocationwewanttoreadonA0throughA14. 2. BringCSLOWtoturnonthechip. 3. KeepWHIGHtodisablewritingtothechip. 4. BringOELOWtoturnonthetri-stateoutputbuffers.
140
BusOrganizationandMemoryDesign Thememorychipsthenputsthedatafromthecorrespondingmemorylocationontodatalines D0throughD7fromonechip,andD8throughD15fromtheotherchip.Inordertowritetoa memorydevicewemustdothefollowingsteps: 1. PlacethecorrectaddressofthememorylocationwewanttoreadonA0throughA14. 2. BringCSLOWtoturnonthechip. 3. BringWLOWtoenablewritingtothechip. 4. KeepOEHIGHtodisablethetri-stateoutputbuffers. 5. PlacethedataondatalinesD0throughD15.WithD0throughD7goingtoonechipand D8throughD15goingtotheother. 6. BringWfromLOWtoHIGHtowritethedataintothecorrespondingmemorylocation. Nowthatweunderstandhowanindividualmemorychipworks,let’smoveontothecircuitasa whole.Inthisexampleourmicroprocessorhas24addresslines,A0throughA23.A0throughA14 arerouteddirectlytothememorychipsbecauseeachchiphasanaddressspaceof32Kbytes.The ninemostsignificantaddressbits,A15throughA23areneededtoprovidethepaginginformation forthedecodinglogicblock.Theseninebitstellsusthatthismemoryspacemaybedividedup into512pageswith32Kaddressoneachpage.However,theastutereaderwillimmediatelynote thatweonlyhaveatotaloffourmemorychipsinoursystem.Somethingisdefinitelywrong!We don’thaveenoughmemorychipstofill512pages.Ohdrat,Ihateitwhenthathappens! Actually,itisn’taproblemafterall.Itmeansthatoutofapossible512pagesofaddressablememory, ourcomputerhas2pagesofrealmemory,andspaceforanother510pages.Isthisaproblem?That’s hardtosay.Ifwecanfitallofourcodeintothetwopageswedohave,thenwhyincurtheadded costsofmemorythatisn’tbeingused?Icantellyoufrompersonalexperiencethatalotofsweathas goneintocrammingallofthecodeintofewermemorychipstosaveadollarhereandthere. Theotherquestionthatyouaskisthis.“OK,sotheaddressablememoryspaceoftheµPisnot completelyfull.Sowhere’s D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 thememorythatwedo havepositionedinthe addressspaceoftheprocessor?”That’saverygood questionbecausewedon’t haveenoughinformation rightnowtoanswerthat. However,beforeweattempt toprogramthiscomputer andmemorysystem,we mustdesignthehardware sothatthememorychips wedohavearecorrectly decodedatthepagelocationstheyaredesignedtobe 16-bit data bus, D0..D15 at.We’llseehowthatworks Figure6.17:Expandingamemorysystembywidth. inalittlewhile. A0 W A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14
D0
D1 D2
D3
D4 D5 D6 D7
OE CE
A0 W A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14
D0
D1 D2
D3
D4 D5 D6 D7
OE CE
141
Chapter6 Let’sreturntoFigure6.16.It’simportanttounderstandthatwereallyneedtwomemorychipsfor eachpageofmemorybecauseourdatabusis16-bitswide,buteachmemorychipisonly8data bitswide.Thus,inordertobuilda16-bitwidememory,weneedtwochips.Wecanseethisin Figure6.17.Noticehoweachmemorydeviceconnectstoaseparategroupofeightwiresinthe databus.Ofcourse,theaddressbuspins,A0throughA14mustconnecttothesamewiresofthe addressbus,becauseweareaddressingthesameaddresslocationbothmemorychips. Nowthatyou’veseenhowthetwomemorychipsare“stacked”tocreateapageinmemorythatis 32K×16.Itshouldnotbeaproblemforyoutodesigna32K×32memoryusingfourchips. Youmayhavenoticedthatthemicroprocessor’sclockwasnowheretobeseeninthisexample memorydesign.Surely,oneofthemostimportantlinksinacomputersystem,thememoryto processor,needsaclocksignalinordertosynchronizetheprocessortothememory.Infact,many memorysystemsdonotneedaclocksignaltoinsurereliableperformance.Theonlythingthat needstobeconsideredisthetimingrelationshipbetweenthememorycircuitsandtheprocessor’s busoperation.Inthenextchapter,we’lllookataprocessorbuscycleinmoredetail,buthere’sa preview.TheNECµPD444008comesinthreeversions.Theactualpartnumbersare: • µPD444008-8 • µPD444008-10 • µPD444008-12 Thenumericalsuffixes,8,10and12,refertothemaximumaccesstimeforeachofthechips.The accesstimeisbasicallyaspecificationwhichdetermineshowquicklythechipisabletoreliably returndataoncethecontrolinputshavebeenproperlyestablished.Thus,assumingthattheaddress tothechiphasstabilized,CSandOEareasserted,thenafteradelayof8,10or12nanoseconds (dependingupontheversionofthechipbeingused),thedatawouldbeavailableforreadinginto theprocessor.Thechipmanufacturer,NEC,guaranteesthattheaccesstimewillbemetforthe entiretemperaturerangethatthechipisdesignedtooperateover.Formostelectronics,thecommercialtemperaturerangeis0degreesCelsiusto70degreesCelsius. Let’sdoasimpleexampletoseewhatthismeans.We’llactuallylookintothisinmoredetaillater on,butitcan’thurttoprepareourselvesforthingstocome.Supposethatwehaveaprocessor witha500MHzclock.Youknowthatthismeansthateachclockperiodis2nslong.Ourprocessorrequires5clockcyclestodoamemoryread,withthedatabeingreadintotheprocessoronthe fallingedgeofthe5thclockcycle.Theaddressandcontrolinformationcomesoutoftheprocessor ontherisingedgeofthefirstclockcycle.Thismeansthattheprocessorrequires4.5×2,or9ns todoamemoryreadoperation.However,we’renotquitedonewithourcalculation.Ourdecodinglogiccircuitalsointroducesatimedelay.Assumethatittakes1nsfromthetimetheprocessor assertsthecontrolandaddresssignaltothetimethatthedecodinglogictoprovidethecorrectsignalstothememorysystem.Thismeansthatweactuallyhave8ns,not9ns,togetthedataready. Thus,onlythefastestversionofthepart(generallythismeansthemostexpensiveversion)would workreliablyinthisdesign. Isthereanythingthatwecando?Wecouldslowdowntheclock.Supposethatwechangedthe clockfrequencyfrom500MHzto400MHz.Thislengthenstheperiodto2.5nsperclockcycle. Now4.5clockcyclestake11.25nsinsteadof9ns.Subtracting1nsforthepropagationdelay 142
BusOrganizationandMemoryDesign throughthedecodinglogic,wewouldneedamemorythatwas10.25nsorfastertoworkreliably. Thatlooksprettyencouraging.Wecouldslowtheclockdownevenmoresowecoulduseeven cheapermemorydevices.Won’ttheProjectManagerbepleased!Unfortunately,we’vejustmade atrade-off.Thetrade-offisthatwe’vejustslowedourprocessordownby20%.Everythingthe processordoeswillnowtake20%longer.Canwelivewiththat?Atthispoint,weprobablydon’t know.We’llneedtodosomecarefulmeasurementsofcodeexecutiontimesandperformance requirementsbeforewecananswerthequestioncompletely;andeventhenwemayhavetomake someprettyroughassumptions. Anyway,thekeytotheabovediscussionisthatthereisnoexplicitclockinthedesignofthe memorysystem.Theclockdependencyisimplicitinthetimingrequirementsofthememoryto-processorinterface,buttheclockitselfisnotrequired.Inthisparticulardesign,ourmemory systemisasynchronouslyconnectedtotheprocessor. Today,mostPCmemorydesignsaresynchronousdesigns.Theclocksignalisanintegralpartof thecontrolcircuitryoftheprocessor-to-memoryinterface.Ifyou’veeveraddedamemory“stick” toyourPCthenyou’veuppedthecapacityofyourPCusingsynchronousdynamicrandomaccess memoryorSDRAMchips.Theprintedcircuitboard(thestick)isaconvenientwaytomechanicallyconnectthememorychipstothePCmotherboard. Figure6.18isaphotographofa64Megabyte(Mbyte)SDRAMmemorymodule.Thismoduleholds64Mbytesofdataorganizedas1M×64.Thereareatotalof16memorychipsonthe module(frontandback)each chiphasacapacityof 32Mbits,organizedas 8M×4.We’lllookatthe differencesbetweenasynchronous,orstaticmemory systemsandsynchronous, Figure6.18:64MbyteSDRAMmemorymodule. dynamic,memorysystems lateroninthischapter. PaginginRealMemorySystems OurfourmemorychipsofFigure6.16willgiveustwo32K×16memorypages.Thisleavesus 510possiblememorypagesthatareempty.Howdoweknowwherewe’llhavethesetwomemory pagesandwherewewilljusthaveemptyspace?Theansweristhatitisuptoyou(orthehardware designer)tospecifywherethememorywillbe.Asyou’llsoonsee,inthe68000systemwewant nonvolatilememory,suchasROMorFLASHtoresidefromthestartofmemoryandgoupfrom there.Let’sstateforthepurposeofthisexercisethatwewanttolocateourtwoavailablepagesof realmemoryatpage0andatpage511. Let’sassumethattheprocessorhas24addressbits.Thiscorrespondstoabout16Mofaddressable memory(224addresslocations).ItiscustomarytolocateRAMmemory(read/write)atthetopof memory,butthisisn’trequired.Inmostcases,itwilldependupontheprocessorarchitecture.In anycase,inthisexampleweneedtofigureouthowtomakeoneofthetworealmemorypages 143
Chapter6 Table6.2:Pagenumbersandmemoryaddressrangesfora24-bitaddressingsystem. PageNumber(binary)
Pagenumber(hex)
Absoluteaddressrange(hex)
A23…………….A15 000000000 000000001 000000010 000000011
000 001 002 003
000000to007FFF 008000to00FFFF 010000to017FFF 018000to01FFFF
111111111
1FF
FF8000toFFFFFF
. . . .
. . .
respondtoaddressesfrom0x000000through0x007FFF.Thisisthefirst32Kofmemoryand correspondstopage0.Theother32Kwordsofmemoryshouldresideinthememoryregionfrom 0xFF8000through0xFFFFFF,orpage511.Howdoweknowthat?Simple,it’spaging.Ourtotal systemmemoryof16,777,216wordsmaybedividedupinto512pageswith32Koneachpage. Sincewehave9bitsforthepagingwecandividetheabsoluteaddressupasshowninTable6.2. WewantthetwohighlightedmemoryrangestorespondbyassertingtheCS0orCS1signalswhen thememoryaddressesarewithinthecorrectrangeandtheothermemoryrangestoremainunasserted.Thedecodercircuitforpage1FFisshowninFigure6.19.Thecircuitforpage000isleftas anexerciseforyou. NoticethatthereisnewasignalcalledADDRVAL(AddressValid).TheAddressValidsignal(or someothersimilarsignal)isissuedbytheprocessorinordertonotifytheexternalmemorythat thecurrentaddressonthebusisstable.Whyisthisnecessary?Keepinmindthattheaddresses ontheaddressbusarealwayschanging.Justexecutingoneinstructionmayinvolvefiveormore memoryaccesseswithdifferentaddressvalues.Thelongeranaddressstaysaround,theworsethe performanceoftheprocessorwillbe.Therefore,theprocessormustsignaltothememorythatthe currentvalueoftheaddressiscorrectandthememorymayrespondtoit.Also,someprocessors mayhavetwoseparatesignalsRDandWR,tosignifyreadandwriteoperations,respectively. OtherjusthaveasinglelineR/W.Thereareadvantagesanddisadvantagestoeachapproachand wewon’tneedtoconsiderthemhere.For A23 now,let’sassumethatourprocessorhastwo A22 separatesignals,oneforareadoperation A21 A20 andoneforawriteoperation. CS1 AsyoucanseefromFigure6.16andfrom thediscussionofhowthememorychips workinoursystem,itisapparentthatwe canexpressthelogicalconditionsnecessary toreadandwritetomemoryas: MEMORYREAD=OE*CS*WR MEMORYWRITE=OE*CS*WR
ADDR VAL
A19 A18 A17 A16 A15 NOT
NAND
Figure6.19:Schematicdiagramforacircuittodecode thetoppageofmemoryofFigure6.16. 144
BusOrganizationandMemoryDesign Inbothcases,weneedtoasserttheCSsignalinordertoreadorwritetomemory.Itisthecontrol ofthechipenable(orchipselect)signalthatallowsustocontrolwhereinthememoryspaceof theprocessoraparticularmemorychipwillbecomeactive. WiththeexceptionofourbriefintroductiontoSDRAMmemories,we’veconsideredonlystatic RAM(SRAM)forourmemorydevices.Asyou’veseen,staticRAMisderivedfromtheDflip-flop. Itisrelativelysimpleinterfacetotheprocessorbecauseallweneedtodoispresentanaddress andtheappropriatecontrolsignals,waitthecorrectamountoftime,andthenwecanreadorwrite tomemory.Ifwedon’taccessmemoryforlongstretchesoftimethere’snoproblembecausethe feedbackmechanismoftheflip-flopgatedesignkeepsthedatastoredproperlyaslongaspower isappliedtothecircuit.However,wehavetopayapriceforthissimplicity.AmodernSRAM memorycellrequiresfiveorsixtransistorstoimplementtheactualgatedesign.Whenyou’retalkingaboutmemorychipsthatstore256millionbitsofdata,asixtransistormemorycelltakesupa lotofvaluableroomonthesiliconchip(die). Today,mosthigh-densitymemoryincomputers,likeyourPC,usesadifferentmemorytechnologycalleddynamicRAM,orDRAM.DRAMcellsaremuchsmallerthanSRAMcells,typically takingonlyonetransistorpercell.Onetransistorisnotsufficienttocreatethefeedbackcircuitthat isneededtostorethedatainthecell,soDRAM’suseadifferentmechanismentirely.Thismechanismiscalledstoredcharge. Ifyou’veeverwalkedacrossacarpetonadrywinterdayandgottenashockwhenyoutouched somemetal,liketherefrigerator,you’refamiliarwithstoredcharge.Yourbodypickedupexcess chargeasyouwalkedacrossthecarpet(nowyourepresentalogical1state)andyoureturnedto alogical0statewhenyougotzappedasthechargeleftyourbody.DRAMcellsworkinexactly thesameway.EachDRAMcellcanstoreasmallamountofchargethatcanbedetectedasa1by theDRAMcircuitry.Storesomechargeandthecellhasa1,removethechargeandits0.(However,justlikethechargestoredonyourbody,ifyoudon’tdoanythingtoreplenishthecharge,it eventuallyleaksaway.)It’sabitmorecomplicatedthanthis,andthestoredchargemightactually representa0ratherthana1,butitwillbesufficientforourunderstandingoftheconcept. InthecaseofaDRAMcell,thewaythatwereplenishthechargeistoperiodicallyreadthecell. Thus,DRAM’sgettheirnamefrom 13 Column Address Lines CA0..CA12 thefactthatweareconstantlyreading (0,8191) (0,0) (0,1) them,evenifwedon’tactuallyneedthe datastoredinthem.Thisisthedynamic (1,0) portionoftheDRAM’sname.Theprocessofreadingfromthecelliscalleda 13 Row Address refreshcycle,andmustbecarriedoutat Lines RA0..RA12 intervals.Infact,everycellofaDRAM mustberefreshedeverfewmilliseconds orthecellwillbeindangeroflosing (8191,0) itsdata.Figure6.20showsaschematic (8191,8191) representationoftheorganizationofa Figure6.20:Organizationofa64MegabitDRAM 64MbitDRAMmemory. memory. 145
Chapter6 Thememoryisorganizedasamatrixwith8192rows×8192columns(213).Inordertouniquely addressanyoneoftheDRAMmemorycells,a26-bitaddressisrequired.Sincewe’vealready createditasamatrix,and26pinsonthepackagewouldaddalotofextracomplexity,thememory isaddressedbyprovidingaseparaterowaddressandaseparatecolumnaddresstotheXYmatrix. Fortunatelyforus,theprocessofcreatingtheseaddressesishandledbythespecialchipsetson yourPC’smotherboard.Let’sreturntotherefreshproblem.Supposethatwemustrefresheachof the64millioncellsatleastonceevery10milliseconds.Doesthatmeanthatwemustdo64million refreshcycles?Actuallyno;itissufficienttojustissuetherowaddresstothememoryandthat guaranteesthatallofthe8192cellsinthatrowgetrefreshedatonce.Nowourproblemismore tractable.If,forexamplethespecificationallowsus16.384millisecondstorefresh8192rowsin thememory,thenwemust,onaverage,refreshonerowevery16.384×10–3/8.192×103seconds, oroneroweverytwomicroseconds. Ifthisallseemsverycomplicated,itcertainlyis.DesigningaDRAMmemorysystemisnotfor thebeginninghardwaredesigner.TheDRAMintroducesseveralnewlevelsofcomplexity: • Wemustbreakthefulladdressdownintoarowaddressandacolumnaddress, • Wemuststopaccessingmemoryeverymicrosecondorsoanddoarefreshcycle, • Iftheprocessorneedstousethememorywhenarefreshalsoneedstoaccessthememory, wethenneedsomewaytosynchronizethetwocompetingprocesses. ThismakestheinterfacingDRAMtomodernprocessorsquiteacomplexoperation.Fortunately, themodernsupportchipsetshavethiscomplexitywellinhand.Also,ifthefactthatwemust doarefresheverytwomicrosecondsseemsexcessivetoyou,rememberthatyour2GHzAthlon orPentiumprocessorissues4,000clockcycleseverytwomicroseconds.Sowecandoalotof processingbeforeweneedtodoarefreshcycle. Theproblemofconflictsarisingbecauseofcompetingmemoryaccessoperations(read,write andrefresh)aremitigatedtoaverylargedegreebecausemodernPCprocessorscontainon-chip memoriescalledcaches.Cachememorieswillbediscussedinmuchmoredetailinalaterchapter, butfornow,wecanseetheeffectofthecacheonouroff-chipDRAMmemoriesbygreatlyreducingtheprocessor’sdemandsontheexternalmemorysystem. Aswe’llsee,theprobabilitythattheinstructionordatathataprocessorrequireswillbeinthe cacheisusuallygreaterthan90%,althoughtheexactprobabilityisinfluencedbythealgorithms beingrunatthetime.Thus,only10%ofthetimewilltheprocessorneedtogotoexternalmemory inordertoaccessdataorinstructionsnotinthecache.Inmodernprocessors,dataistransmitted betweentheexternalmemorysystemsandtheprocessorinbursts,ratherthanonebyteorwordat atime.Burstaccessescanbeveryefficientwaystotransferdata.Infact,youareprobablyalready veryfamiliarwiththeconceptbecausesomanyothersystemsinyourPCrelyonburstdatatransfers.Forexample,youharddrivetransfersdatatomemoryinburstsofasectorofdataatatime.If yourcomputerisconnectedtoa10Base-Tor100Base-Tnetworkthenitisprocessingpacketsof 256bytesattime.Itwouldbejusttooinefficientandwastefulofthesystemresourcestotransmit dataabyteatatime. SDRAMmemoryisalsodesigntoefficientlyinterfacetoaprocessorwithon-chipcachesand isspecificallydesignedforburstaccessesbetweenthememoryandtheon-chipcachesofthe 146
BusOrganizationandMemoryDesign T0 T1 T2 T3 T4 T5 T6 processor.Figure6.21is CLK anexcerptfromthedata sheetforanSDRAM memorydevicefrom READ NOP NOP NOP READ NOP NOP COMMAND MicronTechnology,Inc.®, X = 1 cycle asemiconductormemory BANK, BANK, ADDRESS COL n COL b manufacturerlocated inBoise,ID.ThetimD D D D D DQ ingdiagramisforthe n n+2 n+3 b n+1 2 MT48LC128MXA2 family CAS Latency = 2 ofSDRAMmemories.The Figure6.21:TimingdiagramofaburstmemoryaccessforaMicron devicesare512Mbitparts TechnologyInc.partnumberMT48LC128MXA2SDRAMmemorychip. organizedas4,8or16-bit DiagramcourtesyofMicronTechnology. widedatapaths.The‘X’isa placeholderfortheorganization(4,8or16bitwide).Thus,theMT48LC128M4A2isorganizedas 32M×4,whiletheMT48LC128M16A2isorganizedas8M×16. OUT
OUT
OUT
OUT
OUT
ThesedevicesarefarmorecomplicatedintheiroperationthenthesimpleSRAMmemorieswe’ve lookedatsofar.However,wecanseethefundamentalburstbehaviorinFigure6.21. ThefieldsmarkedCOMMAND,ADDRESSandDQarerepresentedasbandsofdata,ratherthan individualbits.Thisisasimplificationthatallowsustoshowagroupofsignals,suchas14 addressbits,withouthavingtoshowthestateofeachindividualsignal.Thebandisusedtoshow wherethesignalmustbestableandwhereitisallowedtochange.Noticehowthesignalsareall synchronizedtotherisingedgeoftheclock.OncetheREADcommandisissuedandtheaddress isprovidedforwheretheburstistooriginate,thereisatwoclockcyclelatencyandsequentially storeddatainthechipwillthenbeavailableoneverysuccessiveclockcycle.Clearly,thisisfar moreefficientthenreadingonebyteatatime. Whenweconsidercachememoriesingreaterdetail,we’llseethattheon-chipcachesarealso designedtobefilledfromexternalmemoryinburstsofdata.Thus,weincurapenaltyinhavingto set-uptheinitialconditionsforthedatatransferfromexternalmemorytotheon-chipcaches,but oncethedatatransferparametersareloaded,thememorytomemorydatatransfercantakeplace quiterapidly.Forthisfamilyofdevicesthedatatransfertakesplaceatamaximumclockrateof 133MHz. NewerSDRAMdevices,calleddoubledatarate,orDDRchips,cantransferdataonboththerisingandfallingedgesoftheclock.Thus,aDDRchipwitha133MHzclockinputcantransferdata ataspeedy266MHz.Thesepartsaredesignated,forreasonsunknown,asPC2700devices.Any SDRAMchipcapableofconformingtoa266MHzclockratearePC2700. ModernDRAMdesigntakesmanydifferentforms.We’vebeendiscussingSDRAMbecause thisisthemostcommonformofDRAMinamodernPC.Yourgraphicscardcontainsvideo DRAM.OlderPC’scontainedextendeddataout,orEDODRAM.Today,themostcommontype ofSDRAMisDDRSDRAM.Theamazingthingaboutallofthisistheincrediblylowcostofthis typeofmemory.Atthiswriting(summerof2004),youcanpurchase512MbytesofSDRAMfor 147
Chapter6 about10centspermegabyte.Amemorywiththesamecapacity,builtinstaticRAMwouldcost wellover$2,000. Memory-to-ProcessorInterface Thelasttopicthatwe’lltackleinthischapterinvolvesthedetailsofhowthememorysystemand theprocessorcommunicatewitheachother.Admittedly,wecanonlyscratchthesurfacebecause therearesomanyvariationsonathemewhenthereareover300commerciallyavailablemicroprocessorfamiliesintheworldtoday,butlet’strytotakeageneraloverviewwithoutgettingtoo deeplyenmeshedinindividualdifferences. Ingeneral,mostmicroprocessor-basedsystemscontainthreemajorbusgroupings: • Addressbus:Aunidirectionalbusfromtheprocessorouttomemory. • Databus:Abi-directionalbuscarryingdatafromthememorytotheprocessorduringread operationsandfromtheprocessortomemoryduringwriteoperations. • Statusbus:Aheterogeneousbuscomprisedofthevariouscontrolandhousekeeping signalsneedtocoordinatetheoperationoftheprocessor,itsmemoryandotherperipheral devices.Typicalstatusbussignalsinclude: a. RESET, b. interruptmanagement, c. busmanagement, d. clocksignals, e. readandwritesignals. ThisisshownschematicallyinFigure6.22fortheMotorola®∗MC68000processor.The68000 hasa24-bitaddressbusanda16-bitexternaldatabus.However,internally,bothaddressanddata canbeupto32bitsinlength.We’lldiscusstheinterruptsystemandbusmanagementsystemlater oninthissection. A1..A23
Out to Memory 16 Mbyte address space
Status Bus
D0..D15 MC68000
• RESET • INTERRUPT • BUS REQUEST • BUS ACKNOWLEDGE • CLOCK IN/OUT • READ/WRITE
Data Bus: Out to Memory Input from Memory
Figure6.22:ThreemajorbussesoftheMotorola68000processor.
∗
TheMotorolaCorporationhasrecentlyspunoffitsSemiconductorProductsSector(SPS)toformanewcompany, Freescale®,Inc.However,oldhabitsdiehard,sowe’llcontinuetorefertoprocessorsderivedfromthe68000architectureastheMotorolaMC68000.
148
BusOrganizationandMemoryDesign TheAddressBusistheaggregateofalltheindividualaddresslines.Wesaythatitisa homogeneousbusbecausealloftheindividualsignalsthatmakeupthebusareaddresslines.The addressbusisalsounidirectional.Theaddressisgeneratedbytheprocessorandgoesouttomemory.Thememorydoesnotgenerateanyaddressesandsendthemtotheprocessoroverthisbus. TheDataBusisalsohomogeneous,butitisbidirectional.Datagoesoutfrommemorytotheprocessoronareadoperationandfromtheprocessortomemoryonawriteoperation.Thus,datacan flowineitherdirection,dependingupontheinstructionbeingexecuted. TheStatusBusisheterogeneous.Itismadeupofdifferentkindsofsignals,sowecan’tgroup theminthesamewaythatwedoforaddressanddata.Also,someofthesignalsareunidirectional, somearebidirectional.TheStatusBusisthe“housekeeping”bus.Allofthesignalsthatarealso neededtocontrolsystemoperationaregroupedintotheStatusBus. Let’snowlookathowthesignalsonthesebussesworktogetherwithmemorysothatwemayread andwrite.Figure6.23showsustheprocessorsideofthememoryinterface. Memory Read Cycle T1
T2
Memory Write Cycle T3
T1
T2
T3
CLK
ADDRESS A0..AN
Address Valid
Address Valid
ADDR VAL RD WR DATA D0..DN
Data Valid
Data Valid
WAIT
Figure6.23:Timingdiagramforatypicalmicroprocessor.
Nowwecanseehowtheprocessorandtheclockworktogethertosequencetheaccessingofthe memorydata.Whileitmayseemquitebewilderingatfirst,itisactuallyverystraightforward. Figure6.23isa“simplified”timingdiagramforaprocessor.We’veomittedmanyadditional signalsthatmaypresentorabsentinvariousprocessordesignsandtriedtorestrictourdiscussion tothebareessentials. TheY-axisshowsthevarioussignalscomingfromtheprocessor.Inordertosimplifythings,we’ve groupedallthesignalsfortheaddressbusandthedatabusintoa“band”ofsignals.Thatway,at anygiventime,wecanassumethatsomeare1andsomeare0,butthekeyisthatwemustspecify whentheyarevalid.Thecrossings,orX’sintheaddressanddatabussesisasymbolicwayto representpointsintimewhentheaddressesordataonthebussesmaybechanging,suchasan addresschangingtoanewvalue,ordatacomingfromtheprocessor. 149
Chapter6 Sincethemicroprocessorisastatemachine,everythingissynchronizedwiththeedgesofthe clock.Someeventsoccuronthepositivegoingedgesandsomemaybesynchronizedwith thenegativegoingedges.Also,forconvenience,we’lldividethebuscyclesintoidentifiable timesignaturescalled“Tstates.”Notallprocessorsworkthisway,butthisisareasonable approximationofhowmanyprocessorsactuallywork.Keepinmindthattheprocessorisalways runningthesebuscycles.Theseoperationsformthefundamentalmethodofdataexchange betweentheprocessorandmemory.Therefore,wecanansweraquestionthatwasposedatthe beginningofthischapter.Recallthatthestatemachinetruthtablefortheoperation,ADDB, A,leftoutanyexplanationofhowthedatagotintotheregistersinthefirstplace,andhowthe instructionitselfgotintothecomputer. Thus,beforewelookatthetimingdiagramfortheprocessor/memoryinterface,weneedtoremind ourselvesthatthecontrolofthisinterfaceishandledbyanotherpartofourstatemachine.Inalgorithmicterms,wedoa“functioncall”totheportionofthestatemachinethathandlesthememory interface,andthedataisreadorwrittenbythatalgorithm. Let’sstartwithaREADcycle.DuringthefallingedgeoftheclockinT1theaddressbecomes stableandtheADDRVALsignalisassertedLOW.Also,theRDsignalgoesLOWtoindicatethat thisisareadoperation.DuringthefallingedgeofT3theREADandADDRESSVALIDsignals arede-assertedindicatingtomemorythatthatthecycleisendingandthedatafrommemoryis beingreadbytheprocessor.Thus,thememorymustbeabletoprovidethedatatotheprocessor withintwofullclockcycles(allofT2plushalfofT1andhalfofT3). Supposethememoryisn’tfastenoughtoguaranteethatthedatawillbereadyintime.WediscussedthissituationforthecaseoftheNECstaticRAMchipanddecidedthatapossiblesolution wouldbetoslowtheprocessorclockuntiltheaccesstimerequirementsforthememorycould beguaranteedtobewithinspecs.Nowwewillconsideranotheralternative.Inthisscenario,the memorysystemmayasserttheWAITsignalbacktotheprocessor.Theprocessorchecksthestate oftheWAITsignalontheonthefallingedgeoftheclockduringT2cycle.IftheWAITsignalis asserted,theprocessorgeneratesanotherT2cycleandchecksagain.AslongastheWAITsignal isLOW,theprocessorkeepsmarkingtimeinT2.OnlywhenWAITgoeshighwilltheprocessor completethebuscycle.Thisiscalledawaitstate,andisusedtosynchronizeslowermemoryto fasterprocessors. Thewritecycleissimilartothereadcycle.DuringthefallingedgeoftheclockinT1theaddress becomesvalid.DuringtherisingedgeoftheclockinT2thedatatobewrittenisputonthedata busandthewritesignalgoeslow,indicatingamemorywriteoperation.WAITsignalhasthesame functioninT2onthewritecycle.DuringthefallingedgeoftheclockinT3theWRsignalis de-asserted,givingthememoryarisingedgetostorethedata.ADDRVALalsoisde-assertedand thewritecycleends. Thereareseveralinterestingconceptsburiedinthepreviousdiscussionthatrequiresomeexplanationbeforewemoveon.Thefirstistheideaofastatemachinethatoperatesonbothedgesof theclock,solet’sconsiderthatfirst.Whenweinputasingleclocksignaltotheprocessorinorder tosynchronizeitsinternaloperations,wedon’treallyseewhathappenstotheinternalclock.
150
BusOrganizationandMemoryDesign CLK IN
Manyprocessorswillinternallyconvert theclocktoa2-phaseclock.Atiming diagramfora2-phaseclockisshownin Figure6.24.
Ø1
Theinputclock,whichisgeneratedby anexternaloscillator,isconvertedtoa Ø2 2-phaseclock,labeledφ1andφ2.The Figure6.24:Figure4.9:Atwo-phaseclock. twoclockphasesnow180degreesout ofphasefromeachother,sothatevery risingorfallingedgeoftheCLKINsignal CLK IN D Q Ø1 clock 4 Q 2 3 1 generatesaninternalrisingclockedge. Ø2 Howcouldwegeneratea2-phaseclock? Youactuallyalreadyknowhowtodoit, Figure6.25:Atwo-phaseclockgenerationcircuit. butthere’sapieceofinformationthatwe firstneedtoplaceincontext.Figure6.25isacircuitthatcanbeusedtogeneratea2-phaseclock. The4XORgatesareconvenienttousebecausethereisacommonintegratedcircuitpartwhich contains4XORgatesinonepackage.Thiscircuitmakesuseofthepropagationdelaysthatare inherentinalogicgate.SupposethateachXORgatehasapropagationdelayof10ns.Assume thattheclockinputisLOW.OneinputofXORgates1through3ispermanentlytiestoground (logicLOW).Sincebothinputsofgate1areLOW,itsoutputisalsoLOW.Thissituationcarries throughtogates2,3and4.Now,theCLKINinputgoestologicstateHIGH.Theoutputofgate #4goeshigh10nslaterandtogglestheD-FFtochangestate.SincetheQandQoutputsare oppositeeachother,weconvenientlyhaveasourceoftwoalternatingclockphasesbynatureof thedivide-by-twowiringoftheD-FF.
Thiscircuitworksforanyclockfrequencythathasaperiodgreaterthan 4XORgatedelays.Also,byusing bothoutputsoftheD-FF,weareguaranteedatwo-phaseclockoutputthatis exactly180degreesoutofphasefrom eachother. NowwecanrevisitFigure6.23and seetheothersubtlepointthatwas
OUTPUT OF XOR GATE #4 CLK IN
Afterapropagationdelayof30nstheoutputofgate#3alsogoesHIGH,whichcausestheoutput ofXORgate#4togoLOWagainbecausetheoutputofanXORgateisLOWifbothinputsare thesameandHIGHiftheinputsaredifferent.Atsometimelater,theclockinputgoeslowagain andwegenerateanother30nswidepositivegoingpulseattheoutputofgate#4becausefor30ns bothoutputsaredifferent.ThiscausetheD-FFtotoggleatbothedgesoftheclockandtheQand Qoutputsgiveusthealternatingphasesthatweneed.Figure6.26showsthe relevantwaveforms.
Ø1
Ø2
Figure 6.26: Waveforms for the 2-phase clock generation circuit. 151
Chapter6 WAIT buriedinthediagram.Sinceweareapparentlychangingstatesonthe risingandfallingedgesoftheclock,wenowknowthattheinternalstate T10 T20 T11 machineoftheprocessorisactuallyusinga2-phaseclockandeachof the‘T’statesis,inreality,twostates.Thus,wecanredrawthetiming T30 T31 T21 diagramforaREADcycleasastatediagram.ThiswillclearlydemonstratethewayinwhichtheWAITstatecomesintoplay.Figure6.27 Figure6.27:State showstheREADphaseofthebuscycle,representedasastatediagram. diagramforaprocessor
READcycle.
ReferringtoFigure6.27wecanclearlyseethatinstateT20theprocessorteststhestateoftheWAITinput.IftheinputisassertedLOW,theprocessorremainsin stateT20,effectivelylengtheningthetotaltimeforthebuscycle.Theadvantageofthewaitstate overdecreasingtheclockfrequencyisthatwecandesignoursystemsuchthatawaitpenaltyis incurredonlywhentheprocessoraccessescertainmemoryregions,ratherthanslowingitforall operations.WecannowsummarizetheentirebusREADcycleasfollows: • T10:READcyclebegins.ProcessoroutputsnewmemoryaddressforREADoperation. • T11:AddressisnowstableandADVALgoesLOW.RDgoeslowindicatingthataREAD cycleisbeginning. • T20:READcyclecontinues. • T21:ProcesssamplesWAITinput.IfassertedT21cyclecontinues. • T30:READcyclecontinues. • T31:READcyclesterminates.ADVALandRDarede-assertedandprocessorinputsthe datafrommemory.
DirectMemoryAccess(DMA) We’llconcludeChapter6withabriefdiscussionofanotherformofmemoryaccesscalledDMA, ordirectmemoryaccess.TheneedforaDMAsystemisaresultofthefactthatmemorysystem andtheprocessorareconnectedtoeachotherbybusses.Sincethebusistheonlypathinandout ofthesystem,conflictswillarisewhenperipheraldevices,suchasdiskdrivesornetworkcards havedatafortheprocessor,buttheprocessorisbusyexecutingprogramcode. Inmanysystems,theperipheraldevicesandmemorysharethesamebusseswiththeprocessor. Whenadevice,suchasaharddiskdriveneedstotransferdatatotheprocessor,wecouldimagine twoscenarios. Scenario#1 1 2 3 4 5
Diskdrive: Processor: Diskdrive: Processor: Diskdrive:
“Sorryfortheinterruptboss,I’vegot512bytesforyou.” “That’sabig10-4littlediskbuddy.Gimmethefirstbyte.” “Sureboss.Hereitis.” “Gotit.Gimmethenextone.” “Hereitis.”
Repeatsteps4and5for510moretimes.
152
BusOrganizationandMemoryDesign Scenario#2 1
Diskdrive:
2
Processor:
3 4 5 6 7
Diskdrive: Diskdrive: Diskdrive: Processor: Diskdrive:
“Yo,boss.Igot512bytesandthey’reburningaholeinmyplatter.Igottago,I gottago.”(BUSREQUEST) “OK,ok,pipedownlemmefinishthisinstructionandI’llgetoffthebus.OK, I’mdone,thebusisyours,anddon’tdawdle,I’mbusy.”(BUSGRANT) “Thanksboss.You’reapal.Ioweyouone.I’vegotit.”(BUSACKNOWLEDGE) “I’llputthedataintheusualspot.”(Saidtoitself) “Heyboss!Wakeup.I’moffthebus.” Thanksdisk.I’llretrievethedatafromtheusualspot.” “10-4.Theusualspot.I’moff.”
Asyoumightgatherfromthesetwoscenarios,thesecondwasmoreefficientbecausetheperipheraldevice,theharddisk,wasabletotakeovermemorycontrolfromtheprocessorandwriteall ofitsdatainasingleburstofactivity.Theprocessorhadplaceditsmemoryinterfaceinatri-state conditionandwaswaitingforthesignalfromthediskdrivethatitcouldreturntothebus.Thus, theDMAallowsotherdevicestotakeovercontrolofthebussesandimplementadatatransfer toorfrommemorywhiletheprocessoridles,orprocessesfromaseparatelycachedmemory. Also,giventhatmanymodernprocessorshavelargeon-chipcaches,theprocessorloosesalmost nothingbyturningtheexternalbusoverto Address, Data and Status Busses theperipheraldevice.Let’stakethehumorousdiscussionofthetwoscenariosandget BUSREQ seriousforamoment.Figure6.28showsthe uP MEMORY PERIPHERAL BUSGRA simplifiedDMAprocess.Youmayalsogather ARRAY DEVICE thatIshouldn’tquitmydayjobtobecome asitcomwriter,butthat’sadiscussionfor Figure 6.28: Schematic representation of a DMA anothertime. transfer. Inthesimplestform,thereisahandshakeprocessthattakesplacebetweentheprocessorandthe peripheraldevice.Ahandshakeissimplyanactionthatexpectsaresponsetoindicatetheaction wasaccepted.Theprocesscanbedescribedasfollows: • TheperipheraldevicerequestscontrolofthebusfromtheprocessorbyassertingtheBUS REQUEST(BUSREQ)signalinputontheprocessor. • Whenprocessorcompletespresentinstructioncycle,andnohigherlevelinterruptsare pending,itsendsoutaBUSGRANT(BUSGRA),givingtherequestingdevicepermission tobeginitsownmemorycycles. • Processorthenidles,orcontinuestoprocessdatainternallyincache,untilBUSREQ signalgoesaway
SummaryofChapter6 •
Welookedattheneedforbusorganizationwithinacomputersystemandhowbussesare organizedintoaddress,dataandstatusbusses. • Carryingonourdiscussionofthepreviouschapter,wesawhowthemicrocodestate machinewouldworkwiththebusorganizationtocontroltheflowofdataoninternal busses. 153
Chapter6 •
Wesawhowthetri-statebuffercircuitenablesindividualmemorycellstobeorganized intolargermemoryarrays. • Weintroducedtheconceptofpagingasawaytoformmemoryaddressesandasamethod tobuildmemorysystems. • Welookedatthedifferenttypesofmodernmemorytechnologytounderstandtheuseof staticRAMtechnologyanddynamicRAMtechnology. • Finally,weconcludedtheoverviewofmemorywithadiscussionofdirectmemoryaccess asanefficientwaytomoveblocksofdatabetweenmemoryandperipheraldevices.
Chapter6:Endnotes http://www.necel.com/memory/pdfs/M14428EJ5V0DS00.pdf.
1
http://download.micron.com/pdf/datasheets/dram/sdram/512MbSDRAM.pdf.
2
RalphTenny,SimpleGatingCircuitMarksBothPulseEdges,Designer’sCasebook,Preparedbytheeditorsof Electronics,McGrawHill,p.27.
3
154
ExercisesforChapter6 1. Designa2input,4-outputmemorydecoder,giventhetruthtableshownbelow: A
0 1 0 1
Inputs B
O1
O2
Outputs O3
O4
0 0 1 1
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
2. RefertoFigure6.11.Theexternalinputandoutput(I/O)signalsdefinedasfollows:
A0,A1:Addressinputsforselectingwhichrowofmemorycells(D-flipflops)toreadfrom,or writeto.
CE:Chipenablesignal.Whenlow,thememoryisactiveandyoureadfromitorwritetoit.
R/W:Read/Writeline.Whenhigh,theappropriaterowwithinthearraymaybereadbyan externaldevice.Whenlow,anexternaldevicemaywritedataintotheappropriaterow.The appropriaterowisdefinedbythestateofaddressbitsA0andA1.
DB0,DB1,DB2,DB3:Bidirectionaldatabits.Databeingwrittenintotheappropriaterow,or readfromtheappropriaterow,asdefinedbyA0andA1,aredefinedbythese4bits.
Thearrayworksasfollows:
A. Toreadfromaspecificaddress(row)inthearray: a. PlacetheaddressonA0andA1 b. BringCElow c. BringR/Whigh. d. ThedatawillbeavailabletoreadonD0..D3
B. Towritetoaspecificaddressinthearray: a. PlacetheaddressonA0andA1 b. BringCElow c. BringR/Wlow. d. PlacethedataonD0..D3. e. BringR/Whigh.
155
Chapter6
C. EachindividualmemorycellisastandardDflip-flopwithoneexception.Thereisa tri-stateoutputbufferoneachindividualcell.Theoutputbufferiscontrolledbythe CSsignaloneachFF.Whenthissignalislow,theoutputisconnectedtothedataline andthedatastoredintheFFisavailableforreading.Whenthisoutputishigh,theoutputoftheFFisisolatedfromthedatalinesothatdatamaybewrittenintothedevice. Considertheboxlabeled“MemoryDecodingLogic”inthediagramofthememoryarray. Designthetruthtableforthatcircuit,simplifyitusingK-mapsanddrawthegatelogicto implementthedesign.
3. Assumethatyouhaveaprocessorwitha26-bitwideaddressbusanda32-bitwidedatabus. a. Supposethatyouareusingmemorychipsorganizedas512Kdeep×8bitswide(4Mbit). Howmanymemorychipsarerequiredtobuildamemorysystemforthisprocessorthat completelyfillstheentireaddressspace,leavingnoemptyregions? b. Assumingthatweuseapagesizeof512K,completethefollowingtableforthefirstthree pagesofmemory: 4. ConsiderthememorytimingdiagramfromFigure6.23.Assumethattheclockfrequencyis 50MHzandthatyoudonotwanttoaddanywaitstatestoslowtheprocessordown.Whatis theslowestmemoryaccesstimethatwillworkforthisprocessor? 5. Definethefollowingtermsinafewsentences: a. DirectMemoryAccess b. Tri-statelogic c. Addressbus,databus,statusbus 6. Thefigureshownbelowisaschematicdiagramofamemorydevicethatwillbeusedina memorysystemforacomputerwiththefollowingspecifications: • 20-bitaddressbus • 32-bitdatabus • Memoryatpages0,1and7
a. Howmanyaddressablememorylocationsareineachmemorydevice?
b. Howmanybitsofmemoryareineachmemorydevice?
c. Whatistheaddressrange,inhex,coveredbyeach memorydeviceinthecomputer’saddressspace?You mayassumethateachpageofmemoryisthesamesize astheaddressrangeofonememorydevice.
D0 D1 D2 D3 D4 D5 D6 D7 D8
A0
WE
A1
OE
A2
CE
d. Whatisthetotalnumberofmemorydevicesrequiredin thismemorydesign?
A3
D9
A4
D10
A5
D11
e. Whywouldamemorysystemdesignbaseduponthis typeofamemorydevicenotbecapableofaddressing memorylocationsatthebyte-level?Discussthereason foryouranswerinasentenceortwo.
A6
D12
A7
D13
156
A8
D14 A9 A10 A11 A12 A13 A14 A15 A16 D15
BusOrganizationandMemoryDesign 7. AssumethatyouarethechiefhardwaredesignerfortheSouloftheCityBagelandFlight ControlSystemsCompany.Yourjobistodesignacomputertomemorysub-systemforanew, automatic,galleyandbagelmakerforthenextgenerationofcommercialairlinersnowbeing designed.ThemicroprocessoristheAB2000,ahotnewchipthatyou’vejustbeenwaitingto haveanopportunitytouseinadesign.
Theprocessorhasa16-bitaddressbusanda16-bitdatabusasshowninthefigure,below(For simplicity,thediagramonlyshowstherelevantprocessorsignalsforthisexerciseproblem). AlsoshowninthefigurearetheschematicdiagramsandpindesignationsfortheROMand SRAMchipsyouwillneedinyourdesign.
Thememorysubsystemhasthreestatussignalsthatyoumustusetodesignyourmemoryarray: a. WR:Activelow,itindicatesawritetomemorycycle b. RD:Activelow,itindicatesareadfrommemorycycle c. ADVAL:Activelow,itindicatesthattheaddressonthebusisstableandmaybeconsideredtobeavalidaddress
Thememorychipsoccupythefollowingregionsofmemory: a. ROMfrom0x0000through0x3FFF b. SRAMfrom0xC000through0xFFFF c. Otherstuff(notyourproblem)from0x4000through0xBFFF 1 2 3 4 5 6 7 8 9 D0 D1 D2 D3
D4 D5 D6 D7 D8
36
A0
WR
35 34 33 32 31 30 29 28
A1
RD
A2
ADVAL
A3
D9
U1-AB2000
A4
D10
A5
D11
A6
D12
A7
D13
A8 A9
D14 A10 A11 A12 A13 A14 A15 N/C D15
28 27 26 25 24 23 22
10 11 12 13 14 15 16 17 18
157
A4 A5 A6
N/C
A7
N/C CE OE WE
A8 A9 A10 A11 A12
D7 A13 D6 D5 D4 D3 D2 D1 D0
8 9 10 11 12 13 14
21 20 19 18 17 16 15 1 2 3 4 5 6 7
27 26 25 24 23 22 21 20 19
Assumethatthecircuitthatyoudesignwillbefabricated intoamemoryinterfacechipdesigned,U6.Sincethe printedcircuitdesignerneedstoknowthepindesignationsforalltheparts,yousupplythespecificationofthe pinfunctionsforU6shown,below.Youwillthendesign thememorydecodertothesepindesignations.Finally,for simplicity,youdecidenottoimplementbyteaddressability. Allmemoryaccesseswillbeword-widthaccesses.
A0 A1 A2 A3 N/C
28 27 26 25 24 23 22
A0 A1 A2 A3 N/C N/C N/C CE OE WE D7
A4 A5 A6 A7
U4, U5 - 16K × 8 ROM
1 2 3 4 5 6 7
U2, U3 - 16K × 8 SRAM
a. Designthememorydecoder circuitthatyouwillneed forthiscircuit.Itmustbe abletoproperlydecode thememorychipsfortheir respectiveaddressranges andalsodecodetheprocessor’sWR,RDandADVAL signalstocreateOE,CE, andWR.Hint,refertothe textsectionsonthememory systemtogettheequations forOE,CEandWR.
A8 A9 A10 A11 A12
A13 D6 D5 D4 D3 D2 D1 D0
8 9 10 11 12 13 14
21 20 19 18 17 16 15
1 2 3 4
A14 A15 ADVAL N/C
U6 Memory Decoder
CS(ROM) N/C CS(RAM) N/C
5 6 7 8
Chapter6 b. Createanetlistforyourdesign.Asamplenetlistisshowninthefigure,below.Thenetlistis justatableofcircuitinterconnectionsor“nets”.AnetisjustthecommonconnectionofallI/O pinsonthecircuitpackagesthatareconnectedtogether.Thus,inthisdesign,youmighthavea netlabeled“ADDR13”,whichiscommonto:Pin#23onchipnumberU1(shorthandU1-23), U2-14,U3-14,U4-14,U5-14.Thenetlistisusedbyprintedcircuitboardmanufacturersto actuallybuildyourPCboard.Astartforyournetlistisshownbelow: Netname addr0 addr1 addr2 addr3 • • data0
U1-36 U1-35 U1-34 U1-33
U2-1 U2-2 U2-3 U2-4
U3-1 U3-2 U3-3 U3-4
U1-14
U2-15
U4-15
U4-1 U4-2 U4-3 U4-4
U5-1 U5-2 U5-3 U5-4
8. Completetheanalysisforeachofthefollowingsituations:
a. Amicroprocessorwitha20-bitaddressrangeusingeightmemorychipswithacapacity of128Keach:Buildatableshowingtheaddressranges,inhexadecimal,coveredbyeach memorychip.
b. Amicroprocessorwitha24-bitaddressrangeandusingfourmemorychipswithacapacityof64Keach:Twoofthememorychipsoccupythefirst128Koftheaddressrangeand twochipsoccupythetop128Koftheaddressrange.Buildatableshowingtheaddress ranges,inhexadecimal,coveredbyeachofthechips.Noticethatthereisabigaddress rangeinthemiddlewithnoactivememoryinit.Thisisfairlycommon.
c. Amicroprocessorwitha32-bitaddressrangeandusingeightmemorychipswithacapacityof1Meach(1M=1,048,576):Twoofthememorychipsoccupythefirst2Mofthe addressrangeand6chipsoccupythetop6Moftheaddressrange.Buildatableshowing theaddressranges,inhexadecimal,coveredbyeachofthechips.
d. Amicroprocessorwitha20-bitaddressingrangeandusingeightmemorychipsofdifferentsizes.Fourofthememorychipshaveacapacityof128Keachandoccupythefirst 512Kconsecutiveaddressesfrom00000on.Theotherfourmemorychipshaveacapacity of32Keachandoccupythetopmost128Koftheaddressingrange.
158
7
CHAPTER
MemoryOrganizationand AssemblyLanguageProgramming Objectives Whenyouarefinishedwiththislesson,youwillbeableto: Describehowatypicalmemorysystemisorganizedintermsofmemoryaddressmodes; Describetherelationshipbetweenacomputer'sinstructionsetarchitectureandits assemblylanguageinstructionset;and Usesimpleaddressingmodestowriteasimpleassemblylanguageprogram.
Introduction Thislessonwillbeginourtransitionfromhardwaredesignersbacktosoftwareengineers.We’ll takewhatwe’velearnedsofaraboutthebehaviorofthehardwareandseehowitrelatestothe instructionsetarchitecture(ISA)ofatypicalmicroprocessor.Ourstudyofthearchitecturewill befromtheperspectiveofasoftwaredeveloperwhoneedstounderstandthearchitectureinorder touseittoitsbestadvantage.Inthatsense,ourstudyofassemblylanguagewillbeametaphorfor thestudyofthearchitectureofthecomputer.Thisperspectiveisquitedifferentfromthatofsomeonewhowantstobeabletodesigncomputerhardware.Ourfocusthroughoutthisbookhasbeen ontheunderstandingofthehardwareandarchitecturalissuesinordertotakemakeoursoftware developmentscomplementthedesignoftheunderlyinghardware. We’refirstgoingtoinvestigatethearchitectureoftheMotorola68000processor.The68KISAisa verymaturearchitecture,havingfirstcomeoutintheearly1980s.Itisafairquestiontoask,“Why amIlearningsuchanoldcomputerarchitecture?Isn’titobsolete?”Theanswerisaresounding, “No!”The68Karchitectureisoneofthemostpopularcomputerarchitecturesofalltimeandderivativesofthe68Karestillbeingdesignedintonewproductsallthetime.PalmPDA’sstillusea 68Kderivativeprocessor.Also,acompletelynewprocessorfromMotorola,theColdFire®family, usesthe68Karchitecture,andtoday,itisoneofthemostpopularprocessorsinuse.Forexample, theColdFireisusedinmanyinkjetprinters. Insubsequentchapters,wewillalsolookattwootherpopulararchitectures,theIntelX86processorfamilyandtheARMfamily.Withthesethreemicroprocessorarchitecturesunderourbelts, we’llthenbefamiliarwiththethreemostpopularcomputerarchitecturesintheworldtoday.Much ofwhatwe’llbediscussingaswebeginourinvestigationwillbecommontoallthreeprocessor families,sostudyingonearchitectureisasgoodasanotherwhenitcomestogaininganunderstandingofbasicprinciples. 159
Chapter7 Anotherreasonforstartingwiththe68000isthatthearchitecturelendsitselftothelearningprocess.Thememoryaddressingmodesarestraightforward,andfromasoftwaredeveloper’spointof view,thelinkagestohighlevellanguagesareeasytounderstand. We’llstartbylookingatthememorymodelagainandusethatasajumpingoffpointforour studyofhowthecomputeractuallyreadsandwritestomemory.Fromahardwareperspective, wealreadyknowthatbecausewejustgotfinishedstudyingit,butwe’llnowlookatitfromthe ISAperspective.Alongtheway,we’llseewhythissometimesstrangewayoflookingatmemory isactuallyveryimportantfromthepointofviewofhigherlevellanguages,suchasCandC++. Ladiesandgentlemen,startyourengines... MemoryStorageConventions Whenyoustartlookingatacomputerfromthearchitecturallevelyousoonrealizethatrelationshipbetweentheprocessorandmemoryisoneofthekeyfactorsindefiningthebehaviorand operationalcharacteristicsofthemachine.Somuchofhowacomputeractuallyaccomplishesits programmingtasksrevolvesarounditsinterfacetomemory.Asyou’llsoonseewhenwebegin toprograminassemblylanguage,mostoftheworkthatthecomputerdoesinvolvesmovingdata inandoutofmemory.Infact,ifyoucreatedabargraphoftheassemblylanguageinstructions usedinaprogram,whethertheoriginalprogramwaswritteninC++orassemblylanguage,you’ll findthatthemostusedinstructionistheMOVEinstruction.Therefore,oneofourfirstitemsof businessistobuildonourunderstandingofhowmemorysystemsareorganizedandlookatthe memorysystemfromtheprospectiveoftheprocessor.
160
0xFFFFFE
0xF00000
IO Devices
IO Devices
0x01FFFE
0x000000
Figure7.1isafairlytypicalmemorymapfora68Kprocessor.Theprocessorcanuniquelyaddress 16,777,216(16M)bytesofmemory.Ithas23externalwordaddresslinesand2-byteselectorcontrolsignalsforchoosingoneortheotherofthetwobytesstoredinawordlocation.Theexternal databusis16-bitswide,buttheinternaldatapathsareall32bitswide.Subsequentmembersofthe 68Kfamily,beginningwiththe68020,allhad32-bitexternaldatabuses.InFigure7.1weseethat thefirst128Kwords(16-bit)ofmemoryoccupytheaddressrangefrom0x000000to0x01FFFE. Thiswilltypicallybetheprogramcodeandthevectortables.Thevectortableoccupiesthefirst 256longwordsofmemory.Thisisthereasonthatnonvolatile,read-onlymemoryoccupiesthe loweraddressrange.TheupperaddressrangeusuallycontainstheRAMmemory.Thisisvolatile, read/writememorywherevariablesarestored.Therestofthememoryspacecontainstheaddress ofI/Odevices.Allthe Program codeand andinitialization initialization vectors Program code vectors restofthememoryin Stack and heap, variable storage Stack and heap, variable storage Figure7.1isempty space.Also,we’ll seeinamomentwhy RAM ROM Empty Empty Empty theaddressesforthe 512K x 16 64K x 16 Space Space Space lastwordofROM andthelastwordof RAMare0x01FFFE and0xFFFFFE, respectively. Figure7.1:Memorymapfora68K-basedcomputersystem.
MemoryOrganizationandAssemblyLanguageProgramming Since8-bits,orabyte,isthesmallestdataquantitythatweusuallydealwith,storingbyte-sized characters(chars)inmemoryisstraightforwardwhenwe’redealingwithamemorythatisalso 8-bitswide.However,ourPCsallhavememoriesthatare32bitswideandlotsofcomputersarein usetodaythathave16bitdatapathwidths.Asanextremeexample,wedon’twanttowaste24bits ofstoragespacesimplybecausewewanttostorean8-bitquantityina32-bitmemory,socomputersaredesignedtoallowbyteaddressing.Thinkofbyteaddressingina32-bitwordasanexample ofpagingwiththepageaddressbeingthe32-bitwordandtheoffsetbeingthefourpossiblebyte positionsinthepage.However,thereisonemajordifferencebetweenbyteaddressingandreal paging.Withbyteaddressing,wedonothaveaseparatewordaddressthatcorrespondstoapage address.Figure7.2showsthisimportantdistinction. Wecallthistypeofmemorystoragebytepackingbecauseweareliterallypackingthe32-bitmemoryfullofbytes.Thistypeofaddressingintroducesseveralambiguities.One,isquiteserious,and theothersaremerelynewtous.Wediscusstheseriousoneinamoment.ReferringtoFigure7.2 weseethatthebyteatmemoryaddressFFFFF0(char)andthe32-bitlongword(int)atFFFFF0 havethesameaddress.Isn’tthisadisasterwaitingtohappen?Theanswerisadefinite“maybe.”In CandC++,forexample,youmustdeclareavariableanditstypebeforeyoucanuseit.Nowyou seethereasonforit.UnlessthecompilerknowsthetypeofthevariablestoredataddressFFFFF0, itdoesn’tknowthetypeofcodeitmustgeneratetomanipulateit.Also,ifyouwanttostoreachar atFFFFF0,thenthecompilerhastoknowhowmuchstoragespacetoallocatetoit. Also,noticethatwecannotaccess32-bitwordsataddressesotherthanthosethataredivisibleby 4.Someprocessorswillallowustostorea32-bitvalueatanoddboundary,suchasbyteaddresses 000003-000006,butmanyotherswillnot.Thereasonisthattheprocessorwouldhavetomake severaladditionalmemoryoperationstoreadintheentirevalueandthendosomeextraworkto reconstructthebytesintothecorrectorder.Wecallthisanonalignedaccessanditisgenerally quitecostlyintermsofprocessorperformancetoallowittohappen.Infact,ifyoulookatthe memorymapofhowacompilerstoresobjectsandstructuresinmemory,you’lloftenseespaces Long Word Address 000000
Byte 0 – Address 000000
Byte 1 – Address 000001
Byte 2 – Address 000002
Byte 3 – Address 000003
000004
Byte 0 – Address 000004
Byte 1 – Address 000005
Byte 2 – Address 000006
Byte 3 – Address 000007
000008
Byte 0 – Address 000008
Byte 1 – Address 000009
Byte 2 – Address 00000A
Byte 3 – Address 00000B
00000C
Byte 0 – Address 00000C
Byte 1 – Address 00000D
Byte 2 – Address 00000E
Byte 3 – Address 00000F
000010
Byte 0 – Address 000010
Byte 1 – Address 000011
Byte 2 – Address 000012
Byte 3 – Address 000013
FFFFF0
Byte 0 – Address FFFFF0
Byte 1 – Address FFFFF1
Byte 2 – Address FFFFF2
Byte 3 – Address FFFFF3
FFFFF4
Byte 0 – Address FFFFF4
Byte 1 – Address FFFFF5
Byte 2 – Address FFFFF6
Byte 3 – Address FFFFF7
FFFFF8
Byte 0 – Address FFFFF8
Byte 1 – Address FFFFF9
Byte 2 – Address FFFFFA Byte 3 – Address FFFFFB
FFFFFC
Byte 0 – Address FFFFFC Byte 1 – Address FFFFFD Byte 2 – Address FFFFFE Byte 3 – Address FFFFFF
Figure7.2:Relationshipbetweenwordaddressingandbyteaddressingina32-bitwideword. 161
Chapter7 inthedata,correspondingtointentionalgapssoasnottocreateregionsofdatacontainingnonalignedaccesses. Alsonoticethatwhenwearedoing32-bitwordaccesses,addressbitsA0andA1aren’tbeing used.Thismightpromptyoutoask,“Ifwedon’tusethem,whatgoodarethey?”However,wedo needthemwhenweneedtoaccessaparticularbytewithinthe32-bitwords.A0andA1areoften calledthebyteselectoraddresslinesbecausethatistheirmainfunction.Anotherpointisthatwe reallyonlyneedbyteselectorswhenwearewritingtomemory.Readingfrommemoryisfairly harmless,butwritingchangeseverything.Therefore,youwanttobesurethatyoumodifyonlythe byteyouareinterestedinandnottheothers.Fromahardwaredesigner’sperspectivehavingbyte selectorsallowsyoutoqualifythewriteoperationtoonlythebytethatyouareinterestedin. Manyprocessorswillnotexplicitlyhavethebyteselectoraddresslinesatall.Rather,theyprovide signalsonthestatusbuswhichareusedtoqualifytheWRITEoperationstomemory.Whatabout storing16-bitquantities(ashortdatatype)in32-bitmemorylocations?Thesamerulesapply inthiscase.Theonlyvalidaddresseswouldbethoseaddressesdivisibleby2,suchas000000, 000002,000004,andsoon.Inthecaseof16-bitwordaddressing,thelowestorderaddressbit,A0, isn’tneeded.Forour68Kprocessor,whichhasa16-bitwidedatabustomemory,wecanstore twobytesineachwordofmemory,soA0isn’tusedforwordaddressingandbecomesthebyte selectorfortheprocessor.
andREADsignalshavebeenomittedforclarity.
128K × 8 RAM #4
D24 – D31
D16 – D23
128K × 8 RAM #3
D8 – D15
128K × 8 RAM #2
Address Decoder
128K × 8 RAM #1
D0 – D7
32-bit microprocessor
Figure7.3showsatypiA2 cal32-bitprocessorand A3 A4 D0 A5 memorysysteminterface. D1 A6 D2 A7 TheREADsignalfromthe D3 A8 D4 A9 processorandtheCHIP D5 A10 D0 D0 D0 D0 D6 A11 D1 D1 D1 D1 D7 SELECTsignalshavebeen A12 A0 A0 A0 A0 D2 D2 D2 D2 D8 A13 A1 A1 A1 A1 D3 D3 D3 D3 D9 A14 omittedforclarity.The A2 A2 A2 A2 D4 D4 D4 D4 D10 A15 A3 A3 A3 A3 D5 D5 D5 D5 D11 A16 A4 A4 A4 A4 processorhasa32-bitdata D6 D6 D6 D6 D12 A17 A5 A5 A5 A5 D7 D7 D7 D7 D13 A18 A6 A6 A6 A6 busanda32-bitaddress D14 A19 A7 A7 A7 A7 D15 A20 A8 A8 A8 A8 D16 bus.Thememorychips A21 A9 A9 A9 A9 D17 A22 A10 A10 A10 A10 D18 A23 representonepageofRAM A11 A11 A11 A11 D19 A24 A12 A12 A12 A12 D20 A25 A13 A13 A13 A13 somewhereintheaddress D21 A26 A14 A14 A14 A14 D22 A27 A15 A15 A15 A15 spaceoftheprocessor. D23 A28 A16 A16 A16 A16 D24 A29 D25 Theexactpageofmemory A30 D26 A31 D27 WR WR WR WR wouldbedeterminedby D28 D29 thedesignoftheAddress WE0 D30 WE1 D31 Decoderlogicblock.The WE2 WE3 RAMchipseachhavea capacityof1Mbitandare Figure7.3:Memoryorganizationfora32-bitmicroprocessor.Chipselect organizedas128Kby8. Sincewehavea32-bitwide databusandeachRAMchiphaseightdataI/Olines,weneedfourmemorychipsper128Kwide page.Chip#1isconnectedtodatalinesD0throughD7,chip#2isconnectedtodatalinesD8 162
MemoryOrganizationandAssemblyLanguageProgramming throughD15,chip#3isconnectedtodatalinesD6throughD23andchip#4isconnectedtodata linesD24toD31,respectively. Theaddressbusfromtheprocessorcontains30addresslines,whichmeansitiscapableofaddress 230longwords(32-bitwide).Theadditionaladdressingbitsneededtoaddressthefulladdress spaceof232bytesareimplicitlycontrolledbytheprocessorinternallyandexplicitlycontrolled throughthe4WRITEENABLEsignalslabeledWE0throughWE3. AddresslinesA2throughA18fromtheprocessorareconnectedtoaddressinputsA0throughA16 oftheRAMchips,withA2fromtheprocessorbeingconnectedtoA0oneachofthe4chips,and soon.Thismayseemoddatfirst,butitshouldmakesensetoyouafteryouthinkaboutit.Infact, thereisnospecialreasonthateachaddresslinefromtheprocessormustbeconnectedtothesame addressinputpinoneachofthememorydevices.Forexample,A2fromtheprocessorcouldbe connectedtoA14ofchip#1,A3ofchip#2,A8ofchip#3andA16ofchip#4.Thesameaddress fromtheprocessorwouldclearlybeaddressingdifferentbyteaddressesineachofthe4memory chips,butaslongasallofthe17addresslinesarefromtheprocessorareconnectedtoall17 addresslinesofthememorydevices,thememoryshouldworkproperly. Theupperaddressbitsfromtheprocessor,A19throughA31areusedforthepageselectionprocess.ThesesignalsareroutedtotheaddressdecodinglogicwheretheappropriateCHIPSELECT signalsaregenerated.ThesesignalshavebeenomittedfromFigure7.3.Thereare13higherorder addressbitsinthisexample.Thisgivesus213or8,192pagesofmemory.Eachpageofmemory holds128K32-bitwidewords,whichworksouttobe230longwords.Intermsofaddresses,each pageactuallycontains512Kbytes,sothebyteaddressrangeoneachpagegoesfrombyteaddress (inHEX)00000through7FFFF.Recallthatinthismemoryscheme,thereare8,192pageswith eachpageholding512Kbytes.Thus,pageaddressesgofrom0000through1FFF.Itmaynotseem obvioustoyoutoseehowthepageaddressesandtheoffsetaddressesarerelatedtoreachotherbut ifyouexpandthehexadecimaladdressestobinaryandlaythemnexttoeachotheryoushouldsee thefull32-bitaddress. Now,let’sseehowtheprocessordealswithdatasizessmallerthanlongwords.Assumethatthe processorwantstoreadalongwordfromaddressABCDEF64andthatthisaddressisdecodedto beonthepageofFigure7.3.Sincethisaddressisona32bitboundary,A0andA1=0,andarenot usedaspartoftheexternaladdressthatgoestomemory.However,iftheprocessorwantedtodo awordaccessofeitheroneofthewordslocatedataddressABCDEF64orABCDEF66,itwould stillgeneratethesameexternaladdress.Whenthedatawasreadintotheprocessor,the½ofthe longwordthatwasnotneededwouldbediscarded.SincethisisaREADoperation,thecontents ofmemoryarenotaffected. Iftheprocessorwantedtoreadanyoneofthe4byteslocatedatbyteaddressABCDEF64, ABCDEF65,ABCDEF66orABCDEF67,itwouldstillperformthesamereadoperationasbefore. Again,onlythebyteofinterestwouldberetainedandtheotherswouldbediscarded. Now,let’sconsiderawriteoperation.Inthiscase,weareconcernedaboutpossiblycorrupting memory,sowewanttobesurethatwhenwewriteaquantitysmallerthanalongwordtomemory, wedonotaccidentallywritemorethanweintendto.So,supposethatwewanttowritethebyte 163
Chapter7 atmemorylocationABCDEF65.Inthiscase,onlytheWE1signalwouldbeasserted,soonly thatbytepositioncouldbemodified.Thus,towriteabytetomemory,weonlyactivateoneofthe 4WRITEENABLEsignals.TowriteawordtomemorywewouldactiveeitherWE0andWE1 togetherorWE2andWE3.Finally,towritealongword,allfouroftheWRITEENABLElines wouldbeasserted. Whataboutthecaseofa32-bitwordstoredina16-bitmemory?Inthiscase,the32-bitword canbestoredonanyevenwordboundarybecausetheprocessormustalwaysdotwoconsecutive memoryaccessestoretrievetheentire32-bitquantity.However,mostcompilerswillstilltryto storethe32-bitwordsonnaturalboundaries(addressesdivisibleby4).Thisiswhyassemblylanguageprogrammerscanoftensavealittlespaceorspeedupanalgorithmbyoverridingwhatthe compilerdoestogeneratecodeandtweakingitforgreaterefficiency. Let’sgetbackontrack.Fora32-bitprocessor,addressbitsA2...A31areusedtoaddressthe 1,073,741,824possiblelongwords,andA0...A1addressthefourpossiblebyteswithinthelong word.Thisgivesusatotalof4,294,967,296addressablebytelocationsina32-bitprocessor.In otherwords,wehaveabyteaddressingspaceof4GB.Aprocessorwitha16-bitwidedatabus, suchasthe68K,usesaddresslinesA1–A23forwordaddressingandA0forbyteselection. Combiningallofthis,youshouldseetheproblem.Youcouldhavean8-bitbyte,a16-bitwordora 32-bitwordwiththesameaddress.Isn’tthisambiguous?Yesitis.Whenwe’reprogrammingina high-levellanguage,wedependuponthecompilertokeeptrackofthesemessydetails.Thisisone reasonwhycastingonevariabletypetoanothercanbesodangerous.Whenweareprogramming inalow-levellanguage,wedependupontheskilloftheprogrammertokeeptrackofthis. Seemseasyenough,butit’snot.Thislittlefactofcomputerlifeisoneofthemajorcausesofsoftwarebugs.Howcanasimpleconceptbesocomplex?It’snotcomplex,it’sjustambiguous.Figure 7.4illustratestheproblem.TheleftmostcolumnofFigure7.4showsastring(aptlynamed“string”) storedinan8-bitmemoryspace.EachASCIIcharacteroccupiessuccessivememorylocations. 7
Bit
S T R I N G
0
0x0000
Bit 0
87
15 0x00001
Byte Addressable memory for an 8-bit processor with a 16-bit addressing range
T I G
S R N
Bit
0x00000
87
15
0
0x000000
Byte Addressable memory for a 16-bit processor with a 20-bit addressing range Intel 80186 Little Endian
S R N
T I G
0x000001
Byte Addressable memory for a 16-bit processor with a 24-bit addressing range MC68000 Big Endian
0xFFFE 0xFFFF
0xFFFFF
0xFFFFE
0xFFFFFE
0xFFFFFF
Figure7.4:Twomethodsofpackingbytesinto16-bitmemorywords.Placingtheloworderbyteatthe loworderendofthewordiscalledLittleEndian.Placingtheloworderbyteatthehighordersideof thewordiscalledBigEndian. 164
MemoryOrganizationandAssemblyLanguageProgramming Themiddlecolumnshowsa16-bitmemorythatisorganizedsothatsuccessivebytesarestored righttoleft.ThebytecorrespondingtoA0=0isalignedwiththeloworderportion,DB0...DB7, ofthe16-bitwordandthebytecorrespondingtoA0=1isalignedwiththehighorderportion, DB8..DB15,ofthe16-bitword.ThisiscalledLittleEndianorganization.Therightmostcolumn storesthecharactersassuccessivebytesinalefttorightfashion.Thebytepositioncorrespondingto A0=0isalignedwiththehighorderportionofthe16-bitword.ThisiscalledBigEndianorganization.Asanexercise,considerhowthebytesarestoredinFigure7.2.AretheybigorlittleEndian? MotorolaandIntel,chosetousedifferentendianconventionsandPandora’sBoxwasopenedfor theprogrammingworld.Thus,CorC++codewrittenforoneconventionwouldhavesubtlebugs whenportedtotheotherconvention.Itgetsworsethanthat.Engineersworkingtogetheron projectsmisinterpretspecificationsiftheintentisoneconventionandtheyassumetheother.The ARMarchitectureallowstheprogrammertoestablishwhichtypeof“endianess”willbeusedby theprocessoratpower-up.Thus,whiletheARMprocessorcandealwitheitherbigorlittleendian, itcannotdynamicallyswitchmodesoncetheendianessisestablished.Figure7.5showsthe differencebetweenthetwo 87 8 7 0 24 23 16 15 0 31 24 23 16 15 31 conventionsfora32-bitword packedwithfourbytes. A0 0 1 0 1 A0 1 0 1 0 A1
1
1
0
0
A1
0
0
1
1
Ifyoutakeawayanything Little Endian Big Endian fromthistext,rememberthis Figure7.5:Bytepackinga32-bitwordinLittleEndianandBigEndian problembecauseyouwillsee modes. itatleastonceinyourcareer asasoftwaredeveloper. Beforeyouaccusemeofbeatingthissubjecttodeath,let’slookatitonemoretimefromthe hardwareperspective.Thewholeareaofmemoryaddressingcanbeveryconfusingfornoviceprogrammersaswellasseasonedveterans.Also,therecanbeambiguitiesintroducedbyarchitectures andmanufacturer’sterminology.So,let’slookathowMotorolahandleditforthe68Kandperhaps thiswillhelpustobetterunderstandwhat’sreallygoingon,atleastinthecaseoftheMotorola processor,eventhoughwehavealreadylookedattheproblemoncebeforeinFigure7.3.Figure 7.6summarizesthememoryaddressingschemeforthe68Kprocessor. The68Kprocessoriscapableofdirectlyaddressing16Mbytesofmemory,requiring24“effective”addressinglines.Why?Because224=16,777,216.InFigure7.6wesee23addresslines.The missingaddressline,A0,issynthesizedbytwoadditionalcontrolsignals,LDSandUDS. Fora16-bitwideexternaldatabus,wewouldnormallyaddressbitA0tobethebyteselector. WhenA0is0,wechoosetheevenbyte,andwhenA0=1,wechoosetheoddbyte.Theendiannessofthe68KisBigEndian,sothattheevenbyteisalignedwithD8throughD15ofthedata bus.Referringtofigure4.3.1weseethattherearetwostatusbussignalscomingoutoftheprocessor,designateUDS,orUpperDataStrobe,andLDS,orLowerDataStrobe. Whentheprocessorisdoingabyteaccesstomemory,theneitherLDSorUDSisassertedto indicatetothememorywhichpartofthewordisbeingaccessed.Ifthebyteattheevenaddress
165
Chapter7 0x000002
0x000001
0x000003
0xFFFFFE 0xFFFFFC
0x000000
D15
D8 D7
68000 Processor
Note: A word access on a byte boundary would Note: A word access on a byte boundary would require two memory operations to complete and require two memory operations to complete and is not allowed in the 68000 processor. is not allowed in the 68000 processor.
0xFFFFFF
Byte Access
0xFFFFFD
D0
A0=0: LDS=1, UDS=0 A0=1: UDS=1, LDS=0 Word Access A0=0: LDS=0, UDS=0 A0=1: LDS=1, UDS=1
A1..A23
Lower Data Strobe (LDS)
Upper Data Strobe (UDS)
Figure7.6:MemoryaddressingmodesfortheMotorola68Kprocessor
isbeingaccessed(A0=0),thenUDSisassertedandLDSstaysHIGH.Iftheoddbyteisbeing accessed(A0=1),thenLDSisassertedandUDSremainsintheHIGH,orOFF,state.Foraword access,bothUDSandLDSareasserted.ThisbehaviorissummarizedinthetableofFigure7.6. YouwouldnormallyuseLDSandUDSasgatingsignals tothememorycontrolsystem.Forexample,youcouldthe circuitshowninFigure7.7tocontrolwhichofthebytesare beingwrittento.
WR LDS
OR
WE1
Youmaybescratchingyourheadaboutthiscircuit.Why OR WE0 areweusingORgates?Wecananswerthequestionintwo UDS ways.First,sinceallofthesignalsareassertedLOW,weare Figure 7.7: Simple circuit to control reallydealingwiththenegativelogicequivalentofanAND thebytewritingofa68Kprocessor function.Thegatethathappenstobethenegativelogic equivalentoftheANDgateistheORgate,sincetheoutputis0ifandonlyifbothinputsare0. ThesecondwayoflookingatitisthroughtheequivalenceequationsofDeMorgan’sTheorems. Recallthat: (A*B)=A+B(1) (A+B)=A*B(2) Inthiscase,equation1showsthattheORofAandBwouldbeequivalenttousingpositivelogic AandBandthenobtainingtheNANDofthetwosignals. Now,supposethatyouattemptedtodoawordaccessatanoddaddress.Forexample,supposethat youwrotethefollowingassemblylanguageinstruction: move.wD0,$1001 *Thisisanon-alignedaccess! 166
MemoryOrganizationandAssemblyLanguageProgramming Thisinstructiontellstheprocessortomake acopyofthewordstoredininternalregister D0andstorethecopyofthatwordinexternal memorybeginningatmemoryaddress$1001. Theprocessorwouldneedtoexecutetwomemorycyclestocompletetheaccessbecausethe byteorderrequiresthatitbridgethetwomemory locationstocorrectlyplacethebytes.Some processorsarecapableofthistypeofaccess, butthe68Kisoneoftheprocessorsthatcan’t. Ifanonalignedaccessoccurs,theprocessorwill generateanexceptionandtrytobranchtosome user-definedcodethatwillcorrecttheerror,orat least,diegracefully.
EVEN BYTE 7
6
5
ODD BYTE 4
2
1
0
7
6
5
4
3
2
1
0
3
2
1
0
1 LONG WORD = 32 BITS 15
14
13
12
11
MSB
10
9
8
7
6
5
4
HIGH ORDER WORD
LONG WORD 0 LOW ORDER WORD
LSB
MSB LONG WORD 1 LSB MSB LONG WORD 2 LSB
Figure7.8:Memorystorageconventionsforthe
Sincethe68Kprocessorisa32-bitwideprocesMotorola68000processor sorinternally,butwithonlya16-bitwidedata bustoexternalmemory,weneedtoknowhowitstores32-bitquantitiesinexternalmemoryas well.Figure7.8showsustheconventionforstoringlongwordsinmemory.
AlthoughFigure7.8mayseemconfusing,itisjustarestatementofFigure7.2inasomewhatmore abbreviatedformat.Thefiguretellsusthat32-bitdataquantities,calledlongsorlongwordsare storedinmemorywiththemostsignificant16bits(D16–D31)storedinthefirstwordaddress locationandtheleastsignificant16bits(D0–D15)storedinthenexthighestwordlocation.Also, evenbyteaddressesarealignedwiththehighorderportionofthe16-bitwordaddress(BigEndian). IntroductiontoAssemblyLanguage ThePCworldisdominatedbyaninstructionsetarchitecture(ISA)firstdefinedbyIntelover twenty-fiveyearsago.Thisarchitecture,calledX86becauseofitsfamilymembers,hasthe followinglineage: 8080808680186802868038680486Pentium Theworldofembeddedmicroprocessors—thatismicroprocessorsusedforasinglepurposeinside adevice,suchasacellphone—isdominatedbytheMotorola680X0ISA: 680006801068020680306804068060ColdFire ColdFireunitesamodernprocessorarchitecture,calledRISC,withbackwardcompatibilitywith theoriginal68KISA.(We'llstudythesearchitecturesinalaterlesson.)Backwardcompatibilityis veryimportantbecausethereissomuch68Kcodestillaroundandbeingused.TheMotorola68K instructionsetisoneofthemoststudiedISAsaroundandyourcanfindanincrediblenumberof hitsifyoudoaWebsearchon“68K”or“68000.” Everycomputersystemhasafundamentalsetofoperationsthatitcanperform.Theseoperations aredefinedbytheinstructionsetoftheprocessor.Thereasonforaparticularsetofinstructions isduetothewaythecomputerisorganizedinternallyanddesignedtooperate.Thisiswhatwe 167
Chapter7 wouldcallthearchitectureofthecomputer.Thearchitectureismirroredbytheassemblylanguage instructionsthatitcanexecute,becausetheseinstructionsarethemechanismbywhichweaccess thecomputer’sresources.Theinstructionsetistheatomicelementoftheprocessor.Allofthe complexoperationsareachievedbybuildingsequencesofthesefundamentaloperations. Computersdon’treadassemblylanguage.Theytaketheirinstructionsinmachinecode.As youknow,themachinecodedefinestheentrypointintothestatemachinemicrocodetablethat startstheinstructionexecutionprocess.Assemblylanguageisthehuman-readableformofthese machinelanguageinstructions.Thereisnothingmysticalaboutthemachinelanguageinstructions, andprettysoonyou’llbeabletounderstandthemandseethepatternsthatguidetheinternalstate machinesofthemodernmicroprocessor.Fornow,we’llfocusonthetaskoflearningassembly language.ConsiderFigure7.9. Instead Insteadofofwriting writingaaprogram programinin machine language as: machine language as: 00000412 00000412307B7048 307B7048 00000416 327B704A 00000416 327B704A 0000041A 1080 0000041A 1080 0000041C 0000041CB010 B010 000041E 67000008 000041E 67000008 00000422 1600 00000422 1600 00000424 0000042461000066 61000066 00000428 5248 00000428 5248 0000042A B0C9 0000042A B0C9
We write the program in assembly language as: MOVEA.W MOVEA.W MOVE.B CMP.B BEQ MOVE.B BSR ADDQ.W CMPA.W
(TEST_S,PC,D7),A0 (TEST_E,PC,D7),A1 D0,(A0) (A0),D0 NEXT_LOCATION D0,D3 ERROR #01,A0 A1,A0
*We’ll use address indirect *Get the end address *Write the byte *Test it *OK, keep going *copy bad data *Bad byte *increment the address *are we done?
Figure7.9:Theboxontherightisasnippetof68Kcodeinassembly language.Theboxontheleftisthemachinelanguageequivalent. Note:WerepresenthexadecimalnumbersinCorC++withtheprefix‘0x’.Thisisastandardizationofthe language.Thereisnocorrespondingstandardizationinassemblylanguage,anddifferentassemblerdevelopers representhexnumbersindifferentways.Inthistextwe’lladopttheMotorolaconventionofusingthe‘$’prefixfora hexadecimalnumber.
Themachinelanguagecodeisactuallytheoutputoftheassemblerprogramthatconvertstheassemblylanguagesourcefile,whichyouwritewithatexteditor,intothecorrespondingmachine languagecodethatcanbeexecutedbya680X0processor.Thelefthandboxactuallyhastwo columns,althoughitmaybedifficulttoseethat.Theleftcolumnstartswiththehexadecimal memorylocationwherethemachinelanguageinstructionisstored.Inthiscasethememorylocation$00000412holdsthemachinelanguageinstructioncode0x307B7048.Thenextinstruction beginsatmemorylocation0x00000416andcontainstheinstructioncode0x327B704A.Thesetwo machinelanguageinstructionsaregivenbytheseassemblylanguageinstructions. MOVEA.W(TEST_S,PC,D7),A0 MOVEA.W(TEST_E,PC,D7),A1
Soonyou’llseewhattheseinstructionsactuallymean.Fornow,wecansummarizetheabove discussionthisway: • Startingatmemorylocation$00000412,andrunningthroughlocation$00000415,isthe machineinstructioncode$307B7048.Theassemblylanguageinstructionthatcorresponds tothismachinelanguagedataisMOVEA.W(TEST_S,PC,D7),A0 168
MemoryOrganizationandAssemblyLanguageProgramming •
Startingatmemorylocation$00000416,andrunningthroughlocation$00000419,is themachineinstructioncode$327B704A.TheassemblylanguageinstructionthatcorrespondstothismachinelanguagedataisMOVEA.W(TEST_E,PC,D7),A1 Also,forthe68Kinstructionset,thesmallestmachinelanguageinstructionis16-bitslong(4hex digits).Noinstructionwillbesmallerthan16-bitslong,althoughsomeinstructionsmaybeaslong as5,16-bitwordslong. Thereisa1:1correspondencebetweenassemblylanguageinstructionsandmachinelanguage instructions.Theassembly MOVE.B moveabyteofdata languageinstructionsare MOVEA.W moveawordofdatatoanaddressregister calledmnemonics.Theyare CMP.B comparethemagnitudeoftwobytesofdata designedtobeashorthand BEQ branchtoadifferentinstructioniftheresultequalszero clueastowhattheinstruction ADDQ.W add(quickly)twovalues BRA alwaysbranchtoanewlocation actuallydoes.Forexample: You’llnoticethatI’vechosenadifferentfontfortheassemblylanguageinstructions.Thisis becausefontswithfixedspacing,like“courier”,keepthecharactersincolumnalignment,which makesiteasiertoreadassemblylanguageinstructions.There’snolawthatyoumustusethisfont, theassemblerprobablydoesn’tcare,butitmightmakeiteasierforyoutoreadandunderstand yourprogramsifyoudo. Thepartoftheinstructionthattellsthecomputerwhattodo,iscalledtheopcode(shortfor“operationcode”).Thisisonlyonehalfoftheinstruction.Theotherhalftellsthecomputerhowand wherethisoperationshouldbeperformed.Theactualopcode,forexampleMOVE.B,isactually anopcodeandamodifier.TheopcodeisMOVE.Itsaysthatsomedatashouldbemovedfromone placetoanother.Themodifieristhe“.B”suffix.Thistellsittomoveabyteofdata,ratherthana wordorlongword.Inordertocompletetheinstructionwemusttelltheprocessor: 1. wheretofindthedata(thisiscalledoperand1),and 2. wheretoputtheresult(operand2). Acompleteassemblylanguageinstructionmusthaveanopcodeandmayhave0,1or2operands. Here’syourfirstassemblylanguageinstruction.It’scalledNOP(pronouncedNoop).Itmeans donothing.Youmightbequestioningthesanityofthisinstructionbutitisactuallyquiteuseful. Compilersmakeverygooduseofthem.TheNOPinstructionisanexampleofaninstructionthat takes0operands. TheinstructionCLR.LD4isanexampleofaninstructionthattakesoneoperand.Itmeansto clear,orsettozero,all32bits(the“.L”modifier)oftheinternalregister,D4. TheinstructionMOVE.WD0,D3isanexampleofaninstructionthattakestwooperands.Notethe commaseparatingthetwooperands,D0andD3.Theinstructiontellstheprocessortomove16 bitsofdata(the“.W”modifier)fromdataregisterD0todataregisterD3.ThecontentsofD0are notchangedbytheoperation.Allassemblylanguageprogramsconformtothefollowingstructure: Column1 Column2
LABEL
Column3
OPCODE OPERAND1,OPERAND2
169
Column4....
*COMMENT
Chapter7 Eachinstructionoccupiesonelineoftext,startingincolumn1andgoingupto132columns. 1. TheLABELfieldisoptional,butitmustalwaysstartinthefirstcolumnofaline.We’ll soonseehowtouselabels. 2. TheOPCODEisnext.Itmustbeseparatedfromthelabelbywhitespace,suchasaTAB characterorseveralspaces,anditmuststartincolumn2orlater. 3. Next,theoperandsareseparatedfromtheopcodebywhitespace,usuallyatabcharacter. Thetwooperandsshouldbeseparatedfromeachotherbyacomma.Thereisnowhite spacebetweentheoperandsandthecomma. 4. Thecommentisthelastfieldoftheline.Itusuallystartswithanasteriskorasemi-colon, dependinguponwhichassemblerisbeingused.Youcanalsohavecommentlines,butthen theasteriskmustbeincolumn1.
Label Althoughthelabelisoptional,itisaveryimportantpartofassemblylanguageprogramming.You alreadyknowhowtouselabelswhenyougiveasymbolicnametoavariableoraconstant.You alsouselabelstonamefunctions.Inassemblylanguagewecommonlyuselabelstorefertothe memoryaddresscorrespondingtoaninstructionordataintheprogram.Thelabelmustbedefined incolumn1ofyourprogram.Labelsmaketheprogrammuchmorereadable.Itispossibletowrite aprogramwithoutusing TEST_LOOP MOVE.B (A2),D6 *Let D6 test the patterns for done labels,butalmostnoone CMPI.B #END_TEST,D6 *Are we done? everdoesitthatway.The BEQ DONE *We've done the 4 patterns LEA ST_ADDR,A0 *Set up the starting address in A0 labelallowstheassembler LEA END_ADDR,A1 *Set up the ending address in A1 JSR DO_TEST *Go to the test programtoautomatically ADDA.W #01,A2 *Point to the next test pattern (andcorrectly!)calculatethe BRA TEST_LOOP *Go back to the next location STOP #EXIT *Test is over, stop addressesofoperandsand DONE destinations.Forexample, Figure7.10:Snippetofassemblylanguagecodedemonstratingtheuse considerthefollowingsnip- oflabels. petofcodeinFigure7.10. ThecodeexampleofFigure7.10hastwolabels,TEST_LOOPandDONE.Theselabelscorrespondtothememorylocationsoftheinstructions,“MOVE.B(A2),D6”and“BRATEST_LOOP” respectively.Astheassemblerprogramconvertstheassemblylanguageinstructionsintomachine languageinstructionsitkeepstrackofwhereinmemoryeachinstructionwillbelocated.When itencountersalabelasanoperand,itreplacesthelabeltextwiththenumericvalue.Thus,the instruction“BEQDONE”tellstheassemblertocalculatethenumericvaluenecessarytocause theprogramtojumptotheinstructionatthememorylocationcorrespondingtothelabel“DONE” ifthetestcondition,equality,ismet.We’llsoonseehowtotestthisequality.Ifthetestfails,the branchinstructionisignoredandthenextinstructionisexecuted. Comments Beforewegettoofaroffshore,weneedtomakeafewcommentsabouttheproperformfor commentingyourassemblylanguageprogram.AsyoucanseefromFigure7.10eachassembly languageinstructionhasacommentassociatedwithit.Differentassemblershandlecommentsin differentways.Someassemblersrequirethatcommentsthatareonalinebythemselveshavean 170
MemoryOrganizationandAssemblyLanguageProgramming asterisk‘*’orasemicolon‘;’asthefirstcharacterontheline.Commentsthatareassociatedwith instructionsorassemblerdirectivesmightneedthesemicolonorasterisktobeginthecomment,or theymightnotneedanyspecialcharacterbecausetheprecedingwhitespacedefinesthelocation ofthecommentblock.Theimportantpointisthatassemblylanguagecodeisnotself-documenting,anditiseasyforyoutoforget,afteradayorsogoes,exactlywhatyouweretryingtodowith thatalgorithm. Assemblycodeshouldbeprofuselycommented.Notonlyforyoursanity,butforthepeoplewho willhavetomaintainyourcodeafteryoumoveon.Thereisnoreasonthatassemblycodecannot beaseasytoreadasawell-documentC++program.Useequatesandlabelstoeliminatemagic numbersandtohelpexplainwhatthecodeisdoing.Usecommentblockstoexplainwhatsections ofanalgorithmaredoingandwhatassumptionsarebeingmade.Finallycommenteachinstruction,orsmallgroupofinstructions,inordertomakeitabsolutelyclearwhatisgoingon. Inhisbook,Hackers:HeroesoftheComputerRevolution,StevenLevy1describesthecodingstyle ofPeterSamson,anMITstudent,andearlyprogrammer, …Samson,though,wasparticularlyobscureinrefusingtoaddcommentstohissource code,explainingwhathewasdoingatagiventime.Onewell-distributedprogramSamson wrotewentonforseveralhundredsofassemblylanguageinstructions,withonlyonecommentbesideaninstructionwhichcontainedthenumber1750.ThecommentwasRIPJSB, andpeoplerackedtheirbrainsaboutitsmeaninguntilsomeonefiguredoutthat1750was theyearthatBachdied,andthatSamsonhadwrittenanabbreviationforRestInPeace JohannSebastianBach. Programmer’sModelArchitecture Inordertoprograminassemblylanguage,wemustbefamiliarwiththebasicarchitectureofthe processor.OurviewofthearchitectureiscalledtheProgrammer’sModeloftheprocessor.We mustunderstandtwoaspectsofthearchitecture: 1. theinstructionset,and 2. theaddressingmodes. Theaddressingmodesofacomputerdescribethedifferentwaysinwhichitaccessestheoperands, orretrievesthedatatobeoperatedon.Then,theaddressingmodesdescribewhattodowiththe dataaftertheoperationiscompleted.Theaddressmodesalsotelltheprocessorhowtocalculate thedestinationofanonsequentialinstructionfetch,suchasabranchorjumptoanewlocation. Addressingmodesaresoimportanttotheunderstandingofthecomputerwe’llneedtostudythem abitbeforewecancreateanassemblylanguageprogram.Thus,unlikeC,C++orJAVA,we’ll needtodevelopacertainlevelofunderstandingforthemachinethatwe’reprogrammingbefore wecanactuallywriteaprogram. UnlikeCorC++,assemblylanguageisnotportablebetweencomputers.Anassemblylanguage programwrittenforanIntel80486willnotrunonaMotorola68000.ACprogramwrittento runonanIntel80486mightbeabletorunonaMotorola68000oncetheoriginalsourcecodeis recompiledforthe68000instructionset,butdifferencesinthearchitectures,suchasbigandlittle endian,maycauseerrorstobeintroduced. 171
Chapter7 Itmightbeworthwhileandstopforamomenttoreflectonwhy,asaprogrammer,itisimportant tolearnassemblylanguage.Computerscienceandprogrammingdependsuponaworkingknowledgeoftheprocessor,itslimitationsanditsstrengths.Tounderstandassemblylanguageisto understandthecomputingenginethatyourcodeisrunningon.Eventhoughhigh-levellanguages likeC++andJAVAdoagoodjobofabstractingthelowleveldetails,itisimportanttokeepin mindthattheengineisnotinfinitelypowerfulandthatitsresourcesarenotlimitless. Assemblylanguageistightlycoupledtothedesignoftheprocessorandrepresentsthefirstlevel ofsimplificationofthebinaryinstructionsetarchitectureintoahumanreadableform.Generally thereisa1:1matchbetweentheassemblylanguageinstructionandthebinaryorhexadecimal instructionthatresults.ThisisverydifferentfromCorC++,whereoneCstatement,maygenerate hundredsoflinesofassemblycode. Itisstilltruethatyoucangenerallywritethetightestcodeinassemblylanguage.WhileCcompilershavegottenprettygood,they’renotperfect.Aprogramwritteninassemblylanguagewilloften uselessspaceandrunfasterthanaprogramwiththesamefunctionalitywritteninC.Manygames arestillcodedinassemblylanguageforexactlythatreason. Theneedforefficientcodeisespeciallytrueofinterrupthandlersandalgorithmswrittenfor specializedprocessors,suchasdigitalsignalprocessors(DSPs).Manyexperienceprogrammers willarguethatanycodethatmustbeabsolutelydeterministiccannotbewritteninC,because youcannotpredictaheadoftimetheexecutiontimeforthecodegeneratedbythecompiler.With assemblylanguageyoucancontrolyourprogramatthelevelofasingleclockcycle.Also,certain partsoftheruntimeenvironmentmustbewritteninassemblybecauseyouneedtobeableto establishtheCruntimeenvironment.Thus,boot-upcodetendstobewritteninassembly.Finally, understandingassemblylanguageiscriticallyimportantfordebuggingrealtimesystems.Ifyou’ve everbeenprograminaprogrammingenvironmentsuchasMicrosoft’sVisualC++®andwhile youaredebuggingyourcodeyouinadvertentlystepintoalibraryfunction,youwillthenfind yourselfknee-deepinx86assemblylanguage. Motorola68000MicroprocessorArchitecture Figure7.11isasimplifiedschematicdiagramofthe68Karchitecture.SinceMotorolahasalarge numberoffamilymembersthisparticulararchitectureisalsoreferredtoasCPU16.CPU16isa subsetoftheCPU32architecture,theISAforthe68020andlaterprocessors. Let’sbrieflyidentifysomeoftheimportantfunctionalblocksthatwe’lllaterbeusing. • Programcounter:Usedtoholdtheaddressofthenextinstructiontobefetchedfrom memory.Assoonasthecurrentinstructionisdecodedbytheprocessortheprogramcounter(PC)isupdatedtopointtotheaddressofthenextsequentialinstructioninmemory. • Generalregisters:The68Kprocessorhas15general-purposeregistersandtwospecialpurposeregisters.Thegeneralpurposeregistersarefurtherdividedintoeightdata registers,D0...D7andsevenaddressregisters,A0...A6.Thedataregistersareused toholdandmanipulatedatavariablesandtheaddressregistersareusedtoholdand manipulatememoryaddresses.Thetwospecialpurposeregistersareusedtoimplement twoseparatestackpointers,A7andA7'.We’lldiscussthestackabitlaterinthetext. 172
MemoryOrganizationandAssemblyLanguageProgramming Holds address of the next instruction to be executed Program Counter (PC)
If necessary holds the address of memory reads/writes
Effective Address Register (EAR) 32
32
Internal Bus Instruction Register(IR) 16 Instruction Decode and Control
General Registers D0..D7 A0..A6 A7= User Stack pointer (USP) A7’=Supervisor Stack Pointer(SSP) Temporary Register
32
External Bus
Memory and I/O Interface Control Pipeline
Holds first word of currently executing instruction
32 Holds operands or intermediate results
Performs all logical or arithmetic operations ( ADD, SHIFT, etc. )
8
Arithmetic and Logic Unit (ALU)
CCR
SR
32
Holds result of ALU Operations
Figure7.11:ArchitectureoftheMotorola68Kprocessor. •
Statusregister:Thestatusregisterisa16-bitregister.Thebitsareusedtodescribethe currentstateofthecomputeronaninstruction-by-instructionbasis.Theconditioncode register,CCR,ispartofthestatusregisterandholdsthebitsthatdirectlyrelatetothe resultofthelastinstructionexecuted. • Arithmeticandlogicunit(ALU):TheALUisthefunctionalblockwhereallofthemathematicalandlogicaldatamanipulationsareperformed. Figure7.12isthe Programmer’sModel ofthe68Karchitecture.Itdiffersfrom Figure7.11because itfocusesonlyonthe detailsofthearchitecturethatisrelevantto writingaprogram.
MSB 31
16,15
DATA REGISTERS
MSB
31
16,15
8,7
LSB 0 D0 D1 D2 D3 D4 D5 D6 D7 0
LSB A0 A1 A2 A3 A4 A5 A6
MSB 31
LSB 0
16,15
A7 (USP) USER STACK POINTER 31
0 PC PROGRAM COUNTER 7
0 CCR
CONDITION CODE REGISTER 31
16,15
0
Inthistext,we’renot ADDRESS A7’ (SSP) goingtodealwiththe REGISTERS SUPERVISOR STACK POINTER 15 8,7 0 statusregister(SR),or CCR SR thesupervisorstack STATUS REGISTER pointer(A7’).Our Figure7.12:Programmer’smodelofthe68Kprocessor worldfromnowon willfocusonD0...D7,A0...A6,A7,CCRandthePC.Fromtheseregisters,we’lllearneverythingthatweneedtoknowaboutthiscomputerarchitectureandassemblylanguageprogramming. 173
Chapter7 ConditionCodeRegister Theconditioncoderegister(CCR)deserves someadditionalexplanation.Theregisteris showninmoredetailinFigure7.13.
DB7 CCR
DB6
DB5
DB4
DB3
X
N
DB2 Z
DB1 V
DB0 C
Figure7.13:ConditionCodeRegister.Theshadedbits arenotused.
TheCCRcontainsasetoffiveconditionbits whosevaluecanchangewiththeresultof eachinstructionbeingexecuted.Theexactdefinitionofeachofthebitsissummarizedbelow. • XBIT(extendbit):usedwithmulti-precisionarithmetic • NBIT(negativebit):indicatesthattheresultisanegativenumber • ZBIT(zerobit):indicatesthattheresultisequaltozero • VBIT(overflow):indicatesthattheresultmayhaveexceededtherangeoftheoperand • CBIT(carrybit):indicatesthatacarrywasgeneratedinamathematicaloperation TheimportanceofthesebitsresideswiththefamilyoftestandbranchinstructionssuchasBEQ, BNE,BPL,BMI,andsoon.Theseinstructionstesttheconditionofanindividualflag,orCCRbit, andeithertakethebranchiftheconditionistrue,orskipthebranchandgotothenextinstruction iftheconditionisfalse.Forexample,BEQmeansbranchequal.Well,equaltowhat?TheBEQ instructionisactuallytestingthestateofthezeroflag,Z.IfZ=1,itmeansthattheresultinthe registeriszero,soweshouldtakethebranchbecausetheresultisequaltozero.Sobranchequal isaninstructionwhichtellstheprocessortotakethebranchiftherewasarecentoperationthat resultedinazeroresult.
ButBEQmeanssomethingelse.Supposewewanttoknowiftwovariablesareequaltoeachother. Howcouldwetestforequality?Simple,justsubtractonefromtheother.Ifwegetaresultofzero (Z=1),they’reequal.Ifthey’renotequaltoeachotherwe’llgetanonzeroresultand(Z=0). Thus,BEQistrueifZ=1andBNE(branchnotequal)istrueifZ=0.BPL(branchplus)istrueif N=0andBMI(branchminus)istrueif(N=1). Thereareatotalof14conditionalbranchinstructionsinthe68Kinstructionset.Sometestonly theconditionofasingleflag,otherstestlogicalcombinationsofflags.Forexample,theBLT instruction(branchlessthan)isdefineby:BLT=N*V+N*V(N⊕V)(notbybacon,lettuce, andtomato).
EffectiveAddresses Note:Eventhoughthe68Kprocessorhasonly24addresslinesexternaltotheprocessor,internallyitisstilla 32-bitprocessor.Thus,wecanrepresentaddressesas32-bitvaluesinourexamples.
Let’snowgobackandlookattheformatoftheinstructions.Perhapsthemostcommonlyused instructionistheMOVEinstruction.You’llfindthatmostofwhatyoudoinassemblylanguageis movingdataaround.TheMOVEinstructiontakestwooperands.Itlookslikethis: MOVE.W source(EA),destination(EA) Forexample: MOVE.W$4000AA00,$10003000 tellstheprocessortocopythe16-bitvaluestoredinmemorylocation0x4000AA00tomemory location0x10003000. 174
MemoryOrganizationandAssemblyLanguageProgramming Also,theMOVEmnemonicisabitmisleading.Whattheinstructiondoesistooverwritethecontentsofthedestinationoperandwiththesourceoperand.Thesourceoperandisn’tchangedbythe operation.Thus,aftertheinstructionbothmemorylocationscontainthesamedataas$4000AA00 didbeforetheinstructionwasexecuted. Inthepreviousexample,thesourceoperandandthedestinationoperandareaddressesinmemory thatareexactlyspecified.Theseareabsoluteaddresses.Theyareabsolutebecausetheinstruction specifiesexactlywheretoretrievethedatafromandwheretoplaceit.Absoluteaddressingisjust oneofthepossibleaddressingmodesofthe68K.Wecalltheseaddressingmodeseffectiveaddresses.Thus,whenwewritethegeneralformoftheinstruction: MOVE.W source(EA),destination(EA) wearesayingthatthesourceanddestinationofthedatatomovewillbeindependentlydetermined bytheeffectiveaddressingmodeforeach. Forexample: MOVE.W D0,$10003000 movesthecontentsofoneoftheprocessor’seightinternaldataregisters,inthiscasedataregister D0,tomemorylocation$10003000.Inthisexample,thesourceeffectiveaddressmodeiscalled dataregisterdirectandthedestinationeffectiveaddressmodeisabsolute. WecouldalsowritetheMOVEinstructionas MOVE.W A0,D5 whichwouldmovethe16-bitword,currentlystoredinaddressregisterA0todataregisterD5. Thesourceeffectiveaddressisaddressregisterdirectandthedestinationeffectiveaddressisdata registerdirect. SupposethatthecontentsofaddressregisterA0is$4000AA00(wecanwritethisinashorthand notationas=$4000AA00).Theinstruction MOVE.W D5,(A0) movesthecontentsofdataregisterD5tomemorylocation$4000AA00.Thisisanexampleof addressregisterindirectaddressing.Youareprobablyalreadyfamiliarwiththisaddressingmode becausethisisapointerinC++.Thecontentsoftheaddressregisterbecometheaddressofthe memoryoperation.Wecallthisanindirectaddressingmodebecausethecontentsoftheaddress registerarenotthedatawewant,butratherthememoryaddressofthedatawewant.Thus,we aren’tstoringthedatadirectlytotheregisterA0,butindirectlybyusingthecontentsofA0asa pointertoitsultimatedestination,memorylocation$4000AA00.Weindicatethattheaddressregisterisbeingusedasapointertomemorybytheparenthesesaroundtheaddressregister.Suppose that=$10003000and=$4000AA00.Theinstruction MOVE.W (A1),(A6) wouldcopythedatalocatedinmemorylocation$10003000tomemorylocation$4000AA00. Bothsourceanddestinationeffectiveaddressmodesaretheaddressregisterindirectmode.
175
Chapter7 Let’slookatonemoreexample. MOVE.W #$234A,D2 wouldplacethehexadecimalnumber,$234AdirectlyintoregisterD2.Thisisanexampleofthe immediateaddressmode.Immediateaddressingisthemechanismbywhichmemoryvariablesare initialized.Thepoundsign(#)tellstheassemblerthatthisisanumber,notamemorylocation.Of course,onlythesourceeffectiveaddresscouldbeanimmediateaddress.Thedestinationcan’tbea number—itmustbeamemorylocationoraregister. Theeffectiveaddress(EA)specifieshowtheoperandsoftheinstructionwillbeaccessed.Dependingupontheinstructionbeingexecuted,notalloftheeffectiveaddressesmaybeavailabletouse. We’lldiscussthismoreaswego.Thetypeofeffectiveaddressthatwillbeusedbytheinstruction toaccessoneormoreoftheoperandsisactuallyspecifiedaspartoftheinstruction.Recallthatthe minimumsizeforanopcodewordis16-bitsinlength.Theopcodewordprovidesalloftheinformationthattheprocessorneedstoexecutetheinstructions.However,thatdoesn’tmeanthatthe 16-bitopcodeword,byitself,containsalltheinformationintheinstruction.Itmaycontainenough informationtobetheentireinstruction,likeNOP,butusuallyitonlycontainsenoughinformation toknowwhatelseitmustretrieve,orfetch,frommemoryinordertocompletetheinstruction. Therefore,manyinstructionsmaybelongerthantheop-codewordportionoftheinstruction. Ifwethinkaboutthemicrocode-basedstatemachinethatdrivestheprocessor,thisallbeginsto makesense.Weneedtheop-codewordtogiveusourentrypointintothemicrocode.Thisiswhat theprocessordoeswhenitisdecodinganinstruction.Intheprocessofexecutingtheentireinstruction,itmayneedtogoouttomemoryagaintofetchadditionaloperandsinordertocomplete theinstruction.Itknowsthatithastodotheseadditionalmemoryfetchesbecausethepaththrough thestatemachine,asdefinedbytheop-codeword,predeterminesit. Considertheformoftheopcodeword fortheMOVEinstructionshownin Figure7.14
opcode 15
dst EA 12 11
src EA 6
5
0
Figure 7.14: Machine language instruction format for the MOVEinstruction
Theop-codefieldmaycontainthree typesofinformationabouttheMOVEinstructionwithinthefielddefinedbythefourdatabits, DB15–DB12.Thesepossibilitiesaredefinedasfollows: • 0001=MOVE.B • 0011=MOVE.W • 0010=MOVE.L ThesourceEAanddestinationEAareboth6-bitfieldsthatareeachfurthersubdividedintotwo, 3-bitfieldseach.One3-bitfield,calledtheregisterfield,cantakeonavaluefrom000to111,correspondingtooneofthedataregisters,D0throughD7,andaddressregisterA0throughA7.The other3-bitfieldisthemodefield.Thisfielddescribeswhicheffectiveaddressingmodeisbeing usedfortheoperation.We’llreturntothispointlateron.Let’slookatarealexample.Considerthe instruction MOVE.W#$0A55,D0 176
MemoryOrganizationandAssemblyLanguageProgramming Whatwouldthisinstructionlooklikeafteritisassembled?Itmayhelpyoutorefertoanassembly languageprogrammingmanual,suchastheProgrammer’sReferenceManual3tobetterunderstand what’sgoingon.Here’swhatweknow.It’saMOVEinstructionwiththesizeofthedatatransfer being16-bits,oroneword.Thus,thefourmostsignificantbitsoftheopcodeword(D15–D12) are0011.Thesourceeffectiveaddress(D5–D0)isimmediate,whichdecodesas111100.The destinationeffectiveaddress(D11–D6)isdataregisterD0,whichdecodesas000000.Putting thisalltogether,themachinelanguageopcodewordis: 0011000000111100,or$303C Thetranslationfrom0011000000111100(binary)to$303C(hex)mayhaveseemedabit confusingtoyoubecausethebitfieldsoftheinstructionusuallydon’talignthemselvesonnice 4-bitboundaries.However,ifyouproceedfromrighttoleft,groupingbyfour,you’llgetitright everytime. Isthatalltheelementsoftheinstruction?No,becauseallweknowatthispointisthatthesource isanimmediateoperand,butwestilldon’tknowwhattheimmediatenumberis.Thus,theopcode wordtellstheprocessorthatithastogoouttomemoryagainandfetchanotherpartofthesource operand.SincetheeffectiveaddressingmodeofthedestinationregisterD0,isDataRegister Direct,thereisnoneedtoretrieveanyadditionalinformationfrommemorybecausealltheinformationthatisneededtodeterminethedestinationiscontainedintheop-codeworditself.Thus, thecompleteinstructioninmemorywouldbe$303C0A55.ThisisillustratedinFigure7.15. EVEN BYTE 7
6
5
15
14
13
ODD BYTE 4
2
1
12 11 10
0 9
8
7
6
7
6
5 5
4 4
3 3
2 2
1 1
OP CODE WORD FIRST WORD SPECIFIES OPERATIONS AND MODES
0 0
• •It It isis aa MOVE.W MOVE.W instruction instruction • •The Thesource sourceoperand operandisisimmediate immediatedata data • •The destination operand The destination operandisisregister registerD0 D0
IMMEDIATE OPERAND IF ANY, ONE OR TWO WORDS • •The $0A55 Theimmediate immediate data data value, value, $0A55
SOURCE EFFECTIVE ADDRESS EXTENSION IF ANY, ONE OR TWO WORDS DESTINATION EFFECTIVE ADDRESS EXTENSION IF ANY, ONE OR TWO WORDS
Not Notused used
Figure7.15:MemorystorageoftheinstructionMOVE.W#$0A55,D0
Now,supposethattheinstructionusestwoabsoluteaddressesforthesourceeffectiveaddressand thedestinationeffectiveaddress MOVE.W$0A550000,$1000BB00 Thismachinelanguageop-wordwoulddecodeas0011001111111001,or$33F9.Arewedone? Again,theanswerisno.Thisisbecausetherearestilltwoabsoluteaddressesthatneedtobe fetchedfrommemory.Thecompleteinstructionlookslike $33F90A5500001000BB00. 177
Chapter7 Thecompleteinstructiontakesupatotaloffive16-bitwordsofmemoryandrequirestheprocessortomakefiveseparatememory-readoperationstocompletelydigesttheinstruction.Thisis illustratedinFigure7.16. EVEN BYTE 7
6
5
15
14
13
4
ODD BYTE 2
1
12 11 10
0 9
8
7
6
7
6
5 5
4 4
3 3
2 2
1 1
OP CODE WORD FIRST WORD SPECIFIES OPERATIONS AND MODES SOURCE EFFECTIVE ADDRESS HIGH ORDER WORD
0 0
• •It It is is aa MOVE.W MOVE.W instruction instruction • •The Thesource sourceoperand operandisisabsolute absoluteaddress address • •The destination operand The destination operandisisabsolute absolute address address
SOURCE EFFECTIVE ADDRESS LOW ORDER WORD
• •$0A55 $0A55
DESTINATION EFFECTIVE ADDRESS HIGH ORDER WORD DESTINATION EFFECTIVE ADDRESS LOW ORDER WORD
• •$1000 $1000
• •$0000 $0000
• •$BB00 $BB00
Figure7.16:MemorystorageoftheinstructionMOVE.W$0A550000,$1000BB00
WordAlignment Thelasttopicthatweneedtocoverinordertoprepareourselvesfortheprogrammingtasksahead ofusisthatofwordalignmentinmemory.Wetouchedonthistopicearlierinthelessonbutitis worthwhiletoreviewtheconcept.Recallthatwecan Low order byte addressindividualbytesinawordbyusingtheleast 1 0 significantaddressbit,A0,toindicatewhichbytewe’re 3 2 interestedin.However,whatwouldhappenifwetried High order byte Figure7.17:Difficultyduetononaligned toaccessawordoralongwordvalueonanoddbyte access boundary?Figure7.17illustratestheproblem. Inordertofetchawordlocatedonanodd-byteboundarytheprocessorwouldhavetomaketwo 16-bitfetchesandthendiscardpartsofbothwordstocorrectlyreadthedata.Someprocessors candothisoperation.Itisanexampleofthenonalignedaccessmodethatwediscussedearlier.In general,thistypeofaccessisverycostlyintermsofprocessorefficiency.The68Kcannotperform thisoperationanditwillgenerateaninternalexceptionifitencountersit.Theassemblerwillnot catchit.Thisisaruntimeerror.Thecureistoneverprogramawordorlongwordoperationona byte(odd)memoryboundary. ReadingtheProgrammer’sReferenceManual Themostdauntingtaskthatyou’llfacewhenyousetouttowriteanassemblylanguageprogramis tryingtocomprehendthemanufacturer’sdatabook.TheMotorola68KProgrammer’sReference Manualiswritteninwhatwecouldgenerouslycall“arathertersestyle.”Itisareferencefor peoplewhoalreadyknowhowtowriteanassemblylanguageprogramratherthanalearningtool forprogrammingstudentswhoaretryingtolearnthemethods.Fromthispointonyouwillneed theProgrammer’sReferenceManual,oragoodbookon68Kassemblylanguageprogramming, 178
MemoryOrganizationandAssemblyLanguageProgramming suchasClements2textbook.However,thetextbookisn’tparticularlyefficientwhenitcomesto writingaprogramandneedingahandyreference,suchastheProgrammer’sReferenceManual. Fortunately,itiseasytogetafreecopyofthereferencemanual,eitherfromFreescale,ortheir corporatewebsite(seethechapterreferencesfortheURL). So,assumingthatyouhavethebookinfrontofyou,let’stakeaquickcourseinunderstanding whatMotorola’stechnicalwritingstaffistryingtotellyou.Thefollowingtextissimilartothe layoutofthereferencepagesforaninstruction.
ADDI
1. Heading:
AddImmediate
ADDI
2. Operation: ImmediateData+DestinationDestination 3. Syntax:
ADDI#,
4. Attributes: Size=(Byte,Word,Long) 5. Description:Addtheimmediatedatatothedestinationoperandandstoretheresultinthedestinationlocation.Thesizeoftheoperationmaybespecifiedtobebyte,wordorlong. Thesizeoftheimmediatedatamatchestheoperationsize.
6. ConditionCodes: X
N
Z
V
C
N Setiftheresultisnegative.Clearedotherwise. Z Setiftheresultiszero.Clearedotherwise. V Setifanoverflowisgenerated.Clearedotherwise. C Setifacarryisgenerated.Clearedotherwise. X Setthesameasthecarrybit.
7. Instructionformat: 15 14 13 12 11 10 OP -CODE
0
0
0
0
0
1
9 1
8 0
7
6
Size
5
4
Word data
Immediate Field
Long data word (includes previous word)
Sizefiled–Specifiesthesizeoftheoperation: DB6 0 1 0
2
Byte data
8. Instructionfields: DB7 0 0 1
3
1
Effective Address Mode Register
OperationSize Byteoperation Wordoperation Longwordoperation
179
0
Chapter7 9. EffectiveAddressfield: Specifiesthedestinationoperand.Onlydataalterableaddressingmodesareallowedasshown: Addr.Mode Dn An (An) (An)+ -(An) (d16,An) (d8,An,Xn)
Mode 000 --010 011 100 101 110
Register reg.num:Dn notallowed reg.num:An reg.num:An reg.num:An reg.num:An reg.num:An
Addr.Mode (XXX).W (XXX).L #
Mode 111 111 ---
Register 000 001 notallowed
(d16,PC) (d8,PC,Xn)
-----
notallowed notallowed
10.Immediatefield:(Dataimmediatelyfollowingtheop-codeword) Ifsize=00,thenthedataistheloworderbyteoftheimmediateword. Ifsize=01,thenthedataistheentireimmediateword. Ifsize=10,thenthedataisthenexttwoimmediatewords.
Let’sgothroughthisstep-by-step 1. Instructionandmnemonic:ADDI—AddImmediate Operation:ImmediateData+DestinationDestination 2. Addtheimmediatedatatothecontentsofthedestinationeffectiveaddressandplacethe resultbackinthedestinationeffectiveaddress. Assemblersyntax:ADDI#(data), 3. Thisexplainshowtheinstructioniswritteninassemblylanguage.Noticethatthereisaspecial opcodeforanimmediateaddinstruction,ADDIratherthanADD,eventhoughyoustillmust insertthe#signinthesourceoperandfield. Attributes:Byte,wordorlongword(long) 4. Youmayaddabyte,wordorlongword.Ifyouomitthe.B,.Wor.Lmodifierthedefaultwill be.W. Description:Tellswhattheinstructiondoes. 5. Thisisyourbesthopeoftryingtounderstandwhattheinstructionisallaboutsinceitisthe onlyrealEnglishgrammaronthepage. Conditioncodes: 6. Liststheconditioncodes(flags)thatareaffectedbytheinstruction. Instructionformat: 7. Howthemachinelanguageinstructioniscreated.Notethatthereisonlyoneeffectiveaddress forthisinstruction. EffectiveAddressField: Theeffectiveaddressmodesthatarepermittedforthisinstruction.Italsoshowsthemodeand 8. registervaluesfortheeffectiveaddress.Notethatanaddressregisterisnotallowedasthedestinationeffectiveaddress,norisanimmediatedatavalue.Alsonoticethatthetwoaddressing modesinvolvingtheProgramCounter,PC,arealsonotallowedtobeusedwiththisinstruction. 180
MemoryOrganizationandAssemblyLanguageProgramming FlowCharting Assemblylanguageprogram,duetotheirhighlystructurednature,aregoodcandidatesforusing flowchartstohelpplanthestructureofaprogram.Ingeneral,flowchartsarenotusedtoplanprogramswritteninahigh-levellanguagesuchasCorC++,however,flowchartsarestillveryuseful forassemblylanguageprogramplanning. Set up environment
TEST = FALSE
Keyboard Input
Run Self-tests
TEST = TRUE
Scan key board
Interpret keystroke
8
Increatingaflowchart, weusearectangleto representan“operation.” Theoperationcouldbe oneinstruction,several instructions,orasubroutine(functioncall).The rectangleisassociated withdoingsomething. Weuseadiamondto representadecisionpoint (programflowcontrol),andusearrowsto representprogramflow. Figure7.18isasimple flowchartexampleofa computerstartingupa program.
Figure7.18:Flowchartforasimpleprogramtoinitializeasystem.
Theflowchartstartswith theoperation“set-upenvironment.”Thiscouldbeafewinstructionsortensofinstructions.Once theenvironmentisestablishedthenextoperationisto“runtheself-tests.”Again,thiscouldmeana fewinstructionsoralotofinstructions,wedon’tknow.Itdependsupontheapplication.Thedecisionpointiswaitingforakeyboardinput.Theprogramscansthekeyboardandthenteststosee ifakeyhasbeenstruck.Ifnot,itgoesback(loops)andtriesagain.Ifakeyhasbeenstruck,the programtheninterpretsthekeystrokeandmoveson. Flow-chartingisaverypowerfultoolforhelpingyoutoplanyourprogram.Unfortunately,most students(andmanyprofessionalprogrammers)takethe“codehacking”approachandjustjumpin andimmediatelystarttowritecode,muchlikewritinganovel.Thisprobablyexplainswhymost programmerswearsandalsratherthanbasketballsneakerswiththelacesuntied,butthat’sanother issueentirely. WritinganAssemblyLanguageProgram RememberLeonardoDiCaprio’sfamouslineinthemotionpictureTitanic,“I’mkingofthe world!”Inassemblylanguageprogrammingyouaretheabsolutemonarchofthecomputerworld. Thereisnothingthatcanstopyoufrommakingincrediblecodingblunders.Therearenotype checksorcompilerwarnings.Justyouandthemachine,manoamano.Inordertowriteaprogram 181
Chapter7 inassemblylanguage,youmustbecontinuouslyawareofthestateofthesystem.Youmustkeepin mindhowmuchmemoryyouhaveandwhereitislocated;thelocationsofyourperipheraldevices andhowtoaccessthem.Inshort,youcontroleverythingandyouareresponsibleforeverything. Forthepurposesofbeingabletowriteandexecuteprogramsin68Kassemblylanguage,wewill notactuallyusea68Kprocessor.Todothatyouwouldneedacomputerthatwasbasedona68K familyprocessor,suchastheoriginalAppleMacIntosh®We’llapproachitinadifferentway.We’ll useaprogramcalledaninstructionsetsimulator(ISS)totaketheplaceofthe68Kprocessor. CommerciallyavailableISSscancostasmuchas$10,000butoursisfree.We’refortunatethat twoverygoodsimulatorshavebeenwrittenforthe68000processor.Thefirstsimulator,developed byClements4andhiscolleaguesattheUniversityofTeesside,intheUnitedKingdom,maybe downloadedfromhiswebsite. Kelley5developedasecondsimulator,calledEasy68K.Thisisamuchnewersimulatorandhas extensivedebuggingsupportthatislackingintheEasy68Kversion.Bothsimulatorsweredesigned torunundertheWindows®operatingsystemAlso,Easy68Kwascompiledfor32-bitWindows,so ithasamuchbetterchanceofrunningunderWindows2000,XPandsoon,although,theTeesside simulatorseemstobereasonablywell-behavedunderthemoremodernversionsofWindows. Thesimulatorsareclosertointegrateddesignenvironments(IDE).Theyincludeatexteditor,assembler,simulatoranddebuggerinapackage.Thesimulatorsarelikeacomputeranddebugger combined.Youcandomanyofthedebuggeroperationsthatyouareuseto,suchas • peekandpokememory • examineandmodifyregisters • setbreakpoints • runfromortobreakpoints • single-stepandwatchtheregisters Ingeneral,thestepstocreateandrunanassemblylanguageprogramaresimpleandstraightforward. 1. UsingyourfavoriteASCII-onlytexteditor,ortheoneincludedwiththeISSpackage, createyoursourceprogramandsaveitintheoldDOS8.3fileformat.Besuretousethe extension.x68.Forexample,saveyourprogramasmyprog.x68.Theeditorthatcomes withtheprogramwillautomaticallyappendthex68suffixtoyourfile. 2. Assembletheprogramusingtheincludedassembler.Theassemblerwillcreateanabsolutemachinelanguagefilethatyoumayrunwiththesimulator.Ifyourprogramassembles withouterrors,you’llthenbeabletorunitinthesimulator.Theassembleractuallycreates twofiles.Theabsolutebinaryfileandalistfile.Thelistfileshowsyouyouroriginalsource programandthehexadecimalmachinelanguagefileitcreates.Ifthereareanyerrorsin yoursourcefile,theerrorwillbeshownonthelistfileoutput. IntheTeessideassembler,theassemblylanguageprogramrunsonasimulatedcomputerwith1M byteofmemoryoccupyingthevirtualaddressrangeof$00000...$FFFFF.TheEasy68Ksimulatorrunsinafull16Maddressspace.TheTeessidesimulatorpackageisnotwithoutsomeminor quirks,butingeneral,itiswell-behavedwhenrunningunderWindows. Toprograminassemblylanguageyoushouldalreadybeacompetentprogrammer.Youshould havealreadyhadseveralprogrammingclassesandunderstandprogrammingconstructsanddata 182
MemoryOrganizationandAssemblyLanguageProgramming structures.Assemblylanguageprogrammingmayseemverystrangeatfirst,butitisstillprogramming.WhileCandC++arefreeformlanguages,assemblylanguageisverystructured.Also,itis uptoyoutokeeptrackofyourresources.Thereisnocompileravailabletodoresourceallocation foryou. Don’tbeoverwhelmedbythenumberofopcodesandoperandsthatarepartofthe68K,orany processor’sISA.Therealpointhereisthatyoucanwritefairlyreasonableprogramsusingjusta subsetofthepossibleinstructionsandaddressingmodes.Don’tbeoverwhelmedbythenumberof instructionsthatyoumightbeabletouse.Getcomfortablewritingprogramsusingafewinstructionsandaddressingmodesthatyouunderstandandthenbegintointegrateotherinstructionsand addressingmodeswhenyouneedsomethingmoreefficient,orjustwanttoexpandyourrepertoire. Astheprogrammingproblemsthatyouwillbeworkingonbecomemoreinvolvedyou’llnaturally lookformoreefficientinstructionsandeffectiveaddressingmodesthatwillmakeyourjobeasier andresultinabetterprogram.Asyou’llsoondiscoverwhenwelookatdifferentcomputerarchitectures,veryfewprogrammersorcompilersmakeuseofalloftheinstructionsintheinstruction set.Mostofthetime,asmallsubsetoftheinstructionswillgetthejobdonequitenicely.However, everyonceinawhile,aproblemarisesthatwasjustmadeforsomeobscureinstructiontosolve. Happycoding.
PseudoOpcodes Thereareactuallytwotypesofopcodesthatyoucanuseinyourprograms.Thefirstisthesetof opcodesthataretheactualinstructionsusedbythe68000processor.Theyformthe68KISA.The secondsetiscalledpseudo-ops,orpseudoopcodes.Theseareplacedinyourprogramjustlikereal opcodes,buttheyarereallyinstructionstotheassemblertellingithowtoassembletheprogram. Thinkofpseudo-opsjustasyouwouldthinkofcompilerdirectivesand#definestatements. Pseudo-opsarealsocalledassemblerdirectives.Theyareusedtohelpmaketheprogrammore readableortoprovideadditionalinformationtotheassemblerabouthowyouwanttheprogram handled.Itisimportanttorealizethatcommerciallyavailable,industrial-strengthassemblersare everybitascomplexasanymoderncompiler.Let’slookatsomeofthesepseudo-opcodes. • ORG(setorigin):TheORGpseudo-optellstheassemblerwhereinmemorytostart assemblingtheprogram.Thisisnotnecessarilywheretheprogrammightbeloadedinto memory,itisonlytellingtheassemblerwhereyouintendforittorun.Ifyouomitthe ORGstatement,theprogramwillbeassembledtorunstartingatmemorylocation$00000. Sincethememoryrangefrom$00000..$003FFisreservedforthesystemvectors,wewill generally“ORG”ourprogramtobeginatmemoryaddress$00400.Therefore,thefirstline ofyouprogramshouldbe
ORG $400
Thenextpseudo-opdirectiveisplacedattheendofyoursourcefile.Ithastwofunctions.First,it tellstheassemblertostopassemblingatthispointand,second,ittellsthesimulatorwheretoload theprogramintomemory. • END(endofsourcefile):EverythingafterENDisignored.Format: END 183
Chapter7 NotethattheORGandENDdirectivesarecomplementary.TheORGdirectivetellstheassembler howtoresolveaddressreferences;ineffect,whereyouintendfortheprogramtorun.TheEND directiveinstructstheloaderprogram(partofthesimulator)wheretoplacetheprogramcodein memory.MostofthetimetheaddressesofORGandENDwillbethesame,buttheydon’thaveto be.Itisquitepossibleforaprogramtobeloadedinoneplaceandthenrelocatedtoanotherwhen itistimetorun.Forourpurposes,youwouldgenerallystartyourprogramwith: ORG$400andenditwith:END$400 • EQU(equatedirective):Theequatepseudo-opisidenticaltothe#defineinC.Itallows youtoprovideasymbolicnameforaconstantvalue.Theformatis EQU
Theexpressionmaybeamathematicalexpression,butmostlikelyitwilljustbeanumber.You mayalsousetheequatedirectivetocreateanewsymbolicvaluefromothersymbolicvalues. However,thevaluesoftheothersymbolsmustbeknownatthetimethenewsymbolisbeing evaluated.Thismeansthatyoucannothaveforwardreferences. Theequatedirective,likethe“#define”directivesinCandC++,areinstructiontotheCassembler tosubstitutethenumericvalueforthesymbolicnameinyoursourcefile.Forexample, Bit0_test EQU $01 *Isolatedatabit0 ANDI.B #Bit0_test,D0 *Isbit0inD0=1?
willbemoremeaningfultoyouthan: ANDI.B #$01,D0 *Magicnumbers
especiallyafteryouhaven’tlookedatthecodeforseveraldays. • SET(setsymbol):SETislikeEQUexceptthatsetmaybeusedtoredefineasymbolto anothervaluelateron.Theformatis SET
DataStorageDirectives Thenextgroupofpseudo-opsiscalleddatastoragedirectives.Theirpurposeistoinstructthe assemblertoallocateblocksofmemoryandperhapsinitializethememorywithvalues. • DC(defineconstant):Createsablockofmemorycontainingthedatavalueslistedinthe sourcefile.Theformatis: DC. ,,....
Example error_msg DC.B ‘Error99’,$0D,$0A,$00 *errormessage
Atextstringthatisplacedinsideofsinglequotationmarksisinterpretedbytheassemblerasa stringofASCIIcharacters.Thus,thisdirectiveisequivalenttowriting error_msgDC.B$45,$72,$72,$6F,$72,$20,$39,$39,$0D,$0A,$00
184
MemoryOrganizationandAssemblyLanguageProgramming Asyoucansee,usingthesinglequotesmakesyourintentionmuchmoreunderstandable.Inthis case,thememorylocationassociatedwiththelabel,error_msg,willcontainthefirstbyteofthe string,$45. • DCB(defineconstantblock):Initializeablockofmemorytothesamevalue.Thelengthis thenumberofbytes,wordorlongwords.Theformatis DCB. , •
DS(definestorage):Generatesanun-initializedblockofmemory.Usethisifyouneedto
defineastorageareathatyouwilllaterusetostoredata.Theformatis: DS. •
OPT(setoptions):Tellstheassemblerhowyouwanttheassemblertocreateyourprogram
codeinplaceswhereyouhavenotexplicitlydirecteditandonhowyouwantthelistfile formatted. TheonlyoptionthatisworthnotinghereistheCREoption.Thisoptiontellstheassemblertocreatealistofcross-referencesonthelistfile.Thisisaninvaluableaidwhenitcomestimetodebug yourprogram.Forexample,examinetheblockofcodeincase1. CASE1:ListfilewithouttheCREoption Sourcefile:EXAMPLE.X68 Defaults:ORG$0/FORMAT/OPTA,BRL,CEX,CL,FRL,MC,MD,NOMEX,NOPCO 1 2************************ 3* 4*Thisisanexampleof 5*notusingcross-references 6* 7************************ 800000400ORG$400 9 1000000400103C0000START:MOVE.B#00,D0 110000040466FATEST:BNESTART 1200000406B640COMPARE:CMP.WD0,D3 13000004086BFCWAIT:BMICOMPARE 1400000400END$400 Lines:14,Errors:0,Warnings:0.
CASE2:ListfilewiththeCREoptionset Sourcefile:EXAMPLE.X68 Defaults:ORG$0/FORMAT/OPTA,BRL,CEX,CL,FRL,MC,MD,NOMEX,NOPCO 1 2********************* 3* 4*Thisisanexampleofusingcross-
185
Chapter7 5*references 6********************** 7 8OPTCRE 900000400ORG$400 1100000400103C0000START:MOVE.B#00,D0 120000040466FATEST:BNESTART 1300000406B640COMPARE:CMP.WD0,D3 14000004086BFCWAIT:BMICOMPARE 1500000400END$400 Lines:15,Errors:0,Warnings:0. SYMBOLTABLEINFORMATION Symbol-nameTypeValueDeclCrossreferencelinenumbers COMPARELABEL000004061314. STARTLABEL000004001112. TESTLABEL0000040412**NOTUSED** WAITLABEL0000040814**NOTUSED**
NoticehowtheOPTCREdirectivecreatesasymboltableofthelabelsthatyou’vedefinedinyour program,theirvalue,thelinenumberswheretheyarefirstdefinedandallthelinenumbersthat refertothem.Asyou’llsoonsee,thiswillbecomeaninvaluableaidforyourdebuggingprocess.
AnalysisofanAssemblyLanguageProgram SupposethatwewanttoimplementthesimpleCassignmentstatement,Z=Y+24,asanassemblylanguageprogram.Whatwouldtheprogramlooklike?Let’sexaminethefollowingsimple program.AssumethatY=27.Here’stheprogram. ORG$400 *Startofcode MOVE.BY,D0*Getthefirstoperand ADDI.B#24,D0*Dotheaddition MOVE.BD0,Z*Dotheassignment STOP#$2700*Tellthesimulatortostop ORG$600*Startofdataarea YDC.B27*Storetheconstant27inmemory ZDS.B1*ReserveabyteforZ END$400
Noticethatwhenweusethenumber27withoutthe“$”precedingit,theassemblerinterpretsitas adecimalnumber.AlsonoticethemultipleORGstatements.Thisdefinesacodespaceataddress $400andadataspaceataddress$600.Here’stheassemblerlistfilethatwe’vecreated.Thecommentsareomitted. 100000400ORG$400 200000400103900000600MOVE.BY,D0 30000040606000018ADDI.B#24,D0 186
MemoryOrganizationandAssemblyLanguageProgramming 40000040A13C000000601MOVE.BD0,Z 5000004104E722700STOP#$2700 6*Commentline 700000600ORG$600 8000006001BY:DC.B27 90000060100000001Z:DS.B1 1000000400END$400
Let’sanalyzetheprogramlinebyline:
Line1:TheORGstatementdefinesthestartingpointfortheprogram. Line2:MOVEthebyteofdatalocatedataddress$600(Y)intothedataregisterD0.Thismoves thevalue27($1B)intoregisterD0.
Line3:Addsthebytenumber24($18)tothecontentsofD0andstoretheresultinD0. Line4:MovethebyteofdatafromD0tomemorylocation$601. Line5:Thisinstructionisusedbythesimulatortostopprogramexecution.Itisanartifactofthe ISSandwouldnotbeinmostrealprograms.
Line6:Comment Line7:TheORGstatementresetstheassemblerinstructioncountertobegincountingagainat address$600.Thiseffectivelydefinesthedataspacefortheprogram.IfnoORGstatementwasissued,thedataregionwouldbeginimmediatelyaftertheSTOPinstruction.
Line8:Definesmemorylocation$600withthelabel“Y”andinitializesitto$1B.Notethateven thoughweusethedirectiveDC,thereisnothingtostopusfromwritingtothismemorylocation andchangingitsvalue.
Line9:Definesmemorylocation$601withthelabel“Z”andreserves1byteofstorage.Thereis noneedtoinitializeitbecauseitwilltakeontheresultoftheaddition.
Line10:ENDdirective.Theprogramwillloadbeginningataddress$400. Noticethatouractualprogramwasonlythreeinstructionslong,buttwooftheinstructionswere MOVEinstructions.Thisisfairlytypical.You’llfindthatmostofyourprogramcodewillbeusedto movevariablesaround,ratherthandoanyoperationsonthem.
187
Chapter7
SummaryofChapter7 • Howthememoryofatypicalcomputersystemmaybeorganizedandaddressed • HowbyteaddressingisimplementedandtheambiguityoftheBigEndianversusLittle Endianmethodsofbyteaddressing • Thebasicorganizationofassemblylanguageinstructionsandhowtheopcodewordis interpretedaspartofamachinelanguageinstruction • Thefundamentalsofcreatinganassemblylanguageprogram • Theuseofpseudoopcodestocontroltheoperationoftheassemblerprogram • Analyzingasimpleassemblylanguageprogram.
Chapter7:Endnotes StevenLevy,Hackers:HeroesoftheComputerRevolution,ISBN0-385-19195—2,AnchorPress/Doubleday,Garden City,1984,p.30.
1
AlanClements,68000FamilyAssemblyLanguage,ISBN0-534-93275-4,PWSPublishingCompany,Boston,1994.
2
MotorolaCorporation,Programmer’sReferenceManual,M68000PM/ADREV1.Thisisalsoavailableon-lineat: http://e-www.motorola.com/files/archives/doc/ref_manual/M68000PRM.pdf.
3
AlanClements,http://[email protected].
4
CharlesKelley,http://www.monroeccc.edu/ckelly/tools68000.htm.
5
188
ExercisesforChapter7 1. Doestheexternalandinternalwidthsofaprocessor’sdatabushavetobethesame?Discuss whytheymightdiffer. 2. ExplainwhytheAddressbusandDatabusofamicroprocessorarecalledhomogeneousand theStatusbusisconsideredtobeaheterogeneousbus. 3. Allofthefollowinginstructionsareeitherillegalorwillcauseaprogramtocrash.Foreach instruction,brieflystatewhytheinstructionisinerror. a. MOVE.W $1000,A3 b. ADD.B D0,#$A369 c. ORI.W #$55AA007C,D4 d. MOVEA.L D6,A8 e. MOVE.L $1200F7,D3 4. Brieflyexplaininafewsentenceseachofthefollowingtermsorconcepts. a. BigEndian/LittleEndian: b. Nonalignedaccess: c. Addressbus,databus,statusbus: 5. Considerthefollowing68000assemblylanguageprogram.Whatisthebytevaluestoredin memorylocation$0000A002aftertheprogramends? *Systemequates foo bar mask start memory plus magic
EQU EQU EQU EQU EQU EQU EQU
$AAAA $5555 $FFFF $400 $0000A000 $00000001 $2700
*Programstartshere ORG
start MOVE.W
#foo,D0 189
Chapter7
MOVE.W MOVEA.L MOVEA.L MOVE.B ADDA.L MOVE.B ADDA.L MOVE.W MOVE.W MOVE.L SWAP MOVE.L MOVE.L STOP END
#bar,D7 #memory,A0 A0,A1 D0,(A0) #plus,A0 D7,(A0) #plus,A0 #mask,D3 D3,(A0) (A1),D4 D4 D4,D6 D6,(A1) #magic start
6. Examinethecodefragmentlistedbelow.Whatisthelongwordvaluestoredinmemory location$4000? Start
LEA MOVE.L MOVE.L MOVE.W SWAP SWAP MOVE.W MOVE.L
$4000,A0 #$AAAAFFFF,D7 #$55550000,D6 D7,D0 D6 D0 D6,D0 D0,(A0)
*InitializeA0 *InitializeD7andD6 *LoadD0 *Shellgame *LoadD0 *Saveit
7. ExamineeachofthefollowingMotorola68,000assemblylanguageinstructions.Indicate whichinstructionsarecorrectandwhichareincorrect.Forthosethatareincorrect,writebrief explanationofwhytheyareincorrect.
a.MOVE.L b.MOVE.B c. MOVEA.B d. MOVE.W e. AND.L
D0,D7 D2,#$4A D3,A4 A6,D8 $4000,$55AA
8.Fourbytesofdataarelocatedinsuccessivememorylocationsbeginningat$4000.Withoutusing anyadditionalmemorylocationsfortemporarystorage,reversetheorderofthebytesinmemory. 9. Examinethefollowingsnippetof68000assemblylanguagecode.Whatarethewordcontents ofD0aftertheADDQ.Binstruction?Hint,alllogicaloperationsarebitbybit. MOVE.W MOVE.W EOR.W ADDQ.B
#$FFFF,D1 #$AAAA,D0 D1,D0 #01,D0
*WhatarethewordcontentsofD0? 190
MemoryOrganizationandAssemblyLanguageProgramming 10. Considerthefollowingsnippetofassemblycode.Whatisthelongwordvaluestoredinregister D1attheconclusionofthefourthinstruction? START
MOVE.L MOVE.L LSL.W MOVE.L
#$FA865580,D0 D0,$4000 $4002 $4000,D1
*=?
11. Thisproblemisdesignedtoexposeyoutotheassemblerandsimulator.Attheendofthis introductionisasampleprogram.Youshouldcreateanassemblysourcefileandthenassemble itwithoutanyerrors.Onceitassemblesproperly,youshouldthenrunitinthesimulatorof yourchoice.Itisbesttosingle-stepthesimulatorfromthestartingpointoftheprogram.The ultimateobjectiveoftheexerciseistoanswerthisquestion,“WhatistheWORDVALUEofthe datainmemorylocation$4000whentheprogramisjustabouttoloopbacktothebeginning andstartoveragain?” ********************************************************************* * *Myfirst68000Assemblylanguageprogram * ********************************************************************* *Commentlinesbeginwithanasterisk *Labels,suchas“addr1”and“start”,ifpresent,mustbeginincolumn *1ofaline. *OPCodes,suchasMOVE.WorPseudoOPCodes,suchasEQU,mustbegin *incolumntwoorlater. *Watchoutforcomments,ifthetextspillsovertothenextlineand *youforgettouseanasterisk,you’llgetanassemblererror. ********************************************************************* * *BeginningofEQUatessection,justlike#defineinC * ********************************************************************* addr1EQU$4000 addr2EQU$4001 data2EQU$A7FF data3EQU$5555 data4EQU$0000 data5EQU4678 data6EQU%01001111 data7EQU%00010111 ********************************************************************* * *Beginningofcodesegment.Thisistheactualassemblylanguage
191
Chapter7 *instructions. * ********************************************************************* ORG$400*Programstartsat$400 startMOVE.W#data2,D0*LoadD0 MOVE.B#data6,D1*LoadD1 MOVE.B#data7,D2*loadD2 MOVE.W#data3,D3*loadD3 MOVEA.W#addr1,A0*loadaddressregister MOVE.BD1,(A0)+*transferbytetomemory MOVE.BD2,(A0)+*transfersecondbyte MOVEA.W#addr1,A1*loadaddress AND.WD3,(A1)*LogicalAND *Stophere.Thenextinstructionshowshowalabelisused JMP start *Programloopsforever END$400 *Stopassemblyhere
Commentsontheprogram: 1. The“EQU”,“END$400”and“ORG$400”instructionsarethepseudo-opinstructions. Theyareinstructionsfortheassembler,theyarenot68000machineinstructionsanddonot generateany68000code. 2. Alsonotethatthedefaultnumbersystemisdecimal.Towriteabinarynumber,suchas 00110101youwouldwriteitwiththepercentsign,%00110101.Hexadecimalnumbers,such as72CF,arewrittenwiththedollarsign,$72CF. 3. Noticethateveryinstructioniscommented. 4. Lookatthecode.Noticethatmostoftheinstructionsarejustmovingdataaround.Thisisvery commoninassemblylanguageprogramming. 5. Theinstruction,AND.WD3,(A1)isanexampleofanotheraddressmodecalledindirectaddressing.InC++wewouldcallthisapointer.TheparenthesesaroundA1meanthatthevalue containedintheinternalregisterA1shouldbeconsideredtobeamemoryaddress,andnota datavalue.Thisinstructiontellsthecomputertotakethe16-bitquantitystoredinregisterD3 anddoalogicalANDwiththe16-bitvaluestoredinmemoryattheaddresspointedtobythe A1registerandthenputtheresultsbackintothememorylocationpointedtobyA1. 6. Theinstruction,MOVE.BD1,(A0)+,isanexampleofindirectaddressingwithpost increment.Theinstructionmovesabyteofdata(8-bits)frominternaldataregisterD1tothe memorylocationpointedtobythecontentsofaddressregisterA0.Inthiswayitissimilar tothepreviousinstruction.However,oncetheinstructionisexecuted,thecontentsofA0are incrementedbyonebyte,soA0wouldbepointingtothenextbyteofmemory.
192
8
CHAPTER
Programmingin AssemblyLanguage Objectives
Whenyouarefinishedwiththislesson,youwillbeableto: Dealwithnegativeandrealnumbersastheyarerepresentedinacomputer; Writeprogramswithloopconstructs; Usethemostcommonoftheaddressingmodesofthe68000ISAtowriteassembly languageprograms; ExpressthestandardprogramcontrolconstructsoftheC++languageintermsoftheir assemblylanguageequivalents; Usetheprogramstackpointerregistertowriteprogramscontainingsubroutinecalls.
Introduction InChapter7,wewereintroducedtothebasicstructureofassemblylanguageprogramming.Inthis lessonwe’llcontinueourexaminationoftheaddressingmodesofthe68Kandthenexaminesome moreadvancedassemblylanguageprogrammingconstructs,suchasloopsandsubroutineswith theobjectiveoftryingtouseasmuchofthesenewmethodsaspossibleinordertoimproveour abilitytoprograminassemblylanguage. Sinceyoualreadyunderstandhowtowriteprogramsinhigh-levellanguages,suchasC,C++and Java,we’lltrytounderstandtheassemblylanguageanalogsoftheseprogrammingconstructsinterms ofthecorrespondingstructuresthatyou’realreadyfamiliarwith,suchasDO,WHILE,andFOR loops.Remember,anycodethatyouwriteinCorC++willultimatelybecompileddowntomachine language,sotheremustbeanassemblylanguageanalogforeachhigh-levellanguageconstruct. Oneofthebestwaystounderstandassemblylanguageprogrammingistocloselyexaminean assemblylanguageprogram.Itactuallydoesn’thelpverymuchtoworkthroughthevarious instructionsthemselves,becauseonceyou’refamiliarwiththeProgrammer’sModelofthe68K andtheaddressingmodes,youshouldbeablefindtheinstructionthatyouneedinthereference manualanddevelopyourprogramfromthere.Therefore,we’llfocusoureffortsontryingto understandhowprogramsaredesignedanddeveloped,andthenlookatseveralexampleprograms totrytounderstandhowtheywork. NumericRepresentations Beforewemovedeeperintothemysteriesofassemblylanguageweneedtopick-upafewloose ends.Inparticular,weneedtoexaminehowdifferenttypesofnumbersarestoredwithina 193
Chapter8 computer.Uptonow,we’veonlylookedatpositivenumberfromtheperspectiveoftherangeof thenumberintermsofthenumberofbinarybitswehaveavailabletorepresentit. Thelargestpositivenumberthatyoucanrepresentisgivenby: 2N–1 whereNisthenumberofbinarybits. Thus,wecanrepresentarangeofpositiveintegersasfollows: Numberofbinarybits
4 8 16 32 64
Maximumvalue
15 255 65,535 4,294,967,295 18,446,744,073,709,551,616
Name
nibble byte word long longlong
Cequivalent
---char short int ----
Obviously,therearetwomissingnumericclasses:negativenumbersandrealnumbers.Let’slook atnegativenumbersfirst;thenwe’llmoveontorealnumbers. Itmayseemstrangetoyoubuttherearenocircuitsinsideofthe68Kthatareusedtosubtractone numberfromanother.Subtraction,however,isveryimportant,notonlybecauseitisoneofthe fourcommonlyusedarithmeticoperations(add,subtract,multiply,anddivide),butitisthebasic methodtousetodetermineifonevalueisequaltoanothervalue,greaterthantheothervalue, orlessthantheothervalue.Theseareallcomparisonoperations.Thepurposeofacomparison instruction,suchasCMPorCMPA,istosettheappropriateflagsintheconditioncontrolregister (CCR),butnotchangethevalueofeitherofthequantitiesusedinthecomparison.Thus,ifwe executedtheinstruction: testCMP.WD0,D1*Does=?
thezerobitwouldbesetto1if=becausethisisequivalenttosubtractingthetwo values.Iftheresultofthesubtractionisequaltozero,thenthetwoquantitiesmustbeequal.Ifthis doesn’tclickwithyourightnow,that’sOK.We’lldiscusstheCCRinmuchmoredetaillateronin thislesson. Inordertosubtracttwonumbersinthe68Karchitectureitisnecessarytoconvertthenumber beingsubtractedtoanegativenumberandthenthetwonumbersmaybeaddedtogether.Thus: A–B=A+(–B) The68Kprocessoralwaysassumesthatthenumbersbeingaddedtogethermaybepositiveornegative.Thealgorithmthatyouwritemustutilizethecarrybit,C,thenegativebit,N,theoverflow bit,V,andtheextendedbit,X,todecideuponthecontextoftheoperation.Theprocessoralsouses negativenumberstobranchbackwards.Itdoesthisbyaddinganegativenumbertothecontentsof theprogramcounter,PC,whicheffectivelycausesabackwardsbranchintheprogramflow.Why thisissodeservesabitofadiversion.Recallthatonceanop-codewordisdecodedthecomputer knowshowmanywordsarecontainedintheinstructioncurrentlybeingexecuted.Oneofthefirst thingsitdoesistoadvancethevalueinthePCbythatamountsothatthevaluecontainedinthe PCisnowtheaddressofthenextinstructiontobeexecuted. 194
ProgramminginAssemblyLanguage Branchinstructionscausetheprocessortoexecuteaninstructionout-of-sequencebysimply addingtheoperandofthebranchinstructiontothevalueinthePC.Iftheoperand,calledadisplacement,ispositive,thebranchisintheforwarddirection.Ifthedisplacementisanegative number,thebranchisbackwards. Inordertoconvertanumberfrompositivetonegativeweconvertittoit’stwo’scomplementrepresentation.Thisisatwo-stepprocess. Step1:Complementeverybitofthebyte,wordorlongword.Tocomplementthebits,you changeevery1toa0andevery0toa1.Thus,thecomplementof$55is$AAbecauseComplement(01010101)=10101010.Thecomplementof00if$FF,andsoon. Step2:Add1tothecomplement.Thus,–$55=$AB. Two’scomplementisaversionamethodofsubtractioncalledradixcomplement.Thismethodof subtractionwillstillworkproperlyifyouareworkingbase,2,8,10,16oranyotherbasesystem. Ithastheadvantageofretainingasinglemethodofarithmetic(addition)anddealingwithsubtractionbyconvertingthesubtrahendtoanegativenumber. Wedefinetheradixcomplementofanumber,N,representedinaparticularbase(radix)containingddigitsasfollows: radixcomplement=rd–NfornonzerovaluesofNand=0ifN=0.Inotherwords,–0=0 Let’sseehowthisworksforthedecimalnumber4,934.Thus,rd–N=104–4,934=5,066 So5,066equals–4,934.Let’ssubtract4,934from8013andseeifitactuallyworks.First,we’ll justsubtractthenumbersintheoldfashionedway: 8,013–4,934=3,079 Now,we’lladdtogether8,013andtheradixcomplementof4,934: 8,013+5,066=13,079=>13079 Clearly,theleastsignificant4digitsarethesame,butwe’releftwitha1inthemostsignificant digitposition.Thisisanartifactoftheradixcomplementmethodofsubtractionandwewould discardit.Whyweareleftwiththisnumbercanbeseenifwejustmultiplyoutwhatwe’vedone. Ifx=a–b,thenx=a–(rd–b)=rd+(a–b).Thus,weareleftwithanumericresultintwo parts:theradixtothepowerofthenumberofdigitsandtheactualresultofthesubtraction.Bydiscardingtheradixtothepowerportionofthenumber,weareleftwithanswerwewant.Thus,two’s complementarithmeticisjustradixcomplementarithmetic;forthebasetwonumbersystem. Inordertoconvertanegativenumberbacktoitspositiverepresentation,performthetwo’scomplementtransformationinexactlythesameway.Themostsignificantbitofthenumberisoften calledthesignbitbecauseanegativenumberwillhavea1inthemostsignificantbitposition,but itisn’tstrictlyasignbit,becausethenegativeof+5isn’t–5. Inanycase,thisisaminorpointbecausetheNflagisalwayssetifresultoftheoperationplacesa 1inthemostsignificantbitpositionforthatbyte,wordorlongword.Examinethefollowingsnippetofcode. 195
Chapter8 CodeExample start
ORG MOVE.B MOVE.B MOVE.W MOVE.W MOVE.L MOVE.L END
$400 #00,D0 #$FF,D0 #$00FF,D0 #$FFFF,D0 #$0000FFFF,D0 #$FFFFFFFF,D0 $400
TracingthecodeinasimulatorshowsthattheNbitisseteachtimethemostsignificantbitisa 1fortheoperationbeingperformed.Thus,forabyteoperation,DB7=1.Forawordoperation, DB15=1.Foralongwordoperation,DB31=1. Belowisthesimulatortraceforthisprogram.TheNbitandthecontentsofregisterD0arehighlightedingray.Alsonoticethatthesimulatorrepresentsnumbersaspositiveornegativeaswell. Thus,$FFisshownas–1ininstruction2,butisshownintheregisteras$FF. Bycreatingtwo’scomplementnegativenumbers,allarithmeticoperationsareconvertedtoaddition.However,whenweusenegativenumbersourrangeiscutinhalf.Thus, 1. Rangeof8-bitnumberis–128to+127(zeroispositive) 2. Rangeof16-bitnumbersis–32,768to+32,767 3. Rangeof32-bitnumbersis–2,147,483,648to+2,147,483,647 PC=000400SR=2000SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000000A7=00A00000Z=0Programstart D0=00000000D1=00000000D2=00000000D3=00000000V=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.B#0,D0 PC=000404SR=2004SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000000A7=00A00000Z=1Afterinstruction D0=00000000D1=00000000D2=00000000D3=00000000V=0MOVE.B#0,D0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.B#-1,D0 PC=000408SR=2008SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=1 A4=00000000A5=00000000A6=00000000A7=00A00000Z=0Afterinstruction D0=000000FFD1=00000000D2=00000000D3=00000000V=0MOVE.B#-1,D0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.W#255,D0 PC=00040CSR=2000SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000000A7=00A00000Z=0Afterinstruction 196
ProgramminginAssemblyLanguage D0=000000FFD1=00000000D2=00000000D3=00000000V=0MOVE.W#255,D0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.W#-1,D0 PC=000410SR=2008SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=1 A4=00000000A5=00000000A6=00000000A7=00A00000Z=0Afterinstruction D0=0000FFFFD1=00000000D2=00000000D3=00000000V=0MOVE.W#-1,D0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.L#65535,D0 PC=000416SR=2000SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000000A7=00A00000Z=0Afterinstruction D0=0000FFFFD1=00000000D2=00000000D3=00000000V=0MOVE.L#65535,D0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.L#-1,D0 PC=00041CSR=2008SS=00A00000US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=1 A4=00000000A5=00000000A6=00000000A7=00A00000Z=0Afterinstruction D0=FFFFFFFFD1=00000000D2=00000000D3=00000000V=0MOVE.L#-1,D0 D4=00000000D5=00000000D6=00000000D7=00000000C=0
Whathappenswhenyourarithmeticaloperationexceedstherangeofthenumbersthatyouare workingwith?AssoftwaredevelopersI’msureyou’veseenthisbugbefore.Here’sasimple examplewhichillustratestheproblem.InthefollowingC++programbigNumberisa32-bitinteger,initializedtojustbelowthemaximumvaluepositivevalueforaninteger,+2,147,483,647.As weloopandincrementthenumber10times,wewilleventuallyexceedthemaximumallowable valueforthenumber.Wecanseetheresultoftheoverflowintheoutputbox. Unfortunately,unlesswewriteanerrorhandlertodetecttheoverflow,theerrorwillgoundetected. Wewillhavethesameproblemwithtwo’scomplementarithmetic. Two’scomplementnotationmakesthehardwareeasierbutdevelopingthecorrectmathematical algorithmscanbeaminefieldfortheunwaryprogrammer!Algorithmsforarithmeticoperations mustcarefullysetandusetheconditionflagsinordertoinsurethattheoperationiscorrect. Forexample,whathappens whenthesumofanadditionof twonegativenumbers(bytes) islessthan–128?Youneedto considerthestateoftheoverflowbitoverflowbit.Inthis case,V=1ifatwo’scomplementadditionoverflows,but V=0ifnooverflowoccurs.For
Source Code #include int main(void) { int bigNumber = 2147483640 ; for ( int i = 1; i MOVEQ#-1,D1 PC=00040ESR=2008SS=00000FF0US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=1 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 D0=00000000D1=FFFFFFFFD2=00000000D3=00000000V=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.LD1,-4(A6) C=000412SR=2008SS=00000FF0US=00000000X=0 1000 A0=00000000A1=00000000A2=00000000A3=00000000N=1 OFFC OFF8 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 OFF4 D0=00000000D1=FFFFFFFFD2=00000000D3=00000000V=0 OFF0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVEQ#-1,D2 PC=000414SR=2008SS=00000FF0US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=1 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 D0=00000000D1=FFFFFFFFD2=FFFFFFFFD3=00000000V=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.LD2,D3
00 00 00 00 00
00 00 00 00 00
00 00 00 00 00
00 00 00 00 00
00 00 FF 00 00
00 00 FF 00 00
00 00 FF 00 00
00 00 FF 00 00
Notethehighlightedinstruction,MOVE.LD1,-4(A6).Thisisanexampleofaddressregister indirectwithoffsetaddressing.ThecompilerisusingA6asastackpointertoplacedatainitslocal stackframe. Thestackframeissimplethevariablestorageregionthatthecompilerestablishesonthestackfor thevariablesofitsfunction.Therestofthesimulationwillbepresentedwithoutanyfurtherelaboration.We’llleaveitasanexerciseforyoutoworkthroughtheinstructionsandstackconditions andconvinceyourselfthatitreallyworksasadvertised. PC=000416SR=2008SS=00000FF0US=00000000X=0 1000 A0=00000000A1=00000000A2=00000000A3=00000000N=1OFFC A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0OFF8 D0=00000000D1=FFFFFFFFD2=FFFFFFFFD3=FFFFFFFFV=0OFF4 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFF0 ---------->MOVEQ#0,D2
249
00 00 FF 00 00
00 00 FF 00 00
00 00 FF 00 00
00 00 FF 00 00
Chapter9 PC=000418SR=2004SS=00000FF0US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=1 D0=00000000D1=FFFFFFFFD2=00000000D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.L#5555,-4(A6) PC=000420SR=2000SS=00000FF0US=00000000X=0 1000 A0=00000000A1=00000000A2=00000000A3=00000000N=0OFFC A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0OFF8 D0=00000000D1=FFFFFFFFD2=00000000D3=FFFFFFFFV=0OFF4 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFF0 ---------->ADDQ.L#1,D2 PC=000422SR=2000SS=00000FF0US=00000000X=0 A0=00000000A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->LEA.L-4(A6),A0 PC=000426SR=2000SS=00000FF0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->PEA(A0) 1000 PC=000428SR=2000SS=00000FECUS=00000000X=0 OFFC A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 OFF8 A4=00000000A5=00000000A6=00000FFCA7=00000FECZ=0 OFF4 D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0 OFF0 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFEC ---------->MOVEA.LD2,A0 PC=00042ASR=2000SS=00000FECUS=00000000X=0 A0=00000001A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FECZ=0 D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->PEA(A0) 1000 PC=00042CSR=2000SS=00000FE8US=00000000X=0 OFFC A0=00000001A1=00000000A2=00000000A3=00000000N=0OFF8 A4=00000000A5=00000000A6=00000FFCA7=00000FE8Z=0OFF4 D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0OFF0 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFEC ---------->JSR28(PC) OFE8
250
00 00 00 00 00
00 00 00 00 00
00 00 15 00 00
00 00 B3 00 00
00 00 00 00 00 00
00 00 00 00 00 00
00 00 15 00 00 0F
00 00 B3 00 00 F8
00 00 00 00 00 00 00
00 00 00 00 00 00 00
00 00 15 00 00 0F 00
00 00 B3 00 00 F8 01
AdvancedAssemblyLanguageProgrammingConcepts C=00044ASR=2000SS=00000FE4US=00000000X=0 A0=00000001A1=00000000A2=00000000A3=00000000N=01000 A4=00000000A5=00000000A6=00000FFCA7=00000FE4Z=0OFFC D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0OFF8 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFF4 OFF0 ---------->LINKA6,#0 PC=00044ESR=2000SS=00000FE0US=00000000X=0 OFEC A0=00000001A1=00000000A2=00000000A3=00000000N=0OFE8 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0OFE4 D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0OFE0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVEA.L12(A6),A0 PC=000452SR=2000SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=01000 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0OFFC D0=00000000D1=FFFFFFFFD2=00000001D3=FFFFFFFFV=0OFF8 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFF4 OFF0 ---------->MOVE.LA0,D1 PC=000454SR=2000SS=00000FE0US=00000000X=0 OFEC A0=00000FF8A1=00000000A2=00000000A3=00000000N=0OFE8 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0OFE4 D0=00000000D1=00000FF8D2=00000001D3=FFFFFFFFV=0OFE0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->BEQ.S$0000045C PC=000456SR=2000SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=01000 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0OFFC D0=00000000D1=00000FF8D2=00000001D3=FFFFFFFFV=0OFF8 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFF4 ---------->MOVEQ#-1,D0 OFF0 PC=000458SR=2008SS=00000FE0US=00000000X=0 OFEC A0=00000FF8A1=00000000A2=00000000A3=00000000N=1OFE8 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0OFE4 D0=FFFFFFFFD1=00000FF8D2=00000001D3=FFFFFFFFV=0OFE0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->CMP.LD0,D1 PC=00045ASR=2001SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0 D0=FFFFFFFFD1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=1 ---------->BNE.S$00000470 PC=000470SR=2001SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0 D0=FFFFFFFFD1=00000FF8D2=00000001D3=FFFFFFFFV=0 251
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 00
00 00 B3 00 00 F8 01 30 00
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 00
00 00 B3 00 00 F8 01 30 00
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
Chapter9 D4=00000000D5=00000000D6=00000000D7=00000000C=1 ---------->MOVE.L8(A6),D0 PC=000474SR=2000SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0 D0=00000001D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->ADD.L(A0),D0 PC=000476SR=2000SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0 D0=000015B4D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->BRA.L$0000047A PC=00047ASR=2000SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0 D0=000015B4D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->NOP PC=00047CSR=2000SS=00000FE0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FE0A7=00000FE0Z=0 D0=000015B4D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->UNLKA6 PC=00047ESR=2000SS=00000FE4US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FE4Z=0 D0=000015B4D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->RTS PC=000430SR=2000SS=00000FE8US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FE8Z=0 D0=000015B4D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->ADDQ.L#8,SP PC=000432SR=2000SS=00000FF0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 D0=000015B4D1=00000FF8D2=00000001D3=FFFFFFFFV=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->MOVE.LD0,D3 PC=000434SR=2000SS=00000FF0US=00000000X=0 252
1000 OFFC OFF8 OFF4 OFF0 OFEC OFE8 OFE4 OFE0
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
1000 OFFC OFF8 OFF4 OFF0 OFEC OFE8 OFE4 OFE0
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
1000 OFFC OFF8 OFF4 OFF0 OFEC OFE8 OFE4 OFE0
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
AdvancedAssemblyLanguageProgrammingConcepts A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0 D0=000015B4D1=00000FF8D2=00000001D3=000015B4V=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->ADDQ.L#1,D3 PC=000436SR=2000SS=00000FF0US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=01000 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0OFFC D0=000015B4D1=00000FF8D2=00000001D3=000015B5V=0OFF8 OFF4 D4=00000000D5=00000000D6=00000000D7=00000000C=0 OFF0 ---------->NOP OFEC PC=000438SR=2000SS=00000FF0US=00000000X=0 OFE8 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0OFE4 A4=00000000A5=00000000A6=00000FFCA7=00000FF0Z=0OFE0 D0=000015B4D1=00000FF8D2=00000001D3=000015B5V=0 D4=00000000D5=00000000D6=00000000D7=00000000C=01000 ---------->MOVE.L(SP)+,D2 OFFC PC=00043ASR=2004SS=00000FF4US=00000000X=0 OFF8 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0OFF4 A4=00000000A5=00000000A6=00000FFCA7=00000FF4Z=1OFF0 D0=000015B4D1=00000FF8D2=00000000D3=000015B5V=0OFEC OFE8 D4=00000000D5=00000000D6=00000000D7=00000000C=0 OFE4 ---------->MOVE.L(SP)+,D3 OFE0 PC=00043CSR=2004SS=00000FF8US=00000000X=0 A0=00000FF8A1=00000000A2=00000000A3=00000000N=01000 A4=00000000A5=00000000A6=00000FFCA7=00000FF8Z=1OFFC D0=000015B4D1=00000FF8D2=00000000D3=00000000V=0OFF8 D4=00000000D5=00000000D6=00000000D7=00000000C=0OFF4 OFF0 ---------->UNLKA6 PC=00043ESR=2004SS=00001000US=00000000X=0 OFEC OFE8 A0=00000FF8A1=00000000A2=00000000A3=00000000N=0 OFE4 A4=00000000A5=00000000A6=00000000A7=00001000Z=1 OFE0 D0=000015B4D1=00000FF8D2=00000000D3=00000000V=0 D4=00000000D5=00000000D6=00000000D7=00000000C=0 ---------->RTS
YoumaybecuriouswhythelastinstructionisanRTS.Ifthisprogramwererunningunderanoperatingsystem,andtheCcompiler expectsthatyouare,thisinstructionwouldreturnthecontroltothe operatingsystem. There’sonelastmattertodiscussbeforeweleave68000assembly languageandmoveontootherarchitectures.Thisisthesubjectof instructionsetdecomposition.We’vealreadylookedatthearchitecturefromthepointofviewoftheinstructionset,theaddressing 253
1000 OFFC OFF8 OFF4 OFF0 OFEC OFE8 OFE4 OFE0
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00
00 00 15 00 00 0F 00 04 0F
00 00 B3 00 00 F8 01 30 FC
Chapter9 modesandtheinternalregisterresources.Now,we’lltrytorelatethatbacktoourdiscussionof statemachinestoseehowtheactualencodingoftheinstructionstakesplace. Theprocessofconvertingmachinelanguagebacktoassemblylanguageiscalleddisassembly.A disassemblerisaprogramthatexaminesthemachinecodeinmemoryandattemptstoconvertit backtomachinelanguage.Thisisanextremelyusefultoolfordebuggingand,infact,mostdebuggershaveabuilt-indisassemblyfeature. Youmayrecallfromourintroductiontoassemblylanguagethatthefirstwordofaninstructionis calledtheopcodeword.Theop-codewordcontainsanopcode,whichtellsthecomputer(whatto do),anditalsocontainszero,oneortwoeffectiveaddressfields(EA).Theeffectiveaddressfields containtheencodedinformationthattelltheprocessorhowtoretrievetheoperandsfrommemory. Inotherwords,whatistheeffectiveaddressoftheoperand(s). Asyoualreadyknow,anoperandmightbeanaddressregisteroradataregister.Theoperand mightbelocatedinmemory,butpointedtobytheaddressregister.Considertheformofthe instructionsshownbelow: OPCode
DestinationEA
SourceEA
DB15DB12 DB11DB6 DB5DB0
Theopcodefieldconsistsof4bits,DB12throughDB15.ThistellsthecomputerthattheinstructionisaMOVEinstruction.Thatis,movethecontentsofthememorylocationspecifiedbythe sourceEAtothememorylocationspecifiedbythedestinationEA.Furthermore,theeffective addressfieldisfurtherdecomposedintoa3-bitModefieldanda3-bitSubclass(register)field. Thisinformationissufficienttotellthecomputereverythingitneedstoknowinordertocompletetheinstruction,buttheop-codeworditselfmayormaynotcontainalloftheinformation necessarytocompletetheinstruction.Thecomputermayhavetogoouttomemoryoneormore additionaltimestofetchadditionalinformationaboutthesourceEAorthedestinationEAinorder tocompletetheinstruction.Thisbeginstomakesenseifwerecallthattheop-codewordisthepart oftheinstructionthatmustbedecodedbythemicrocode-drivenstatemachine.Oncetheproper statemachinesequencehasbeenestablishedbythedecodingoftheop-codeword,theadditional fetchingofoperandsfrommemory,ifnecessary,canproceed. Forexample,supposethatthesourceEAanddestinationEAarebothdataregisters.Inother words,theinstructionis: MOVE.WD0,D1 Sinceboththesourceandthedestinationareinternalregisters,thereisnoadditionalinformation neededbytheprocessortoexecutetheinstruction.Forthisexample,theinstructionisthesame lengthastheopcode,16-bitslong,oronewordinlength. However,supposethattheinstructionis: MOVE.W#$00D0,D1
Nowthe#signtellsusthatthesourceEAisanimmediateoperand(thehexadecimalnumber $00D0)andthedestinationEAisstillregisterD1.Inthiscase,theopcodewordwouldtellusthat thesourceisanimmediateoperand,butitcan’ttelluswhatistheactualvalueoftheimmediate operand.Thecomputermustgoouttomemoryagainandretrieve(fetch)thenextwordinmemory aftertheopcodeword.Thisisthedatavalue$00D0.Ifthisinstructionresidedinmemoryatmem254
AdvancedAssemblyLanguageProgrammingConcepts orylocation$1000,theopcodewordwouldtakeupwordaddress$1000andtheoperandwould beatmemorylocation$1002. Theeffectiveaddressfieldisa6-bitwidefieldthatisfurthersubdividedintotwo,3-bitfieldscalled modeandregister.The3-bitwidemodefieldcanspecifyoneof8possibleaddressingmodesand theregisterfiledcanspecifyoneof8possibleregisters,orasubclassforthemodefield. TheMOVEinstructionisuniqueinthatithastwopossibleeffectiveaddressfields.Alloftheother instructionsthatmaycontainaneffectiveaddressfieldhaveonlyonepossibleeffectiveaddress. Thismightseemstrangetoyouatfirst,butasyou’llsee,almostalloftheotherinstructionsthat usetwooperandsmustinvolvearegisterandasingleeffectiveaddress. Let’sreturnourattentiontotheMOVEinstruction.Ifwecompletelybreakitdowntoitsconstituentpartsitwouldlooklikethis: 00 DB15DB14
Size DB13DB12
DestinationRegister DB11DB10DB9
DestinationMode DB8DB7DB6
SourceMode DB5DB4DB3
SourceRegister DB2DB1DB0
Thesubclassesareonlyusedwithmode7addressingmodes.Thesemodesare: Mode7addressing Subclass YoumayhavenoticedtheMovetoAddress Absoluteword 000 Register(MOVEA)instructionandwondered Absolutelongword 001 whyweneededtocreateaspecialmnemonicfor Immediate 100 anordinaryMOVEinstruction.Wellfirstofall, PCrelativewithdisplacement 010 addressregistersareextremelyusefulandimporPCrelativewithindex 011 tantinternalresources.Havinganaddressregister meansthatwecandoregisterarithmeticandmanipulatepointers.Wecanalsocomparethevalues inregistersandmakedecisionsbasedupontheaddressofdata.
TheMOVEAinstructionisamorestandardtypeofinstructionbecausethereisonlyoneeffective addressforthesourceoperand.Thedestinationoperandisoneofthe7addressregisters.Thus,the destinationmodeishardcodedtobeanaddressregister.However,weneededtocreateaspecial instructionfortheMOVEAoperationbecauseawordfetchonanoddbyteboundaryisanillegal accessandtheMOVEAinstructionpreventsthisfromoccuring. Themajorityoftheinstructionstaketheformshownbelow. OpCode
Register
OpMode
SourceorDestinationEA
DB15DB12 DB11DB9 DB8DB6 DB5DB0
TheopcodefieldwouldspecifywhethertheinstructionisanADD,AND,CMP,etc.Theregisterfieldspecifieswhichoftheinternalregistersisthesourceordestinationoftheoperation.The opmodefieldisanewterm.Thisspecifiesseveraloneof8subdivisionsfortheinstruction.For example,thisfieldwouldspecifyiftheinternalregisterwasthesourceorthedestinationofthe operationandifthesizewasbyte,wordorlongword.We’llseethisabitlater. Singleoperandinstructions,suchasCLR(setthecontentsofthedestinationEAtozero)havea slightlydifferentform. OpCode
Size
DestinationEA
DB15DB8 DB7DB6 DB5DB0 255
Chapter9 TheJMP(Jump)andJSR(JumptoSubroutine)arealsosingleoperandinstructions.Bothplace thedestinationEAintotheprogramcountersothatthenextinstructionisfetchedfromthenew location,ratherthanthenextinstructioninthesequence.TheJSRinstructionwillalsoautomaticallyplacethecurrentvalueoftheprogramcounterontothestackbeforethecontentsofthePC arereplacedwiththedestinationEA.Thus,thereturnlocationissavedonthestack. Thebranchinstructions,Bcc(branchonconditioncode),issimilartothejumpinstructioninthat itmodifiesthecontentsofthePCsothatthenextinstructionmaybefetchedoutofsequence. However,thebranchinstructionsdifferintwofundamentalways: 1. Thereisnoeffectiveaddress.Adisplacementvalueisaddedtothecurrentcontentsof thePCsothatthenewvalueisdeterminedasapositiveornegativeshiftfromthecurrent value.Thus,adisplacementisarelativejump,ratherthananabsolutejump. 2. Thebranchistakenonlyiftheconditioncodebeingtestedevaluatestotrue. 0
1
1
0
DB15DB12
Condition
DB11DB8
Displacement
DB7DB0
Thedisplacementfieldisan8-bitvalue.Thismeansthatthebranchcanmovetoanotherlocations either+127bytesor–128bytesawayfromthepresentlocation.Ifagreaterbranchisdesired, thenthedisplacementfieldissettoallzeroesandthenextmemorywordisusedasanimmediate valueforthedisplacement.Thisgivesarangeof+16,383to–16,384bytes. Thus,ifthecomputerisexecutingatightloop,thenitwilloperatemoreefficientlyifan8-bit displacementisused.The16-bitdisplacementallowsitgofurtheronabranch,butatacostofan additionalmemoryfetchoperation. Earlier,Isaidthatthebranchisexecutediftheconditioncodeevaluatestotrue.Thismeansthat wecantestmorethanjustthestateofoneoftheflagsintheCCR.Thetablebelowshowsusthe possiblewaysthatabranchconditionmaybeevaluated. CC CS EQ GE GT HI LE
carryclear carryset equal greaterorequal greaterthan high lessorequal
0100 0101 0111 1100 1110 0010 1111
C C Z N*V+N*V N*V*Z+N*V*Z C*Z Z+N*V+N*V
LS LT MI NE PL VC VS
loworsame lessthan minus notequal plus overflowclear overflowset
0011 1101 1011 0110 1010 1000 1001
C+Z N*V+N*V N Z N V V
ThereasonforsomanybranchtestconditionsisthatthebranchinstructionsBGT,BGE,BLTand BLEaredesignedtobeusedwithsignedarithmeticoperationsandthebranchinstructionsBHI, BCC,BLSandBCSaredesignedtobeusedwithunsignedarithmeticoperations. Someinstructions,suchasNOPandRTS(ReturnfromSubroutine)takenooperandsandarecompletelyspecifiedbytheiropcodeword. Earlierwediscussedtheroleoftheopmodefieldonhowageneralinstructionoperationmaybe furthermodified.AgoodexampletoillustratehowtheopmodefieldworksistoexaminetheADD instructionandallofitsvariationsinmoredetail.ConsiderFigure9.3
256
AdvancedAssemblyLanguageProgrammingConcepts Noticehowtheopmodefield, containedinbits6,7and8,definetheformatoftheinstruction. TheADDinstructionisrepresentativeofmostofthe“normal” instructions.Other
15 14 13 12 11 10
9
8
7
6
5
4
3
2
1
0
1
1
0
1
n
n
n OP OP OP EA EA EA EA EA EA
ADD.B ,Dn
1
1
0
1
n
n
n
0
0
0
EA EA EA EA EA EA
ADD.W ,Dn
1
1
0
1
n
n
n
0
0
1
EA EA EA EA EA EA
ADD.L ,Dn
1
1
0
1
n
n
n
0
1
0
EA EA EA EA EA EA
ADD.B Dn,
1
1
0
1
n
n
n
1
0
0
EA EA EA EA EA EA
classesofinstructionsrequire ADD.W Dn, 1 1 0 1 n n n 1 0 1 EA EA EA EA EA EA specialconsideration.Forexample,theMOVEinstructioncontains ADD.L Dn, 1 1 0 1 n n n 1 1 0 EA EA EA EA EA EA twoeffectiveaddresses.Also, ADDA.W ,An 1 1 0 1 n n n 0 1 1 EA EA EA EA EA EA allimmediateaddressingmode ADDA.L ,An 1 1 0 1 n n n 1 1 1 EA EA EA EA EA EA instructionshavethesourceoperandaddressingmodehard-coded 0..7 0..7 D intotheinstructionandthus,it Figure9.3OpmodedecompositionfortheADDinstruction isnotreallyaneffectiveaddress. Thiswouldincludeinstructionssuchasaddimmediate(ADDI)andsubtractimmediate(SUBI).
A17
RAM SELECT ROM SELECT
TO I/O
ADDRESS DECODER
Figures9.4and9.5are simplifiedschematic diagramsoftheprocessor andmemoryportionsof a68Kcomputersystem. Notallthesignalsare shown,butthatwon’t takeanythingawayfrom thediscussion.Also,we don’twanttheEE’sgettingtooupsetwithus.
A18 A19 A20 A21 A22 A23
MC68000
ExampleofaRealMachine Beforewemoveontoconsiderotherarchitectureslet’sdosomethingabitdifferent,butstill relevanttoourdiscussionofassemblycodeandcomputerarchitecture.Uptonow,we’vereally ignoredthefactthatultimatelythecodewewritewillhavetorunonarealmachine.Granted, memorytestprogramsaren’tthatexciting,butatleastwecanimaginehowtheymightbeused withactualhardware. A1..A23 Inordertoconcludeour D0..D15 discussionofassembly D15 languagecoding,let’s A1 CS lookatthedesignof ROM H anactual68000-based CS ROM L system. D0
Note: Not all 68000 signals are shown
CS RAM H AS
CS RAM L
UDS DTACK
LDS
BERR
16 MHz Clock
WRITE
CLK IN RESET
RESET GENERATOR
INT7 INT6 INT5 INT4 INT3 INT2 INT1
READ
IPL2 IPL1 IPL0
WR
Figure 9.4 Simplified schematic diagram of a 68000-based computer system.Onlythe68000CPUisshowninthisfigure. 257
Chapter9 ReferringtotheaddressbusofFigure9.4,weseethatthatthereisnoaddressbitlabeledA0.The A0issynthesizedbythestatemachinebyactivatingUDSorLDSaswesawinFigure7.6.The ORgatesontherightsideofthefigureareactuallybeingusedasnegativelogicANDgates,since alloftherelevantcontrolsignalsareassertedLOW. TheValidAddresssignalinthe68KenvironmentiscalledAddressStrobeandislabeledASinthe diagram.Thissignalisroutedtotwoplaces.ItisusedasaqualifyingsignalfortheREADand WRITEoperationsandisgatedthroughthetwolowerORgatesinthefigure(keepthinking,“ negativelogicANDgates”)andtotheADDRESSDECODERblock.Youshouldhavenotrouble identifyingthisfunctionalblockandhowitworks.Inapinch,youcouldprobablydesignityourself!Thus,wewillnotdecodeanaddressrangeuntiltheASsignalisassertedLOW,guaranteeing avalidaddress.TheI/OdecoderalsodecodesourI/Odevices,althoughtheyaren’tshowninthis figure. Sincethe68KdoesnothaveseparateREADandWRITEsignals,wesynthesizetheREADsignal withtheNOTgateandtheORgate.WealsousetheORgateandtheASsignaltoqualifythe READandWRITEsignals,withtheORgatefunctioningasanegativelogicANDgate.Strictly speakingthisisn’tnecessarybecausewearealreadyqualifyingtheADDRESSDECODERinthe sameway. A1..A17
WRITE
CS RAM H
CS RAM L
OE
D0 D1 D2 D3 D4 D5 D6 D7
~OE
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 CS
D0 D1 D2 D3 D4 D5 D6 D7
OE
CS ROM H
D8..D15
CS ROM L
D0..D16
D0..D7 READ
Figure9.5Simplifiedschematicdiagramofthememorysystem forthe68KcomputerofFigure9.4.
258
ROM LOW
D0 D1 D2 D3 D4
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 CS
ROM HIGH
WR A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 ~CS
RAM LOW
RAM HIGH
WR A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
D0 D1 D2 D3 D4 D5 D6 D7
OE
AdvancedAssemblyLanguageProgrammingConcepts ThelastblockofnoteistheInterruptControllerblockatthebottomlefthandsideofthediagram. The68Kuses3activeLOWinterruptinputslabeledIP0,IP1andIP2.Allthreelinesgoinglow wouldsignalalevel7interrupt.We’llstudyinterruptsinthenextchapter.TheinputstotheInterruptControllerare7activelowinputslabelsINT1–INT7.IftheINT7linewasbroughtLOW,all oftheoutputlineswouldalsogoLOW,indicatingthatINT7isalevel7interrupt.Now,ifINT6is low(IP0=1,IP1=0,IP2=0)andINT5alsowentLOW,theoutputswouldn’tchangebecause theInterruptcontrollerbothdecodestheinterruptinputsandprioritizesthem. Figure9.5isthememorysideofthecomputer.Itismadeupoftwo128K×8RAMchipsandtwo 128Kx8ROMchips.Noticehowthepairingofthedevicesandthetwochip-selectsignalsfrom theprocessorgivesusbytewritingcontrol.Therestofthiscircuitshouldlookveryfamiliartoyou. Thememorysysteminthiscomputerdesignconsistsof256KbytesofROM,locatedataddress $000000through$3FFFF.TheRAMislocatedataddress$100000throughaddress$13FFFF.This addressmappingisnotobviousfromthesimplifiedschematicdiagramsbecauseyoudonotknow thedetailsofthecircuitblocklabeled‘AddressDecoder”inFigure9.4.
SummaryofChapter9 Chapter9covered: • Theadvancedaddressingmodesofthe68Karchitectureandtheirrelationshiptohighlevellanguages. • Anoverviewoftheclassesofinstructionsofthe68Karchitecture. • UsingtheTRAP#15instructiontosimulatedI/O. • Howaprogram,writteninCexecutesinassemblylanguageandhowtheCcompiler makesuseoftheadvancedaddressingmodesofthe68Karchitecture. • Howprogramdisassemblyisimplementedanditsrelationtothearchitecture. • Thefunctionalblocksofa68K-basedcomputersystem.
Chapter9:Endnotes AlanClements,68000FamilyAssemblyLanguage,ISBN0-534-93275-4,PWSPublishingCompany,Boston,1994, p.704
1
259
ExercisesforChapter9 1. Examinetheblockofassemblylanguagecodeshownbelow. a. Whatistheaddressofthememorylocationwherethebyte,FF,isstoredintheindicated instruction? b. Isthiscodesegmentrelocatable?Why? org $400
start
movea.w move.w move.b jmp end
#$2000,A0 #$0400,D0 #$ff,($84,A0,D0)*Wheredoesffgo? start $400
Hint:Rememberthatdisplacementsinvolve2’scomplementnumbers.Also,thisinstruction mayalsobewrittenas:
move.b#$FF,$84(A0,D0) 2. Examinethefollowingcodesegmentandthenanswerthequestionabouttheoperationperformedbythecodesegment.Youmayassumethatthestackhasbeenproperlyinitialized. Brieflydescribetheeffectofthehighlightedinstruction.Whatvalueismovedtowhatdestination?
0000100041F90000101810START:LEADATA+2,A0 000010062C5011MOVEA.L(A0),A6 00001008203C00001B0012MOVE.L#$00001B00,D0 0000100E2C0013MOVE.LD0,D6 000010102D80684614MOVE.LD0,(70,A6,D6.L) 0000101460FE15STOP_IT:BRASTOP_IT 0000101600AA0040C830000016DATA:DC.L $00AA0040,$C8300000
3. Examinethefollowingcodesegmentandthenanswerthequestionabouttheoperationperformedbythecodesegment.Youmayassumethatthestackhasbeenproperlyinitialized. Brieflydescribetheeffectofthehighlightedinstruction.WhatisthevalueinregisterD0after thehighlightedinstructionhascompleted?
000004004FF84000START:LEA$4000,SP 000004043F3C1CAAMOVE.W#$1CAA,-(SP) 000004083F3C8000MOVE.W#$8000,-(SP)
260
AdvancedAssemblyLanguageProgrammingConcepts
0000040C223C00000010MOVE.L#16,D1 00000412203C216E0000MOVE.L#$216E0000,D0 00000418E2A0ASR.LD1,D0 0000041A383C1000MOVE.W#$1000,D4 0000041E2C1FMOVE.L(SP)+,D6 00000420C086AND.LD6,D0 0000042260FESTOP_HERE:BRASTOP_HERE
4. Answerthefollowingquestionsinasentenceortwo. a. WhydolocalvariablesgooutofscopewhenaC++functionisexited? b. Givetworeasonswhyvariablesinhighlevellanguages,suchasC,mustbedeclaredbeforetheycanbeused? c. The68000assemblylanguageinstructions,LINKandUNLKwouldberepresentativeof instructionsthatarecreatedinordertosupportahigh-levellanguage.Why? 5. Thefollowingisadisplayof32bytesofmemorythatyoumightseefroma“displaymemory” commandinadebugger.Whatarethe3instructionsshowninthisdisplay?
00000400067955550000AAAA06B9AAAA55550000 00000410FFFE0640AAAA0000FF00000000000000
6. Assumethatyouaretryingtowriteadisassemblerprogramfor68Kinstructionsinmemory. RegisterA6pointstotheopcodewordofthenextinstructionthatyouaretryingtodisassemble.Examinethefollowingalgorithm.Describeinwordshowitworks.Forthisexample,you mayassumethat=$00001A00and=%1101111001100001 shift
EQU
12
*Shift12bits
start
LEA jmp_table,A0 *Indexintothetable CLR.L D0 *Zeroit MOVE.W(A6),D0 *We’llplaywithithere MOVE.B#shift,D1 *Shift12bitstotheright LSR.W D1,D0 *Movethebits MULU #6,D0 *Formoffset JSR 00(A0,D0) *Jumpindirectwithindex {Otherinstructions}
jmp_table
JMP JMP JMP JMP JMP JMP JMP JMP JMP JMP JMP
code0000 code0001 code0010 code0011 code0100 code0101 code0110 code0111 code1000 code1001 code1010 261
Chapter9
JMP JMP JMP JMP JMP
code1011 code1100 code1101 code1110 code1111
7. ConvertthememorytestprogramfromChapter9,exercise#9,toberelocatable.Inorderto seeifyou’vesucceededwritetheprogramasfollows: a. ORGtheprogramat$400. b. Addsomecodeatthebeginningsowhenitbeginstoexecute,itrelocatesitselftothe memoryregionbeginningat$000A0000. c. Testthememoryregionfrom$00000400to$0009FFF0 d. Makesurethatyoulocateyourstacksothatitisnotoverwritten. 8. Thisexercisewillextendyourmasteryofthe68Kwiththeintroductionofseveralnewconceptstothememorytestexercisefromchapter8.Itwillalsobeagoodexerciseforstructuring yourassemblycodewithappropriatesubroutines.Theexerciseintroducestheconceptof userI/OthroughtheuseoftheTRAP#15instruction.YoushouldreadupontheTRAP#15 instructionsbystudyingtheHELPfacilitythatcomeswiththeE68Kprogram.Alternatively, youcanreaduponthedetailsoftheTRAP#15instructiononpage704oftheClements1text.
TheTRAP#15instructionisanartifactofthesimulator.ItwasdesignedtoallowI/Ototake placebetweenthesimulatorandtheuser.IfyouwerereallywritingI/Oroutines,youprobably woulddosomethingsdifferently,butlotsofthethingsarethesame.
AssociatedwiththeTRAP#15instructionarevarioustasks.EachtaskisnumberedandassociatedwitheachtaskisanAPIthatexplainshowitdoesitswork.Forexample,task#0printsa stringtothedisplayandaddsanewlinecharactersothatthecursoradvancestothebeginning ofthenextline.Inordertousetask#0youmustsetupthefollowingregisters(thisistheAPI): • D0holdsthetasknumberasabyte • TheaddressofthebeginningofthestringisheldinA1 • ThelengthofthestringtoprintisstoredasawordinD1 Onceyou’veset-upthethreeregistersyouthencallTRAP#15asaninstructionandyou’re messageisprintedtothedisplay.Here’sasampleprogramthatillustrateshowitworks.To outputthestring,“Helloworld!”,youmightusethefollowingcodesnippet:
********************************************* *Testprogramtoprintastringtothedisplay * ********************************************* OPTCRE task0EQU00 ORG$400 startMOVE.B#task0,D0*LoadtasknumberintoD0 LEAstring,A1*Getaddressofstring 262
AdvancedAssemblyLanguageProgrammingConcepts MOVE.Wstr_len,D1*Getthelengthofthestring TRAP#15*Doit STOP#$2700*Backtosimulator *Dataarea stringDC.B‘Helloworld’*Storethemessagehere str_lenDC.Wstr_len-string*Getthelengthofthestring END$400
Task#1isalmostidenticaltotask#0exceptthatitdoesnotprintthenewlinecharacter.Thisis handywhenyouwanttoprompttheusertoenterinformation.So,youwouldissueaprompt usingtask#1ratherthantask#0.
Also,youmightwanttogetinformationfromtheuser.Suchas,“Wheredotheywanttorun thememorytestandwhattestpatterndotheywanttouse?”Also,youmightaskthemifthey wanttorunthetestagainwithadifferentpattern.
ProblemStatement 1. Extendthememorytestprogram(Chapter9,exercise#9)toenableausertoenterthestarting addressofthememorytest,theendingaddressofthememorytest,andthewordpatterntouse forthetest.Onlytheinputtestpatternwillbeused.Thereisnoneedtocomplementthebitsor doanyothertests. 2. Theregionofmemorythatmaybetestedwillbebetween$00001000and$000A0000. 3. Theprogramfirstprintsoutatestsummaryheadingwiththefollowinginformation: ADDRESSDATAWRITTENDATAREAD 4. Everytimeanerroroccursyouwillprinttheaddressofthefailedlocation,thedatapattern thatyouwroteandthedatapatternthatyoureadback. 5. Locateyourstackabove$A0000. 6. Whentheprogramstartrunningitpromptstheusertoenterthestartingaddress(above $00001000)ofthetest.Theuserentersthestartingaddressforthetest.Theprogramthen promptstheuserfortheendingaddressofthetest(below$FFFF).Theuserenterstheaddress. Finally,theprogrampromptstheuserforthewordpatterntouseforthetest.
Youmustchecktoseethattheendingaddressisatleast1wordlengthsawayfromthestartingaddressandisgreaterthanthestartingaddress.Youdonothavetotestforvalidnumeric entries.Youmayassumeforthisproblemthatonlyvalidaddressesanddatavalueswillbe entered.Validentriesarethenumbers0through9thelowercaselettersathroughfandthe uppercaselettersAthroughF.
Oncethisinformationisentered,thetestsummaryheadinglineisprintedtothedisplayandthe testingbeings.Anyerrorsthatareencounteredareprintedtothedisplayasdescribedabove.
263
Chapter9 Discussion OnceyouobtainthestringfromtheuseryoumustrealizethatitisinASCIIformat.ASCIIisthe codeusedtoprintcharactersandreadcharacters.TheASCIIcodesforthenumbers0thru9are $30..$39.TheASCIIcodesforthelettersathroughfare$61..$66andtheASCIIcodesforthe lettersAthroughFare$41..$46. So,ifIprompttheuserforanaddressandtheusertypesin9B56;whentheTRAP#15instructioncompletestherewillbethefollowingfourbytesinmemory:$39,$42,$35,$36.So,howdo Igofromthistotheaddress$9B56?We’llyouwillhavetowriteanalgorithm.Inotherwords, youmustconverttheASCIIvaluestotheir16-bitwordequivalent.Hereisagoodopportunity topracticeyourshiftingandmaskingtechniques.Inordertogetanumberthatisrepresentedas ASCII0-9,youmustsubtract$30fromtheASCIIvalue.Thisleavesyouwiththenumericvalue. Ifthenumberis“B”,thenyousubtractadifferentnumbertogetthehexvalue$B.Likewise,ifthe numberis“b”,thenyoumustsubtractyetadifferentvalue. Here’swhattheaddress$9B56lookslikeinbinary. DB15DB12DB7DB3DB0 1001101101010110 Now,ifIhavethevalue9decodedfromtheASCII$39anditissittinginbitpositions0through3, howdoIgetittoendupinbitposition13through15?Itmakesgoodsensetoworkoutthe algorithmwithaflowchart.
264
10
CHAPTER
TheIntelx86Architecture
Objectives
Whenyouarefinishedwiththislesson,youwillbeableto: Describetheprocessorarchitectureofthe8086and8088microprocessors; Describethebasicinstructionsetarchitectureofthe8086and8088processors; Addressmemoryusingsegment:offsetaddressingtechniques; Describethedifferencesandsimilaritiesbetweenthe8086architectureandthe68000 architecture; Writesimpleprogramin8086assemblylanguageusingalladdressingmodesandinstructions ofthearchitecture.
Introduction Intelisthelargestsemiconductormanufacturerintheworld.Itrosetothispositionbecauseofits supplierpartnershipwithIBM.IBMneededaprocessorforitsnewPC-XTpersonalcomputer, underdevelopmentinBocaRaton,FL.IBM’sfirstchoicewastheZ-800fromZilog,Inc.,arival ofIntel’s.ZilogwasformedbyIntelemployeeswhoworkedontheoriginalIntel8080processor. Zilogproducedacode-compatibleenhancementofthe8080,theZ80.TheZ80executedallofthe 8080instructionsplussomeadditionalinstructions.Itwasalsosomewhateasiertointerfaceto thenthe8080. HobbyistsembracedtheZ80anditbecametheprocessorofchoiceforalmostalloftheearlyPCs builtupontheCP/MoperatingsystemdevelopedbyGaryKidallatDigitalResearchCorporation. Zilogwasrumoredtobeworkingona16-bitversionoftheZ80,theZ800,whichpromised dramaticperformanceimprovementoverthe8-bitZ80.IBMinitiallyapproachedZilogasthe possibleCPUsupplierforthePC-XT.Today,itcouldbeaguedthatIntelrosetothatpositionof prominencebecauseZilogcouldnotdeliveronitspromiseddeliverydatetoIBM. In1978Intelhada16-bitsuccessortothe8080,the8086/8088.The8088wasinternallyidentical tothe8086withtheexceptionofan8-biteternaldatabus,ratherthan16-bit.Certainlythiswould limittheperformanceofthedevice,buttoIBMithadtheattractivefeatureofsignificantlyloweringthesystemcost.IntelwasabletomeetIBM’sscheduleanda4.077MHz8088CPUdeputedin theoriginalIBMPC-XTcomputer.Foranoperatingsystem,IBMchosea16-bitCP/Mlook-alike fromasmallSeattlesoftwarecompanycalled,Microsoft.AlthoughZilogsurvivedtothisday,they neverrecoveredfromtheirinabilitytodelivertheZ800.Thisshouldbealessonthatallsoftware 265
Chapter10 developerswhomisstheirdeliverytargetsshouldtakethisclassicexampleofamissedschedule deliverytoheart.Asapost-script,theoriginalZ80alsosurvivestothisdateandcontinuestobe usedasanembeddedcontroller.Itishardtoestimateitsimpact,buttheremustbebillionsoflines ofZ80codestillbeingusedintheworld. ThegrowthofthePCindustryessentiallytrackedIntel’scontinueddevelopmentandongoing refinementofthex86architecture.Intelintroducedthe80286asafollow-onCPUandIBMintroducedthePC-ATcomputerwhichusedit.The80286introducedtheconceptofprotectedmode, whichenabledtheoperatingsystemtobegintoemploytaskmanagement,sothatMS-DOS,which wasasingletaskingoperatingsystem,couldnowbeextendedtoallowcrudeformsofmultitaskingtotakeplace.The80286wasstilla16-bitmachine.DuringthisperiodIntelalsointroduced the80186/80188integratedmicrocontrollerfamily.ThesewereCPUswith8086/8088coresand additionalon-chipperipheraldeviceswhichmadethedeviceextremelyattractiveasaone-chip solutionformanyembeddedcomputerproducts.Amongtheon-chipperipheralsare: • 2directmemoryaccesscontrollers(DMA) • Three16-bitprogrammabletimers • Clockgenerator • Chipselectunit • ProgrammableControlRegisters The80186/188familyisstillextremelypopulartothisday,withothersemiconductorcompanies buildingevenmorehighlyintegratedvariants,suchastheE86™familyfromAdvancedMicro Devices(AMD)andtheV-seriesfromNEC.Inparticular,thecombinationofperipheraldevices madethe186familiesveryattractivetodiskdrivemanufacturersforcontrollers. Withtheintroductionofthe80386thex86architecturewasfinallyextendedto32bits.ThePC worldrespondedwithhardwareandsoftware(Windows3.0)whichralliedaroundthisarchitecture.Thefollowonprocessorstothe80386,the80486andPentiumfamilies,continuedtoevolve thebasicarchitecturedefinedinthei386. Ourinterestinthisfamilyleadsustostepbackfromthei386architectureandtofocusonthe original8086familyarchitecture.Thisarchitecturewillprovideuswithagoodpointofreference fortryingtogainacomparativeunderstandingofthe68000versusthe8086.Whilethe68000is usuallyspecifiedasa16/32-bitarchitecture,the8086isalsocapableofdoingmany32-bitoperationsaswell.Also,thedefaultoperandsizeforthe68Kis16-bits,sotherelativecomparisonof thetwoarchitecturesisreasonablyvalid. Therefore,wewillapproachourlookatthex86familyfromtheperspectiveofthe8086.There aremanyotherreferencesavailableonthefollow-onarchitecturestothe8086andtheinterested readerisencouragedtoseekthemout. Aswebegintostudythe8086architectureandinstructionsetarchitectureitwillbecomeobvious thatsomuchoffocusisontheDOSoperatingsystemandthePCruntimeenvironment.Fromthat perspective,wehaveaninterestingcounterexampletothe68Karchitecture.Whenwestudiedthe 68K,wewererunninginan“emulation”environmentbecausethenativeinstructionsetofyour PCandthatofthe68Kareincompatible.So,avirtual68Kiscreatedprogrammaticallyandthat programcreatesthevirtual68Kmachineforyourcodetoexecuteon. 266
TheIntelx86Architecture Inthecaseofthe8086ISA,youareexecutingina“native”environmentwheretheinstructionset ofyourassemblylanguageprogramsiscompatiblewiththemachine. Thus,youcaneasilyexecute(well-behaved)8086programsthatyoucreateinsideofDOSwindowsonyourPC.Infact,theI/Ofromthekeyboardandtothescreencaneasilybeimplemented withcallstotheDOSoperatingthroughtheBasicInputOutputSystem(BIOS)ofyourPC.This isbothablessingandacursebecauseintryingtowriteandexecute8086programswewillbe forcedtohavetodealwithinterfacingtotheDOSenvironmentitself.Thismeanslearningsome assemblerdirectivesthatarenewandareonlyapplicabletoaDOSenvironment. Finally,masteringthebasicinstructionsetarchitectureofthei86requiresasteeperlearningcurve thenthe68Karchitecturethatwestartedwith.I’msurethiswillstartseveralminorreligiouswars, butpleaseexcusemeforallowingmypersonalbiastounveilitself. Inordertodealwiththeseissueswe’lloptforthe“KISS”approach(KeepItSimple,Stupid)and keepoureyesontheultimateobjective.Sincewhatwearetryingtoaccomplishisacomparativeunderstandingofmoderncomputerarchitectureswewillleavetheissuesofmasteringtheart ofprogrammingthei86architectureinassemblylanguageforanothertime.Therefore,wewill placeoutemphasisonlearningtheaddressingmodesandinstructionsofthe8086asourprimary objectiveandmakethehousekeepingrulesofDOSassemblylanguageprogrammingasecondary objectivethatwe’lldealwithonanas-neededbasis.However,youshouldbeabletoobtainany additionalinformationthatI’veomittedforthesakeofclarityfromanyoneofdozensofresources oni86orDOSassemblylanguageprogramming.Onward!
TheArchitectureofthe8086CPU Figure10.1isasimplifiedblockdiagramofthe8086CPU.Dependinguponyourperspective,the blockdiagrammayseemmorecomplextoyouthantheblockdiagramofthe68000processorin Figure7.11.Althoughitseemsquitedifferent,therearesimilaritiesbetweenthetwo.Thegeneral registersroughlycorrespondinpurposetothedataregistersofthe68K.Theregistersontheright sideofthediagram,whichislabeledthebusinterfaceunit,orBIU¸arecalledthesegmentregisters,androughlycorrespondtotheaddressregistersofthe68K. The8086isstrictlyorganizedastwoseparateandautonomousfunctionalblocks.Theexecution unit,orEU,handlesthearithmeticandlogicaloperationsonthedataandhasa6bytefirst-in, first-out(FIFO)instructionqueue(4bytesonthe8088).ThesegmentregistersoftheBIUare responsibleforaccessinstructionsandoperandsfrommemory.Themainlinkagebetweenthetwo functionalblocksistheinstructionqueue,withtheBIUlookingaheadofthecurrentinstructionbeingexecutedinordertokeepthequeuefilledwithinstructionsfortheEUtodecodeandoperateon. ThesymbolontheBIUsidethatlookslikeacarpenter’ssawhorseiscalledamultiplexer,or MUX,anditsfunctionistocombinetheaddressanddatainformationintoasingle,20-bitexternal bus.Themultiplexed(orshared)busallowsthe8086CPUtohaveonly40pinsonthepackage, whilethe68000has64pins,primarilyduetotheextrapinsrequiredforthe23addressand16 datapins.However,nothingisfree.Themultiplexedbusrequiresthatsystemsusingthe8086must haveexternallogicontheboardtolatchtheaddressintoholdingregistersduringthefirstpartof thebuscycleinordertohaveastableaddresstopresenttomemoryduringthesecondhalfofthe 267
Chapter10 Figure10.1:Simplifiedblock diagramoftheIntel8086CPU. FromTabak1.
ADDRESS BUS 20-BITS AH BH CH DH
General Registers
AL BL CL DL SP BP SI DI
DATA BUS CS 16-BITS DS SS ES IP BUS INTERNAL CONTROL COMMUNICATION LOGIC REGISTERS
16-BIT ALU DATA BUS
TEMPORARY REGISTERS
EU CONTROL
ALU
1 2 3 4
5 6
BUS INSTRUCTION QUEUE INTERFACE 6 BYTES UNIT (BIU)
FLAGS EXECUTION UNIT (EU)
cycle.IfyourecallFigure6.23,thetimingdiagramforagenericmicroprocessor,thenwecan describethe8086buscyclesashavingfour“T”states,labeledT1toT4.Ifawaitstateisgoingto beincludedinthecycle,thenitcomesasanextensionoftheT3state. DuringthefallingedgeofT1theprocessorpresentsthe20-bitaddresstotheexternallogic circuitryandissuesalatchingsignal,addresslatchenable,orALE.ALEisusedtolatchthe addressportionofthebuscycle.DataisoutputontherisingedgeofandisreadinonthefallingedgeofT3.The20-bit wideaddressbusgivesthe SEGMENT REGISTERS DATA REGISTERS D15 D0 D15 D8 D7 D0 80861MByteaddressrange. CS CODE SEGMENT AX AH AL Finally,8086doesnotplace DS DATA SEGMENT BX BH BL anyrestrictionsontheword ES EXTRA SEGMENT CX CH CL alignmentofaddresses.A SS STACK SEGMENT DX DH DL wordcanexistonanodd POINTER AND INDEX REGISTERS orevenboundary.TheBIU D15 D0 BP BASE POINTER managestheextrabuscycle INSTRUCTION POINTER AND FLAGS D15 D0 SI SOURCE INDEX requiredtofetchbothbytes IP INSTRUCTION POINTER DI DESTINATION INDEX ofthewordandasidefrom FLAGS CPU STATUS FLAGS SP STACK POINTER theperformancepenalty,the actionistransparenttothe softwaredeveloper. Theprogrammer’smodelof the8086isshowninFigure 10.2.
X
X
X
X
OF
DF
IF
TF
SF
ZF
X
AF
X
PF
X
D15
CF D0
X = RESERVED
Figure10.2:Programmer’smodelofthe8086registerset. 268
TheIntelx86Architecture
Data,IndexandPointerRegisters Theeight,16-bitgeneralpurposeregistersareusedforarithmeticandlogicaloperations.Inaddition, thefourdataregisterslabeledAX,BX,CXandDXmaybefurthersubdividedfor8-bitoperations intoahigh-byteorlow-byteregister,dependingwherethebyteistobestoredintheregister.Thus, forbyteoperations,theregistersmaybeindividuallyaddressed.Forexample,MOVAL,6Dwould placetheimmediatehexadecimalvalue6DintoregisterAL.Yes,Iknow.It’sbackwards. Also,whendatastoredin16-bitregistersarestoredinmemorytheyarestoredinthereverseorder fromhowtheyappearintheregister.Thus,if=109Candthisdatawordisthenwrittento memoryataddress1000and1001,thebyteorderinmemorywouldbe: =9C,=10{LittleEndian!Remember?} Also,thesedataregistersarenotcompletelygeneralpurposeinthesamewaythatD0throughD7 ofthe68Karegeneralpurpose.TheAXregister,aswellasitstwohalfregisters,AHandAL,are alsoknownasanaccumulator.Anaccumulatorisaregisterthatisusedinarithmeticandlogical operations.TheAXregistermustbeusedwhenyouareusingthemultiplyanddivide,MULand DIV,instructions.Forexample,thecodesnippet: MOV AL,10 MOV DH,25 MUL DH
woulddoan8-bitby8-bitmultiplicationofthecontentsoftheALregisterwiththecontentsofthe DHregisterandstoretheresultant16-bitvalueintheAXregister.Noticethatitwasnotnecessary tospecifytheALregisterinthemultiplicationinstructionsinceithadtobeusedforabytemultiplication.For16-bitoperands,theresultisstoredintheDX:AXregisterpairwiththehighorder wordstoredintheDXregisterandtheloworderwordintheAXregister.Forexample, MOV AX,0300 MOV DX,0400 MUL DX
Wouldresultin=0001:D4C0=120,000indecimal. Thereareseveralinterestingpointshere. • Thetypeofinstructiontobeexecuted(byteorword)isimpliedbytheregistersusedand thesizeoftheoperands. • Numbersareconsideredtobeliteralvalueswithoutanyspecialsymbol,suchasthe‘#’ signusedinthe68000assembler. • Undercertaincircumstances,16-bitregistersmaybegangedtogethertoform32-bitwide registers. • Althoughnotshownisthisexample,hexadecimalnumbersareindicatedwithafollowing ‘h’.Also,hexadecimalnumbersbeginningwiththelettersAthroughFshouldhavealeadingzeroappendedsothattheywon’tbeinterpretedaslabels. – 0AC55h=AC55hex – AC55h=label‘AC55h’
269
Chapter10 TheBXregistercanbeusedasa16-bitoffsetmemorypointer.Thefollowingcodesnippetloads absolutememorylocation1000Ahwiththevalue0AAh. movAX,1000h movDS,AX movBX,000Ah mov[BX],0AAh
Inthisexampleweseethat: • Thesegmentregister,DS,mustbeloadedfromaregister,ratherthanwithanimmediate value. • PlacingtheBXregisterinparentheseschangestheeffectiveaddressingmodetoregister indirect.Thecompletememoryloadaddressis[DS:BX].NoticehowtheDSregisteris implied,itisnotspecified. • TheDSregisteristhedefaultregisterusedwithdatamovementoperations,justastheCS registerwouldbeusedforreferencinginstructions.However,itispossibletooverridethe defaultregisterbyexplicitlyspecifyingthesegmentregistertouse.Forexample,thiscode snippetoverridestheDSsegmentregisterandforcestheinstructiontousethe[ES:BX] registerfortheoperation.
movAX,1000h movCX,2000h movDS,AX movES,CX movBX,000Ah es:movw.[BX],055h
Also,noticehowthe‘w.’wasused.Thisexplicitlytoldtheinstructiontointerprettheliteralasa wordvalue,0055h.Otherwise,theassemblerwouldhaveinterpreteditasabytevaluebecauseof itsrepresentationas‘055h’. TheCXregisterisusedasthecounterregister.Itisusedforlooping,shifting,repeatingandcountingoperations.ThefollowingcodesnippetillustratedtheCXregister’sraisond’etre. movCX,5 myLoop: nop loopmyLoop
TheLOOPinstructionfunctionsliketheDBccinstructionin68Klanguage.However,whilethe DBccinstructioncanuseanyofthedataregistersastheloopcounter,theLOOPinstructioncan onlyusetheCXregisterastheloopcounter.Intheabovesnippetofcode,theNOPinstruction willbeexecuted5times.EachtimethroughthelooptheCXregisterwillbeautomaticallydecrementedandtheloopinstructionwillstopexecutingwhen=0.Alsonoticehowthelabel, ‘myLoop’,isterminatedwithacolon‘:’.The8086assemblersrequireacolontoindicatealabel. Thelabelcanalsobeplacedonthelineabovetheinstructionordata: myLoop:
nop
myLoop: nop
Theseareequivalent.
270
TheIntelx86Architecture TheDXregisteristheonlyregisterthanmaybeusedtospecifyI/Oaddresses.ThereisnoI/O spacesegmentationrequiredbecausethe8086canonlyaddress64KofI/Olocations.Thefollowing codesnippetreadstheI/OportatmemorylocationA43EhandplacesthedataintotheAXregister. MOV DX,0A43Eh IN AX,DX
Unlikethe68K,thei86familyhandlesI/Oasaseparatememoryspacewithitsownsetofbus signalsandtiming. TheDXregisterisalsousedfor16-bitand32-bitmultiplicationanddivisionoperations.As you’veseeninapreviousexample,theDXregisterisgangedwiththeAXregisterwhentwo16bitnumbersaremultipliedtogivea32-bitresult.Similarly,thetworegistersaregangedtogether toforma32-bitdividendwhenthedivisoris16-bits. MOVDX,0200 MOVAX,0000 MOVCX,4000 DIVCX
Thisplacesthe32-bitnumber00C80000hintoDX:AXandthisnumberisdividedby0FA0hin theCXregister.Thequotient,0CCCh,isstoredinAXandtheremainder,0C80h,isstoredinDX. TheDestinationIndexandSourceIndexregisters,DIandSI,areusedwithdatamovementand stringoperations.Eachregisterhasaspecificpurposewhenindexingthesourceanddestination operands.TheDIandSIpointersalsodifferinthatduringstringoperations,theSIregisteris pairedwiththeDSsegmentregisterandtheSIregisterispairedwiththeESsegmentregister. DuringnonstringoperationsbotharepairedwiththeDSsegmentregister. Thismayseemstrange,butitisareasonablethingtodowhen,forexample,youaretryingtocopy astringbetweentwomemoryregionsthataregreaterthan64Kapart.Usingtwodifferentsegment registersautomaticallygivesyouthegreatestpossiblespaninmemory.Thefollowingcodesnippet copies5bytesfromthememorylocationinitiallypointedtobyDS:SItothememorylocation pointedtobyES:DI. myLoop:
MOVCX,0005 MOVAX,1000h MOVBX,2000h MOVDS,AX MOVES,BX MOVDI,200h MOVSI,100h MOVSB LOOPmyLoop
TheprograminitializestheDSsegmentregisterto1000handtheSIregisterto100h.Thusthe stringtobecopiedislocatedataddress,oratthephysicalmemorylocation10100h. TheESsegmentregisterisinitializedto2000handtheDIregisterisinitializedto0200h,sothe physicaladdressinmemoryforthedestinationis20200h.TheCXregisterisinitializedforaloop countof5,andtheLOOPinstructioncausestheMOVSB(MOVStringByte)instructiontoexecute5 271
Chapter10 times.EachtimetheMOVSBinstructionexecutes,thecontentsoftheDIandSIregistersareautomaticallyincrementedby1byte. Likeitornot,thesearepowerfulandcompactinstructions. TheBasePointerandStackPointer(BPandSP)general-purposeregistersareusedinconjunctionwiththeStackSegment(SS)registerintheBIUandpointtothebottomandtopofthestack, respectively.Sincethesystemstackismanagedbyadifferentsegmentregister,weneedadditionaloffsetregisterstopointtoaddresseslocatedintheregionpointedtobySS.ThinkoftheSP registeras“the”StackPointer,whiletheBPregisterisageneralpurposememorypointerintothe memoryregionpointedtobytheSSsegmentregister.TheBPisusedbyhigh-levellanguagesto providesupportforthestack-basedoperationssuchasparameterpassingandthemanipulationof localvariables.Inthatsense,thecombinationoftheSSandBPtaketheplaceofthelocalframe pointer(oftenregisterA6)whenhigh-levellanguagesarecompiledforthe68K. Allofthestack-basedinstructions(POP,POPA,POPF,PUSH,PUSHAandPUSHF)usethe StackPointer(SP)register.TheSPregisterisalwaysusedasanoffsetvaluefromtheStackSegment(SS)registertopointtothecurrentstacklocation. Thesepointerandindexregistershaveoneimportantdifferencewiththeir68Kanalogs.Asnotedin theaboveregisterdefinitions,theseregistersareusedinconjunctionwiththesegmentregistersinthe BIUinordertoformthephysicalmemoryaddressoftheoperand.We’lllookatthisinamoment.
FlagRegisters Someofthebitsintheflagregisterhavesimilardefinitionstothebitsinthe68Kstatusregister.Othersdonot.Also,thesettingandresettingoftheflagsismorerestrictivetheninthe68K architecture.Aftertheexecutionofaninstructiontheflagsmaybeset(1),clearedorreset(0), unchangedorundefined.Undefinedmeansthatthevalueoftheflagpriortotheexecutionofan instructionmaynotberetainedanditsvalueaftertheinstructionisexecutedcannotbepredicted2. • Bit0:CarryFlag(CF)Setonahigh-orderbitcarryforanadditionoperationoraborrow operationforasubtractionoperation;clearedotherwise. • Bit1:Reserved. • Bit2:ParityFlag(PF)Setifthelow-order8bitsofaresultcontainanevennumberof 1bits(evenparity);clearedotherwise(oddparity). • Bit3:Reserved. • Bit4:AuxiliaryCarry(AF)Setoncarryorborrowfromthelow-order4bitsoftheAL general-purposeregister;clearedotherwise. • Bit5:Reserved. • Bit6:ZeroFlag(ZF)Setiftheresultiszero;clearedotherwise. • Bit7:SignFlag(SF)Setequaltothevalueofthehigh-orderbitofresult.Setto0ifthe MSB=0(positiveresult).Setto1iftheMSBis1(negativeresult). • Bit8:TraceFlag(TF)WhentheTFflagissetto1,atraceinterruptoccursafterthe executionofeachinstruction.TheTFflagisautomaticallyclearedbythetraceinterrupt aftertheprocessorstatusflagsarepushedontothestack.Thetraceserviceroutinecan continuetracingbypoppingtheflagsbackwithareturnfrominterrupt(IRET)instruction. Thus,thisflagimplementsasingle-stepmechanismfordebugging. 272
TheIntelx86Architecture • Bit9:Interrupt-EnableFlag(IF)Whensetto1,maskable,orlowerpriorityinterruptsare enabledandmayinterrupttheprocessor.Wheninterrupted,theCPUtransferscontrolto thememorylocationspecifiedbyaninterruptvector(pointer). • Bit10:DirectionFlag(DF)SettingtheDFflagcausesstringinstructionstoautoincrementtheappropriateindexregister.Clearingtheflagcausestheinstructionsto auto-decrementtheregister. • Bit11:OverflowFlag(OF)Setifthesignedresultcannotbeexpressedwithinnumberof bitsinthedestinationoperand;clearedotherwise. • Bits12-15:Reserved.
SegmentRegisters Thefour16-bitsegmentregistersarepartoftheBIU.Theseregistersstorethesegment(page) valueoftheaddressofthememoryoperand.Theregisters(CS,DS,ESandSS)definethesegmentsofmemorythatareimmediatelyaddressableforcode,orinstructionfetches(CS),datareads andwrites(DSandES)andstack-based(SS)operations.
InstructionPointer(IP) TheInstructionPointerregistercontainstheoffsetaddressofthenextsequentialinstructiontobe executed.Thus,itfunctionsliketheProgramCounterregisterinthe68K.TheIPregistercannot bedirectlymodified.LikethePC,nonsequentialinstructionswhichcausebranches,jumpsand subroutinecallswillmodifythevalueoftheIPregister. Theseregisterdescriptionshaveslowlybeenintroducingustoanewwayofaddressingmemory, calledsegment-offsetaddressing.Segment-offsetaddressingissimilartopaginginmanyways, butitisnotquitethesameasthepagingmethodwe’vediscussedearlierinthetext.Thesegment registerisusedtopointtothebeginningofanyoneofthe64Ksixteen-byteboundaries(calledparagraphs)thatcanexistina20-bitaddressspace.Figure10.3illustrateshowtheaddressisformed. Segment address 9000h FFFFF
00000
Physical Memory Space Address Range
00000h - FFFFFh
7FFF
0000
8000
Offset Address Physical Address Range 97FFF
0002 0001 0000 FFFF FFFF
7FFF 7FFE
8001 8000
90002 90001 90000 8FFFF 8FFFE
88001 88000
Physical Address = ( Segment Address x 16 ) + Offset Address
Figure10.3:Memoryaddressbaseduponsegmentandoffsetmodel. 273
Chapter10 Itisobviousfromthismodelthatthesamememoryaddresscanbespecifiedinmanydifferent ways.This,insomerespects,isn’tverydifferentfromthealiasingproblemsthatcanarisefrom the68K’saddressingmethod,althoughthe68Kaddressingprobablyseemssomewhatmore straight-forwardtoyouatthispoint.Inanycase,the20-bitaddressiscalculatedbytakingthe 16-bitaddressvalueintheappropriatesegmentregisterandthendoinganarithmeticshiftleft by4bitpositions.Sinceeachleftshifthastheeffectofamultiplicationby2,fourshiftsmultiplytheaddressby16andresultinthebaseaddressesfortheparagraphboundariesasshownin Figure10.3.Theoffsetvalueisatruenegativeorpositivedisplacementcenteredabouttheparagraphboundarysetbythesegmentregister.Figure10.3showsthatthephysicaladdressrange thatisaccessiblefromasegmentregistervalueof9000hextendsfrom88000hto97FFFh.Offset addressesfrom0000hto7FFFhrepresentapositivedisplacement(towardshigheraddresses)and addressesfrom0FFFFhto8000hrepresentnegativedisplacements. Oncetheparagraphboundaryisestablishedby thesegmentregister,theoffsetvalue(eithera literalorregistercontent)issignextendedand addedtotheshiftedvaluefromthesegment registertoformthefull20-bitaddress.Thisis showninFigure10.4. Whilethephysicaladdressofmemory operandsis20-bits,or1MByte,theI/Ospace addressrangeis16-bits,or64K.Thefour, higherorderaddressbitsarenotusedwhen addressingI/Odevices.
1
2
A
4 Segment Base
15
0
0
0
2
15
1
2
A
0
4
0
19
+
0 1
Offset Address
Logical Address
Left shift 4 bits
0
0
0
2
2
15
19
2
Address
2
Add offset
0
A
6
2 Physical address
To memory
0
Figure10.4:Convertingthelogicaladdresstothe
Oneinterestingpointthatshouldbeginto physicaladdressinthe8086architecture. becomecleartoyouisthatformostpurposes, itdoesn’tmatterwhatthephysicaladdressinmemoryactuallyworksouttobe.Mostdebugging toolsthatyouwillbeworkingwithpresentaddressestoyouinsegment:offsetfashion,soitisn’t necessaryforyoutohavetotrytocalculatethephysicaladdressinmemory.Thus,ifyouare presentedwiththeaddress1000:0055,youcanfocusontheoffsetvalueof55hinthesegment paragraphstartingat1000h.Ifyouhavetogotophysicalmemory,thenyouwouldconvertthisto 10055h,butthiswouldbetheexceptionratherthantherule.
SegmentRegisters Asyou’veseeninthepreviousexamples,thereare4segmentregisterslocatedintheBIU.Theyare: • CodeSegment(CS)Register • DataSegment(DS)Register • StackSegment(SS)Register • ExtraSegment(ES)Register TheCSregisteristhebasepointerregisterforfetchinginstructionsfrommemory.TheDSregister isthedefaultsegmentpointerforalldataoperations.However,asyou’veseenthedefaultregisters canbeoverriddenwithanassemblerregisterprefixontheinstructionop-code.Itisobviousthat 274
TheIntelx86Architecture unlikethe68K,thearchitecturewantstohaveacleardivisionbetweenwhatistobefetchedas instructionsandwhatistobemanipulatedasdata. TheStackSegmentregisterfunctionslikethestackpointerregisterinthe68Karchitecturewhenit isusedwiththeSPregister.ThelastregisterintheBIUistheExtraSegmentRegister,orES.This registerprovidesthesecondsegmentpointerforstring-basedoperations. Wecanseefromthesediscussionsthatthebehaviorofthesegmentregistersisverystrictly defined.SincetheyarepartoftheBIUtheycannotbeusedforarithmeticoperationsandtheir contentscanbemodifiedonlywithregister-to-registertransfers.Thismakessenseinlightofthe obviousdivisionoftheCPUintotheEUandBIU. Insomerespectwe’vebeenjumpingaheadofourselveswiththesecodeexamples.Theintentwas togiveyouafeelforthewaythearchitectureworksbeforewediveinandtrytosystematically workthroughtheinstructionsandeffectiveaddressingmodesofthe8086architecture.
MemoryAddressingModes The8086architectureprovidesfor8addressingmodes.Thefirsttwomodesoperateonvaluesthat arestoredininternalregistersorarepartoftheinstruction(immediatevalues).Theseare: 1. RegisterOperandMode:Theoperandislocatedinoneofthe8or16bitregisters. ExampleofRegisterOperandModeinstructionswouldbe: MOV MOV INC
AX,DX AL,BH AX
2. ImmediateOperandMode:Theoperandispartoftheinstruction.Examplesofthe ImmediateOperandModeare: MOV ADD
AX,0AC10h AL,0AAh
Thereisnoprefix,suchasthe‘#’sign,toindicatethattheoperandisanimmediate.
Thenextsixaddressingmodesareusedwithmemoryoperands,ordatavaluesthatarestoredin memory.Theeffectiveaddressingmodesareusedtocalculateandconstructtheoffsetthatiscombinedwiththesegmentregistertocreatetheactualphysicaladdressusedtoretrieveorwritethe memorydata.Thephysicalmemoryaddressissynthesizedfromthelogicaladdresscontainedinthe segmentregisterandanoffsetvalue,whichmayormaynotbecontainedinaregister.Thesegment registersareusuallyimplicitlychosenbythetypeoperationbeingperformed.Thus,instructions arefetchedrelativetothevalueintheCSregister;dataisreadorwrittenrelativetotheDSregister; stackoperationsarerelativetotheSPregisterandstringoperationsarerelativetotheESregister. Registersmaybeoverriddenbyprefacingtheinstructionwiththedesiredregister.Forexample: ES:MOVAX,[0005h]
willfetchthedatafromES:00005,ratherthanfromDS:0005.Theeffectiveaddress(offsetaddress) canbeconstructedbyaddingtogetheranyofthefollowingthreeaddresselements: a. An8-bitor16-bitimmediatedisplacementvaluethatispartoftheinstruction, b. Abaseregistervalue,containedineithertheBXorBPregister, 275
Chapter10
c. AnindexvaluestoredintheDIorSIregisters.
3. DirectMode:Thememoryoffsetvalueiscontainedasan8-bitor16-bitpositivedisplacementvalue.
Unlikethe68K,theoffsetportionoftheaddressisalwaysapositivenumbersincethedisplacementisreferencedfromthebeginningofthesegment.TheDirectModeisclosestto the68K’sabsoluteaddressingmode,however,thedifferenceisthatthereisalwaystheimpliedpresenceofthesegmentregisterneedtocompletethephysicaladdress.Forexample: MOV
AX,[00AAh]
copiesthedatafromDS:00AAintotheAXregister,while, CS:MOV DX,[0FCh] copiesthedatafromCS:00FCintotheDXregister.
Noticehowthesquarebracketsareusedtosymbolizeamemoryinstructionandthe absenceofbracketsmeansanimmediatevalue.Also,twomemoryoperandsarenotpermittedfortheMOVinstruction.Theinstruction, MOV [00AAh],[1000h] isillegal.Unlikethe68KMOVEinstruction,onlyonememoryoperandispermitted.
4.
RegisterIndirectMode:Theoperandoffsetiscontainedinoneofthefollowingregisters: • BP • BX • DI • SI Thesquarebracketsareusedtoindicateindirection.Thefollowingcodesnippetwritesthe value55htomemoryaddressDS:0100. MOV MOV MOV
BX,100h AL,55h [BX],AL
5. BasedMode:Thememoryoperandisthesumofthecontentsofthebaseregister,BXor BPandthe8-bitor16-bitdisplacementvalue.Thefollowingcodesnippetwritesthevalue 0AAhtomemoryaddressDS:0104.
MOV MOV MOV
BX,100h AL,0AAh [BX]4,AL
TheBasedModeinstructioncanalsobewritten:MOV[BX-4],AL.
Sincethedisplacementcanbeapositiveornegativenumber,theaboveinstructionis equivalentto:MOV[BX+0FFFCh],AL 6. IndexedMode:Thememoryoperandisthesumofthecontentsofanindexregister,DIor SI,andthe8-bitor16-bitdisplacementvalue.Thefollowingcodesnippetwritesthevalue 055htomemoryaddressES:0104. 276
TheIntelx86Architecture
MOV MOV MOV
DI,100h AL,0AAh [DI]4,AL
Thecodesnippetaboveillustratesanotherdifferencebetweenusingtheindexregistersand thebaseregisters.TheDIregisterdefaultstoESregisterasitsaddresssegmentsource, althoughtheESregistermaybeoverriddenwithasegmentoverrideprefixontheinstruction.Thus,DS:MOV[DI+4],ALwouldforcetheinstructiontousetheDSregisterrather thantheESregister.
7. BasedIndexedMode:Thememoryoperandoffsetisthesumofthecontentsofoneofthe baseregisters,BPorBXandoneofthedisplacementregisters,DIorSI.TheDIregisteris normallypairedwiththeESregisterintheIndexedMode,butwhentheDIregisterisused inthismode,theDSregisteristhedefaultsegmentregister.Thefollowingcodesnippet writesthevalue0AAhtomemorylocationDS:0200h MOV DX,1000h
MOV MOV MOV MOV MOV
DS,DX DI,100h BX,DI AL,0AAh [DI+BX],AL
TheinstructionMOV[DI+BX],ALmayalsobewritten:MOV[DI][BX],AL
8. BasedIndexedModewithDisplacement:Thememoryoperandoffsetisthesumofthe contentsofoneofthebaseregisters,BPorBX,oneofthedisplacementregisters,DIorSI andan8-bitor16-bitdisplacement.TheDIregisterisnormallypairedwiththeESregister intheIndexedMode,butwhentheDIregisterisusedinthismode,theDSregisteristhe defaultsegmentregister.Thefollowingcodesnippetwritesthevalue0AAhtomemory locationDS:0204h:
MOV MOV MOV MOV MOV MOV
DX,1000h DS,DX DI,100h BX,DI AL,0AAh [DI+BX]4,AL TheinstructionMOV[DI+BX]4,ALmayalsobewritten:MOV[DI][BX]4,ALorMOV [DI+BX+4],AL.
Offsetcalculationscanleadtoveryinterestingandsubtlecodedefects.Examinethecode segmentshownbelow: MOV MOV MOV MOV MOV MOV
DX,1000h DS,DX BX,07000h DI,0FF0h AL,0AAh [DI+BX+0Fh],AL 277
Chapter10
Thiscodeassembleswithouterrorsandwritesthevalue0AAhtoDS:7FFF.Thephysical addressis17FFFh.Now,examinethenextcodesegment: MOV MOV MOV MOV MOV MOV
DX,1000h DS,DX BX,07000h DI,0FF0h AL,0AAh [DI+BX+10h],AL
Thiscodeassembleswithouterrorsandwritesthevalue0AAhtoDS:8000.However, thephysicaladdressisnot18000h,thenextphysicallocationinmemory,itisactually 08000h.Wehaveactuallywrappedaroundtheoffsetandgonefrommemorylocation 17FFFhto8000h.Ifwehadbeeninaloop,writingsuccessivevaluestomemory,we wouldhavehadaninterestingproblemtodebug.
Wecansummarizethememoryaddressingmodesasshownbelow.Considerthateachofthe items#1,#2,or#3isoptional,withthecaveatthatyoumusthaveatleastonepresenttobeable tospecifyanoffsetvalue.Wow!Thisalmostmakessense.However,weneedtokeepinmindthat unlessthedefaultsegmentregisterisoverridden, #1 #2 #3 theDIregisterwillbepairedwiththeESregisterto determinethephysicaladdresswhiletheotherthree BX DI registerswillusetheDSsegmentregister.However, or + or + displacement iftheDIregisterisusedinconjunctionwitheither BP SI theBXorBPregisters,thentheDSsegmentregister isthedefault.
X86InstructionFormat An8086instructionmaybeasshortas1bytein Instruction Prefix lengthupthroughseveralbyteslong.Allassembly Segment Override Prefix languageinstructionsfollowtheformatshownin Opcode Figure10.5.Eachfieldisinterpretedasfollows: Operand Address 1. InstructionPrefixes:CertainstringoperaDisplacement tioninstructionscanberepeatedlyexecuted withtheREP,REPE,REPZ,REPNEand Immediate REPNZprefixes.Also,thereisaprefixterm Figure10.5:8086Instructionformat. forassertingtheBUSLOCKsignaltothe externalinterface. DB7 DB6 DB5 DB4 DB3 DB2 DB1 DB0 2. SegmentOverridePrefix:Inordertooverride 0 0 1 R R 1 1 0 thedefaultsegmentregisterusedforthememory operandaddresscalculation,thesegmentoverride 00 = ES byteprefixisappendedtothebeginningofthe 01 = CS 10 = SS instruction.Onlyonesegmentoverrideprefixmay 11 = DS beused.Theformatofthesegmentoverrideprefix Figure 10.6: Format of the Segment isshowninFigure10.6. OverridePrefix. 278
TheIntelx86Architecture 3. Opcode:Theop-codefieldspecifiesthemachine DB7 DB6 DB5 DB4 DB3 DB2 DB1 B0 languageop-codefortheinstruction.Most op-codesareonebyteinlength,althoughthere aresomeinstructionswhichrequireasecond MOD AUX r/m op-codebyte. Figure10.7:FormatoftheOperand 4. OperandAddress:Theoperandaddressbyteis AddressByte. somewhatinvolved.Theformofthebyteisshown inFigure10.7.Therearethreefields,labeledmod,auxandr/m,respectively.Theoperand addressbytecontrolstheformoftheaddressingoftheinstruction. ThedefinitionoftheModFieldsisasfollows: DB7
1 0 0 1
DB6
1 0 1 0
Description
r/mistreatedasaregisterfield DISPField=0;disp-lowanddisp-highareabsent DISPField=disp-lowsignextendedto16-bits,disp-highisabsent DISPField=disp-high:disp-low
Themod(modifier)fieldisusedinconjunctionwithther/mfieldtodeterminehowther/m fieldisinterpreted.Ifmod=11,thenther/mfieldisinterpretedasaregisteroperand.For amemoryoperand,themodfielddeterminesifthememoryoperandisaddresseddirectly orindirectly.Forindirectlyaddressedoperands,themodfieldspecifiesthenumberof bytesofdisplacementthatappearsintheinstruction.
Theregister/memory(r/m)fieldspecifiesifaregisteroramemoryaddressistheoperand. Thetypeofoperandisspecifiedbythemodfield.Ifthemodbitsare11,thenther/mfield istreatedasaregisteridentifier.Ifthemodfieldhasthevalues00,10or01,thenther/m fieldspecifiesoneoftheeffectiveaddressingmodesthatwe’vepreviouslydiscussed.The r/mfieldvaluesmaybesummarizedasfollows: DB2
0 0 0 0 1 1 1 1
DB1
0 0 1 1 0 0 1 1
DB0
0 1 0 1 0 1 0 1
Effectiveaddress
[BX]+[SI]+DISP [BX]+[DI]+DISP [BP]+[SI]+DISP [BP]+[DI]+DISP [SI]+DISP [DI]+DISP [BP]+DISP(ifmod=00thenEA=disp-high:disp-low) [BX]+DISP
Whenther/misusedtospecifyaregisterthenthebitcodeisasfollows: DB2 0 0 0 0 1 1 1 1
DB1 0 0 1 1 0 0 1 1
DB0 0 1 0 1 0 1 0 1
Word AX CX DX BX SP BP SI DI
Byte AL CL DL BL AH CH DH BH 279
Chapter10
Referringtotheabovetable,youcanseehowthebitcodefortheregisterchangesdependinguponthesizeofoperandsintheinstruction.Theoperandsizeisdeterminedbythe opcodeportionoftheinstruction.We’llseehowthisworksinamoment.
Theauxiliaryfield(Aux)isusedfortwopurposes.Certaininstructionsmayrequirean opcodeextensionfieldandwhenneeded,theextensionisincludedintheauxfield.Also, whentheinstructionrequiresasecondregisteroperandtobespecified,theAuxiliaryField isusedtospecifythesecondregisteraccordingtotheabovetable. 5. Displacement:Thedisplacementvalueisan8-bitor16-bitnumbertobeaddedtothe offsetportionoftheeffectiveaddresscalculation. 6. Immediate:Theimmediatefieldcontainsupto2bytesofimmediatedata. Asyoucansee,the8086instructionformatisquiteabitmoreinvolvedthenthe68Kinstruction format.Muchofthiscomplexityistraceabletothesegment:offsetaddressingformat,butthatisn’t thewholestory.Weshouldconsiderthe8086architectureasawholetoseehowthebitpatterns areusedtocreatethepaththroughthemicrocodethatdeterminesthestatemachinepathforthe instruction.Thelastpieceofthepuzzlethatweneedtoconsideristheopcodeportionofinstruction. Recallthatinthe68Karchitecture,theopcodeportionoftheinstructioncouldreadilybedissected intoafixedpatternofbits,aregisterfield,amodefield,andaneffectiveaddressfield.Inthe8086 architecture,thedistinctionsarelessapparentandthecontrolismoredistributed.Let’sconsider thepossibleopcodesfortheMOVinstruction.Thetablebelowliststhepossibleinstructioncodes. ThevalueintheOpcodecolumnprecededbyaforwardslash‘/’isthevalueintheAuxiliaryField oftheOperandAddressByte.ThevalueintheOpcodecolumnfollowingtheplussign‘+’isthe registercodewhichisaddedtothebasevalueoftheopcode. Format
Opcode Description
MOVr/m8,r8
88/r
MOVr/m16,r16 MOVr8,r/m8 MOVr16,r/m16 MOVr/m16,sreg MOVsreg,r/m16 MOVAL,moffs8 MOVAX,moffs16 MOVmoffs8,AL MOVmoffs16,AX MOVr8,imm8 MOVr16,imm16
Copythebytevaluestoredinregister/rtotheregisterormemory byter/m8 Copythewordvaluestoredinregister/rtotheregisterormemory 89/r wordr/m16 8A/r Copythebytevaluestoredinr/m8tothebyteregister/r Copythevaluestoredintheregisterormemorywordr/m16tothe 8B/r wordregister/r 8C/sr Copythesegmentregister/srtotheregisterormemorywordr/m16 8E/sr Copytheregisterormemorywordr/m16tothesegmentregister Copythebytevaluestoredinmemoryatsegment:moffs8totheAL A0 register Copythewordvaluestoredinmemoryatsegment:moffs16tothe A1 AXregister CopythecontentsoftheALregistertothememorybyteaddressat A2 segment:moffs8 CopythecontentsoftheAXregistertothememorywordaddress A3 atsegment:moffs16 B0+rb Copytheimmediatebyte,imm8,totheregisterrb B8+rw Copytheimmediateword,imm16,totheregisterrw
280
TheIntelx86Architecture Format
Opcode Description
MOVr/m8,imm8
C6/0
MOVr/m16,imm16 C7/0
Copytheimmediatebyte,imm8totheregisterormemorybyte r/m8 Copytheimmediateword,imm16,totheregisterormemoryword r/m16
Referringtotheopcodemap,above,wecanseethatwheneverabyteoperationisspecifiedthen theleastsignificantbitoftheopcodeis0.Conversely,wheneverawordoperationisspecified,the leastsignificantbitisa1.Whenthesourceordestinationisasegmentregister,thenthecodefor theappropriatesegmentregister,DS,SSorESappearsintheauxfieldoftheoperandaddressbyte as011,010or000,respectively.Thesearethesamecodesthatareusedforthesegmentoverride bits.TheCSregistercannotbedirectlymodifiedbytheMOVinstruction. Thisisalottotrytoabsorbinafewpagesoflightreading,solet’strytopullitalltogetherby lookingatthebitpatternsforsomerealinstructionsandseeiftheyactuallyagreewithwhatwe wouldpredict. Mnemonic
MOV
AX,BX
Code
mod
8BC3
11
aux
000
r/m
011
Theopcode,8Bh,tellsusthatthisisaMOVinstructionoftheformMOVr/16,r/m16.ThedestinationregisteristhewordregisterAX.Also,theregisterisspecifiedintheAUXfieldandits valueis000,whichcorrespondstotheAXregister.TheMODfieldis11,whichindicatesthatthe r/m16fieldisaregistervalue.Ther/mfieldvaleis011,whichmatchestheBXregister.Thus,our dissectionoftheinstructionstartswiththeappropriateformoftheopcodewhichleadsustothe appropriateformsoftheoperandbyte. Thatwasprettystraightforward.Let’sratchetitupanotch.Let’saddasegmentoverrideanda memoryoffset. Mnemonic
ES:MOV[100h],CX
Code
mod
26890E0001
aux
00
r/m
001
110
Thesegmentoverridebyte,26h,identifiestheESregisterandtheopcode,89h,tellsusthatthe instructionisoftheformMOVr/m16,r16.Theauxfieldtellsusthatthesourceregisteris CX(001)andthemodfieldtellsusthattheregister/memoryaddressisamemorylocationbut thereisnodisplacementvalueneededtocalculatetheaddressoffset.Ther/mfieldvalue110indicatesthatthememoryoffsetisprovidedasdisp-high:disp-low,whichisstoredinmemoryas0001 (littleEndian). Forthefinalexamplewe’llputitalltogether. Mnemonic
CS:MOV[BX+DI+0AAh],55h
Code
2EC681AA0055
mod
10
aux
000
r/m
001
Canwemakesenseofthisinstruction?Theoverridebyteindicatesthattheoverridesegmentis CS(01).Theopcode,C6h,tellsusthattheinstructionisoftheform: MOVr/m8,imm8
TheOperandAddressBytegivesusthemodvalue10,whichtellsusthatthereisadisplacement fieldpresentthatmustbeusedtocalculatetheoffsetaddressanditisofftheformdisp-high:disp-low. 281
Chapter10 Weseethisinthenexttwobytesoftheinstruction,AA00.Ther/mfieldvalue001thengivesus thefinalformoftheaddresscalculation,[BX+DI+DISP].Thefinalbyteoftheinstruction,55h, isthe8-bitimmediatevalue,imm8,asshownintheformatoftheopcode. Ifwemodifytheinstructionasfollows: CS:MOV [BX+DI+0AAh],5555h
Theinstructioncodebecomes2EC781AA005555.TheopcodechangesfromC6toC7,indicatingawordoperationandtheimmediatefieldisnowtwobyteslong.Theentireinstructionis now7byteslong,sowecanseewhythe8086architecturemustallowfornonalignedaccesses frommemorysincethenextinstructionwouldbestartingonanoddboundaryifthisinstruction beganonawordboundary.
8086InstructionSetSummary Whileitisbeyondthescopeofthistexttocoverthe8086familyinstructionsindetailwecanlook atrepresentativeinstructionsineachclassandusetheseinstructionsastogainanunderstanding ofalloftheinstructions.Also,someoftheinstructionswillalsoextendourinsightintotheoverall x86familyarchitecture. Wecanclassifytheoriginalx86instructionsetintothefollowinginstructionfamilies: • Datatransferinstructions • Arithmeticinstructions • Logicinstructions • Stringmanipulationinstructions • Controltransferinstructions Mostoftheseinstructionclassesshouldlookfamiliartoyou.Othersarenew.The68Kfamily instructionsetdidnothavededicatedinstructionsforloopcontrolorstringmanipulation.Whether ornotthisrepresentsashortcomingofthearchitectureisdifficulttoassess.Clearlyyouareable tomanipulatestringsandformsoftwareloopsinthe68Karchitectureaswell.Let’snowlookat somerepresentativeinstructionsineachoftheseclasses.
DataTransferInstructions JustaswiththeMOVEinstructionofthe68Kfamily,theMOVinstructionisprobablythemostoftusedinstructionintheinstructionset.We’velookedattheMOVinstructioninsomedetailinthis chapter,usingitasaprototypeinstructionforunderstandinghowthex86instructionsetarchitecturefunctions.Wemaysummarizetheinstructionsinthedatatransferclassinthefollowingtable6. Notethatnotalloftheinstructionmnemonicsarelistedandsomeftheinstructionshaveadditional variationswithuniquemnemonics.Thistableismeanttosummarizethemostgeneralclassification. Mnemonic (©Intel)
Instruction
Description
MOV
MOVE
PUSH
PUSH
POP
POP
Copiesdatafromasourceoperandtoadestinationoperand Createsstoragespaceforanoperandonthestackandthencopiestheoperandontothestack Copiescomponentfromthetopofthestacktomemoryorregister 282
TheIntelx86Architecture Mnemonic (©Intel)
Instruction
Description
XCHG IN OUT
EXCHANGE INPUT OUTPUT
XLAT
TRANSLATE
Exchangestwooperands. ReaddatafromaddressinI/Ospace WritedatatoaddressinI/Ospace Translatestheoffsetaddressofbyteinmemorytothevalueofthe byte. Calculatestheeffectiveaddressoffsetofadataelementandloads thevalueintoaregister Readsafulladdresspointerstoredinmemoryasa32-bitdouble wordandstoresthesegmentportioninDSandtheoffsetportion inaregister SameasLDSinstructionwiththeexceptionthattheESregisteris loadedwiththesegmentportionofthememorypointerrather thanDS CopiestheFlagportionoftheProcessorStatusRegistertotheAH register CopiesthecontentsoftheAHregistertotheFlagportionofthe ProcessorStatusRegister Createsstoragespaceonthestackandcopiestheflagportionof theProcessorStatusRegisterontothestack. CopiesthedatafromthetopofthestacktotheFlagportionof theProcessorStatusRegisterandremovesthestoragespacefrom thestack.
LEA LDS LES
LoadEffective Address LoadDSwith Segmentand RegisterwithOffset LoadESwith Segmentand RegisterwithOffset
LAHF
LoadAHwithFlags
SAHF
StoreAHinFlags
PUSHF
PushFlagsonto stack
POPF
PopFlagsfrom stack
Reviewingthesetofdatatransferinstructions,weseethatallhaveanalogsinthe68Kinstruction setexcepttheXLAT,INandOUTinstructions.
ArithmeticInstructions Thefollowingtablesummarizesthe8086arithmeticinstructions, Mnemonic (©Intel)
ADD ADC INC
AAA DAA SUB SBB DEC NEG
Instruction
Description
Add
Addtwointegersorunsignednumbers Addtwointegersorunsignednumbersandthecontentsofthe Addwithcarry CarryFlag Increment Incrementthecontentsofaregisterormemoryvalueby1 Convertsanunsignedbinarynumberthatisthesumoftwo ASCIIadjustAL unpackedbinarycodeddecimalnumberstotheunpackdecimal afteraddition equivalent. Convertsan8-bitunsignedvaluethatistheresultofadditionto DecimalAddAdjust thecorrectbinarycodeddecimalequivalent Subtract Subtracttwointegersorunsignednumbers Subtractwith Subtractsanintegeroranunsignednumberandthecontentsof borrow theCarryFlagfromanumberofthesametype Decrement Decrementthecontentsofaregisterormemorylocationby1 Replacesthecontentsofaregisterormemorywithitstwo’s Negate complement.
283
Chapter10 Mnemonic (©Intel)
CMP AAS
DAS MUL IMUL AAM DIV IDIV AAD CWD CBW
Instruction
Description
Subtractstwointegersorunsignednumbersandsetstheflags accordinglybutdoesnotsavetheresultofthesubtraction.Neitherthesourcenorthedestinationoperandischanged Convertsanunsignedbinarynumberthatisthedifferenceoftwo ASCIIadjustafter unpackedbinarycodeddecimalnumberstotheunpackeddecimal subtraction equivalent. Convertsan8-bitunsignedvaluethatistheresultofsubtraction Decimaladjust tothecorrectbinarycodeddecimalequivalent aftersubtraction Compare
Multiply IntegerMultiply
Multipliestwounsignednumbers Multipliestwosignednumbers Convertsanunsignedbinarynumberthatistheproductoftwo ASCIIadjustfor unpackedbinarycodeddecimalnumberstotheunpackeddecimal Multiply equivalent. Divide Dividestwounsignednumbers Integerdivide Dividestwosignednumbers Convertsanunsignedbinarynumberthatisthedivisionoftwo ASCIIadjustafter unpackedbinarycodeddecimalnumberstotheunpackeddecimal division equivalent. Convertwordto Convertsa16-bitintegertoasignextended32-bitinteger double Convertbyteto Convertsan8-bitintegertosignextended16-bitinteger word
Alloftheaboveinstructionshaveanalogsinthe68Karchitecturewiththeexceptionofthe4 ASCIIadjustinstructionswhichareusedtoconvertBCDnumberstoaformthatmaybereadily convertedtoASCIIequivalents.
LogicInstructions Thefollowingtablesummarizesthe8086logicinstructions, Mnemonic (©Intel)
NOT SHL SAL SHR SAR ROL
Instruction
Description
Invert Logicalshiftleft
One’scomplementnegationofregisterormemoryoperand SHLandSALshifttheoperandbitstotheleft,fillingtheshifted bitpositionswith0’s.ThehighorderbitsareshiftedintotheCF Arithmeticshiftleft bitposition Shiftsbitsofanoperandtotheright,fillingthevacantbitposiLogicalshiftright tionswith0’stheloworderbitisshiftedintotheCFposition Shiftsbitsofanoperandtotheright,fillingthevacantbitposiArithmeticshift tionswiththeoriginalvalueofthehighestbit.Theloworderbitis right shiftedintotheCFposition Rotatesthebitsoftheoperandtotheleftandplacingthehigh Rotateleft orderbitthatisshiftedoutintotheCFpositionandloworderbit position. 284
TheIntelx86Architecture Mnemonic (©Intel)
ROR RCL RCR AND TEST OR XOR
Instruction
Description
Rotatesthebitsoftheoperandtotherightandplacingthelow orderbitthatisshiftedoutintotheCFpositionandthehighorder position. Rotatethrough Rotatesthebitsoftheoperandtotheleftandplacingthehigh CarryFlagtothe orderbitthatisshiftedoutintotheCFpositionandthecontents left oftheCFintotheloworderbitposition Rotatethrough Rotatesthebitsoftheoperandtotherightandplacingthelow CarryFlagtothe orderbitthatisshiftedoutintotheCFpositionandthecontents right oftheCFintothehighorderbitposition BitwiseAND ComputesthebitwiselogicalANDofthetwooperands. Determineswhetherparticularbitsofanoperandaresetto1. Logicalcompare Resultsarenotsavedandonlyflagsareaffected. ComputesthebitwiselogicalORofthetwooperands. BitwiseOR ExclusiveOR ComputesthebitwiselogicalexclusiveORofthetwooperands. Rotateright
Asyoucansee,thefamilyoflogicalinstructionsisprettyclosetothoseofthe68Kfamily.
StringManipulation Thisfamilyofinstructionshasnodirectanaloginthe68Karchitectureandprovidesapowerful groupofcompactstringmanipulationoperations.Thefollowingtablesummarizesthe8086string manipulationinstructions: Mnemonic (©Intel)
REP
MOVS CMPS SCAS LODS STOS
Instruction
Description
Repeat Moveastring component Compareastring component Scanastringfor component Loadstring component Storethestring component
Repeatedlyexecutesasinglestringinstruction Copyabyteelementofastring(MOVSB)orwordelementofa string(MOVESW)fromonelocationtoanother. Comparesthebyte(CMPSB)orword(CMPSW)elementofone stringtothebyteorwordelementofasecondstring Comparesthebyte(SCASB)orword(SCASW)elementofastring toavalueinaregister Copiesthebyte(LODSB)orword(LODSW)elementofastringto theAL(byte)orAX(word)register. Copiesthebyte(STOSB)orword(STOSW)intheAL(byte)orAX (word)registertotheelementofastring
TheREPEATinstructionhasseveralvariantformsthatarenotlistedintheabovetable.Likethe 68KDBccinstructiontherepeatingoftheinstructioniscontingentuponavalueofoneoftheflag registerbits.ItisclearhowthissetofinstructionscaneasilybeusedtoimplementthestringmanipulationinstructionsoftheCandC++libraries. TheREPEATinstructionisuniqueinthatitisnotusedasaseparateopcode,butratheritisplaced infrontoftheotherstringinstructionthatyouwishtorepeat.Thus, REPMOVSB
wouldcausetheMOVSBinstructiontoberepeatedthenumberoftimesstoredintheCXregister. 285
Chapter10 Also,theMOVSinstructionautomaticallyadvancesordecrementstheDIandSIregisters’contents,so,inordertodoastringcopyoperation,youwoulddothefollowingsteps: 1. InitializetheDirectionFlag,DF, 2. Initialthecountingregister,CX, 3. Initializethesourceindexregister,SI, 4. Initializethedestinationindexregister,DI, 5. ExecutetheREPMOVSBinstruction. Comparethiswiththeequivalentoperationforthe68Kinstructionset.Thereisnodirectionflagto set,butsteps2through4wouldneedtobeexecutedinananalogousmanner.TheautoincrementingorautodecrementingaddressmodewouldbeusedwiththeMOVEinstruction,soyouwould mimictheMOVSBinstructionwith: MOVE.B(A0)+,(A1)+ Thedifferenceisthatyouwouldneedtohaveanadditionalinstruction,suchasDBcc,tobeyour loopcontroller.Doesthismeanthatthe8086wouldoutperformthe68Kinsuchastringcopyoperation?That’shardtosaywithoutsomein-depthanalysis.BecausetheREPMOVSBinstruction isarathercomplexinstruction,itisreasonabletoassumethatitmighttakemoreclockcyclesto executethenasimplerinstruction.The186EM4processorfromAMDtakes8+8*nclockcycles toexecutetheinstruction.Here‘n’isthenumberoftimestheinstructionisrepeated.Thus,the instructioncouldtakeaminimum808clockcyclestocopy100bytesbetweentwomemorylocations.However,inordertoexecutethe68KMOVEandDBccinstructionpair,wewouldalsohave tofetcheachinstructionfrommemoryoveranoveragain,sotheoverheadofthememoryfetch operationwouldbeasignificantpartofthecomparison.
ControlTransfer Mnemonic (©Intel)
CALL JMP RET JE JZ JL JNGE
Instruction
Description
Suspendsexecutionofthecurrentinstructionsequence,savesthe segment(ifnecessary)andoffsetofthenextinstructionandtransfersexecutiontotheinstructionpointedtobytheoperand. StopsexecutionofthecurrentsequenceofinstructionsandtransUnconditionaljump fercontroltotheinstructionpointedtobytheoperand. UsedinconjunctionwiththeCALLinstructiontheRETinstruction Returnfrom restoresthecontentsoftheIPregisterandmayalsorestorethe procedure contentsoftheCSregister. Jumpifequal IftheZeroFlag(ZF)issetcontrolwillbetransferredtotheaddress oftheinstructionpointedtobytheoperand.IfZFiscleared,the Jumpifzero instructionisignored. Jumponlessthan IftheSignFlag(SF)andOverflowFlag(OF)arenotthesame, thencontrolwillbetransferredtotheaddressoftheinstruction Jumponnot pointedtobytheoperand.Iftheyarethesametheinstructionis greaterofequal ignored. Callprocedure
286
TheIntelx86Architecture Mnemonic (©Intel)
JB
JNAE JC JBE JNA JP JPE JO JS JNE JNZ JNL JGE JNLE JG JNB JAE JNC JNBE JA JNP JO JNO JNS
Instruction
Description
Jumponbelow IftheCarryFlag(CF)issetcontrolwillbetransferredtotheadJumponnotabove dressoftheinstructionpointedtobytheoperand.IfCFiscleared, orequal theinstructionisignored. Jumponcarry Jumponbelowor IftheCarryFlag(CF)ortheZeroFlag(ZF)issetcontrolwillbe equal transferredtotheaddressoftheinstructionpointedtobythe operand.IfCFandtheCFarebothcleared,theinstructionis Jumponnotabove ignored. Jumponparity IftheParityFlag(PF)issetcontrolwillbetransferredtotheadJumponparity dressoftheinstructionpointedtobytheoperand.IfPFiscleared, even theinstructionisignored. IftheOverflowFlag(OF)issetcontrolwillbetransferredtothe Jumponoverflow addressoftheinstructionpointedtobytheoperand.IfOFis cleared,theinstructionisignored. IftheSignFlag(SF)issetcontrolwillbetransferredtotheaddress Jumponsign oftheinstructionpointedtobytheoperand.IfSFiscleared,the instructionisignored. Jumponnotequal IftheZeroFlag(ZF)isclearedcontrolwillbetransferredtotheaddressoftheinstructionpointedtobytheoperand.IfZFisset,the Jumponnotzero instructionisignored. Jumponnotless IftheSignFlag(SF)andOverflowFlag(OF)arethesame,then controlwillbetransferredtotheaddressoftheinstruction Jumpongreateror pointedtobytheoperand.Iftheyarenotthesame,theinstrucequal tionisignored. Jumponnotless IfthelogicalexpressionZF*(SFXOROF)evaluatestoTRUE thencontrolwillbetransferredtotheaddressoftheinstruction thanorequal Jumpongreater pointedtobytheoperand.IftheexpressionevaluatestoFALSE, than theinstructionisignored. Jumponnotbelow IftheCarryFlag(CF)isclearedcontrolwillbetransferredtothe Jumponaboveor addressoftheinstructionpointedtobytheoperand.IfCFisset, equal theinstructionisignored Jumponnotbelow IftheCarryFlag(CF)ortheZeroFlag(ZF)arebothclearedcontrol orequal willbetransferredtotheaddressoftheinstructionpointedtoby Jumponabove theoperand.Ifeitherflagisset,theinstructionisignored. Jumponnotparity IftheParityFlag(PF)iscleared,controlwillbetransferredtothe addressoftheinstructionpointedtobytheoperand.IfPFisset, Jumponoddparity theinstructionisignored. IftheOverflowFlag(OF)isclearedcontrolwillbetransferredto Jumponnot theaddressoftheinstructionpointedtobytheoperand.IfOFis overflow set,theinstructionisignored. IftheSignFlag(SF)isclearedcontrolwillbetransferredtothe Jumponnotsign addressoftheinstructionpointedtobytheoperand.IfSFisset, theinstructionisignored.
287
Chapter10 Mnemonic (©Intel)
LOOP LOOPZ LOOPE LOOPNZ LOOPNE JCXZ
INT IRET
Instruction
Description
LoopwhiletheCX Repeatedlyexecuteasequenceofinstructions.Thenumberof registerisnotzero timestheloopisrepeatedisstoredintheCXregister. Loopwhilezero Repeatedlyexecuteasequenceofinstructions.Themaximum numberoftimestheloopisrepeatedisstoredintheCXregister. Loopwhileequal TheloopisterminatedbeforethecountinCXreacheszeroifthe ZeroFlag(ZF)isset. Loopwhilenotzero Repeatedlyexecuteasequenceofinstructions.Themaximum numberoftimestheloopisrepeatedisstoredintheCXregister. Loopwhilenot TheloopisterminatedbeforethecountinCXreacheszeroifthe equal ZeroFlag(ZF)iscleared. Ifthepreviousinstructionleaves0intheCXregister,thencontrol JumponCXzero istransferredtotheaddressoftheinstructionpointedtobythe operand. ThecurrentinstructionsequenceissuspendedandtheProcessorStatusFlags,theInstructionPointer(IP)registerandtheCS Generateinterrupt registerarepushedontothestack.Instructioncontinuesatthe memoryaddressstoredinappropriateinterruptvectorlocation. Returnfrom RestoresthecontentsoftheFlagsregister,theIPandtheCS interrupt register.
Althoughthelistofpossibleconditionaljumpsislongandimpressive,youshouldnotethatmostof themnemonicsaresynonymsandtestthesamestatusflagconditions.Also,thesetofJUMP-type instructionsneedsfurtherexplanationbecauseofthesegmentationmethodofmemoryaddressing. Jumpscanbeoftwotypes,dependinguponhowfarawaythedestinationofthejumpresidesfrom thepresentlocationofthejumpinstruction.Ifyouarejumpingtoanotherlocationinthesame regionofmemorypointedtobythecurrentvalueoftheCSregisterthenyouareexecutingan intrasegmentjump.Conversely,ifthedestinationofthejumpisbeyondthespanoftheCSpointer, thenyouareexecutinganintersegmentjump.IntersegmentjumpsrequirethattheCSregisteris alsomodifiedtoenablethejumptocovertheentirerangeofphysicalmemory. Theoperandofthejumpmaytakeseveralforms.Thefollowingareoperandsofthejumpinstruction: • Short-label:An8-bitdisplacementvalue.Theaddressoftheinstructionidentifiedby thelabeliswithinthespanofasigned8-bitdisplacementfromtheaddressofthejump instructionitself. • Nearlabel:A16-bitdisplacementvalue.Theaddressoftheinstructionidentifiedbythe labeliswithinthespanofthecurrentcodesegment.Thevalueof • Memptr16orRegptr16:A16-bitoffsetvaluestoredinamemorylocationoraregister. ThevaluestoredinthememorylocationortheregisteriscopiedintotheIPregisterand formstheoffsetportionofthenextinstructiontobefetchedfrommemory.Thevaluein theCSregisterisunchanged,sothistypeofanoperandcanonlyleadtoajumpwithinthe currentcodesegment. • Far-labelorMemptr32:Theaddressofthejumpoperandisa32-bitimmediatevalue. Thefirst16-bitsareloadedintotheoffsetportionoftheIPregister.Thesecond16-bitsare loadedintotheCSregister.Thememptr32operandmayalsobeusedtospecifyadouble 288
TheIntelx86Architecture wordlengthindirectjump.Thatis,thetwosuccessive16-bitmemorylocationsspecified bymemptr32containtheIPandCSvaluesforthejumpaddress.Also,certainregister pairs,suchasDSandDXmaybepairedtoprovidetheCSandIPvaluesforthejump. Thetypeofjumpinstructionthatyouneedtouseisnormallyhandledbytheassembler,unlessyou overridethedefaultvalueswithassemblerdirectives.Welldiscussthispointlateronthischapter.
AssemblyLanguageProgrammingthe8086Architecture Theprinciplesofassemblylanguageprogrammingthatwe’vecoveredinthepreviouschapters arenodifferentthenthoseofthe68Kfamily.However,whiletheprinciplesmaybethesame,the implementationmethodsaresomewhatdifferentbecause: 1. the8086issodeeplylinkedtothearchitectureofthePCanditsoperatingsystems, 2. thesegmentedmemoryarchitecturerequiresthatwedeclarethetypeofprogramthatwe intendtowriteandspecifyamemorymodelfortheopcodesthattheassemblerisgoingto generate. Inordertowriteanexecutableassemblylanguageprogramforthe8086processorthatwillrun nativelyonyourPCyoumust,ataminimum,followtherulesofMSDOS®.Thisrequiresthatyou donotpresetthevaluesofthecodesegmentregisterbecauseitwillbeuptotheoperatingsystemto initializethisregistervaluewhenitloadstheprogramintomemory.Thus,inthe68Kenvironment whenwewanttowriterelocatablecode,wewouldusethePCoraddressregisterrelativeaddressingmodes.Here,weallowtheoperatingsystemtospecifytheinitialvalueoftheCSregister. AssemblerssuchasBorland’sTurboAssembler(TASM®)andMicrosoft’sMASM®assembler handlemanyofthesehousekeepingtasksforyou.So,aslongasyoufollowtherulesyoumaystill beabletowriteassemblylanguageprogramsthatarewell-behaved.Certainlytheseprogramscan runonanymachinethatisstillrunningthe16-bitcompatibleversionsofthevariousPCoperating systems.Thenewer,32-bitversionsaremoreproblematicbecausetherunolderDOSprogramsin anemulationmodewhichmayormaynotrecognizetheolderBIOScalls.However,mostsimple assemblylanguageprogramswhichdosimpleconsoleI/OshouldrunwithoutdifficultyinaDOS window.BeingatrueLuddite,Istillhavemytrusty486machinerunninggoodoldMS-DOS,even thoughI’mwritingthistextonasystemwithWindowsXP. Let’sfirstlookattheissueofthesegmentdirectivesandmemorymodels.Ingeneral,itisnecessarytoexplicitlyidentifytheportionsofyourprogramthatwilldealwiththecode,thedataand thestack.Thisissimilartowhatyou’vealreadyseen.Weusethedirectives: • .code • .stack • .data todenotethelocationsofthesesegmentsinyourcode(notethatthedirectivesareprecededbya period).Forexample,ifyouusethedirective: .stack 100h
youarereserving256bytesofstackspaceforthisprogram.Youdonothavetospecifywherethe stackitselfislocatedbecausetheoperatingsystemismanagingthatforyouandtheoperating systemisalreadyupandrunningwhenitisloadingthisprogram. 289
Chapter10 The.datadirectiveidentifiesthedataspaceofyourprogram.Forexample,youmighthavethe followingvariablesinyourprogram: .data var16 dw var8 db initMsg db
0AAAAh 55h ‘HelloWorld’,0Ah,0Dh
Thisdataspacedeclaresthreevariables,var16,var8andinitMsgandinitializesthem.Inorderfor youtousethisdataspaceinyourprogramyoumustinitializetheDSsegmentregistertoaddress ofthedatasegment.Butsinceyoudon’tknowwherethisis,youdoitindirectly: MOVAX,@data MOV DS,AX
;Addressofdatasegment
Here,@dataisareservedwordthatcausestheassemblertocalculatethecorrectDSsegmentvalue. The.codedirectiveidentifiesthebeginningofyourcodesegment.TheCSregisterwillinitialized topointtothebeginningofthissegmentwhenevertheprogramisloadedintomemory. Inadditiontoidentifyingwhereinmemorythevariousprogramsegmentswillresideyouneedto providetheassembler(andtheoperatingsystemwithsomeideaofthetypeofaddressingthatwill berequiredandtheamountofmemoryresourcesthatyourprogramwillneed.Youdothiswith the.modeldirective.Justaswe’veseenwiththedifferenttypesofpointersneededtoexecute anintrasegmentjumpandanintersegmentjump,specifyingthemodelindicatesthesizeofyour programanddataspacerequirements.Theavailablememorymodelsare3: • Tiny:Bothprogramcodeanddatafitwithinthesame64Ksegment.Also,bothcodeand dataaredefinedasnear,whichmeansthattheyarebranchedtobyreloadingtheIPregister. • Small:Programcodefitsentirelywithinasingle64Ksegmentandthedatafitsentirely withinaseparate64Ksegment.Bothcodeanddataarenear. • Medium:Programcodemaybelargerthan64Kbutprogramdatamustbesmallenough tofitwithinasingle64Ksegment.Codeisdefinedasfar,whichmeansthatbothsegment andoffsetmustbespecifiedwhiledataaccessesareallnear. • Compact:Programcodefitswithinasingle64Ksegmentbutthesizeofthedatamay exceed64K,withnosingledataelement,suchasanarray,beinglargerthan64K.Code accessesarenearanddataaccessesarefar. • Large:Bothcodeanddataspacesmaybelargerthan64K.However,nosingledataarray maybelargerthan64K.Alldataandcodeaccessesarefar. • Huge:Bothcodeanddataspacesmaybelargerthan64Kanddataarraysmaybelarger than64K.Faraddressingmodesareusedforallcode,dataandarraypointers. Theuseofmemorymodelsisimportantbecausetheyareconsistentwiththememorymodelsused bycompilersforthePC.Itguaranteesthatanassemblylanguagemodulethatwillbelinkedin withmoduleswritteninahighlevellanguagewillbecompatiblewitheachother. Let’sexamineasimpleprogramthatcouldrunoninaDOSemulationwindowonyourPC.
.MODEL .STACK
small 100h
290
TheIntelx86Architecture .DATA PrnStrg db .CODE Start: mov ax,@data mov ds,ax mov dx,OFFSET mov ah,09 int 21h mov ah,4Ch int21h END Start
‘HelloWorld$’
;Stringtoprint
;setdatasegment ;initializedatasegmentregister PrnStrg;Loaddxwithoffsettodata ;DOScalltoprintstring ;callDOStoprintstring ;preparetoexit ;quitandreturntoDOS
Asyoumightguess,thisprogramrepresentsthefirstbabystepsof8086assemblylanguageprogramming.Youshouldbeallteary-eyed,recallingtheveryfirstC++programthatyouactuallygot tocompileandrun. Weareusingthe‘small’memorymodel,althoughthe‘tiny’modelwouldworkjustaswell.We’ve reserved256bytesforthestackspace,butitisdifficulttosayifwe’veusedanystackspaceatall, sincewedidn’tmakeanysubroutinecalls. Thedataspaceisdefinedwiththe.datadirectiveandwedefineabytestring,“HelloWorld$”.The ‘$’isusedtotellDOStoterminatethestringprinting.Borland7suggeststhatinstructionlabels beonlinesbythemselvesbecauseitiseasiertoidentifyalabelandifaninstructionneedstobe addedafterthelabelitismarginallyeasiertodo.However,thelabelmayappearonthesameline astheinstructionthatitreferences.Labelswhichreferenceinstructionsmustbeterminatedwith acolonandlabelswhichreferencedataobjectsdonothavecolons.Colonsarenotusedwhenthe labelisthetargetinaprogram,suchasaforalooporjumpinstruction. Thereservedword,offset,isusedtoinstructtheassemblertocalculatetheoffsetfromtheinstructiontothelabel,‘PrnStrg’andplacethevalueintheDXregister.Thiscompletesthecodethatis necessarytocompletelyspecifythesegmentandoffsetofthedatastringtoprint.Oncewehave establishedthepointertothestring,wecanloadtheAHregisterwiththeDOSfunctioncallto printastring,09.Thecallismadeviaasoftwareinterrupt,INT21h,whichhasthesamefunction astheTRAP#15instructiondidforthe68Ksimulator. TheprogramisterminatedbyaDOSterminationcall(INT21hwithAH=4Ch)andtheEND reservedwordtellstheassemblertostopassembling.ThelabelfollowingtheENDdirectivetells theassemblerwhereprogramexecutionistobegin.Thiscanbedifferentfromthebeginningofthe codesegmentandisusefulifyouwanttoentertheprogramatsomeplaceotherthanthebeginningofthecodesegment.
SystemVectors Likethe68K,thefirst1Kmemoryaddressesarereservedforthesysteminterruptvectorsand exceptions.Inthe8086architecture,theinterruptnumberisan8-bitunsignedvaluefrom0to255. Theinterruptoperandisshiftedleft2times(multipliedby4)toobtaintheaddressofthepointer totheinterrupthandlercode.Thus,theINT21hinstructionwouldcausetheprocessortovector throughmemorylocation00084htopick-upthe4bytesofthesegmentandoffsetfortheoperating 291
Chapter10 systementrypoint.Inthiscase,theIPoffsetwouldbelocatedatwordaddress00084handtheCS pointerwouldbelocatedatwordaddress00086h.Thefunctioncode,09intheAHregistercause DOStoprintthestringpointedtobyDS:DX.
SystemStartup An8086-basedsystemcomesoutofRESETwithalltheregisterssetequaltozerowiththe exceptionoftheCSregister,whichissetequalto0FFFFh.Thus,thephysicaladdressofthefirst instructionfetchwouldbe0FFFFh:0000,or0FFFF0h.Thisisanaddresslocated16-bytesfrom thetopofphysicalmemory.Thus,an8086systemusuallyhasnonvolatilememorylocatedinhigh memorysothatitcontainsthebootcodewhentheprocessorcomesoutofRESET.Once,outof RESET,the16bytesisenoughtoexecuteafewinstructions,includingajumptothebeginningof theactualinitializationcode.OnceintothebeginningoftheROMcode,thesystemwillusually initializetheinterruptvectorsinlowmemorybywritingtheirvaluestoRAM,whichoccupiesthe lowmemoryportionoftheaddressspace.
Wrap-Up Youmayeitherbeoverjoyedordisappointedthatthischapteriscomingtoanend.Afterall,we dissectedthe68Kinstructionsetandlookedatnumerousassemblylanguageprogrammingexamples.Inthischapterwelookedatnumerouscodefragmentsandonlyone,rathertrivial,program. Whatgives? Earlierinthetextwewerebothlearningthefundamentalsofassemblylanguageprogramming andlearningacomputer’sarchitectureatthesametime.Thearchitectureofthe68Kfamilyitself isfairlystraight-forwardandallowsustofocusonbasicprinciplesofaddressingandalgorithms. The8086architectureisamorechallengingarchitecturetoabsorb,sowedelayeditsintroduction untillaterinthetext.Nowthatyou’vebeenexposedtothegeneralmethodsofassemblylanguage programming,wecouldfocusoureffortsonmasteringtheintricaciesofthe8086architecture itself.Anyway,that’sthetheory. Inthenextchapterwe’llexamineathirdarchitecturethat,onceagain,youmayfindeithervery refreshingorveryfrustrating,toworkwith.Frustratingbecauseyoudon’thaveallthepowerfulinstructionsandaddressingmodestoworkwiththatyouhavewiththe8086architecture;and refreshingbecauseyoudon’thaveallofthepowerfulandcomplexinstructionsandaddressing modestomaster.
SummaryofChapter10 Chapter10covered: • Thebasicarchitectureofthe8086and8088microprocessors • 8086memorymodelsandaddressing • Theinstructionsetarchitectureofthe8086family • Thebasicsof8086assemblylanguageprogramming
292
TheIntelx86Architecture Chapter10:Endnotes 1 DanielTabak,AdvancedMicroprocessors,SecondEdition,ISBN0-07-062843-2,McGraw-Hill,NY,1995,p.186. AdvancedMicroDevices,Inc,Am186™ESandAm188™ESUser’sManual,1997,p.2-2.
2
Borland,TurboAssembler2.0User’sGuide,BorlandInternational,Inc.ScottsValley,1988.
3
AdvancedMicroDevices,Inc,Am186™ESandAm188™ESInstructionSetManual,1997.
4
WalterA.TriebelandAvatarSingh,The8088and8086Microprocessors,ThirdEdition,ISBN0-13-010560-0, Prentice-Hall,UpperSaddleRiver,NJ,2000.Chapters5and6.
5
IntelCorporation,808616-BitHMOSMicroprocessor,DataSheetNumber231455-005,September1990,pp.25–29.
6
Borland,opcit,p.83.
7
293
ExercisesforChapter10 1. Thecontentsofmemorylocation0C0020h=0C7handthecontentsofmemorylocation 0C0021h=15h.Whatisthewordstoredat0C0020h?Isitalignedornonaligned? 2. Assumethatyouhaveapointer(segment:offset)storedinmemoryatbyteaddresses0A3004h through0A3007hasfollows: =00 =10h =0C3h =50h Expressthispointerintermsofsegment:offsetvalue. 3. Whatwouldtheoffsetvaluebeforthephysicalmemoryaddress0A257Chifthecontentsof thesegmentregisteris0A300h? 4. Convertthefollowingassemblylanguageinstructionstotheirobjectcodeequivalents: MOV AX,DX MOVBX[SI],BX MOVDX,0A34h 5. Writeasimplecodesnippetthat: a. Loadsthevalue10intotheBXregisterandthevalue4intotheCXregister, b. executesaloopthatincrementsBXby1anddecrementsCXuntilthe=00 6. 7. 8. 9.
LoadregisterAXwiththevalue0AA55handthenswapthebytesintheregister. WhatisarethecontentsoftheAXregisterafterthefollowingtwoinstructionsareexecuted? MOV AX,0AFF6h ADD AL,47h SupposeyouwanttoperformthemathematicaloperationX=Y*Z,where: Xisa32-bitunsignedvariablelocatedatoffsetaddress200h, Yisa16-bitunsignedvariablelocatedataddress204h, Zisa16-bitunsignedvariablelocatedataddress206h, writean8086assemblylanguagecodesnippetthatperformsthisoperation. Writeaprogramsnippetthatmoves1000bytesofdatabeginningataddress82000Hto address82200H. 10. Modifytheprogramofproblem9sothattheprogrammoves1000bytesofdatafrom82000H toC4000H. 294
11
CHAPTER
TheARMArchitecture
Objectives
Whenyouarefinishedwiththislesson,youwillbeableto: DescribetheprocessorarchitectureoftheARMfamily; DescribethebasicinstructionsetarchitectureoftheARM7processors; DescribethedifferencesandsimilaritiesbetweentheARMarchitectureandthe68000 architecture; WritesimplecodesnippetsinARMassemblylanguageusingalladdressingmodesand instructionsofthearchitecture.
Introduction We’regoingtoturnourattentionawayfromthe68Kand8086architecturesandheadinanew direction.Youmayfindthischangeofdirectiontobequiterefreshingbecausewe’regoingtolook atanarchitecturethatmaybecharacterizedbyhowitremovedallbutthemostessentialinstructionsandaddressingmodes.WecallacomputerthatisbuiltaroundthisarchitectureaRISC computer,whereRISCisanacronymforReducedInstructionSetComputer.The68Kandthe 8086processorsarecharacteristicofanarchitecturecalledComplexInstructionSetComputer,or CISC.We’llcomparethetwoarchitecturesinalaterchapter.Fornow,let’sjustmarchonwardand learntheARMarchitectureasifweneverheardofCISCandRISC. In1999theARM32-bitarchitecturefinallyovertooktheMotorola68Karchitectureintermsof popularity1.The68Karchitecturehaddominatedtheembeddedsystemsworldsinceitwasfirst invented,buttheARMarchitecturehasemergedastoday’smostpopular,32-bitembeddedprocessor.Also,ARMprocessorstodayoutselltheIntelPentiumfamilybya3to1margin2.Thus,you’ve justseenmyrationaleforteachingthese3microprocessorarchitectures. IfyouhappentobeinAustin,TexasyoucouldheadtothesouthsideoftownandvisitAMD’s impressivesiliconfoundry,FAB25.Inthismodern,multibilliondollarfactory,siliconwafersare convertedtoAthlonmicroprocessors.Ashortdistanceaway,Freescale’s(Motorola)FAB(fabrication facility)cranksoutPowerPC®processors.IntelbuildsitsprocessorsinFABsinChandler,Arizona andSanJose,CA,aswellasothersitesworldwide.Where’sARM’sFABlocated?Don’tfret,thisis atrickquestion.ARMdoesn’thaveaFAB.ARMisaFABlessmanufacturerofmicroprocessors. ARMHoldingsplcwasfoundedin1990asAdvancedRISCMachinesLtd.3Itwasbasedinthe UnitedKingdomasajointventurebetweenAcornComputerGroup,AppleandVLSITechnology. 295
Chapter11 ARMdoesnotmanufacturechipsinitsownright.Itlicensesitschipdesignstopartnerssuchas VLSITechnology,TexasInstruments,Sharp,GECPlesseyandCirruslogicwhoincorporatethe ARMprocessorsincustomdevicesthattheymanufactureandsell.ItwasARMthatcreatedthis modelofsellingIntellectualProperty,orIP,ratherthanasiliconchipmountedinapackage.In thatsenseitisnodifferentthenbuyingsoftware,which,infact,itis.Acustomerwhowantsto buildasystem-on-silicon,suchasaPDA/Cellphone/Camera/MP3playerwouldcontractwithVLSI technologytobuildthephysicalpart.VLSI,asanARMlicensee,offersthecustomeranencrypted libraryinahardwaredescriptionlanguage,suchasVerilog.TogetherwithotherIPthatthecustomer maylicense,andIPthattheydesignthemselves,aVerilogdescriptionofthechipiscreatedthat VLSIcanusetocreatethephysicalpart.Thus,youcanseewiththeemergenceofcompanieslike ARM,theintegratedcircuitdesignmodelpredictedbyMeadandConwayhascometrue. Today,ARMoffersarangeofprocessordesignsforawiderangeofapplications.Justaswe’ve donewiththe68Kandthe8086,we’regoingtofocusoureffortsonthebasic32-bitARMarchitecturethatiscommontomostoftheproductsinthefamily. WhenwetalkabouttheARMprocessor,we’lloftendiscussitintermsofacore.Thecoreisthe naked,mostbasicportionofamicroprocessor.Whensystems-on-silicon(orsystems-on-chip)are designed,oneormoremicroprocessorcoresarecombinedwithperipheralcomponentstocreatean entiresystemdesignonasinglesilicondie.SimplerformsofSOCshavebeenaroundforquitea while.Todaywecallthesecommerciallyavailablepartsmicrocontrollers.BothIntelandMotorola pioneeredthecreationofmicrocontrollers.Historically,asemiconductorcompany,suchasMotorola,woulddevelopanewmicroprocessor,suchasthe68Kfamilyandsellthenewpartataprice premiumtothosecustomerswhoweredoingtheleadingedgedesignsandwerewillingtopaya pricepremiumtogetthelatestprocessorwiththebestperformance. Astheprocessorgainedwideracceptanceandthesemiconductorcompanyrefinedtheirfabricationprocesses,they(thecompanies)wouldoftenlifttheCPUcoreandplaceitinanotherdevice thatincludedotherperipheraldevicessuchastimers,memorycontrollers,serialportsandparallelports.Thesepartsbecameextremelypopularinmorecost-consciousapplicationsbecauseof theirhigherlevelofintegration.However,apotentialcustomerwaslimitedtobuyingtheparticular partsfromthevendor’sinventory.Ifthecustomerwantedavariantpart,theyeitherboughtthe closestparttheycouldfindandthenplacedtheadditionalcircuitryonaprintedcircuitboard,or workedwiththevendortodesignacustommicrocontrollerfortheirproducts. Thus,wecantalkaboutMotorolaMicrocontrollersthatusetheoriginal68Kcore,called CPU16,suchasthe68302,ormoreadvancedmicrocontrollersthatusethefull32-bit68Kcore (CPU32)indevicessuchasthe68360.The80186processorfromIntelusesthe8086coreandthe SC520(Elan)fromAMDusesthe486coretobuildanentirePConasinglechip.Theoriginal PalmPilot®usedaMotorola68328microcontroller(codenameDragonball)asitsengine.Thus, sincealloftheARMprocessorsarethemselvescores,tobedesignedintosystems-on-chip,we’ll continuetousethatterminology.
ARMArchitecture Figure11.1isasimplifiedschematicdiagramoftheARMcorearchitecture.Dataandinstructions comeinandareroutedtoeithertheinstructiondecoderortooneofgeneralpurposeregisters, 296
TheARMArchitecture labeledr0–r15.Unlikethesingle bi-directionaldatabustomemoryofthe 68Kand8086processors,thevarious ARMimplementationsmaybedesigned withseparatedatabussesandaddress bussesgoingtoinstructionmemory (codespace)anddatamemory.This typeofimplementation,withseparate dataandinstructionmemoriesiscalleda HarvardArchitecture.
Instruction Decoder
DATA
Sign Extender Data Write
Data Read Rd
Register Files r0 – r15
Program Counter r15
Rn A
Rm B Barrel Shifter
Acc B
A
Alldatamanipulationstakeplaceinthe Multiply Accumulate registerfile,agroupof16,32-bitwide, (MAC) ALU general-purposeregisters.Thisisavery Result deferentconceptfromthededicated Rd ADDRESS Address address,dataandindexregistersthat Register we’vepreviouslydealtwith.Although Incrementer someoftheregistersdohavespecific Figure11.1:TheARMarchitecture. functions,therestoftheregistersmay beusedassourceoperands,destination operandsormemorypointers.8-bitand16-bitwidedatacomingfrommemorytotheregistersis automaticallyconvertedtoasignextended32-bitnumberbeforebeingstoredinaregister.These registersaresaidtobeorthogonal,becausetheyarecompletelyinterchangeablewithrespectto addressordatastorage. Allarithmeticandlogicaloperationstakeplacebetweenregisters.Instructionslikethe68K’s mixedmemoryoperandandregisteraddition,shownbelow,arenotpermitted. ADD D5,$10AA *AddD5to$10AAandstoreresultin$10AA
Also,mostarithmeticandlogicaloperationsinvolve3operands.Thus,twooperands,Rnand Rm,aremanipulatedandtheresult,Rd,isreturnedtothedestinationregister.Forexample,the instruction: ADD r7,r1,r0
addstogetherthe32-bitsignedcontentsofregistersr0andr1andplacestheresultinr7.In general,3operandinstructionsareoftheform: opcode Rd,Rn,Rm
withtheRmoperandalsopassingthroughabarrelshifterunitbeforeenteringtheALU.This meansthatbitshiftsmaybeperformedontheRmoperandinoneassemblylanguageinstruction. InadditiontothestandardALUtheARMarchitecturealsoincludesadedicatedmultiplyaccumulate(MAC)unitwhichcaneitherdoastandardmultiplicationoftworegisters,or accumulatetheresultwithanotherregister.MAC-basedinstructionsareveryimportantinsignal processingapplicationsbecausetheMACoperationisfundamentaltonumericalintegration. ConsidertheexamplenumericalintegrationshowninFigure11.2. 297
Chapter11 b Inordertocalculatetheareaunderthe F(x)dx Y = curve,orsolveanintegralequation, a Accumulate wecanuseanumericalapproximation method.Eachsuccessivecalculationof Multiply F(x) b theareaunderthecurveinvolvescalY ≈ ∑ 1/2[ F(x) + F(x + ∆x) ] ∆x culatingtheareaofasmallrectangular x=a prismandsummingalloftheareas. a b x Ax Eachareacalculationisamultiplication Figure 11.2: A multiply-accumulate (MAC) unit can andthesummationoftheprismsisthe acceleratenumericalintegration. totalarea.TheMACunitdoesthemultiplicationandkeepstherunningsummationinoneoperation.
TheARMprocessorsuseaload/storearchitecture.Loadsaredatatransfersfrommemorytoaregisterintheregisterfileandstoresaredatatransfersfromtheregisterfiletothememory.Theload andstoreinstructionsusetheALUtocomputeaddressvaluesthatarestoredintheaddressregister fortransfertotheaddressbus,orbusses.Theincrementerisusedtoadvancetheaddressregister valueforsequentialmemoryloadorstoreoperations. Atanypointintime,anARMprocessormaybeinoneofsevenoperationalmodes.Themost basicmodeiscalledtheusermode.Theusermodehasthelowestprivilegeleveloftheseven modes.Whentheprocessorisinusermodeitisexecutingusercode.Inthismodethereare 18activeregisters;the1632-bitdataregistersand2,32-bitwidestatusregisters.Ofthe16data registers,r13,r14,r15areassignedtospecifictasks. • r13:Stackpointer.Thisregisterpointstothetopofthestackinthecurrentoperating mode.Undercertaincircumstances,thisregistermayalsobeusedasanothergeneral purposeregister,however,whenrunningunderanoperatingsystemthisregisterisusually assumedtobepointingtoavalidstackframe. • r14:Linkregister.Holdsthereturnaddresswhentheprocessortakesasubroutinebranch. Undercertainconditions,thisregistermayalsobeusedasageneralpurposeregister • r15:Programcounter:Holdstheaddressofthenextinstructiontobefetchedfrommemory. Duringoperationintheusermodethecurrentprogramstatusregister(cpsr)functionsasthestandardrepositoryforprogramflagsandprocessorstatus.Thecpsrisalsopartoftheregisterfileand is32-bitswide(althoughmanyofthebitsarenotusedinthebasicARMarchitecture). Thereareactuallytwoprogramstatus registers.Asecondprogramstatus register,calledthesavedprogram statusregister(spsr)isusedtostore thestateofthecpsrwhenamode changeoccurs.Thus,theARMarchitecturesavesthestatusregisterina speciallocationratherthanpushingit ontothestackwhenacontextswitch occurs.Theprogramstatusregisteris showninFigure11.3.
FLAGS
STATUS
EXTENSION
31 30 29 28 27 N Z C V
CONTROL
8 7 6 5 4 3 2 1 0
RESERVED
INTERRUPT MASKS
I F T
PROCESSOR MODE THUMB
Figure11.3:Statusregisterconfiguration. 298
BITS
TheARMArchitecture Theprocessorstatusregisterisdividedupinto4fields;Flags,Status,ExtensionandControl.The StatusandExtensionfieldsarenotimplementedinthebasicARMarchitectureandarereserved forfutureexpansion.TheFlagsfieldcontainsthefourstatusflags: • Nbit:Negativeflag.Setifbit31oftheresultisnegative. • Zbit:Zeroflag.Setiftheresultiszeroorequal. • Cbit:Carryflag.Setiftheresultcausesanunsignedcarry. • Vbit:Overflowflag.Setiftheresultcausesasignedoverflow. TheInterruptMaskbitsareusedtoenableordisableeitherofthetwotypesofinterruptrequests totheprocessor.Whenenabled,theprocessormayacceptnormalinterruptrequests(IRQ)orFast InterruptRequests(FIQ).Wheneitherbitissetto1thecorrespondingtypeofinterruptismasked, orblockedfromallowinganinterruptsourcefromstoppingtheprocessor’scurrentexecution threadandservicingtheinterrupt. TheThumbModeBithasnothingtodowithyourhand.Itisaspecialmodedesignedtoimprove thecodedensityofARMinstructionsbycompressingtheoriginal32-bitARMinstructionsetinto a16-bitform;thusachievinga2:1reductionintheprogrammemoryspaceneeded.Special on-chiphardwaredoesanon-the-flydecompressionoftheThumbinstructionsbacktothestandard 32-bitinstructionwidth.However,nothingcomesforfree,andtherearesomerestrictionsinherent inusingThumbmode.Forexample,onlythegeneralpurposeregistersr0–r7areavailablewhen theprocessorisinThumbmode. Bit0throughbit4definethecurrentprocessoroperatingmode.TheARMprocessormaybein oneofsevenmodesasdefinedinthefollowingtable: Mode
Abort FastInterruptRequest InterruptRequest Supervisor System Undefined User
Abbreviation
abt fiq irq svc sys und usr
Privileged
yes yes yes yes yes yes no
Modebits[4:0]
10111 10001 10010 10011 11111 11011 10000
Theusermodehasthelowestprivilegelevel,whichmeansitcannotalterthecontentsoftheprocessorstatusregister.Inotherwords,itcannotenableordisableinterrupts,enterThumbmodeor changetheprocessor’soperatingmode.Beforewediscusseachofthemodesweneedtolookat howeachmodeusestheregisterfiles.Thereareatotalof37registersinthebasicARMarchitecture.Wediscussedthe18thatareaccessibleinusermode.Theremaining19registerscomeinto playwhentheotheroperatingmodesbecomeactive.Figure11.4showstheregisterconfiguration foreachoftheprocessor’soperatingmodes. WhentheprocessorisrunninginUserModeorSystemModethe13general-purposeregisters, sp,lr,pcandcpsrregistersareactive.IftheprocessorenterstheFastInterruptRequestMode, theregisterslabeledr8_fiqthroughr14_fiqareautomaticallyexchangedwiththecorresponding registers,r8throughr14.Thecontentsofthecurrentprogramstatusregister,cpsr,arealsoautomaticallytransferredtothesavedprogramstatusregister,spsr. 299
Chapter11 Thus,whenafastinterruptrequestcomes intotheprocessor,andtheFIQmaskbit intheCPSRisenabled,theprocessorcan quicklyandautomaticallymakeanentire newgroupofregistersavailabletoservice thefastinterruptrequest.Also,sincethe contentsofthecpsrareneededtorestore thecontextoftheuser’sprogramwhen theinterruptrequestisserviced,thecpsr isautomaticallysavedtothespsr_fiq.The spsrregisterautomaticallysavesthecontextofthecpsrwheneveranyofthemodes otherthanuserandsystembecomeactive.
User and System r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 sp r14 lr r15 pc
Fast Interrupt Request r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq r14_fiq
Interrupt Request r14_irq r15_irq
Supervisor Undefined r14_svc r14_und r15_svc r15_und
Abort r14_abt r15_abt
cpsr
Whenevertheprocessorchangesmodesit spsr_fiq spsr_irq spsr_svc spsr_und spsr_abt ---musthavebeabletoeventuallyreturnto Figure11.4:Registerfilestructure. thepreviousmodeandcorrectlystartagain fromtheexactpointwhereitleftoff.Thus,whenitentersanewmode,thenewregisterspoint tothememorystackforthenewmodeinr13_xxxandalsothereturnaddressofthepriormode inr14_xxx.Thus,whentheprocessordoesswitchmodesthenewcontextisautomaticallyestablishedandtheoldcontextisautomaticallysaved.Ofcourse,wecoulddoallofthisinsoftwareas well,buthavingtheseadditionalhardwareresourcesoffersbetterprocessorperformancefortimecriticalapplications. Sincetheregistersusedareswappedinasagroup(calledabankswitch)theircontentsmaybe preloadedwiththeappropriatevaluesnecessarytoservicethefastinterruptrequest.Also,sincethe FIQmodeisahigherprivilegemode,theprogramcodeusedtoservicethefastinterruptrequest canalsochangetheoperationalmodebacktouserorsystemwhentheinterruptisover.We’lldiscussthegeneralconceptsofinterruptsandservicinginterruptsinmoredetailinalaterchapter. Theotherprocessormodes;InterruptRequest,Supervisor,UndefinedandAbortallbehavein muchthesamewayastheFastInterruptRequestmode.Whenthenewmodeisentered,theparticularregistersappropriateforthatmodearebankswitchedwiththeregistersoftheusermode. Thecurrentcontentsofthebaseregistersarenotchanged,sothatwhentheusermodeisrestored, theoldworkingregistersarereactivatedwiththeirvaluepriortothebankswitchtakingplace. Ourlastremainingtaskistolookatthesevenprocessormodes. • Usermode:Thisisthenormalprogramexecutionmode.Theprocessorhasgeneraluseof registersr0throughr12andtheflagbitsinthecpsrarechangedaccordingtotheresultsof theexecutionoftheassemblylanguageinstructionsthatcanmodifytheflags. • Systemmode:Systemmodeisusermodewiththehigherprivilegelevel.Thismeansthat insystemmodetheprocessormayalterthevaluesinthecpsr.Youwillrecallthe68Kalso hadauserandsupervisormode,whichgaveaccesstoadditionalinstructionswhichcould modifythebitsinthestatusregister.Systemmodewouldbetheappropriatemodetouse ifyourapplicationwasnotsocomplexthatitrequiredtheadditionofanoperatingsystem. 300
TheARMArchitecture
•
• • • •
Forsimplerapplications,theaccessibilityoftheprocessormodeswouldbeanadvantage, sothesystemmodewouldbeanobviouschoice. FastInterruptRequestMode:TheFIQandIRQmodesaredesignedforhandlingprocessorinterrupts.TheFastInterruptRequestModeprovidesmorebankregistersthanthe standardinterruptrequestmodesothattheprocessorhaslessoverheadtodealwithif additionalregistersmustbesavedduringtheinterruptserviceroutine.Also,theseven bankedregistersusedintheFIQmodearesufficientforcreatingasoftwareemulationofa DirectMemoryAccess(DMA)controller. InterruptRequestMode:LiketheFIQmode,theIRQmodeisdesignedfortheservicing ofprocessorinterrupts.Twobankedregistersareavailablewhentheprocessoracknowledgestheinterruptandswitchescontexttotheinterruptserviceroutine. Supervisormode:Supervisormodeisdesignedtobeusedwithanoperatingsystem kernel.WheninsupervisormodealltheresourcesoftheCPUareavailable.Supervisor modeistheactivemodewhentheprocessorfirstcomesoutofreset. Undefinedmode:Thismodeisreservedforillegalinstructionsorforinstructionsthatare notsupportedbytheparticularversionoftheARMarchitecturethatisinuse. Abortmode:Undercertainconditionstheprocessormayattempttomakeanillegal memoryaccess.Theabortmodeisreservedfordealingwithattemptstoaccessrestricted memory.Forexample,specialhardwaremightbedesignedtodetectillegalwriteattempts tomemorythatissupposedtoberead-only.
ConditionalExecution TheARMarchitectureoffersaratheruniquefeaturethatwe’venotpreviouslyconsidered.Thatis, theabilitytoconditionallyexecutemostinstructionsbaseduponthestates,orlogicalcombination ofstates,oftheconditionflags.Theconditioncodesandtheirlogicaldefinitionsareshowninthe followingtable: Code
EQ NE CSHS CCLO MI PL VS VC HI LS GE LT GT LE AL NV
Description
Equaltozero Notequaltozero Carryset/unsignedhigherorthesame Carrycleared/unsignedlower Negativeorminus Positiveorplus Overflowset Overflowcleared unsignedhigher unsignedlowerorthesame signedgreaterthanorequal signedlessthan signedgreaterthan signedlessthanorequal always(unconditional) never(unconditional)
301
Flags
Z=1 Z=0 C=1 C=0 N=1 N=0 V=1 V=0 Z*C Z+C (N*V)+(N*V) NxorV (N*Z*V)+(N*Z*V) Z+(NxorV) notused notused
OP-Code[31:28]
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
Chapter11 Theconditioncodemaybeappendedtoaninstructionmnemonictocausetheinstructiontobe conditionallyexecuted,baseduponthecurrentstateoftheflags.Forexample: SUB r3,r10,r5
subtractsthecontentsofregisterr5fromregisterr10andplacestheresultinregisterr3.Writing theinstructionas: SUBNEr3,r10,r5
wouldperformthesubtractionoperationonlyifthezeroflag=0. Thus,conditionalexecutionofinstructionscouldbeimplementedwithoutthenecessityofrequiringtheinsertionofaspecialbranchorjumpinstruction.Sincemanyassemblylanguageconstructs involvesjumpingoverthenextinstruction,theadditionofaconditionalexecutionmechanism greatlyimprovesthecodedensityoftheinstructionset. AnotheruniquefeatureoftheARMarchitectureisthattheclassesofinstructionsknowasdata processinginstructions;includingmoveinstructions,arithmeticandlogicalinstructions,comparisoninstructionsandmultiplicationinstructions,donotautomaticallychangethestateofthe conditioncodeflags.Justasinthepreviousexampleoftheoptiontoconditionallyexecuteaninstructionbaseduponthestateoftheconditioncodeflag,itispossibletocontrolwhetherornotthe resultofadataprocessinginstructionwillchangethestateoftheappropriateflags.Forexample: SUB r3,r10,r5
performsthesubtractionoperationbutdoesnotchangethestateoftheflags. SUBS r3,r10,r5
performsthesamesubtractionoperationbutdoeschangethestateoftheflags. SUBNES r3,r10,r5
conditionallyperformsthesameoperationandchangesthestateoftheflags.Theconditionalexecutionflag,‘NE’,isappendedfirst,followedbytheconditionalflagupdate,‘S’.Thus, SUBNES r3,r10,r5
isalegalopcode,but SUBSNE r3,r10,r5
isnotlegalandwillcauseanunknownopcodeerror.
BarrelShifter ReferringtoFigure11.1weseethattheRmoperandpassesthroughabarrelshifterhardware blockbeforeitenterstheALU.Thus,itispossibletoexecutebothanarithmeticoperationanda shiftoperationinoneinstruction.Thebarrelshiftercanshiftthe32-bitRmoperandanynumber ofbitpositions(upto32bits,leftorright)beforeenteringtheALU.Thebitshiftisentirelyan asynchronousoperation,sonoadditionalclockcyclesarerequiredtofacilitatetheshiftoperation. We’llleaveittoyouasanexercisetodesignthegateimplementationofabitshifterforvarious degreesofshifts.
302
TheARMArchitecture Forexample,theMOVinstructionisusedforregistertoregisterdatatransfersandforloading registerswithimmediatevalues.Considerthefollowinginstruction:
MOV r6,r1
;Copyr1tor6
Now,considerthesameinstructionusingwiththelogicalshiftleftoperationadded:
MOV r6,r1,LSL#3 ;Copiesr1*8tor6
Inthesecondexample,thecontentsofregisterr1areshiftedleftby3bitpositions(multipliedby 8)andthencopiedtoregisterr6.Thenumberofbitspositionsthatareshiftedmaybespecified asanimmediatevalueorthecontentsofaregister.Thiscodesnippetperformsthesameshiftand loadasinthepreviousexample:
MOV r9,#3 ;Initializer9 MOV r6,r1,LSLR9 ;Copiesr1*8tor6
Thefollowingtablesummarizesthebarrelshifteroperations: Mnemonic
LSL LSR ASR ROR RRX
Operation
LogicalShiftLeft LogicalShiftRight ArithmeticShiftRight RotateRight RotateRightExtended
ShiftAmount
#0-31,orregister #1-32,orregister #1-32,orregister #1-32,orregister 33
TheRotateRightExtendedeffectivelyshiftsthebitsrightby1bitpositionandcopiesbit31into thecarryflagposition.Notethattheflagbitsareconditionallyupdatedbaseduponthestateofthe Sbittotheinstructionmnemonic.
OperandSize TheARMarchitecturepermitsoperationsonbytes(8-bit),half-words(16-bits)andwords (32-bits).Onceagain,youareconfrontedwithaslightlydifferentsetofdefinitions.However, beingthemostmodernofthethreearchitectures,theARMdefinitionsareprobablythemost naturalandincloseragreementwiththeClanguagestandards. Bytesmaybeaddressedonanyboundary,whilehalf-wordsmustbealignedonevenaddress boundaries(A0=0)andwordsmustbealignedonquadbyteboundaries(A0andA1=0).This isevenmorerestrictivethanthe68Karchitecture,butisunderstandablebecausemostARM implementationswouldhaveafull32-bitwidedatapathtomemory.Thus,fetchinga32-bitwide memoryvaluefromanyaddresswhereA0=0andA0=1wouldrequiretwoaccessoperations,or theequivalentofanonalignedaccess. TheARMarchitecturesupportsbothbigendianandlittleendiandatapackingthroughtheproper configurationofthecore.However,thearchitecturedoesdefaulttolittleendianasthenative mode.Instructionsforloadingorstoringdatamayoptionallybemodifiedbyappendingthe operandtypeandsizetotheloadorstoreop-code.Thefollowingtableliststhetypeofload/store operationsintheARMarchitecture:
303
Chapter11 Mnemonic
LDR STR LDRB STRB LDRH STRH LDRSB LDRSH
Description
Operation
Loadawordfrommemoryintoaregister Register