390 47 2MB
English Pages 287 Year 2005
Co-VerificationofHardwareand SoftwareforARMSoCDesign
This page intentionally left blank
Co-VerificationofHardwareand SoftwareforARMSoCDesign byJasonR.Andrews
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Newnes is an imprint of Elsevier
NewnesisanimprintofElsevier 200WheelerRoad,Burlington,MA01803,USA LinacreHouse,JordanHill,OxfordOX28DP,UK Copyright©2005,ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin anyformorbyanymeans,electronic,mechanical,photocopying,recording,orotherwise, withoutthepriorwrittenpermissionofthepublisher. PermissionsmaybesoughtdirectlyfromElsevier’sScience&TechnologyRightsDepartmentinOxford,UK:phone:(+44)1865843830,fax:(+44)1865853333,e-mail: [email protected].Youmayalsocompleteyourrequeston-lineviatheElsevier homepage(http://elsevier.com),byselecting“CustomerSupport”andthen“ObtainingPermissions.” Recognizingtheimportanceofpreservingwhathasbeenwritten,Elsevierprintsitsbookson acid-freepaperwheneverpossible.
LibraryofCongressCataloging-in-PublicationData
Andrews,JasonR. Co-verificationofhardwareandsoftwareforARMSoCdesign/JasonR.Andrews p.cm. ISBN0-7506-7730-9 1.Integratedcircuits--Vertification.2.Computersoftware--Verification.3.Systemsona chip.I.Title. TK7874.A5952004 005.1'4--dc22 BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ForinformationonallNewnespublications visitourwebsiteatwww.newnespress.com 04050607080910987654321 PrintedintheUnitedStatesofAmerica.
2004053860
Contents
Foreword...................................................................................... xiii Preface.......................................................................................... xv WhyIsThisBookImportant?.............................................................................. xv Audience............................................................................................................. xvi PrerequisiteKnowledge....................................................................................... xvi AboutHardware/SoftwareCo-Verification........................................................ xvi
Acknowledgments.........................................................................xvii AbouttheAuthor......................................................................... xix AboutVerisity............................................................................... xxi What’sontheCD-ROM?.............................................................xxiii Chapter1:EmbeddedSystemVerification:AnIntroduction.............1 What’sanEmbeddedSystem?................................................................................ 3 EmbeddedSystemsAreEverywhere...................................................................... 5 ConsumerElectronics............................................................................................ 5 Wireless................................................................................................................... 5 Medical................................................................................................................... 5 Networking............................................................................................................. 5 Security................................................................................................................... 5 Imaging................................................................................................................... 5 Storage.................................................................................................................... 5 Automotive............................................................................................................ 5 DesignConstraints................................................................................................. 6 Cost......................................................................................................................... 6 Memory................................................................................................................... 6 Power...................................................................................................................... 7 Real-TimeResponse............................................................................................... 7 v
Contents Performance............................................................................................................ 7 SystemSize............................................................................................................. 8 Reliability............................................................................................................... 8 Time-to-Market...................................................................................................... 8 EmbeddedSystemsDecomposition........................................................................ 9 Microprocessors,ChipsandBoards........................................................................ 9 EmbeddedSystemClassifications......................................................................... 11 LittleorNoCustomHardwareDesign................................................................ 12 ALotofCustomHardware–SoBDesign........................................................... 12 ALotofCustomHardware–SoCDesign........................................................... 13 EmbeddedSystemDesignProcess........................................................................ 14 Requirements........................................................................................................ 15 SystemArchitecture............................................................................................. 15 MicroprocessorSelection..................................................................................... 15 HardwareDesign.................................................................................................. 16 SoftwareDesign.................................................................................................... 16 HardwareandSoftwareIntegration..................................................................... 16 VerificationandValidation.................................................................................. 16 Verification:DoesitWork?.................................................................................. 17 Validation:DidWeBuildtheRightThing?........................................................ 17 HumanInteraction............................................................................................... 18 WhatisthisBookAbout?.................................................................................... 20 ScopeandOutline................................................................................................ 22
Chapter2:HardwareandSoftwareDesignProcess.......................25 ThreeComponentsofSoCVerification.............................................................. 25 VerificationPlatform............................................................................................ 26 SoftwareEngineer’sViewoftheWorld................................................................ 35 HardwareEngineer’sViewoftheWorld.............................................................. 37 Example................................................................................................................ 37 SoftwareDevelopmentTools............................................................................... 39 Editor.................................................................................................................... 39 SourceCodeRevisionControl............................................................................ 39 Compiler............................................................................................................... 40 Debugger............................................................................................................... 41 Simulator.............................................................................................................. 41 DevelopmentBoard.............................................................................................. 42 IntegratedDevelopmentEnvironment(IDE)..................................................... 42 vi
Contents SoftwareDebuggingConnections........................................................................ 42 JTAG.................................................................................................................... 43 Stub....................................................................................................................... 43 DirectConnection............................................................................................... 44 TypesofSoftware.................................................................................................. 44 SystemInitializationandHAL............................................................................ 44 DiagnosticSuite................................................................................................... 45 Real-TimeOperatingSystem(RTOS)................................................................. 45 DeviceDriversandApplicationSoftware.......................................................... 45 SoftwareDevelopmentProcess............................................................................ 46 HardwareDevelopmentTools.............................................................................. 52 Editor.................................................................................................................... 52 SourceCodeRevisionControl............................................................................ 53 LintTools.............................................................................................................. 54 CodeCoverage..................................................................................................... 54 DebuggingTools................................................................................................... 55 VerificationLanguages......................................................................................... 55 Assertions............................................................................................................. 56 DebuggingDefined............................................................................................... 58 MemoryModels.................................................................................................... 59 MicroprocessorModels......................................................................................... 61 HardwareDesignProcess...................................................................................... 62 MicroprocessorReview......................................................................................... 63 HardwareandSoftwareInteraction..................................................................... 64 SoftwareDebuggingCharacteristics.................................................................... 64 HardwareDebuggingCharacteristics................................................................... 64
Chapter3:SoCVerificationTopicsfortheARMArchitecture.........69 ARMBackground................................................................................................. 69 ARMArchitecture............................................................................................... 70 ARMArchitectures,Families,andCPUCores................................................... 71 ThumbInstructionSet......................................................................................... 75 ProgrammingModel............................................................................................. 76 InstructionSet...................................................................................................... 78 DataTransferInstructions.................................................................................... 78 CoprocessorInstructions...................................................................................... 79 ExceptionsandInterrupts.................................................................................... 80 MemoryLayoutandByteOrder........................................................................... 83 vii
Contents ARMBusInterfaceProtocols............................................................................... 84 ARM7TDMIBusProtocol................................................................................... 85 AMBASpecification............................................................................................ 89 IntroductiontoAMBAProtocols........................................................................ 91 AMBAASB......................................................................................................... 91 AMBAAHB........................................................................................................ 92 AMBAAPB......................................................................................................... 92 AMBA3.0andAXI............................................................................................. 92 SummaryofARMCPUBusInterfaces................................................................ 93 AHBTutorial........................................................................................................ 94 ConfigurationatReset.......................................................................................... 98 PhasesofAHBTransfer........................................................................................ 99 AHBArbitration.................................................................................................. 99 AHBAddressPhase.......................................................................................... 101 AHBDataPhase................................................................................................ 104 AHB-Lite............................................................................................................ 106 Single-LayerandMultilayerAHB..................................................................... 107 ARM926EJ-SExample....................................................................................... 107 InterruptSignals................................................................................................. 111 InstructionandDataCaches.............................................................................. 111 TightlyCoupledMemory(TCM)...................................................................... 115 ARMSummary.................................................................................................. 118
Chapter4:Hardware/SoftwareCo-Verification............................119 HistoryofHardware/SoftwareCo-Verification.................................................. 119 CommercialCo-VerificationToolsAppear....................................................... 121 Co-VerificationDefined..................................................................................... 124 Definition........................................................................................................... 124 BenefitsofCo-Verification................................................................................. 125 ProjectScheduleSavings................................................................................... 125 Co-VerificationEnablesLearningbyProvidingVisibility................................. 127 Co-VerificationImprovesCommunication....................................................... 127 Co-VerificationversusCo-Simulation............................................................... 128 Co-VerificationversusCo-Design...................................................................... 128 IsCo-VerificationReallyNecessary?.................................................................. 129 Co-VerificationMethods.................................................................................... 129 NativeCompilingSoftware................................................................................ 130 InstructionSetSimulation................................................................................. 130 viii
Contents HardwareStubs................................................................................................... 131 Real-TimeOperatingSystem(RTOS)Simulator.............................................. 132 MicroprocessorEvaluationBoard...................................................................... 132 Waveforms,LogFiles,andDisassembly............................................................. 133 ASampleofCo-VerificationMethods.............................................................. 134 Host-CodeModewithLogicSimulation........................................................... 134 InstructionSetSimulationwithLogicSimulation............................................ 137 CSimulation...................................................................................................... 140 RTLModelofCPUwithSoftwareDebugging.................................................. 144 HardwareModelwithLogicSimulation............................................................ 147 EvaluationBoardwithLogicSimulation........................................................... 149 In-CircuitEmulation......................................................................................... 150 FPGAPrototype................................................................................................. 153 Co-VerificationMetrics...................................................................................... 154 Performance........................................................................................................ 155 VerificationAccuracy......................................................................................... 155 AHBArbitrationandCycleAccuracyIssues.................................................... 158 ModelingSummary............................................................................................ 160 Synchronization................................................................................................. 161 TypesofSoftware................................................................................................ 162 OtherMetrics..................................................................................................... 162
Chapter5:AdvancedHardware/SoftwareCo-Verification............165 DirectAccesstoSimulationMemories.............................................................. 165 MemoryOptimizationsandPerformance.......................................................... 171 ModesofSynchronization.................................................................................. 175 InterprocessCommunication............................................................................. 177 MixingHDLandCModels............................................................................... 180 ImplicitAccess................................................................................................... 183 SaveandRestart................................................................................................. 186 Post-ProcessingSoftwareDebuggingTechniques.............................................. 188 EmbeddedSoftwareToolIssues.......................................................................... 193 DebuggingCo-VerificationIssues..................................................................... 194
Chapter6:HardwareVerificationEnvironmentand Co-Verification.......................................................................197
BusMonitor........................................................................................................ 197 ProtocolChecking.............................................................................................. 207 AlignedAddresses.............................................................................................. 207 ix
Contents IssuingIdleTransfers.......................................................................................... 207 Assertions........................................................................................................... 208 AssertionDefinitions.......................................................................................... 208 AssertionApproaches........................................................................................ 210 DeclarativeAssertions........................................................................................ 210 ProceduralAssertions......................................................................................... 212 FormalPropertyLanguage.................................................................................. 212 Pseudo-CommentDirectives.............................................................................. 213 Post-ProcessingSimulationHistory................................................................... 213 AssertionsforSimulationAccelerationandEmulation.................................... 214 TestbenchesUsingBusFunctionalModels........................................................ 215 DirectedTests..................................................................................................... 216 ConstrainedRandomTests................................................................................ 217 TestbenchArchitecture...................................................................................... 218 FunctionalCoverage.......................................................................................... 220 ComplianceSuite............................................................................................... 221 SoftwareVerification.......................................................................................... 221 SoftwarePrintStatements.................................................................................. 222 Summary............................................................................................................. 227
Chapter7:MethodologyforanExampleARMSoC......................229 SoCMethodologyDifficulty.............................................................................. 230 VerificationEfficiency........................................................................................ 231 TheDebuggingLoop.......................................................................................... 232 Co-VerificationMethodology............................................................................ 234 SystemInitializationandHALDevelopment................................................... 235 Diagnostics.......................................................................................................... 235 RTOSandDeviceDrivers.................................................................................. 236 ApplicationSoftware......................................................................................... 236 TestbenchDevelopment..................................................................................... 236 ThreeVerificationPhases................................................................................... 237 ExampleofARMVerificationFlow................................................................... 239 BlockandSubsystemVerification...................................................................... 239 InitialSystemIntegration.................................................................................. 240 FocusedHardwareVerification.......................................................................... 242 Hardware/SoftwareCo-Verification................................................................... 243 SystemSoftwareTesting..................................................................................... 244 TheCo-VerificationEngineer............................................................................ 246 x
Contents Conclusion......................................................................................................... 248 MethodologyGridlock....................................................................................... 249
Afterward....................................................................................253 Index...........................................................................................255
xi
This page intentionally left blank
Foreword
Thisisaremarkablebook. JasonAndrewsknowsaboutthehardwareandthesoftware.Heknowsaboutthe people,thetools,andthemethodologiesinthemiddlegroundbetweenhardware andsoftware. Hecanalsowrite,explainingcomplexthingssothatyoucanreallyunderstandthem. Oneofthemainreasonsthismiddleareaissocomplexistherearejusttoomany interactingissuestounderstandandtoomanydecisionstomake. Jasontakescaretoenumeratetheissues,explainhowtheyinteract,anddescribethe optionsfordealingwiththem. Bestofall,heexplainswhichtoolsandmethodologiesareapplicableforeachsituation.Thisiscrucialbecausetherearemanydistinctsolutionsfortheproblem,and youcannotpossiblyusethemall.Youneedtomakeaninformedjudgmentonwhat todowhen. Jasonhaseitherusedorimplementedmostofthesesolutions,someofthemtwice, andhegivesaveryinformedtourofthelandandguidesyouthroughthepossible compromises. PleasenotethatwhileJasonandIworkforaverificationcompany(Verisity)that wouldlovetosellyouverificationsolutions,thisbookisdecidedlygeneric.Ittells youwhatworks,whatdoesnot,andwhy. WhilethetitleofthebookisCo-VerificationofHardwareandSoftwareforARMSoC Design,Ithinkthisbookhaswiderapplicability.Infact,ifanyofthefollowingapply, thenyoushouldbeginbyreadingthisbook:
xiii
Foreword ■
Youareinvolvedintheverificationofproductsthatcontainbothhardware andsoftware,regardlessofwhethertheyareSoC-basedorARM-based.
■
YouareworkingononesideoftheHW/SWdivide,andwanttoseewhatthe othersidelookslike.
■
Youareinterestedincreatingtoolsforthisarea.
YoavHollander FounderandCTO,VerisityInc. July2004
xiv
Preface
WhyIsThisBookImportant? Thisbookisthefirsttodocumentandteachimportantinformationabouttheverificationtechniqueknownashardware/softwareco-verification.Traditionalembedded systemdesignhasevolvedintosinglechipdesignsthatarepushingpast1Mlogic gatesandheadedtoward10Mgates.InthiseraofSoCdesign,chipsnowinclude microprocessorsandrequiresoftwaretobedevelopedbeforehardwarefabrication.To developqualityproductseffectivelyandinatimelymanner,engineersmustbearmed withnecessaryinformationtomakeeducateddecisionsaboutwhichtoolsandmethodologytodeploy.SoCverificationrequiresamixofexpertisefromthedisciplinesof microprocessorandcomputerarchitecture,logicdesignandsimulation,andCand assemblylanguageembeddedsoftware.Individualbooksexistineacharea,butuntil nowtherelevantinformationandhowitallfitstogetherhasnotbeenavailableina singlevolume.Thisbookprovidesunique,in-depthinformationabouthowco-verificationreallyworks,howtobesuccessfulusingit,andthepitfallstoavoid. Thisbookalsocontainsanaddedbonus.ItcoversimportantinformationaboutdevelopingandverifyingSoCdesignsusingARMmicroprocessorcores.Inthelastfew yearsARMhasachievedadominantmarketpositioninthe32-bitembeddedmicroprocessorspaceandhasbecomethedefactostandardformanymarketsegments. Thisbookillustratestheconceptsofhardware/softwareco-verificationusingconcrete ARMSoCexamplesandprovidesusefulinformationaboutco-verificationofdesigns utilizinganARMmicroprocessor.
xv
Preface
Audience Theprimaryaudienceistheengineerlookingtodevelopbestpracticetechniques forSoCverificationofbothhardwareandsoftwarenotonlytoincreaseconfidence inthedesign,butalsotocompleteverificationinashorterperiodoftime.Both hardwareandsoftwareengineerswillbenefitfromabetterunderstandingofeachdiscipline.Projectmanagerswillalsobenefitfromanunderstandingoftheinteraction betweenhardwareandsoftwareteamsandhowtoencouragecollaborationbetween thetwoteams.EngineersinvolvedinARMSoCdesignprojectswillalsobenefit fromtheinformationinthebook.
PrerequisiteKnowledge Readersshouldhavesomeknowledgeofembeddedsystemdesignincludingsystems withmicroprocessorsandsoftware.Readerswithahardwareengineeringbackground shouldbefamiliarwithdigitallogicdesignandverification.Aworkingknowledge ofVerilogorVHDLisusefulaswellasfamiliaritywithcommonsimulationtools. ReaderswithasoftwarebackgroundshouldbeproficientinCandassemblylanguage programmingandshouldbefamiliarwithembeddedsystemconcepts.Verilogisused topresentconceptsandexamples,buteverythingappliesequallytoVHDL.
AboutHardware/SoftwareCo-Verification Hardware/softwareco-verificationisaboutmakingsureembeddedsystemsoftware workswellwithhardwarebeforechipsandboardsareavailable.It’salsoaboutmakingsurehardwarehasbeendesignedcorrectlytorunthesoftwaresuccessfully.For applicationswheretime-to-marketandprojectcostareimportant,co-verification savestimeandreducestheriskofcostlyhardwaredesignerrors.
xvi
Acknowledgments
ThankstoeverybodyatAxisSystemsandVerisityforencouragingmetowritethis book. ThankstoRichDavenportforhiringmeintotheworldofco-verificationwhenIrespondedtothejobadintheSt.PaulPioneerPresswithoutknowingwhereitwould leadme. ThankstoDavidBurnswhovolunteeredtoreviewthemanuscript(justforthefunof it)andprovidedvaluablefeedback. ThankstoYoavHollanderforhisreviewcommentsandforwritingtheforeword. ThankstoRussKleinforthestoryofSeamlessandmakingsureIgotallthefacts straight. ThankstoalltheEDAcompaniesIhaveworkedat,allthegreatpeopleIhave workedwith,andtoalltheusersthatenabledmetolearnsomuchaboutthisinterestingareaofverification. Mostofall,bigthanksgoesouttomywifeDeborahandchildrenHannah,Caroline, Philip,andCharlotteforputtingupwithallmytravelasaremoteworkerinMinnesotaandfortheirsacrificesinhelpingmefinishthisbook.
xvii
This page intentionally left blank
AbouttheAuthor
JasonAndrewsiscurrentlyworkingintheareasofhardware/softwareco-verification andtestbenchmethodologyforSoCdesignatVerisity.Hehasimplementedmultiple commercialco-verificationtoolsaswellasmanycustomco-verificationsolutions. HisexperienceintheEDAandembeddedmarketplaceincludessoftwaredevelopmentandproductmanagementatVerisity,AxisSystems,Simpod,SummitDesign, andSimulationTechnologies.Hehaspresentedtechnicalpapersandtutorialsatthe EmbeddedSystemsConference,CommunicationDesignConferenceandIP/SoC andwrittennumerousarticlesrelatedtoHW/SWco-verificationanddesignverification.HehasaB.S.inelectricalengineeringfromTheCitadel,Charleston,SC,and anM.S.inelectricalengineeringfromtheUniversityofMinnesota.Hecurrently livesintheMinneapolisareawithhiswife,Deborah,andtheirfourchildren.
xix
This page intentionally left blank
AboutVerisity
VerisityLtd.(NASDAQ:VRST)istheleadingproviderofverificationprocess automation(VPA)solutionsthatautomateandsimplifythecompleteverification processtoincreaseproductivity,predictabilityandquality.Verisityaddressescritical businessissueswithitsverificationsystemsandintellectualproperty(IP)thateffectivelyverifythedesignofelectronicsystemsandcomplexintegratedcircuitsfor thecommunications,computingandconsumerelectronicsmarkets.Verisity’sVPA solutionsenableprojectstomovefromanexecutableverificationplantounit,chip, systemandprojectlevel‘totalcoverage’andverificationclosure.Verisityisaglobal organizationwithofficesthroughoutAsia,Europe,andNorthAmerica.Formore information,visitwww.verisity.comandalsolookfortheproductsummariesinthe Afterwardattheendofthisbook.
xxi
This page intentionally left blank
What’sontheCD-ROM?
IncludedontheaccompanyingCD-ROM: ■
AfullysearchableeBookversionofthetextinAdobePDFformat
■
SourcecodeforFigure6-1
■
Verisityproductliterature
xxiii
This page intentionally left blank
CHAPTER
1
EmbeddedSystemVerification: AnIntroduction Itworks!Thesearetwoofthemostgratifyingwordsengineersmayeverhear.A co-workeroncetoldmethatengineering(especiallyhardwareengineering)consists ofextendedperiodsofboredomfollowedbyafewbriefmomentsofexcitementthat resultineitherbitterdisappointmentorgreatsatisfaction.Theabilitytodefine, architect,design,integrate,verify,test,anddeliveraworkingproductprovidesthe driveforcontinuedinnovationintheelectronicsindustry. Recently,IworkedonaprojectthatrequiredmetodoFPGAdesignforanARM CPUboardtobeusedasamicroprocessormodelforin-circuitemulationofARM SoCdesigns(don’tworryifyoudon’tunderstandthisyet,youwillbytheendofthe book).Afterworkinginthesimulationworldforsometime,itwasexcitingforme totakeashotatarealhardwaredesignproject,evenifitwasonlyoneprogrammabledevice.IdiligentlycreatedthenecessaryVHDLsourcefilesfortheFPGAand constructedamixed-languageVerilogandVHDLsimulationenvironmentofthe CPUboardincludingaVerilogmodelfortheARMCPUandmyVHDLcodeforthe FPGAandanotherCPLDontheboard.Iconnectedmysimulatedboardtoacouple oftestdesignstomakesurethewholethingworkedtogether.Afterfixingacouple ofbugsintheFPGAcode,IfoundthenecessarysynthesisandFPGAplace-androutetoolstoturntheVHDLcodeintoasuitablebitstreamfileforprogrammingthe FPGA.Aftercheckingandrechecking,themomentoftruthhadcome.Itwastime totryoutthedesigninthelab.Theresultwouldbeeitherdisappointmentortremendoussatisfaction.Asluckwouldhaveit,thedesignworkedonthefirsttry.Iwas successfullyabletousetheCPUboardforin-circuitemulationofARMdesigns.
1
Chapter1 Contrastthisexperiencetooneofmypreviousjobsatastartupthatsetouttobuild acomplexmultiprocessorserverofupto32Intelprocessorscompletelyoutofsmall programmablelogicdevices(PLDs).Atthattime,allprogrammablelogicdevices weresmall(bytoday’sstandards),andthesmallertheywere,thefastertheywent. Theengineer’srallycrywas“pluganddebug.”Themanagement’smindsetwasthat sincethesystemwasconstructedtotallyoutofprogrammabledevicesitwasmost expedienttodoabitofdesign,buildacoupleboards,andhitthelab.Ifitdidnot workonthefirsttry,engineerscouldsimplychangetheprogrammablelogicandtry again.Therecipeincludediterationuntilthesystemfinallyworked.Theonlything standinginthewayofashippingproductwasonelasttweakofaPLD.Thenagain, consideringtheprimitivelogicanalyzersbeingusedandthenearlyinfinitecombinationsforprogrammingaPLD,thesystemmightneverwork.Evenifthiswastheway togetaproductworkingintheshortesttime,itwasdifficultonmorale.NowIknow whytheyhadaspinningsireninthelabthatwasactivatedwheneversomething worked,foritwasabitofamiracleeverytimeithappened.Bytheway,thesystem didworkeventually(mostofthetime),thoughthehighpartcountmadereliabilitya constantproblem. Mostengineeringprojectsprobablyfallsomewhereinbetweenthesetwoextremes; noteverythingworksonthefirsttry,butitdoeshaveaveryhighchanceofworking, eventually.Thepurposeofthisbookistoincreasetheexperiencesofgreatsatisfactionandminimizethedisappointment.Thebestwaytoincreasetheoddsofsuccess intheembeddedsystemworldistoverifythatthehardwareandsoftwareworkwell togetherevenbeforetheprojecthitsthelab.Thisbookwilldemonstratehowplanningandpatience,combinedwithprovenengineeringtechniques,canhelpensure yournextembeddedsystemprojectisasuccess. Thischapterprovidesthenecessaryembeddedsystembackgroundtoformafoundationtodiscusstimeandmoney-savingco-verificationtechniques.
2
EmbeddedSystemVerification:AnIntroduction
What’sanEmbeddedSystem? Thereisnoformaldefinitionofanembeddedsystem,butitisgenerallyacceptedto bededicatedcomputerhardwarewithsoftwaredesignedtosolveaspecificproblemor task.Thisisincontrasttoageneral-purposecomputer,suchasapersonalcomputer (PC)orworkstation,designedtorunanysoftwareapplicationprogrammerscreate anduserschoosetoinstall.Marketingcampaignslike“IntelInside”havetaughteven non-technicalpeoplethatthereisamicroprocessorinsideeveryPC.Incontrast, embeddedsystemsutilize“hidden”microprocessors.Productliteraturemaynoteven listhowmanyorwhatkindofmicroprocessorsareusedinaproduct.Asonewhois alwayscuriousaboutwhatisinsideaparticularproductorchip,Iamoftenpuzzledas towhythisinformationisnotreadilyavailableinproductbrochuresanddatasheets. Thisisespeciallytrueforproductswherethesoftwarecontentcanbeaddedor changed.Asconfirmation,askyourself,“Whatkindofmicroprocessorisusedinmy mobilephone?”Fewpeoplehaveanyideawhatisinsidethephone. Embeddedsystemstypicallyuseamicroprocessor,combinedwithotherhardware andsoftware,tosolveaspecificcomputingproblem.Microprocessorsrangefrom simple(bytoday’sstandards)8-bitmicrocontrollerstotheworld’sfastestandmost sophisticated64-bitmicroprocessors.Ataminimumsomerandomaccessmemory (RAM)orread-onlymemory(ROM)isrequiredtostorethesoftware.Flashmemory iscommonlyusedasnonvolatilememorytoholdthesoftwareandstillallowforfield upgradeswhendefectsarefixedorothersoftwareenhancementsaremade.Inadditiontothemicroprocessorandmemory,embeddedsystemsgenerallyhaveamixof hardwarefunctionssuchastimers,interruptcontrollers,UARTs,general-purpose inputandoutput(GPIO)pins,directmemoryaccess(DMA)controllers,realtime clocks,andliquidcrystaldisplay(LCD)controllers.Themixofhardwareperipherals variesgreatlyinembeddedsystemsandistailoredspecificallyfortherequirementsof theproduct.
3
Chapter1 Embeddedsystemsoftwarecanbedividedintotwomainclasses,theoperating softwareandtheapplicationsoftware.Operatingsoftwarerangesfromasmallexecutivetoalargereal-timeoperatingsystem(RTOS).Thefunctionoftheoperating softwareistoprovideasetofservicestotheapplicationsoftwarewithoutforcingthe applicationsoftwaretolearnaboutthedetailsofthehardwareimplementation.The applicationsoftwareimplementsthespecifictaskstheembeddedsystemisdesigned toperform.Anexampleofapplicationsoftwareisthegraphicaluserinterface(GUI) thatisusedtoconfiguretheproduct.Thedifferenttypesofembeddedsystemsoftwarewillbefurtherclassifiedinthenextchapter.Figure1-1illustratesthebasicsof anembeddedsystem.
ROM (Software)
RAM
Custom Hardware
CPU
I/O Devices
Peripherals
LCD
Keypad
Figure1-1:Basicdiagramofanembeddedsystem
4
EmbeddedSystemVerification:AnIntroduction
EmbeddedSystemsAreEverywhere Theembeddedsystemlandscapeisasdiverseastheworld’spopulation;notwosystemsarethesame.Embeddedsystemsrangefromlargecomputers,suchasairtraffic controlsystems,tosmallcomputers,suchasahandheldcomputerthatfitsintoyour pocket.Followingarejustafewofmanyproductsweexperienceeachday.
ConsumerElectronics ProductsincludePDAs,MP3players,DVDplayers,digitalcamerasandset-top boxes.
Wireless Productsincludemobilephones,basestationsandwirelessnetworkingproducts.
Medical Productsincludeimplantablepacemakersandmagneticresonanceimaging(MRI) machines.
Networking Productsincluderouters,switchesandgateways.
Security Productsincludeencryptionprocessorsandbiometricidentificationsystems.
Imaging Productsincludeprinters,scannersandfaxmachines.
Storage Productsincludediskdrives,tapedrivesandmemorycards.
Automotive Productsincludeenginecontrolsystems,anti-lockbrakesandnavigationsystems.
5
Chapter1 Connectivityandintegrationarethecurrentfocusofmanyproducts.Nolonger arenetworkingandwirelessaclassofproductsbythemselves,butinsteadtheyare featuresofalmosteveryembeddedsystem.Theconvergenceofproductssuchas thePDA,mobilephone,andMP3playerisoccurring.Wearequicklyapproaching thepointwhereeverythingisconnectedandtherearemoremicroprocessorsthan people.Trytoidentifyallofthemicroprocessorsthatareembeddedintheproducts youuseeveryday.Yourcarislikelytohavethemostmicroprocessorsofanything youown.Forexample,Irecentlyreadanarticleindicatingthatthecurrent7-Series BMWandS-ClassMercedeseachhaveaboutone-hundredprocessors.Evenabasic non-luxurycarhasabouttwenty-fiveprocessors.
DesignConstraints Embeddedsystemdesignshavedifferentconstraintsdependingonthetarget application.
Cost Expectedproductvolumedirectlyimpactscostconstraints.Lowvolumeproducts withhighmarginscanaffordtrade-offsthatmayincreasethemanufacturingcostin exchangeforusefulbenefitssuchasflexibilityortheabilitytoupdatetheproductat alatertime.Highvolumeproductsaremorerestrictivesincesavingasmallamount ofmoneyperunitleadstoamajorcostsavingsdowntheroad.
Memory Oneareaforcostsavingsinhighvolumedesignsismemorysizeormemorytype. Someembeddedmicroprocessorshavedevelopedaspecialmodethatallowsa32-bit architecturetorun16-bitinstructions.Thistechniqueofferstheperformanceof a32-bitprocessorandthememoryrequirementsofa16-bitprocessorforsoftware storage.Ofcourse,thereissomeoverheadrequiredtoprocess16bitinstructionsina 32-bitCPU.ExamplesincludetheARMThumbandMIPS16instructionsets.
6
EmbeddedSystemVerification:AnIntroduction
Power Portableproductswithbatteriesrequireadesignthatisoptimizedforlowpower consumption.Tominimizepower,bothstaticanddynamicpowerdissipationmust beconsidered.Staticpowerdissipation(whenthesystemisnotactive)canbe minimizedusingsleepmodesandspecialcircuitswithlowleakagecurrent.Dynamic powerdissipation(whenthesystemisrunning)canbeminimizedbyreducingoperatingvoltagesandcontrollingtheclockfrequency.Clockgatingtechniquessave powerbyshuttingofftheclocktopartsofthedesignnotbeingused.Thesystem clockmaybesloweddownorstoppedasawaytoinsertwaitstatesonthemicroprocessorbusinsteadofusingawaitsignaltoindicatetheinsertionofwaitstates.
Real-TimeResponse Responsetimeisacriticalconstraintinembeddedsystems.Therearespecificconstraintsforhowlongthesystemcantaketorespondtocriticalevents.Thisisalso referredtoasinterruptlatency.Therearetwokindsofreal-timecomputingsystems, hardreal-timeandsoftreal-time.Systemswithhardreal-timeconstraintsmusthave aguaranteedresponsetimewithpotentiallytragicresultsiftheserequirementsare notmet.Factoryautomationisanexampleofhardreal-timesystem.Ifthesoftware cannotperformspecifictasksfollowingeventssuchasexternalinterrupts,machines shutdownormalfunction.Softreal-timesystemsalsohaveconstraintsonresponse timetoexternalevents,thoughthepenaltyforfailingtomeettherequirementsis lesssevere.Amobilephoneisoneexampleofasoftreal-timesystem.Ifthephone softwarecannotrespondquickly,thecallinprogressislost.
Performance Performanceisoneofthemostimportantconstraintsandoneofthemostdifficult topredictandverify.Ideally,performancecanbecomputedorpredictedbeforethe systemisconstructed.Inreality,onlyafterthehardwareisdesignedandthesoftware isrunonthehardwarecanthearchitectsbecertainthatthedesignmeetstheperformancerequirements.Itiscrucialtohavetoolsthatprovidevisibilityandtheability toquantifyexpectedperformance.
7
Chapter1
SystemSize Productsizerequirementsmayrestrictthepossiblesolutions.Amobilephonethat canrunanyprotocolfromanywhereintheworldandplay3Dgamesat1GHzmay notsucceedbecauseofitsexcesssizeandweight.Incontrast,customersmayperceive morevaluefromaproductthatislarger.Abiggercarmustcostmorethanasmaller carbecauseittakesmorematerialtomakeit.
Reliability Embeddedsystemsarevieweddifferentlythangeneral-purposecomputers.Ithas becomecommonforuserstoacceptthatgeneral-purposecomputerssuchasdesktop PCsoftencrashandneedtoberestarted.Incriticalapplications,suchasservers, techniquessuchasECC(errorcorrectingcodes)andRAID(redundantarrayof inexpensivedisks)areusedtoprovidehigherreliabilityformemoryanddiskstorage. Mostembeddedsystemsarenotallowedtocrash.Reliabilityisaccomplishedusing bothhardwareandsoftwaretodetectproblemsandcorrectthem.Awatchdogtimer isanexampleofasoftwaretechniqueusedtocorrectaproblem.Embeddedsystems alsohaveahigherprobabilityofbeingreliablesincethesoftwareistightlycontrolled andthoroughlytested.Generally,userscannotaddorchangetheembeddedsystem software.
Time-to-Market Time-to-marketisoneofthemosttalkedaboutconceptsinembeddedsystemdesign. Beingfirsttodeliveranewanduniqueproductcanpropelitintobecomingthede factostandardandproducehigherrevenuethaniftherearetwosimilarproducts available.Muchofthehigh-techworldofelectronicsandsoftwarerevolvesaround thedeliveryofuniquetechnologythatprovidesvaluablebenefitstousers.Inthe worldofelectronicdesignautomation(EDA)andembeddeddevelopmenttools (EDT),everycompanyistryingtodeliverproductsthathelpengineersgetproducts tomarketfasterwithfewerproblems.Relatedtotime-to-marketconcernsistotal developmentcost.Inpooreconomictimes,companiestrytomaintaincashandstay alive.Insuchanenvironment,savingmoneyismoreimportantthantime-to-market sinceconsumersmaynothavetheresourcestoadopteverynewemergingtechnology.
8
EmbeddedSystemVerification:AnIntroduction
EmbeddedSystemsDecomposition Thereareendlesstopicsrelatedtoembeddedsystemsandmanygoodbooksonthe subject.Theaimofthisbookisnottocoveralloftherelatedtopicssuchashardware design,writingembeddedsoftware,portingoperatingsystems,debuggingsoftware, andwritingdiagnosticstotesthardwaresystems.Thegoalistounderstandthenecessarybackgroundtohelpengineersverifythatthesoftwareworkswiththehardware andtovalidatethatthesystemmeetsthedesignrequirements.Toreachthisgoal, somebackgroundonmicroprocessorhistoryisusefultoidentifytheclassesofembeddedsystemsthatarethemostatriskintheareaofhardware/softwareinteraction.
Microprocessors,ChipsandBoards In1989,AndyGroveannouncedattheEmbeddedSystemsConferencethatbythe endofthecenturyIntelwouldhaveamicroprocessorchipnumbered80786with 100milliontransistorsanda250MHzclockrate.Asitturnedout,Inteldroppedthe 80x86numberingsystemsothemarketingpeoplecouldputonabettercampaign usingtermslikePentium,butGrove’sperformancepredictionwasexceededwhen Intelproducedachiprunningatmorethan1GHz.Intel’sdominanceinPCmicroprocessorsandstrongmarketinggivesthemhighernamerecognitionthancompanies makingprocessorsforembeddedsystems.Majorcompaniesprovidingmicroprocessors forembeddedsystemsareARM,Hitachi,IBM,MIPS,Motorola,andTexasInstruments.Engineerscanchoosefromsomewherearound100differentmicroprocessors foraspecificembeddedsystemdesignproject.ShipmentsofembeddedmicroprocessorsgreatlyoutnumbermicroprocessorsusedinPCsandworkstations. Microprocessorvendorstargetdesignwinsintwoways.First,somesellchipsfor useonaprintedcircuitboard(PCB).Thismarketisreferredtoassystem-on-board (SoB).Themicroprocessorcompanyissellingchipstocompaniesmakingboardsand systems.Thesecondwaymicroprocessorvendorsoperateistosellmicroprocessors foruseinanapplicationspecificintegratedcircuit(ASIC)orapplicationspecific standardproduct(ASSP).Thismarketisreferredtoassystem-on-chip(SoC).The microprocessorvendorissellingtocompaniesmakingchipsandpotentiallyalsousingthosechipsinsystems.NoactualchipsarebeingsoldforuseintheSoC,butthe designfilesforthemicroprocessorarebeingsold.Thedesignfilesarereferredtoas
9
Chapter1 intellectualproperty(IP)andthecompaniesthatsellthemarelabeled“IPCompanies”sincetheysellfiles,morelikesellingsoftware. ThetermSoChasnoformal,agreedupondefinition,butSoChasgenerallycome tomeanasinglechipthatincludesoneormoremicroprocessors,applicationspecific customlogicfunctions,andembeddedsystemsoftware. SomeviewaSoCasasystemwhereasinglechipprovidesalloftherequireddigital logic,andtheexistenceofamicroprocessorisoptional.Japanesecompaniesoften callthisintegrateddeviceaSystemLSI(largescaleintegration).SoCisalwaysa bitmisleadingsinceaproductorsystemalwaysneedsmorethanonesingledevice. Externalmemoriesandconnectionstohardwaresuchasdisplayandkeyboardare usuallyrequired.Afterall,achipbyitselfisnotmuchuseunlessitisputonaboard andconnectedtosomethingmoremeaningful.Thereisalsotheworldofanalog design.Evenwhenasingledigitalchipisused,analogcomponentsarenotalways includedinthechipfortechnologyandcostreasons.Theworldofembeddedsystem designcontinuestointegratemoreandmorefunctionalityintolargersemiconductor devices,andtheinclusionofmicroprocessorspresentsnewchallengesforSoChardwareandsoftwareengineers. VendorssellingchipstotheSoBmarketprovidevalueanddifferentiationinoneof twoareas,eitherhigh-performanceorhigh-integration.Highperformancechipsare thelatestandgreatestchipswiththefastestclockrates.Manyembeddedsystems requiremaximumprocessorperformancetomeetdesignrequirements.Examples ofhigh-performancemicroprocessorsusedinembeddedsystemsaretheMotorola MPC7455andthePMC-SierraRM9000.Theseare32-and64-bitdevicescapableof clockspeedsinexcessof1GHz. Thesecondareawherechipvendorsdelivervalueanddifferentiationisbyprovidinglessthanleadingedgeperformanceyetmaximizingintegration.Thesedevices typicallystartwithaCPUthathasalreadybeenprovenandaddtoitadditional peripheralssothatnearlyalloftherequiredfunctionalityisavailableinasinglechip. Thislowersthetotalcomponentcountandprovidesasolutionrequiringlesscustom hardwaretobedesignedfromscratch.Examplesofhigh-integrationmicroproces-
10
EmbeddedSystemVerification:AnIntroduction sorsusedinembeddedsystemsaretheMotorolaPowerQUICCfamily.Thisfamilyof chipsusesprovenPowerPCcorescombinedwithnearlyeveryperipheralneededfor anynetworkingapplication. TherecentgrowthoftheSoCmarketisaresultofthenaturalprogressioninelectronicdesigndrivenbyincreasedperformance(faster),higherintegration(smaller) andlowercost(cheaper)products.EmbeddedproductsthatstartedasSoBproducts builtwithanoff-the-shelfmicroprocessorchipandothercomponentshaveevolved. Overtime,itbecameadvantageoustointegratethemicroprocessorandthecustom logicintoasingledevice.Forexample,theNokiamobilephonestartedasasetof discretecomponentsfortheCPU,DSP,memoryandsupportinghardware.Thiswas oneofthefirstapplicationstomergetheCPU,DSPandotherlogicintowhatwe nowcallanSoC. Increasedintegrationbringsincreasedriskanddecreasedflexibility.Integration decisionscanbedifficultbecauseoftheriskinvolved.TheNokiaphonehadmany choicesfordiscreteCPUs,thoughchoicesbecamelimitedoncetheCPUand DSPwereintegratedintoamorecomplexcustomchip.Systemengineersfocuson choosingthebestmicroprocessorandsystemarchitecturetosatisfytheprojectrequirementswithinthegivenconstraints.
EmbeddedSystemClassifications Classificationofembeddedsystemsmaybebasedontheprofileofthehardwaredesign.Hardwaresolutionsthatsatisfythedesignconstraintscanbeobtainedinmany differentways,rangingfromassemblingoff-the-shelfboardsandcomponentstodesigningfull-customintegratedcircuits.Followingisarathersimplisticcategorization ofembeddedsystemhardwaretohelpidentifythetypesofsystemsthatarethebest candidatesforimprovingcollaborationandintegrationofhardwareandsoftware.
11
Chapter1
LittleorNoCustomHardwareDesign Somesystemsaredesignedusinglittleornocustomhardware.Thesolutionisbuilt usingoff-the-shelfboardsorevencompletesystems.Appropriatesoftwaredevelopmentsolvestheproblemandsatisfiesthedesignrequirementswithoutanyhardware modifications.Minormodificationsmaybemadetoanoff-the-shelfboardorreferencedesignfromamicroprocessorvendortoaddsomecustomlogictosatisfythe designrequirements.Thecustomhardwareoftentakestheformofprogrammable logic.Again,softwareistheprimarydifferentiatoroftheproduct.Theseprojects havethefollowinghardwaredesigncharacteristics: ■
Hardwaredesignisminimized(togettheproducttomarketquickly)
■
Hardwareuseshigh-integrationmicroprocessorsoroff-the-shelfboards
AnexampleofthistypeofdesignisapieceofhospitalequipmentIrecentlysawfor monitoringbrainwaves.ThesolutionusedaWindowsPCwithsomecustomhardwaretoconnectthePCtoasetofprobes.Likely,mostoftheworktoproducethe productwasinthesoftwarethatwasrunningonthePCtoproduceusefulresultsfor doctorsandnurses.
ALotofCustomHardware–SoBDesign Boarddesignsthatmakeuseofhigh-performancemicroprocessorshaveperformance asatoppriority.ComplexcustomlogicintheformofFPGAsandASICsisoften requiredtomeetperformanceobjectives.Integrationisusedprimarilyasameans toachieveperformancerequirements.Thesedesignshavethefollowinghardware designcharacteristics: ■
High-performancemicroprocessor,suchasPowerPCandMIPSchipsare utilized
■
Boardsincludelargecustomlogic,FPGAsorASICs
■
Boardsareoftenverylargeandmayutilizemultipleprocessors
■
Systemscanbecomprisedofmultipleboards
12
EmbeddedSystemVerification:AnIntroduction Anexampleofthistypeofdesignisthehigh-performanceroutersproducedbyCisco. Toachievetherequiredperformance,acombinationofcustomASICsandthefastestMIPSmicroprocessorchipsareused.Sizeandpowerarelessofafactorinrouter design.Alargeeffortforbothhardwareandsoftwareisrequiredtoproduceafinal product.
ALotofCustomHardware–SoCDesign ApplicationsthatrequiresmallproductsizeandlowpoweroftenturntoSoCdesign. SoCcanalsobeusedtomeetperformancerequirementsthatcouldnotbeachieved bylimitationsimposedbyprintedcircuitboardtechnology.Forthepurposeofthis section,SoCisdefinedasanASICorASSPthatincludesoneormoremicroprocessorsonachip.Microprocessorcompaniescreatedesignstosellhigh-integration chips.NumerousotherfablesssemiconductorcompaniesalsoparticipateinSoC designbylicensingmicroprocessorIPtodevelopandsellchipstargetedatspecific industries.Thesedesignshavethefollowinghardwaredesigncharacteristics: ■
Oneormoremicroprocessorsonachip
■
IncorporatesmicroprocessorIPsuchasARM,MIPSorTensilica
■
DSPcoresareoftenincluded
■
Integration,performanceandpowerarecrucial
■
Costtodevelopisveryhigh
■
Containshighcustomhardwarecontent
Goodexamplesofthistypeofdesignarereadilyfoundinconsumerelectronicssuch asMP3playersanddigitalcameras.Inconsumerelectronicsthevolumeishigherand diesizeandpowerareimportant. Customhardwarerequirescustomsoftwaretoprovidesystemspecificdiagnostics,initializationsoftware,anddevicedrivers.Byunderstandingthenatureofthehardware designitbecomeseasiertoidentifytherisksinaspecificembeddedsystemdesign projectandformulateastrategytoincreasethechancesofprojectsuccess.
13
Chapter1
EmbeddedSystemDesignProcess Theprocessofembeddedsystemdesigngenerallystartswithasetofrequirements forwhattheproductmustdoandendswithaworkingproductthatmeetsallofthe requirements.Followingisalistofthestepsintheprocessandashortsummaryof whathappensateachstateofthedesign.ThestepsareshowninFigure1-2. Product Requirements
System Architecture
Microprocessor Selection
Software Design
Hardware Design
Hardware and Software Integration
Figure1-2:Embeddedsystemdesignprocess
14
EmbeddedSystemVerification:AnIntroduction
Requirements Therequirementsandproductspecificationphasedocumentsanddefinestherequiredfeaturesandfunctionalityoftheproduct.Marketing,sales,engineering,or anyotherindividualswhoareexpertsinthefieldandunderstandwhatcustomers needandwillbuytosolveaspecificproblem,candocumentproductrequirements. Capturingthecorrectrequirementsgetstheprojectofftoagoodstart,minimizesthe chancesoffutureproductmodifications,andensuresthereisamarketfortheproduct ifitisdesignedandbuilt.Goodproductssolverealneeds,havetangiblebenefits,and areeasytouse.
SystemArchitecture Systemarchitecturedefinesthemajorblocksandfunctionsofthesystem.Interfaces, busstructure,hardwarefunctionalityandsoftwarefunctionalityaredetermined.Systemdesignersusesimulationtools,softwaremodels,andspreadsheetstodetermine thearchitecturethatbestmeetsthesystemrequirements.Systemarchitectsprovide answerstoquestionssuchas,“Howmanypackets/seccanthisrouterdesignhandle?” or“WhatisthememorybandwidthrequiredtosupporttwosimultaneousMPEG streams?”
MicroprocessorSelection Oneofthemostdifficultstepsinembeddedsystemdesigncanbethechoiceofthe microprocessor.Thereareanendlessnumberofwaystocomparemicroprocessors, bothtechnicalandnon-technical.Importantfactorsincludeperformance,cost, power,softwaredevelopmenttools,legacysoftware,RTOSchoices,andavailable simulationmodels.Benchmarkdataisgenerallyavailable,thoughapples-to-apples comparisonsareoftendifficulttoobtain.Creatingafeaturematrixisagoodwayto siftthroughthedatatomakecomparisons. Softwareinvestmentisamajorconsiderationforswitchingtheprocessor.Embedded guruJackGansslesaystheruleofthumbistodecideif70%ofthesoftwarecanbe reused;ifso,don’tchangetheprocessor.Mostcompanieswillnotchangeprocessors unlessthereissomethingseriouslydeficientwiththecurrentarchitecture.Whenin doubt,thebestpracticeistostickwiththecurrentarchitecture.
15
Chapter1
HardwareDesign Oncethearchitectureissetandtheprocessor(s)havebeenselected,thenextstepis hardwaredesign,componentselection,VerilogandVHDLcoding,synthesis,timing analysisandphysicaldesignofchipsandboards. Thehardwaredesignteamwillgeneratesomeimportantdataforthesoftwareteam suchastheCPUaddressmap(s)andtheregisterdefinitionsforallsoftwareprogrammableregisters.Aswewillsee,theaccuracyofthisinformationiscrucialtothe successoftheentireproject.
SoftwareDesign Oncethememorymapisdefinedandthehardwareregistersaredocumented,work beginstodevelopmanydifferentkindsofsoftware.Examplesincludebootcodeto startuptheCPUandinitializethesystem,hardwarediagnostics,real-timeoperating system(RTOS),devicedriversandapplicationsoftware. Duringthisphase,toolsforcompilationanddebuggingareselectedandcodingis done.
HardwareandSoftwareIntegration Themostcrucialstepinembeddedsystemdesignistheintegrationofhardwareand software.Somewhereduringtheprojectthenewlycodedsoftwaremeetsthenewly designedhardware.Howandwhenhardwareandsoftwarewillmeetforthefirsttime toresolvebugsshouldbedecidedearlyintheproject.Therearenumerouswaysto performthisintegration.Doingitsoonerisbetterthanlater,thoughitmustbedone smartlytoavoidwastedtimedebugginggoodsoftwareonbrokenhardwareordebugginggoodhardwarerunningbrokensoftware.
VerificationandValidation Twoimportantconceptsofintegratinghardwareandsoftwareareverificationand validation.Thesearethefinalstepstoensurethataworkingsystemmeetsthedesign requirements.
16
EmbeddedSystemVerification:AnIntroduction
Verification:DoesItWork? Embeddedsystemverificationreferstothetoolsandtechniquesusedtoverifythat asystemdoesnothavehardwareorsoftwarebugs.Softwareverificationaimsto executethesoftwareandobserveitsbehavior,whilehardwareverificationinvolves makingsurethehardwareperformscorrectlyinresponsetooutsidestimuliandthe executingsoftware.Theoldestformofembeddedsystemverificationistobuildthe system,runthesoftware,andhopeforthebest.Ifbychanceitdoesnotwork,try todowhatyoucantomodifythesoftwareandhardwaretogetthesystemtowork. Thispracticeiscalledtestinganditisnotascomprehensiveasverification.Unfortunately,findingoutwhatisnotworkingwhilethesystemisrunningisnotalwayseasy. Controllingandobservingthesystemwhileitisrunningmaynotevenbepossible. Tocopewiththedifficultiesofdebuggingtheembeddedsystemmanytoolsand techniqueshavebeenintroducedtohelpengineersgetembeddedsystemsworking soonerandinamoresystematicway.Ideally,allofthisverificationisdonebeforethe hardwareisbuilt.Theearlierintheprocessproblemsarediscovered,theeasierand cheapertheyaretocorrect.Verificationanswersthequestion,“Doesthethingwe builtwork?”
Validation:DidWeBuildtheRightThing? Embeddedsystemvalidationreferstothetoolsandtechniquesusedtovalidatethat thesystemmeetsorexceedstherequirements.Validationaimstoconfirmthatthe requirementsinareassuchasfunctionality,performanceandpoweraresatisfied.It answersthequestion,“Didwebuildtherightthing?”Validationconfirmsthatthe architectureiscorrectandthesystemisperformingoptimally. IonceworkedwithanembeddedprojectthatusedacommonMIPSprocessorand areal-timeoperatingsystem(RTOS)forsystemsoftware.Forvariousreasonsitwas decidedtochangetheRTOSforthenextreleaseoftheproduct.ThenewRTOS waswellsuitedforthehardwareplatformandtheengineerswereabletobringit upwithoutmuchdifficulty.Allapplicationtestsappearedtofunctionproperlyand everythinglookedpositiveforanon-scheduledeliveryofthenewrelease.Justbefore theproductwasreadytoship,itwasdiscoveredthattheapplicationswererunning
17
Chapter1 about10timesslowerthanwiththepreviousRTOS.Suddenly,panicsetinandthe projectschedulewasindanger.Softwareengineerswhowrotetheapplicationsoftwarestruggledtofigureoutwhytheperformancewassomuchlowersincenotmuch hadchangedintheapplicationcode.Hardwareengineerstriedtostudythehardware behavior,butusinglogicanalyzersthatarebettersuitedfortriggeringonerrorsthan providingwidevisibilityoveralongrangeoftime,itwasdifficulttoevendecide wheretolook.TheRTOSvendorprovidedmostofthesystemsoftwareandsothere waslittlesourcecodetostudy.Finally,oneoftheengineershadahunchthatthe cacheoftheMIPSprocessorwasnotbeingproperlyenabled.Thisindeedturnedout tobethecaseandaftertheproblemwascorrected,systemperformancewasconfirmed.Thisexampledemonstratestheimportanceofvalidation.Likeverification,it isbesttodothisbeforethehardwareisbuilt.Toolsthatprovidegoodvisibilitymake validationeasier.
HumanInteraction Embeddedsystemdesignismorethanaroboticprocessofexecutingstepsinanalgorithmtodefinerequirements,implementhardware,implementsoftware,andverify thatitworks.Therearenumeroushumanaspectstoaprojectthatplayanimportant roleinthesuccessorfailureofaproject. Thefirstplacetolookistheorganizationalstructureoftheprojectteams.Thereare twocommonlyusedstructures.Figure1-3showsastructurewithseparatehardware andsoftwareteams,whereasFigure1-4showsastructurewithonegroupofcombined hardwareandsoftwareengineersthatshareacommonmanagementteam. Separateprojectteamsmakesenseinmarketswheretime-to-marketislesscritical. Staggeringtheprojectteamssothatthesoftwareteamisalwaysoneprojectbehind thehardwareteamcanbeusedtoincreaseefficiency.Thisway,thesoftwareteam alwayshasavailablehardwarebeforetheystartanysoftwareintegrationphase.Once thehardwareispassedtothesoftwareengineers,thehardwareengineerscangoonto thenextproject.Thisstructureavoidshavingthesoftwareengineerssittingaround waitingforhardware.
18
EmbeddedSystemVerification:AnIntroduction
Software Engineer
Vice President Software Development
Vice President Hardware Development
Software Development Manager
Hardware Development Manager
Software Engineer
Software Engineer
Hardware Engineer
Hardware Engineer
Hardware Engineer
Figure1-3:Managementstructurewithseparateengineeringteams
Vice President Engineering
Responsible for both hardware and software
Project Manager
Lead Hardware Engineer
Lead Software Engineer
Software Engineer
Software Engineer
Software Engineer
Hardware Engineer
Hardware Engineer
Hardware Engineer
Figure1-4:Managementstructurewithcombinedengineeringteams
19
Chapter1 Acombinedprojectteamismostefficientforaddressingtime-to-marketconstraints. Thebestsituationtoworkunderisacommonmanagementstructurethatisresponsibleforprojectsuccess,notjustoneareasuchashardwareengineersorsoftware engineers.Companiesthatarerunningmostefficientlyhaveremovedstructuralbarriersandworktogethertogettheprojectdone.Intheend,thesuccessoftheproject isbasedontheentireproductworkingwell,notjustthehardwareorsoftware. Ionceworkedinacompanythattotallyseparatedhardwareandsoftwareengineers. Therewasnosharedmanagement.Whentheprototypesweredeliveredandbrought upinthelab,themanagerofeachgroupwouldpacebackandforthtryingtodeterminewhatworkedandwhatwasbroken.Whatusuallyendeduphappeningwasthat thehardwareengineerwouldtellhismanagerthattherewassomethingwrongwith thesoftwarejusttogetthemanagertogoaway.Mostengineersprefertobeleftalone duringthesecriticalprojectphases.Thereisnothingworsethanastatusmeetingto reportthatyourdesignisnotworkingwhenyoucouldbeworkingtofixtheproblems insteadofexplainingthem.Idon’tknowwhatthesoftwareteamwascommunicating toitsmanagement,butIalsoenvisionedsomethingaboutthehardwarenotworkingortheinabilitytogettimetousethehardware.Attheendoftheday,thetwo managersprobablywenttotheCEOtoreporttheothergroupwasstillworkingtofix itsbugs. Everybodyhasaroletoplayontheprojectteam.Understandingtherolesandskills ofeachpersonaswellasthepersonalitiesmakesforasuccessfulprojectaswellasan enjoyableworkenvironment.Engineerslikechallengingtechnicalwork.Ihaveno datatoconfirmit,butIthinkmoreengineersseeknewemploymentbecauseofdifficultieswiththepeopletheyworkwithorthemoraleofthegroupthanbecausethey areseekingnewtechnicalchallenges.
WhatisthisBookAbout? Thischapterprovidedanintroductionintotheinteractionofhardwareandsoftware forembeddedsystemprojects.Thepurposeoftheremainingchaptersistodocument andteachimportantinformationabouttheverificationtechniqueknownasHardware/SoftwareCo-Verification.Itisclearthatmanyprojectsarenotcompletedon timeandthosethataremaybeabletoshrinkscheduleandlowercostevenmore.
20
EmbeddedSystemVerification:AnIntroduction Arecentsurveyintoembeddedsystemsprojectsfoundthatmorethan50%ofdesigns arenotcompletedontime.Typicallythosedesignsare3to4monthsoffthepace, projectcancellationsaverage11–12%andaveragetimetocancellationis4-and-ahalfmonths.JerryKrasnerofElectronicsMarketForecastersJune2001. Hardware/softwareco-verificationaimstoverifyembeddedsystemsoftwareexecutescorrectlyonarepresentationofthehardwaredesign.Itperformsearly integrationofsoftwarewithhardware,beforeanychipsorboardsareavailable. TheprimaryfocusofthisbookisonSoCverificationtechniques.Althoughallembeddedsystemswithcustomhardwarecanbenefitfromco-verification,theareaof SoCverificationismostimportantbecauseitinvolvesthemostriskandispositioned toreapthemostbenefit.TheARMarchitectureisthemostcommonmicroprocessor usedinSoCdesignandservesasareferencetoteachmanyoftheconceptspresented inthebook. Ifanyofthefollowingstatementsaretrueforyou,thisbookwillprovidevaluable information: 1. Youareasoftwareengineerdevelopingcodethatinteractsdirectlywith hardware. 2. Youarecuriousabouttherelationshipbetweenhardwareandsoftware. 3. Youwouldliketolearnmoreaboutdebugginghardwareandsoftwareinteractionproblems. 4. YoudesiretolearnmoreabouteitherthehardwareorsoftwaredesignprocessesforSoCprojects. 5. Youareanapplicationengineerinacompanysellingco-verificationproducts. 6. Youwanttogetyourprojectsdonesoonerandbetheheroatyourcompany. 7. Youaregettingtiredofthemanagerbuggingyouinthelabasking,“Doesit workyet?” 8. Youareamanagerandyouaretiredofbuggingtheengineersasking,“Doesit workyet?”andwouldliketopestertheengineersinamoremeaningfulway.
21
Chapter1 9. Youhavenocluewhatthisstuffisallaboutandwanttolearnsomethingto atleastsoundintelligentaboutthetopicatyournextinterview.
ScopeandOutline ThisbookpresentspracticaltechniquestoverifyintegrationofSoChardwareand software.ItprovidesdetailedinformationandplentyofexamplesforthemostcommonSoCbeingdesignedtoday,thoseusingARMmicroprocessors.Theexamples mostdirectlyrelatetotheSoC/ASIC/ASSPmarketwheretheriskisgreatestand mistakesequalmoney.Itismostrelevanttosoftwareengineersdevelopingcodethat dealswithhardwareoperation. Thebookisnotadesignbooktocreateanewchiporsoftware;ithasnograndand glorioustop-downschemesforsystem-leveldesignandhardware/softwarepartitioning.TherearemanyotherbooksaboutdesignpracticesusingVerilogandVHDLand associatedsimulationandsynthesistools. Chapter2providesinformationabouttheseparateyetrelatedhardwareandsoftware designprocesses.Itdocumentsthetoolsandtechniquescommonlyusedineach discipline.Theboundarieswherehardwaremeetssoftwareandtherelationships attheseboundariesaredescribed.Theviewoftheworldforboththehardware engineerandthesoftwareengineerispresented. Chapter3givesanoverviewandsomedetailedinformationabouttheARMarchitectureusedthroughoutthebook.ForengineersinvolvedinARMprojectsit providesbothatutorialonthearchitectureandbusprotocolsaswellasdetails aboutimportantareasrelatedtoco-verification. Chapter4providesthefoundationforhardware/softwareco-verificationincluding thedefinitions,techniques,benefitsandexamples. Chapter5coversadvancedtopicsonhardware/softwareco-verificationthatmost engineersdon’ttypicallylearnexceptfromon-the-jobtrainingwhiletryingtouse co-verificationonanembeddedsystemproject. Chapter6explainstherelationshipbetweenco-verificationandthetestbench. Traditionalhardwareverificationtechniquesandhowtheyrelatetothemicroprocessorportionofthedesignarecovered.
22
EmbeddedSystemVerification:AnIntroduction Chapter7putsallofthepreviousinformationintopracticeandpresentsamethodologyforSoCverificationusingthetechniquescoveredintheearlychaptersona realARMSoCdesignexample.
23
This page intentionally left blank
CHAPTER
2
HardwareandSoftware DesignProcess Tounderstandhardware/softwareco-verification,itisnecessarytounderstand thetoolsandtheprocessusedtodevelophardwareandsoftware.Untilrecently, theintegrationofsoftwarewithhardwarewasperformedinalabenvironmentby constructingthehardwareandrunningthesoftware.Debuggingwasdoneusing equipmentsuchasin-circuitemulators(ICE),logicanalyzers,andoscilloscopes. Debugginglateinthedesigncycle,whenthepressureoftheprojectscheduleisthe greatest,isatediousandstressfultask.Co-verificationchangesthisenvironmentby usingavirtualprototypeofthehardwaretoexecutethesoftwarewellbeforeprototypesareavailable.Withthisinmind,let’sdigintothetechnicalaspectsofhow hardwareandsoftwareengineersworkandidentifytheboundarieswheretheymeet.
ThreeComponentsofSoCVerification Beforegettingintothedetailsofhardwareandsoftwaredesignandexaminingthe intersectionofthetwo,itisworthwhiletoreviewofthethreecomponentsofSoC verification.Tobuildacohesivemethodologyforhardwareandsoftwareverification, engineersmustunderstandspecifictoolsineachareaaswellastheinteroperability betweenthem.Thethreecomponentsare: 1. Verificationplatform 2. Hardwareverificationtoolsandtechniques 3. Softwaredebuggingtoolsandtechniques
25
Chapter2
VerificationPlatform Theverificationplatformisthemethodusedtoexecuteadescriptionofthehardware design.Ithasothercommonnamessuchasexecutionengineorvirtualprototype. Thehardwaredesignprocessconsistsofdescribingthehardwareusingoneofthe twocommonhardwaredescriptionlanguages(HDLs),VerilogorVHDL.ThisHDL representationofthehardwaredesigncanbeexecutedusinganynumberofplatforms orexecutionengines. Fourdistinctmethodshavebeenidentifiedandusedfortheexecutionofthehardwaredesign: ■
Logicsimulation
■
Simulationacceleration
■
Emulation
■
Hardwareprototyping
Eachhardwareexecutionmethodhasspecificdebuggingtechniquesassociatedwith it,eachwithitsownsetofbenefitsandlimitations.Themethodsrangefromthe slowestexecutionmethod,withthemostflexibilityandbestdebugging,tothefastest,withlessflexibilityanddebugging.Thefollowingdefinitionsareusedtodescribe thetypesofplatformsandthewaytheyoperate. HardwareDescriptionLanguage(HDL)referstoadedicatedlanguagedesignedto describehardware.ThetwolanguagesusedtodayareVerilogHDLandVHDL.These languagesareusedtospecifythebehaviorofachiporaboardinthesamewayasoftwareprogramspecifiesthebehaviorofamicroprocessororembeddedsystem.HDLs containthekeywords,syntaxandsemanticstomodelhardwarecircuits.Software toolscanthenusethesemodelstosimulatehardwarebehaviorandtosynthesize HDLmodelsintostructuralrepresentationsthatcanbeusedtobuildhardware. HDLsareusedtospecifytheimplementationofASICandFPGAdevices.HDLsare alsousedtomodeloff-the-shelfelectroniccomponentssuchasmemoriesandother digitallogic,aswellastomodelthestimulitoandfromtheinterfacesofthechip orsystem.FollowingisashorthistoryofVerilogandVHDL.Numerousbooksand resourcesareavailableforthosewishingmoreinformationontheusesanddetailsof VerilogandVHDL. 26
HardwareandSoftwareDesignProcess VerilogHDLoriginatedatAutomatedIntegratedDesignSystems(laterrenamed GatewayDesignAutomation)in1985.VerilogHDLwasdesignedbyPhilMoorby, wholaterbecametheChiefDesignerforVerilog-XLandthefirstCorporateFellowat CadenceDesignSystems.GatewayDesignAutomationgrewrapidlywiththesuccess ofVerilogandwasacquiredbyCadenceDesignSystems,SanJose,CAin1989. CadenceDesignSystemsdecidedtoopentheVeriloglanguagetothepublicin 1990.ThestandardsbodycreatedtooverseeVerilogwascalledOVI(OpenVerilog International).WhenOVIwasformedin1991,anumberofsmallcompaniesbegan workonVerilogsimulators.Thefirstofthesecametomarketin1992.Themost successfulsimulatortofollowVerilog-XLwasVCS,theVerilogCompiledSimulator, fromChronologicSimulation.VCSwasacompilerasopposedtoaninterpreterlike Verilog-XL.Asaresult,compiletimewaslonger,butsimulationexecutionspeedwas muchfaster.TodaythereareVerilogsimulatorsavailablefromseveralsourcesincludingvendorssuchasCadence,MentorGraphics,andSynopsys. AnIEEE(InstituteofElectricalandElectronicEngineers)workinggroupwasestablishedin1993undertheDesignAutomationSub-Committee(DASC)toproduce theIEEEVerilogstandard1364.TheIEEEmaintainsstandardsforavarietyof engineeringfields.VerilogbecameIEEEStandard1364in1995.Thelatestversion oftheIEEE1364VerilogHDLandPLI(programminglanguageinterface)standard iscalledVerilog2001.TheVerilogPLIprovidesaprogramminginterfacethatallows engineerstocustomizeandextendthecapabilitiesofVerilogsimulators.Engineers usethePLIforavarietyofprogrammingtasksduringsimulation.ThereareactuallythreegenerationsofPLI.ThefirstgenerationPLIroutines(startingwithtf_) workonlywiththeparameterspassedtothem.Itwassoonapparentthatalldesign parameterscouldnotbepassedtoCfunctionsasparameters.Asaresult,moresophisticatedPLIfunctionsemerged.Secondgenerationfunctionsareknownasaccess routines(startingwithacc_),andcoveravarietyofdesignobjectswhilekeepingthe userinterfaceassimpleaspossible.Accessroutinesdidagoodjobkeepingtheuser interfacesimple,butoftentheyareinconsistent.In1995,Cadencecameupwiththe thirdgenerationofPLIforVerilog,theVerilogProceduralInterface(VPI).Allthree generationsofPLIarepartofIEEEStandard1364-1995.
27
Chapter2 VHDL,theveryhigh-speedintegratedcircuit(VHSIC)hardwaredescriptionlanguage,isthesecondlanguageusedbyhardwareengineerstodescribehardware. UnlikeVerilog,whichwasconceivedbyaprivatecompany,VHDLisaproductof theVHSICprogramfundedbytheDepartmentofDefenseinthe1970sand1980s. Itwasintendedtoserveasawaytodocumentcircuitsaswellasamodelinglanguage forsimulation.VHDLwasfirstapprovedastheIEEE1076standardin1987andwas updatedin1993totheIEEE1164standard.WhileVerilogtendstolookmorelikeC, VHDLtendstolookmorelikeAda,duetoitsrootsintheDepartmentofDefense. ForseveralyearstheindustryhasdebatedaboutwhichHDLisbetter,Verilogor VHDL.Thislanguagewarisprettymuchovertoday.Neitherlanguagewasever declaredthewinner.Botharewidelyusedtoday.Verilogislooselytypedandconsideredlessverbose,althoughthiscanleadtounexpectedbehavior.VHDLisstrongly typedsoitismoredifficulttolearn,butsomewouldarguethatsavestimeinthelong run.TherearenohardrulesaboutwhousesVerilogandwhousesVHDL,butsome generalizationsarepossible.VerilogismorecommonforASICdesignandVHDLis morecommonforFPGAdesign.ItisgenerallyacceptedthatVerilogismorewidely usedinNorthAmerica,VHDLismorewidelyusedinEurope,andJapanisapretty evenmixofVerilogandVHDL. BothVerilogandVHDLcontinuetoevolve.ThenextversionofVerilogbeingdevelopedisSystemVerilog3.X(the3rdgenerationofVerilog)anditwilllikelyresult inanewIEEE1364standard.VHDLalsosawtheformationofaworkinggroupin 2003tostartworkonnextversionoftheIEEE1076(language)and1164(packages) standards.Bothlanguagesareaddingmoresupportfortestbenchfeatures,assertions, bettermodelingandanexpandedsynthesissubset. Sincepartsofachiporsystemmaybedesignedbydifferentgroupsinacompanyor purchasedfromothercompanies,manydesignsuseacombinationofbothlanguages. ThissituationofusingbothVerilogandVHDLiscalledmixed-languagedesign,and simulatorsthatcansimulateVerilogandVHDLinthesamesimulationareknownas mixed-languagesimulators.Mostlogicsimulatorstodayaremixed-languagecapable andsomecanrunadditionallanguages,suchasSystemC,beyondVerilogandVHDL.
28
HardwareandSoftwareDesignProcess Forthepurposesofthisbook,SoftwareSimulationreferstoanevent-drivenlogic simulator.SoftwaresimulatorsrunonworkstationsanduseVerilogandVHDLas simulationlanguagestodescribethedesignandthetestbench(stimulusandresponse checking). Themostcommontypeofdigitalsimulatorisanevent-drivensimulator.Whenthe valueofasignalchanges,thetime,thesignalandthenewvaluearecollectivelyreferredtoasanevent.Theeventisscheduledbyputtingitinaneventqueueorevent list.Whenthespecifiedtimeisreached,thelogicvalueofthesignalischanged. Thechangeaffectsothersignalsthathavethissignalasaninput.Alloftheaffected signalsmustbeevaluated,whichmayaddmoreeventstotheeventlist.Thesimulatorkeepstrackofthecurrenttime,thecurrenttimestepandtheeventlistthatholds futureevents.Foreachsignal,thesimulatorkeepstrackofthelogicstateandthe strengthofthesourceorsourcesdrivingthesignal.Thelogicsimulatoristhemost commontoolusedtosimulatethebehaviorofhardwaredesigns.Logicsimulatorsfollowoneoftwomodels,interpreted-codeorcompiled-code. Aninterpreted-codesimulatorusestheHDLmodelasdata,compilinganexecutable modelaspartofthesimulatorstructure,andthenexecutesthemodel.Thistypeof simulatorusuallyhasashortcompiletimebutalongerexecutiontimecomparedto acompiled-codesimulator.Anexampleofaninterpreted-codesimulatorisVerilogXL.Acompiled-codesimulatorconvertstheHDLmodeltoanintermediateform (usuallyC)andthenusesaseparatecompilertocreateexecutablebinarycode(an executable).Thisresultsinalongercompiletimebutshorterexecutiontimethan aninterpreted-codesimulator.Mostsimulatorstodayarecategorizedasnative-codecompiledsincetheybypasstheintermediaterepresentation(suchasC)andconvert theHDLdirectlytoanexecutablefortheworkstation.Native-codecompiledsimulatorsofferthefastestexecutiontime.Therearealsohybridsimulatorsthatcanbe configuredtorunininterpretedmodeorcompiledmodedependingonuserpreference. Therearemanymoreaspectstolearnabouttheworkingsofamodernlogicsimulator, andgurusinthisareawilldebateaboutthedetails,butthesedescriptionswillsufficefor thepurposeofunderstandingtheroleofthelogicsimulatorinSoCverification.
29
Chapter2 SimulationAccelerationreferstotheprocessofmappingthesynthesizableportion ofthedesignintoahardwareplatformspecificallydesignedtoincreaseperformance byevaluatingHDLconstructsinparallel.VerilogandVHDLareinherentlyparallellanguagessincetheyareusedtodescribehardwareoperationsthatinvolvemany concurrentoperations.Thisparallelismiswhatmakesthemdifferentfromthe sequentialnatureofmostsoftwareprogramminglanguages.Simulationacceleration takesadvantageofthisparallelism.Sincelogicsimulatorsrunsonaworkstation, theparallelconstructsendupbeingserializedintotheCPUinstructionsetofthe workstation.Overtheyears,manypeoplehavetriedtousemultipleprocessorsor multipleworkstationstotakeadvantageoftheparallelismofVerilogandVHDL simulation,butwithlittlesuccess;however,simulationaccelerationusingcustom hardwareinsteadofgeneral-purposeprocessorshashadgoodsuccessinproviding fastersimulation.Insimulationaccelerationtherearetwocomponentsthatcontributetotheoverallsimulationtime;theportionofthesimulationthatcanbemapped intothecustomhardware,andtheremainingportionsofthesimulationthatare notmappedintohardware.Thelaterportionsruninasoftwaresimulatorandwork inconjunctionwiththehardwareplatformtoexchangesimulationdata.Simulationaccelerationprovideshigherperformancebecauseitremovessimulationevents fromthelogicsimulatorandevaluatesthemusingparallelprocessinghardware.This removalofsimulationeventsincreasesperformance.Theeasywaytothinkaboutaccelerationistosaythatwhatevercanberemovedfromthesimulatorexecutesinzero time.Thefinalperformanceisdeterminedbythepercentageofthesimulationthatis leftrunninginthelogicsimulator.Forexample,if50%ofthesimulationeventscan beremovedfromthelogicsimulatorandevaluatedinzerotime,thenthetotalsimulationtimeiscutinhalf.ThesimulationaccelerationsetupisshowninFigure2-1.
30
HardwareandSoftwareDesignProcess
Logic
CPU
Testbench Memory
Clock
Workstation
Hardware Engine
(master)
(slave)
Figure2-1:Simulationacceleration
Thegoalofaccelerationistoincreaseperformance.Thefinalperformanceisbased onthespeedoftheaccelerationplatformandthepercentageofthesimulationthat canberuninsidetheaccelerationhardware.Inatypicalaccelerationsituation, someofthesimulationisleftontheworkstation.Theratioofthepercentageofthe simulationontheworkstationversusthepercentageofthedesignintheaccelerator determinesthefinalperformance.Anexampleofprofileoutputfromasimulatoris showninFigure2-2.DuetotestbenchandPLIprograms,thisexampleshowsthat onlya5Xspeedupcanbeexpectedfromsimulationacceleration. C8 > $finish at simulation time 27386280 ns -- Simulation execs 2738628 time steps in 128.17 sec (46.80 us/ts, 21367.15 ts/sec) -- Profile: TB=5.87% UTF=9.76% RCC=0.00% DUT=80.51% SYS=3.86% TB 3.20% in TBplatform.uPlatform.uProcSubSys.uProcCoreMod.uARM926EJS TB 2.14% in clkgen.sw_clks TB 0.53% in TBplatform.uTrickWrapper.uAHBTube UTF 9.76% in TBplatform.uPlatform.uProcSubSys.uProcCoreMod.uARM926EJS Simulation: cpu time = 128.17 secs, elapsed time = 285 secs, heap memory size = 453.05M bytes
Figure2-2:Examplesimulationprofile
31
Chapter2 Emulationreferstotheprocessofmappinganentiredesignintoahardwareplatform designedtoincreaseperformance.Thereisnoconstantconnectiontotheworkstationduringexecution,andtheemulatorreceivesnoinputfromtheworkstation.By eliminatingtheconnectiontotheworkstation,thehardwareplatformnowrunsatits fullspeedanddoesnotneedtowaitforanycommunication. Therearethreecommonlyusedtechniquesforsimulationaccelerationandemulation.Oneusesanarrayofcustomprocessorstoexecutethedesignandothertwouse anarrayofFPGAstoachieveparallelism.Inthepast,accelerationandemulation technologyconsistedofperformingawire-for-wireandgate-for-gatemappingofan RTLdesignintoagatelevelrepresentationforFPGAsjustasisdoneinprototyping. Re-timingissues,glitches,andraceconditionsoftenmadesuchtechnologydifficulttouseandledtoitsdemise.MorerecentadvancesuseeitherASICsorFPGAs toimplementmanyprocessingelementsthatevaluateonlyasmallportionofthe design.Thesecomputingelementsarescheduledinasimilarwaytohowasoftware logicsimulatorworkstoproducethesamesimulationresultsatmuchhigherspeed duetoparallelprocessing. Therearemanydifferentdefinitionsofwhatthetermemulationmeansandhow itrelatestosimulationacceleration.Followingaresomeofthecharacteristicsthat defineemulation: ■
Thereisnotestbenchrunningontheworkstation
■
Theemulatoristhemasterandtheworkstationistheslave
■
Allclocksaregeneratedbytheemulator,nottheworkstation
Simulationaccelerationhastheoppositecharacteristics: ■
Thereissometestbenchrunningontheworkstation(hopefullynotmuch,to getgoodperformance)
■
Theworkstationisthemasterandtheemulator/acceleratoristheslave
■
Allclocksaregeneratedbythetestbenchrunningontheworkstation
32
HardwareandSoftwareDesignProcess In-Circuitreferstotheuseofexternalhardwarecoupledtoanemulatorforthe purposeofprovidingamorerealisticenvironmentforthedesignbeingsimulated. Thishardwarecommonlytakestheformofprintedcircuitboards,sometimescalled targetboardsoratargetsystem,andtestequipmentcabledtotheemulator.Targetless emulationreferstorunningwithnotestbenchinputfromtheworkstation,butalso notargetsystem.Withtargetlessemulationallstimulusgeneration(testbench)is synthesizableandrunsintheemulator.Therearealwaysgrayareasinthesedefinitionssincecurrentgenerationproductsperformbothsimulationaccelerationand emulationandcanswitchbetweenthesemodesbyissuingasimplecommand.For example,whenrunninginemulationmodeausermaypressCtrl+ctostoptheemulatorsotheremustbesomeconnectionbacktotheworkstation.Someemulatorsalso allowsynthesizabletestbenchestouseprintstatementssuchas$displayinVerilog toprintmessagesontheworkstationfordebuggingpurposes;anothersignofaloose connectionbacktotheworkstation.Diagramsfortargetlessemulationandin-circuit emulationareshowninFigures2-3and2-4,respectively.
on-demand workstation access
Logic
CPU Infrequent $display(); $readmemh();
Clock
Memory Testbench
Workstation
Hardware Engine
(slave)
(master)
Figure2-3:Targetlessemulation
33
Chapter2
Target Boards on-demand workstation access
CPU Infrequent $display(); $readmemh();
Logic
Clock
Memory
Testbench
Workstation
Hardware Engine
(slave)
(master)
Figure2-4:In-circuitemulation
HardwarePrototypereferstotheconstructionofcustomhardwareortheuseof reusablehardware(breadboard)toconstructahardwarerepresentationofthesystem. Prototypeisarepresentationofthefinalsystemthatcanbeconstructedfasterand availablesoonerthantheactualproduct.Thisisachievedbymakingtradeoffsin productrequirementssuchasperformanceandpackaging.Acommontradeofffora prototypeistosavetimebysubstitutingprogrammablelogicforASICs.Thisallows thedesigntobeavailablesooner,butinarepresentationthatrunsslowerandis muchlarger.
34
HardwareandSoftwareDesignProcess
SoftwareEngineer’sViewoftheWorld Tothesoftwareengineertheentireworldrevolvesaroundtheprogrammingmodelof theembeddedsystem.Hereisacomputerscientist’sdefinition: Aprogrammingmodelisamodelusedtoprovidecertainoperationstotheprogramminglevelaboveandrequiringimplementationsofallofthearchitecturesbelow. Practically,theprogrammingmodelforamicroprocessorconsistsofthekeyattributes oftheCPUthatarenecessarytoabstracttheprocessorforthepurposeofsoftware development.Asanexampleofaprogrammingmodel,considertheARM9E-SCPU fromARM. Fromthetechnicalreferencemanual(TRM)wefindthattheARM9E-Simplements theARMv5TEinstructionsetthatincludesthe32-bitARMinstructionsetandthe 16-bitThumbinstructionset.Thedetailsoftheinstructionsetareanimportantpart oftheprogrammingmodel.AlsocoveredbytheprogrammingmodelaredetailsrelatedtotheoperatingmodesoftheCPU,memoryformat,datatypes,generalpurpose registerset,statusregisters,andinterruptsandexceptions.Allofthesedetailsofthe microprocessorareimportanttothesoftwareengineer. Beyondthemicroprocessor,softwareengineersareinterestedinthememorymapfor theembeddedsystem.Fora32-bitaddressspace,thereis4GBofphysicalmemory thatcanbeaccessed.Embeddedsystemsuseonlyasubsetofthisphysicaladdress spaceandthememorymapdefineswhereintheaddressspacevarioustypesofmemoryandotherhardwarecontrolregistersarelocated.Thememorymapmayalsodefine whathappensifaddresseswherenophysicalmemoryexistsareaccessed.Figure2-5 showsanexampleofamemorymap.
35
Chapter2 0xFFFFFFFF
System Bus
0x100C0000
Alternate SRAM 0x10040000
Control Registers 0x10000000
SDRAM
0x80000
SRAM
0x0
Figure2-5:Examplememorymap
CommontypesofmemoryinanembeddedsystemareROMtoholdtheinitialsoftwaretorunontheCPU,flashmemory,DRAM,SDRAM,orDDRmemory,SRAM forfastdatastorage,andmemorymappedperipherals.Peripheralscanbeanydedicatedhardwarethatissoftwareprogrammable.Thesecanrangefromsmallfunctions suchasaUARTortimertomorecomplexhardwarelikeaJPEGencoder/decoder.
36
HardwareandSoftwareDesignProcess Thecombinationofthemicroprocessorprogrammingmodel,thememorymap, andtheindividualhardwarecontrolregistersformthesoftwareengineer’sviewof theembeddedsystem.Thisinformationbecomesthelawtofollowforallsoftware developmentandisavailableintheformoftechnicalmanualsonthemicroprocessorcombinedwiththesystemspecificationssuppliedbythehardwareengineersor systemarchitects.
HardwareEngineer’sViewoftheWorld Hardwareengineershaveadifferentviewoftheembeddedsystem.Wesawhowthe internaloperationofthemicroprocessorisimportanttosoftwareengineers.The internalworkingsoftheCPUaremuchlessimportanttohardwareengineers,and thebusinterfaceiswhatmattersmost.Forthehardwaredesigntoworkcorrectly,the logicconnectedtothemicroprocessormustobeyalloftherulesofthebusprotocol. Iftherulesofthebusprotocolareobeyed,thedetailsofwhatthesoftwareisdoing arenotimportant.Tohardwareengineers,themicroprocessorisnothingmorethana bustransactiongenerator. Allmicroprocessorsusesometypeofprotocoltoreadandwritememory.Atthe hardwareengineer’slevel,themicroprocessorisviewedasaseriesofmemoryreads andwrites.Thesereadsandwritesareusedforfetchinginstructions,accessingperipherals,doingDMAtransfers,andmanyotherthings,butintheendtheyare nothingmorethanasequenceofreadsandwritesonthebus.
Example Todemonstratethedifferencesinhowsoftwareandhardwareengineersviewthe world,lookatthefollowingexample.Consideraregisterthatisprogrammablefrom software.TheregisteristhedefinitionofacontrolregisterfromtheARM926EJ-S MMU.Itisaccessedusingcoprocessorreadandwriteinstructions.ThebitdefinitionsoftheregisterareshowninFigure2-6.Softwareengineerswritinglow-level codetypicallyhavetensorhundredsofsuchregisterstoprogram.Theycanbeaccessedusingcoprocessorinstructionsorbememorymappedandaccessedusingdata loadandstoreinstructions.
37
Chapter2 CP15 Translation Table Base Registers r2
Translation Table Base 31
Should Be Zero 14 13
0
MRC p15, 0, R1, c2, c0, 0 ; read TTBR MCR p15, 0, R1, c2, c0, 0 ; write TTBR
Figure2-6:Exampleprogrammableregister
Thehardwaredesignlooksmuchdifferent.Theparticularblockofcodeimplementingtheregisterwillnotuseall32bitsofaddress,onlyasubsetoftheaddress.Also, notallbitsoftheregistermaybeeasilylocatedasa32-bitwideregisterinverilog. ThepathfromtheCPUbustothisregisterisnotsoeasytotrace.Afragmentofhow aconfigurationregisterforamemorycontrollerisimplementedisshowninFigure2-7. Forthis32-bitregisteronly8bitsaremeaningfulandtheregisterisimplementedas multiplesmallerregistersthatareconcatenatedtogethertoformtheregistervalue whenreadfromsoftware. // // // // //
----------------------------------------------------------------------------Offset | Register Name | R/W | Valid Bits | Reset | Description ----------------------------------------------------------------------------0x00 MPMCStConfig R/W 20:19,8:6, 0x0 Static Memory 3,1:0 configuration
assign MPMCStConfig
= {HWDataReg20to19Q, 10'b0000000000, HWDataReg8to6Q[8:6], 2'b00, HWDataReg4to0Q[3], 1'b0, HWDataReg4to0Q[1:0]);
Figure2-7:Exampleregisterimplementation
38
HardwareandSoftwareDesignProcess
SoftwareDevelopmentTools Editor OK,maybeaneditorisnotreallyasoftwaredevelopmenttool,butmostsoftware engineersspendmoretimewiththeireditorthananyotherprogram.It’sabitof aparadoxtoconsiderthepowerandcomplexityofdesktopapplicationsavailable todaytocomputerusers(thinkofthefeaturebloatofMicrosoft®Wordforexample), butatthesametimeengineersthatmakealivingwithcomputersstillusethesame editorsthatwereusedthirtyyearsago,programslikeviandemacs.Experiencehas shownthatsoftwareengineersaremorelikelytohavebettereditorsthantheirhardwarecounterparts,butthereisusuallyroomforproductivityimprovements.
SourceCodeRevisionControl Oncesomesoftwareexists,thenextstepistomakesureitdoesn’tgetlostorotherwisebrokenbyaccident.Revisioncontrolisameansofrecordingincrementalsteps duringsoftwaredevelopment.Usingrevisioncontrol,itbecomespossibletoundo changesthathavebeenmadeifthechangescauseproblems.Itcanalsoprovidea meanstolimitwhocanmodifyparticularfilesaswellastoidentifywhomadeaparticularchange. Thisispotentiallyveryusefulinthedevelopmentofsoftwarewheretherearemultipledeveloperswhomaybeworkingindifferentlocations.Withoutit,projects operateinanenvironmentwherecontrolofthesourcecodeismaintainedentirely byword-of-mouth. Therearemanysuchrevisioncontrolsystemsrangingfromfreetoexpensive.Concurrentversionssystem(CVS)isthedefactostandardintheworldtoday.Itrunsonmost everyplatformandprovidesaneasyclient/servermodelforpeopleinanylocation.It maynotbethebest,butitisthemostcommonlyknown.Someexamplesofhowtouse CVStocheckoutfilesandviewfilehistoryaregiveninFigure2-8.
39
Chapter2 x8:46 % cvs update Makefile U Makefile x8:44 % cvs status Makefile =================================================================== File: Makefile Status: Locally Modified Working revision: Repository revision: Sticky Tag: Sticky Date: Sticky Options:
1.20 1.20 (none) (none) (none)
/home/tools/x-cvs-repos/arm/Makefile,v
sp8:47 % cvs log Makefile RCS file: /home/tools/x-cvs-repos/arm/Makefile,v Working file: Makefile head: 1.20 branch: locks: strict access list: symbolic names: release_2002_1_1: 1.17 release_2002_1_1: 1.16 release_2001_3_2: 1.9 start: 1.1.1.1 keyword substitution: kv total revisions: 21; selected revisions: 21 description: ---------------------------revision 1.20 date: 2002/11/19 18:37:30; author: toolman; state: Exp; Fixed gtk target for Linux platform.
lines: +1 -3
Figure2-8:UsingCVS
Compiler AcompilerisatooltotranslateC/C++andassemblylanguagetextfilesintoaprogramthecomputercanrun.Thisformatiscalledmachinelanguageandisstoredina filecalledanexecutablefile. Embeddedsystemprojectsnormallyuseacrosscompiler.Acrosscompilerrunsona computerwithadifferentprocessortypethantheoneusedintheembeddedsystem. Thismeanstheexecutablefileproducedbythecompilercannotberunonthemachinethatcreateditandmustbetransferredtotheembeddedsystemtoberun.
40
HardwareandSoftwareDesignProcess
Debugger Unfortunately,notallprogramsruncorrectlythefirsttime.Itmaybemoreaccurate tosaythat100%ofallprogramsdonotworkcorrectly.Evenifaprogramseemsto workcorrectly,itisimpossibletotestitundereverysituationandcircumstance,soit couldneverbeproventoalwaysworkcorrectly.Adebuggerhelpssoftwareengineers findenoughproblemssothatthecoderunscorrectlyformostofthesituationsit faces.Adebuggerallowsthesoftwareengineertoinspectthesequenceofthecode, theCPUregistersandmemorytofigureoutwhatishappening. Commondebuggerfunctionsinclude: ■
Viewandchangeregisters
■
Viewandchangememory
■
Displayfunctioncallstack(backtrace)
■
Setbreakpoints
■
Setwatchpoints(databreakpoint)
■
SinglestepbyCstatementorassemblyinstruction
■
InterrupttherunningCPU
Simulator Forasoftwareengineer,asimulatormodelstheinternalworkingsoftheCPU.CPU simulatorsareoftenprovidedalongwithsoftwaredevelopmenttoolsusedtocompilesoftware.Anothernameforthistypeofsimulatorisaninstructionsetsimulator (ISS).Oneapplicationofthesimulatorallowscompilerwriterstotestcompiler outputwithouthavingaCPUchipavailable.Whenanewprocessorisdesignedthe compilermustbecompletedearlysoprogramscanberunimmediatelyonthefirst implementationoftheprocessor.TheISSisalsousedbymicroprocessordesignersto cross-checktheCPUdesignagainstthemodeltospotdifferences.Usingthesimulatorasagoldenreferencemodelisagoodwaytoverifyanddebugmicroprocessor operation.Softwareengineersmayalsouseasimulatortotestcodewithouthaving achiporboardavailable.Forearlysoftwaredesignitiseasytoruntheinitialboot codeonasimulatortomakesurethedetailsoftheCPUsetuparecorrect.
41
Chapter2
DevelopmentBoard AdevelopmentboardcontainsaCPU,memoryandperipherals,andawayto downloadanddebugprograms.Thedevelopmentboardisthedefactostandard forsoftwareengineers.Whenanewprojectstartsoranewprocessorisintroduced softwareengineerswillimmediatelygetadevelopmentboardandstarttryingtobuild andrunprogramstomakesurethecompilationanddebuggingtoolsareworking.MicroprocessorvendorsprovidetheseboardsforCPUevaluationandtoshowtheCPU isreal,itworks,andsoftwareengineerscantryitoutveryeasily.
IntegratedDevelopmentEnvironment(IDE) TheIDEcombinesallnecessaryfunctionsandtoolsforthesoftwareengineerintoa single,integratedproductthatnormallyincludesprojectediting,sourcecodesearch, filenavigation,fileediting,andprojectbuilding.Theideaisthatlearningasingle toolmakeslifeeasierthanhavingdifferentapplicationsthatmustberunmanually foreachfunction.Efficientlynavigatingsourcecodesavestimeandmakesiteasierto understandhowthesoftwareworks.
SoftwareDebuggingConnections Softwaredebuggersconnecttoembeddedsystemsinmanydifferentways.Even thoughthedebuggingconceptsarethesameforallconnectionmethods,itisusefultounderstandtheunderlyingmechanismthedebuggerisusingtocommunicate withthemicroprocessorandthememory.Debuggersworkbysendingcommandsto theembeddedsystemtoperformoperationsontheCPUregistersandmemory.The twomostusedfunctionsarereadingCPUregistersandreadingmemory.Withthis informationthedebuggercandeterminethecontextofsoftwareexecution.Almost alldebuggercommandsrequirememorytoberead;examplesincludeprintingvariablesandviewingdisassembledsoftwareinstructions.Otherdebuggerfunctionsare thelessoftenusedregisterandmemorywriteandcontroloperationstostartandstop softwareexecution.Followingaresomeofthewayssoftwaredebuggersexchange informationwithembeddedsystemstoprovidedebuggingfunctions.
42
HardwareandSoftwareDesignProcess UnlikedebuggingaprogramonaPCorworkstation,theembeddedsystemsoftware debuggerrunsonanothermachine,nottheembeddedprocessor.Thisconceptis oftencalledremotedebugging.Thetargetprocessormaynotworkcorrectlyandmay nothaveanykeyboardandmonitoranyway.
JTAG OnewaydebuggerscanconnecttoaCPUisJTAG.JTAGwasdevelopedasthe IEEE1149.1standardforboundaryscantestingofprintedcircuitboards.Itusesfive wirestosendandreceiveserialcommandsanddatafromdevicesthatimplementthe JTAGstandard.MicroprocessorsusethesameJTAGprotocolnottotestPCBconnections,butinsteadtosendandreceiveinformationfromtheembeddedsystem. JTAGdebuggerconnectionsprovideawaytolinkamachinerunningthedebugger toaconnectorontheembeddedsystemthatimplementstherequiredJTAGsignals. SincetheserialJTAGsequencescanbeverylong,asmallboxthatconvertscommandsintotheserialdatasignalsisusedtoprovidegoodperformance.TouseJTAG debugging,themicroprocessormustimplementspecialhardwareforthispurpose.
Stub Forprocessorsthathavenohardwaresupportfordebugging,thesoftwaredebugger cancommunicatewiththeembeddedsystemusingastub.Astubissomespecial codethattheuseraddstotheembeddedsoftwareforthesolepurposeofcommunicatingwiththedebugger.Thestubsoftwarecommunicateswiththedebuggerusinga communicationchannelsuchasaserialportorEthernetconnection.Thestubperformsdebuggerrequestssuchasreadingregistersandmemoryandsendstheresults tothedebuggerviathecommunicationschannel.Italsousesaninterruptservice routinetogaincontrolwhenthedebuggerwantstostop,suchaswhentheuserhits Ctrl+ctostopexecution.Astubissomewhatintrusivesincespecialcodethatisnot requiredforproductoperationmustberunontheembeddedsystem,butwhenthe stubisworkingitisaveryflexiblewaytodebugsoftware.
43
Chapter2
DirectConnection WhenusinganISSthereisnoneedforeitheraJTAGconnectionorastubbetween thesoftwaredebuggerandtheembeddedsystem.SincetheISSisasoftwaremodel, thedebuggercaneasilyaccessallrequiredinformationbymakingfunctioncallsto theISS.Adirectconnectionisthebestconnectionbetweendebuggerandembedded systemsinceitdoesnotrequireanyspecialcodetobeinsertedintotheembeddedsoftwareanddoesnotrequireanyhardwareconnectionorcables.Alldebugger requestscanbesatisfiedimmediately,withouttheneedtomodifythestateofthe CPUorenteranyspecialdebugmode.Thisprovidesthemostaccurateandrealistic pictureofhowtheembeddedsystemoperates.
TypesofSoftware Fivedistincttypesofembeddedsystemsoftwarehavebeenidentified,withsoftware designproceedinginthisorder.Thesoftwarecontent(linesofcode)increaseswith eachstep: ■
Systeminitializationsoftwareandhardwareabstractionlayer(HAL)
■
Hardwarediagnostictestsuite
■
Real-timeoperatingsystem(RTOS)
■
RTOSdevicedrivers
■
Applicationsoftware
SystemInitializationandHAL Thefirstsoftware-codingtaskistowriteandtestthemicroprocessorinitialization software.Thiscodeincludesconfiguringtheoperatingmodesandperipherals(things likecacheconfiguration,memoryprotectionunitorMMUprogramming,interrupt controllerconfiguration,timersetup,andDRAMinitialization). Thehardwareabstractionlayeristhenextlayerofsoftwarethatworkswiththe initializationcodetoprovideacommoninterfaceforhigher-levelsoftwaretousefor hardware-specificfunctionalityafterthesystemisinitialized.TheHALabstractsthe underlyinghardwareoftheprocessorarchitectureandtheplatformtoalevelsufficientfortheRTOSkerneltobeported. 44
HardwareandSoftwareDesignProcess
DiagnosticSuite OncetheinitializationsoftwareandHALarestable,thenextphaseofsoftwaredevelopmentconsistsofdevelopingadetailedtestsuiteforthehardwaredesign.Inthe pastthisusuallytooktheformofahardwaretestbench.Whiletestbenchesarestill necessarytoprovidestimulusforexternalinterfacessuchasnetworkingprotocols, thesoftwarenowservesasthetestbenchfortheCPUcorebus.Acomprehensiveset ofdiagnostictestsaredevelopedtoverifyeachsubsystemandperipheral.Thisstarts withthememorysubsystem,progressestointerrupttesting,thenmovestootherIP blocksliketimers,DMAcontrollers,videocontrollers,MPEGdecoders,andother specialtyhardware.Mostofthesetestsdonotseetheirwayintothefinalproduct, buttheyareveryimportantbecausetheybuildthecaseforasolidhardwaredesign. Creatingtheprogramsgivesthesoftwareengineersaverygoodunderstandingof thehardwareandservesasachancetolearnaboutthehardwarespecificsinamore secludedenvironment.Diagnosticscanalsobereusedtoverifyoperationwhenthe hardwarearrives.
Real-TimeOperatingSystem(RTOS) ThefirstassumptionoftheRTOSengineeristhatthehardwareisstable.Thisistrueif thediagnosticsuitewasdonewell.TheinitialRTOSworkconsistsofjustgettingitto bootontheplatform.HowbigatasktheRTOSisdependsonhowstandardtheplatformisandhowwelltheHALwasdesigned.Duringtheinitialportingphase,device driversforapplicationspecifichardwaremaybemissing,buttheRTOScanstillboot.
DeviceDriversandApplicationSoftware Oncetheplatformiscomplete,itistimetotestdevicedriversandapplicationsoftware.Attheapplicationlevelthehardwareisassumedcorrect,andthetaskbecomes morelikeworkstationprogramming.Iftheapplicationcrashes,itcanbesafelyassumeditisasoftwareproblem,notsomethingwrongwiththehardware. Applicationsusuallywanttointerfacetorealnetworktraffic,seethingsonthe screen,andusethepointerormouse.Duringapplicationdevelopment,hardwareand lower-levelsoftwarebugsarefewandfarbetweenandthesoftwareengineerisfocusedonprovidingrobustapplicationswithdifferentiatingfeaturesforendusers.
45
Chapter2 Inthecomingchapters,methodstoverifyalltypesofsoftwarewillbediscussed,but theprimaryfocuswillbeonhardwaredependentsoftware.Thesoftwarethatinteractsmostwiththespecificsofthehardwaredesignisthecriticalplaceintheproject wherehardwareandsoftwareintegrationismostimportant.
SoftwareDevelopmentProcess Writingembeddedsoftwareisdifferentfromwritingaprogramforaworkstationor PC.Programmingaspectsthatarenotimportantonaworkstationsuddenlybecome importantwhenworkingintheconstrainedenvironmentofanembeddedsystem. Asanexample,considerthememoryconstraintsofanembeddedproject.Memory layoutisimportant,howmuchmemoryusedisimportant,howtogettherightdata intothememoryisimportant.Thisisdifferentfromaworkstationprogramrunning onamodernoperatingsystemwithlargeamountsofvirtualmemory. Forthelastfewyears,Ihavebeenfollowingthedevelopmentsofanopensourcetool thatisusedtoorganizesoftwareprojects,analyzethecode,makeiteasiertonavigate, buildanddebug.Inthesoftwareworld,itwouldbeclassifiedasanintegrateddevelopmentenvironment(IDE).ItookinterestinitbecauseIhadextendedittowork withtheVeriloglanguageanduseditformydailydevelopmenttasks.Likemostopen sourceprojects(includingtheLinuxkernel)thedevelopmenteffortwasallvolunteeranddrivenbye-maillists.Oneofthee-mailsonthelistremindedmeagain howdifferentembeddedsoftwaredevelopmentisfromwritingapplicationsforPC orworkstation.Anembeddedsoftwareengineerstartedoutbyremindingeverybody thathedoesnotreallylikethesekindofgraphicalIDEtoolsandhereallydoesn’t needthemsincehecandoeverythingusingthecommandline,butinthiscaseit savedhim.Hewentontotellhowhisproducthadrunoutofmemoryandhewas requiredtoaddonemorefeature.Programmingthefeaturewastrivial,butsqueezing itintomemorywasnot.Usingthetool,hewasabletoanalyzehisprojectandfreeup 1.1kbytesofunusedcode.Intheworkstationworldwhereawebbrowserchewsup 20MBofmemory,1.1kbytesmaynotseemlikemuch,butitwasenoughforthisguy toaddthefeatureandfinishtheproject.
46
HardwareandSoftwareDesignProcess Softwareengineerswritecodeinatexteditorusingtheprogramminglanguageof choice.Thenextstepistocompilethesourcecodeintoobjectfiles,thenlinkthose objectfilestogether,usuallywithsomelibrariesprovidedbythecompilerandoperating system.Theresultisanexecutablefilethatcanbedirectlyexecutedonthecomputer. ThisisthesameprocessfordevelopingsoftwarethatrunsonaPCorworkstation. Onefileformatoftenusedinbothworkstationprogramsandembeddedprogramsis ELF(executableandlinkingformat).ELFfilesareobjectfilesproducedbythecompilerandlinkerthatarebinaryrepresentationsofprogramsintendedtobeexecuted directlyonaprocessor.ThestandardforELFdefinesthefileformatincludinga headeranddifferentsectionsthatallowthefilestobeeasilyidentifiedandunderstoodinamachineindependentway.Understandingthedetailsofthefileformatis notusuallyrequiredtodesignanddevelopsoftwareforembeddedsystems,butsome familiarityisuseful. Theworkstationprogramcaneasilyusepre-compiledlibrariestoaccesscommon functionsprovidedbytheoperatingsystemtoaccessgraphicsdisplays,network interfacesandinputdeviceslikekeyboardandmouse.Workstationsalsousedynamic linkingtoavoidduplicationofdatabyloadinglibrariesatruntime,whentheprogramactuallyneedsthem.Dynamiclinkingdoesnotincludeallofthelibrariesinside thefinalexecutable,butincludesonlyreferencesaboutwheretofindthem.This keepsexecutablefilesizesmallandtheoperatingsystemcanloadthelibrarieswhen theyareactuallyneeded. Anexampleofrunning“helloworld”onaLinuxworkstationisshowninFigure2-9. % cat hello.c main() { printf("Hello World\n"); } % gcc –o hello hello.c % ldd hello libc.so.6 => /lib/libc.so.6 (0x4001e000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) % hello Hello World
Figure2-9:Exampleworkstationprogram 47
Chapter2 Contrastthissoftwaredevelopmentprocesstothatusedwhenwritingadiagnostic programforanembeddedprocessor.Thediagnosticprogramhasnooperatingsystem andnostandardlibrariestocommunicatewithinputandoutputdevices(ifthereare any).Engineerswilloftencallthisoperatingenvironmentbarehardware.Thecode usuallygoessomethinglikethefollowing: ■
Softwarestartsexecutionfromtheresetvector.Itisdifferentforeacharchitecture,butinthecaseofARMitisusuallyaddress0.
■
Setupinterruptvectorsbyprogrammingaddressesofinterrupthandlersinto thetableofvectors.Asanexample,intheARMarchitectureaddress0x18 containstheaddressofthehandlerforthenIRQinterrupt(normalinterrupt) signal.
■
Configureandenableotherhardware.Thiscanincludeenablingtheinstructionanddatacacheandprogrammingthememorycontroller.
■
Setupastackandcall__maintoprepareforprogramexecution.
■
Startexecutionfromthemain()Cfunction.
EverythinguptothefinalstepisdoneinassemblylanguagetopreparetheCPUto runeventhesimplestCprogramstartingfrommain.Thisflowprovidesalookinto whatisnecessarytoworkintheworldofembeddedsystemscomparedtoworkstation programming. Nowlet’stakealookatthecompileprocessforanembeddedprogramtounderstand howitdiffersfromaworkstationprogram.IsawthemessageshowninFigure2-10in anewsgrouprelatedtoembeddedsoftwaredevelopment.
48
HardwareandSoftwareDesignProcess Hi, I am new to this, but I understand that I need a certain amount of assembly code to "boot" the arm720t processor. Now this should be a standard thing for you guys, so although I have been unable to dig up any info about this so far, I was hoping you could either point me in the right direction, or perhaps even help me aquire the needed code. Or do I really need to "re-invent the wheel" so to speak!? The arm7 must be set up properly before it can execute my c-program. As all arm720t cpus are equal (?!) so should their init code be (!?), so why can't I just download this code from www.arm.com !? ...or from any site dealing with arm development—kinda hard to develop without even getting the thing running. I'm sure this is obvious to you guys, but please help me out. In danger of having misunderstood how things work, I am open to all input on the matter!
Figure2-10:Anengineertryingtounderstand embeddedsoftwaredevelopment OncethediagnosticprogramiscodedinCandassemblylanguageitiscompiledinto objectfilesforthetargetprocessor.Thisisthesameasaworkstationprogram.Figures 2-11and2-12showthecompilationcommandsforbothCandassemblylanguage requiredforasmalltestprogram. armasm -32 -bigend -checkreglist -CPU 5T -keep -apcs /inter -g \ -i include init.s -list init.lst -o init.o armcc -ansi -c -cpu 5T -zo -bigend -fy -g -O2 -I include main.c -o main.o
Figure2-11:ExamplecommandstocompileCand assemblylanguageintoobjectfiles Thefirstcommandisanexampleofanassemblylanguagefilethatisusedtoinitialize theCPU.ThesecondcommandisaCfilethatcontainsmain().Thenextstepisto takealloftheobjectfilesandlinkthemtogethertoformanELFfilenamedtest.elf.
49
Chapter2 armlink -debug -scatter link.txt -remove -noscanlib -info \ sizes,totals,veneers,unused -map -symbols -xref main.o cp15init.o \ /tools/ads11/common/lib/armlib/c_a__un.b \ /tools/ads11/common/lib/armlib/f_a_m.b \ /tools/ads11/common/lib/armlib/m_a_mu.b -list test.map -o test.elf
Figure2-12:LinkingobjectfilestoproduceanELFfile Forembeddedsystems,thememorylayoutisextremelyimportant.Onaworkstation thesoftwareengineerdoesnotcarewhereinmemorytheprogramisloadedorabout anythingrelatedtothephysicaladdressesofthehardware.Inembeddedprograms, thesoftwareengineermustusealinkmapasshowninFigure2-13tospecifythe locationofthememoryandwheretoputROM,RAM,andstackdata. LR_1 0x0 { E_INIT_VEC 0x00 { int_vectors.o(+RO) } E_RO +0 { .ANY(+RO) } E_TABLE +0 UNINIT { init.o(image) } E_RW 0xFFFE0300 { .ANY(+RW) } ER_ZI +0 { *(+ZI) } }
Figure2-13:Examplelinkmap
50
HardwareandSoftwareDesignProcess OncetheELFfileiscompleteandlocatedcorrectlyinmemory,thefinalstepisto translatetheELFfileintoaformatthatcanbeusedtoprogramaflashmemoryor otherwiseloadthedataintotheembeddedsystem.Sincethisbookisfocusedonthe verificationofhardwareandsoftwarebeforeanychipsorboardsareconstructed,the ELFfilemustbetranslatedintoaformatthatissuitableforaverificationplatform suchasalogicsimulatororemulator. Alogicsimulationenvironmentwillcontainmemorymodelsforthevarioustypesof memoryinthedesign.Commonpracticeistoloadthecodeintothesememoriesat thestartofsimulation.Onewaytodothisistogeneratehexadecimalmemoryfiles thatcanbereaddirectlyintothememorymodelsasinFigure2-14. % fromelf -vhx -16x2 test.elf -output rom0.dat
Figure2-14:CommandtotranslateELFfileintoVerilogmemoryformat Verilogprovidesasystemtask,$readmemh,thatwillreadthememorydatafrom afileintothememoryarray.Thetasktakestwoarguments;thefilenameandthe memoryarraynametoloadthedatainto.AcodefragmentisshowninFigure2-15.
// Memory reg [15:0] data [0:16383]; initial $readmemh(“rom0.dat”, data);
Figure2-15:Loadingmemorydata TheaboveexampleshowsallofthestepstogofromCandassemblylanguagesourcecode toasimulationmemorymodelandfinallytoacompletesimulationoftheARMdesign. Whentheresetofthemicroprocessoriscompleteditwillbegintofetchinstructionsfrom memoryandexecutetheprogram.Understandinghowembeddedsoftwareisdevelopedand simulatediscrucialtounderstandingtheworldofhardware/softwareco-verification.
51
Chapter2
HardwareDevelopmentTools Therearetwoprimaryclassesoftoolsusedbyhardwareengineers;thosedealingwith designimplementationandthosedealingwithdesignverification.Implementationis theprocessofdescribingadesignandtransformingthedesigndescriptionintonecessaryformatstomanufactureachiporaboard.Verificationistheprocessofmaking surethedesignworks,itdoesnotcrash,anditdoesthisaccordingtothespecification.Forourpurposesdesignverificationisthefocus,solet’sreviewsomeofthemost commonverificationtools. Wehavealreadydiscussedtheverificationplatformasthemethodusedtoexecute adescriptionofthehardwaredesign.ThemostcommonplatformistheVerilogand VHDLsimulatorrunningonaPCorworkstation.Wealsodiscussedotherplatforms thatprovidefeaturesandperformancebeyondthesimulatorforthepurposesof simulationaccelerationandin-circuitemulation.Anentireindustryhassprungup toprovideproductsthataugmenttheverificationplatformtohelpengineersdevelop testsandinterprettheresults.Therearespecialhardwareverificationlanguagesthat havebeendesignedtoimproveefficiencyandverificationquality.Othercommonly usedtoolsarecodecoverage,linttools,anddebuggingtoolstovisualizeresults. Anothertechniquequicklygainingpopularityistheuseofassertionsasawayto documentthedesigner’sassumptionsandthepropertiesofthedesign.Assertionsare apowerfultooltocrosscheckthedesign’sactualversusintendedbehavior.Assertions arealsovaluabletoverificationandsystemengineerstospecifytheintendedbehaviorofthesystemformallyandtomakesureitisactingaccordingtospecification.
Editor Justlikesoftwareengineers,mosthardwareengineersenteradesigndescriptionand testbenchusingatexteditor.Inthecaseofhardwareengineers,theeditingtechniquesusedareevenmoreprimitivethansoftwareengineers.Onereasonisthatmost hardwareengineersinvolvedincomplexdesignareworkinginaUNIXorLinux computingenvironment.Theprimaryfocusisonsimulationperformanceandthe abilitytoshareanetworkofworkstationsformosttasks.Anotherreasonfortheuse ofprimitiveeditingtoolsisthattherelativesizeofthemarketforengineersdevelop-
52
HardwareandSoftwareDesignProcess inginVerilogandVHDLismuchlessthanforasoftwaredevelopmentlanguagelike C++orJava.Thissmallermarketdoesnotgetasmuchattentionfrompotentialtool providers,especiallywhenyouconsiderthatmostofthemarketisfortoolsonUNIX andLinux.
SourceCodeRevisionControl Anotherareawherehardwareengineersarefollowingsoftwareengineersisthearea ofrevisioncontrol.Fromthebeginningoftime,softwaredevelopmenthasalways involvedmanagingalargenumberoftextfilesthatareproducedbymanyengineers andcompiledintooneormoreexecutableprograms.WiththeadventofVerilogand VHDLdesignflows,hardwareengineersaredealingwiththesametypeofenvironmentandmustuserevisioncontroltoolstomanageprojects.Manyrevisioncontrol toolsdonotreallycareifthefilesbeingcheckedinandoutareC++,Java,Verilog, orVHDL;afterall,afileisafileinanylanguage. Oneareawhererevisioncontrolmaydifferbetweenhardwareandsoftwareisthe amountofdatageneratedandstoredinhardwaredesignprojects.Compiledsoftware projectsconsistofmanyobjectfilesandsomenumberofexecutablefiles.Intermediateversionsofthebinaryfilesarenotusuallyarchivedpermanentlybecauseevena verylargeprojectcanberebuiltinamatterofhours.Hardwareprojectsaredifferentinthatitmaytakemanydaystocreatetheproperbinaryfilesanddatabasesto gofromHDLsourcecodetoalayoutofachiporboard.Thisprocessinvolvesmany differenttoolsandintermediatefiles.Thisleadstomuchmoredataandahigher importancetosavethisdatausingrevisioncontroltools.Fortheseapplications,it isimportanttofindatoolthatiscapableofhandlinglargeamountsofbinarydata efficiently.Thereareevensometoolsthatarespecificallytargetedforrevisioncontrolandsynchronizationofthesehardwaredatabasesacrossdesignsitesindifferent locations.
53
Chapter2
LintTools Linttoolsprovideaneasywaytoanalyzecodeforcommonmistakesbysimplyreadingthesourcecode.LintoriginatedintheearlydaysofCprogrammingasatoolto finderrorsbeforeruntime.Commonlintchecksincludeunuseddeclarations,type inconsistencies,usebeforedefinition,unreachablecode,ignoredreturnvalues,executionpathswithnoreturn,likelyinfiniteloops,andfallthroughcases. TheconceptoflinthasbeenextendedtoVerilogandVHDL.HDLlinttoolsprovide syntaxchecksandcodingstylechecksforsimulationandforsynthesis.Theycanalso detectraceconditionsandanalyzepropertiesoffinitestatemachines.Thebenefitof linttoolsistoanalyzethecodebeforethecompileandsimulationprocess.Thereis nothingworsethanstartingacompileofalargeprojectonlytofindoutthecompiler reportsacodingstyleproblem15minutesintothecompile,ortofindaraceconditionafterspending45minutestocompile,runsimulation,anduseawaveformto examinetheresult.Thegoaloflinttoolsistoachievehigherqualitycodeandcatch bugsearlierbyanalyzingthedesignbeforesimulation.
CodeCoverage Codecoverageisanothertechniqueborrowedfromsoftwareengineering.Coverage measuresareasofcodeexercisedwhentestsarerun.Basedoncoverageresults,additionaltestscanbedevelopedtoincreasecoverage.Indirectly,increasingcoverage willincreaseproductquality.ForHDLdesigntherearedifferenttypesofcoverage metricsthataremeasured,suchasstatementcoverage,branchcoverageandFSM coverage. Statementcoverage,alsocalledlinecoverage,measureswhichstatementswere executed.Statementisthemostbasictypeofcoveragemetricthatidentifiesunexecutedlinesofcodeforwhicheitheratestshouldbedevelopedormaybethecode canberemoved. Branchcoverageidentifiesconditionsthatweresatisfiedinbranchstatementssuch asif-then-elseandcasestatements.Itprovidesmoredetailthansimplestatement coverageandwillidentifyareaswherenotallpossibleconditionswereexercisedduringtesting.
54
HardwareandSoftwareDesignProcess FSMcoverageidentifiesdesignbehaviorbyidentifyingasetofstatementsasafinite statemachineandanalyzesthesetofFSMtransitionsthatoccur,tomakesureall behavioristested. CodecoverageisagoodwaytomakesureHDLcodeistested,butdoesn’tguarantee thedesigniscorrect.Evenifanincorrectdesigniswelltested,thisdoesnothelpin theendsincecoverageprovidesmetricsontheimplementationandhasnocorrelationbacktothedesignspecification.
DebuggingTools Unlikesomeoftheprevioustools,whichwereadoptedbyhardwareengineering fromsoftwareengineering,hardwaredebuggingisverydifferentfromsoftwaredebugging.Hardwareengineersoperateprimarilyinabatchenvironment.Whetherthey areworkingonlargesimulationsandusingafarmofcomputerstorunthejobsor workingonasinglemoduleandsimulatingitonalocalworkstation,thedebuggingis oftendoneusingpost-processingtechniques.Thismeansthatthestepofrunningthe simulationandcapturingresultsisseparatefromthestepofinterpretingresultsand debuggingproblems. Themostcommontoolforhardwaredebuggingisawaveformviewer.Hardware engineersoperatebyviewingsignalvaluesacrossaspanofsimulationtime.Aparticularsimulationtimeisreferencedbythesimulationtimestamp.Today’swaveform toolssavedataincompressedformatsformaximumperformanceandminimalfile size.TheIEEEVerilogstandarddoesspecifyaformatforwaveformfiles,knownas valuechangedump(VCD).Thistextformatisfineforsmallsimulationswithsmall amountsofdatatostore,butbecomesimpossibleforverylargesimulations.Forlarge simulations,proprietaryformatswithcompressionmustbeused.
VerificationLanguages Asdesigncomplexityhasincreased,ithasbecomemoredifficulttousesimulationlanguagestoverifyadesignadequately.Newlanguageshavebeenproposed anddevelopedthatarespecificallytunedforverification.Althoughtherearemany languages,oneofthemostpopularistheelanguageinventedbyYoavHollander, founderofVerisity,whichiscurrentlyincommitteetobecomeIEEEstandard1674.
55
Chapter2 ToimproveverificationofHDLdesigns,additionalcapabilitiesarerequiredthan whatiscurrentlyavailableinVerilog,VHDLorC.Someoftheusefulaspectsofa verificationlanguageare: ■
Easytoreusecodeformultipletests
■
Constraintsfordirected-randomtestgeneration
■
Builtinassertionsandfunctionalcoveragesupport
■
Datatypesthatprovidebothhigh-levelfunctionalityand interfacewelltoHDLdesigns
Theideaofalanguagethatisspecificforfunctionalverificationisappealingtomany projectsthatarestrugglingwithcomplexverificationandthemanualworkrequired todeveloptestcases.
Assertions Theuseofassertionstospecifydesignintentandthepropertiesofthedesignisrapidlygrowinginpopularity.Assertionsareapowerfultooltocrosscheckthedesign’s actualversusintendedbehavior.Theyarealsovaluabletoverificationandsystem engineerstoformallyspecifytheintendedbehaviorofthesystemandtomakesureit isactingaccordingtospecification.Recently,muchhasbeenwrittenaboutassertions andeventhoughtheconcepthasexistedasanadhoctechniqueforyears,thereis stillmuchconfusionovertheiruse.SomeofthedetailsofhowassertionsarespecifiedandimplementedarepresentedinChapter6.Fornow,let’stakealookatthe goalsandbenefitsofassertions. Assertionsprovideacommonformatformultipletools.Withoutanagreedupon methodofspecifyingassertions,thereisnochancetoautomateactivitiessuchas functionalcoveragemetrics,errorreportingandseveritylevels.Acommonassertionmethodologylendsitselftoincreasedautomationinbothformalverification andsimulation.Assertionscanalsobenefitfroma“write-once,run-anywhere” characteristic.Onceinsertedintoadesign,theycanbeusedbymanytoolsand arequiteportablefordesignreuse.
56
HardwareandSoftwareDesignProcess Assertionsarebetterthanapapertestplan.Verificationinvolvesderivingalistof featurestobetestedfromthedesignspecification.Mostprojectswriteatestplan thatdictateshowthefeaturesofthedesignaretobeverified.Thesefeaturesarethen documented,andatestcaseisdevelopedtoverifyeachfeature.Testcasesusually taketheformofatestbenchthatwillcausethedesiredfeaturetobeexercised.This directedapproachmaybeaugmentedwithrandomtesting.Assertionshelptoautomatethemanualprocessofrunningatestcase;visuallyverifyingthetesthascovered thefeature,andaddingthetesttotheregressionsuite.Withoutusingassertions, thereisoftennowaytoguaranteethetestcasestillexercisesthefeatureasthedesign evolves.Assertionsprovideamechanismtomeasurefunctionalcoverageandquantifytestresults. Assertionseasedebuggingandreducesimulationtime.Oneofthemainmotivationsforassertionsisreduceddebugtime.Sinceassertionsareawhite-box verificationtechniquetheyprovideincreasedvisibilityandcontrollabilityof thedesignundertest.Assertionswilldetectdesignerrorsassoonastheyoccur withoutwaitingfortheeffectstobepropagatedtothedeviceorsystemboundaries.Havinganimmediateindicationofaproblemcansavehoursoftryingto lookbackwardtogettotherootcauseoftheproblem.Theuseofassertionshas beenreportedtoreducedebugtimebyasmuchas50%.Assertionsclearlyhave thepotentialtoreducedebuggingtime.Arunawayormeaninglesssimulation canwastevaluablesimulationtimeifitoccursduringanovernightregressionrun whenmoretestscouldbegivenachancetorun. Assertionsverifyinterfacesbetweenblocks.Duringintegrationphases,assertions actaswatchdogsatthemoduleorblockinterfaceboundaries,makingsureeach blockobeystheagreeduponprotocol.Thiscanbecrucial,sinceitispossiblefor atestcasetopassbasedonthedataobservedatthechiporsystemboundaries evenwhenthereisaviolationofaprotocolinsidethedesign.Assertionshelp eliminatesuchsituations.
57
Chapter2 AssertionsareagoodRTLcodingpractice.Assertionsarefarmorevaluablethan commentsaloneinthedesign.Well-commentedcodemakesiteasiertomaintainandtounderstandtheintendedfunctionality.Assertionsgoonestepfurther bydocumentingexactlywhatthecodeisexpectedtodoinawaythatcanbe verifiedusingtoolsinsteadofahumanreadingthecommentsandattempting tounderstandifthecodeisworkingcorrectly.Assertionsarealsoagoodwayto sanitizethecodebeforecheck-in.Toolscanreadadesignfileanditsassertions andindicatepossibleproblemareas.Inthesamewayalintprogramchecksfor errorsinthecode’ssyntaxandstructure,assertionscanbeusedtocheckforerrors inthedesign’sbehavior. Assertionsprovidemanybenefitstosystem-levelengineers,designengineersdoingRTLcoding,andverificationengineers.Eachgroupbringstothetabledifferent knowledgeaboutthedesignanditsoperation,butacommonassertionmethodology allowsallpartiestobenefit.
DebuggingDefined Whatmostengineerscall“debugging”isactuallyacombinationoftwoseparateactivities:detectionanddebugging. Detectionistheprocessofdeterminingthatthereisaprobleminadesignortest. Mostprojectsusesimulationastheprimarymeanstodetectproblems.Morerecently, formalverificationtoolshavebeenusedtodetectdesignproblems.Insimulation, therearemanywaystocommunicatetheexistenceofaproblem.Twocommon waystocommunicateproblemsareusing$display(print)statementsandcomparingmemoryvalueswithexpectedresults.Thelackofa$displaystatementmayalso indicateaproblemwithatestresult.Assertionsareabetterwaytoformalizeadhoc detectionmethods. Toincreaseproblemdetection,engineerscanapplymorecomputersandsimulators oruseotherwaystoincreaseperformance,suchassimulationaccelerationoremulation.Inadditiontothebrute-forcemethodsofaddingmoreperformance,engineers candevelopadditionalorsmarterteststhatwilluncovertheproblemsfasterordetect themsooner.Examplesincludedevelopingdirectedteststhattargetaspecificarea,or usingfeedbackfromcodecoveragetoordertestsinabetterway.
58
HardwareandSoftwareDesignProcess Onceaproblemisdetecteditmustbedebugged.Debuggingistheprocessoffinding therootcauseoftheproblemandchangingthedesignortesttocorrectit.Debuggingisamanualprocesswhencomparedtodetection,becauseitrequiresanengineer tospendtimetofigureoutwhatisrightandwhatiswrong.Determiningthecorrect behaviorrequiresgoodknowledgeofthedesignandhowitshouldwork.Moreor fastercomputersdonothelpthedebuggingprocess.Toolscanhelpimprovetheprocessbyhelpingtoimprovetheunderstandingofthedesignandbyprovidingbetter viewsofsimulationresults. Theprimarymeansofdebuggingisviewingwaveforms.Usingthedetectioninformation,engineersmustlookatlogicsignalsatdifferenttimesduringsimulationandfind outwhatisnotworkingcorrectly.Thisprocesstypicallyconsumesmanyhoursoftryingtofigureouthowadesignissupposedtoworkandwhyitisnotworkingcorrectly. Waveformdumpingcanalsodrasticallyslowdownsimulationperformanceandresult inverylargedatafiles.Engineersspendtimetryingtodecidebeforesimulationwhen togeneratewaveformfilesandwhichpartsofthedesigntocapture. Inembeddedsystemverificationtherearetwoimportantareasthatmustbewell understoodandwillbereviewedinthefollowingsections;memorymodelsandmicroprocessormodels.
MemoryModels Memorymodelsfordiscretememorycomponentsaswellasembeddedon-chipmemoriesareusedinlogicsimulation.Aswehavediscussed,softwaremustbetranslated frombinaryformatslikeELFintosomethingsuitableformemorymodelstoload. TherearemanykindsofmemorymodelsrangingfromVerilogandVHDLmodels toCmodelsthatusethelogicsimulator’sCinterfacetocommunicatewithanHDL wrapper.AnexampleofasimplememorymodelisshowninFigure2-16. TheIEEEVCDformatforstoringwaveformdatadoesnothaveanyprovisionfor handlingthecontentsofmemories.Tobetterunderstandmemoryactivity,special handingmustbedone.Onealternativeistouseaproprietarywaveformfileformat thatiscapableofstoringmemoryhistory.Mosthardwaredebuggingtoolsincluded withlogicsimulatorsaswellasindependentdebuggingtoolsprovidethisprovision. Forexample,DebussyfromNovasSoftwareprovidesthe$fsdbDumpMemtodump
59
Chapter2 // Simple read-only memory module ROM(A, OE, D); input [13:0] A; input OE; output [7:0] D;
// Address // Output enable // Data output
reg [7:0] D; reg [7:0] Mem [0:(16 * 1024)]; // memory array // Read from memory always @(OE or A) begin if (OE) D = Mem[A]; else D = 8'bz; end endmodule
Figure2-16:Simplememorymodel memorycontentsintoitsproprietaryfileformatandtheSynopsysVCSsimulator offersasimilartask$vcdplusmemontodumpmemoryintoitsproprietaryfileformat.BothtoolsprovideGUIfunctionstodisplaymemorycontentsassavedinthe waveformfiles.TheseworkwellformemoriesdefinedusingaVerilogarray.The secondalternativeistousememorymodels,suchasthosefromDenaliSoftware,that arewritteninCandhaveanAPItoreadandwritememorycontents.Thisallows graphicaltoolstoviewmemoryhistorybothinteractivelyduringsimulationandafter simulationinapost-processingmode.Italsoallowsmemorytobechangedduring simulationusingaGUI. Understandinghowmemorymodelsloaddataandwheretheyareinthedesign’s memorymapiscrucialbecausememoryisoneofthekeyareaswherehardwareand softwaremeet.Aswewillsee,memoryisoneofthemostimportantareasinhardwareandsoftwareco-verification.
60
HardwareandSoftwareDesignProcess
MicroprocessorModels Anothertoolusedbyhardwareengineersisthemicroprocessormodel.Thetwo primarymodelsusedforhardwaredesignarethebusfunctionalmodel(BFM)and thefull-functionalmodel(FFM).Afull-functionallogicsimulationmodelisnecessaryforsystem-on-chipdesigns.Itisimpossible(oratleastriskyandimpractical)to buildanASICcontainingamicroprocessorwithoutafullchipsimulation.Software isnecessarytouseafull-functionalmodel.Thesoftwaremustbecross-compiledfor thetargetandloadedintothelogicsimulationmemories.Theprocessorwillthen fetchandexecuteinstructions,thusverifyingthesystemdesign.Full-functional modelscomeinavarietyofformats.ThefastestarewritteninCanduseonlyawrapperorshelltocommunicatewiththelogicsimulator.SometimesRTLorgatelevel descriptionsofthemicroprocessorareusedasfull-functionalmodels.Thiscanbethe actualdesigndatabase(sometimesencrypted)thatisusedtomanufacturethedevice. Whilefull-functionalmodelsarenecessary,manyhardwareengineersdonotwantto becomesoftwareengineersjusttotestthehardwaredesign.Forthisreason,hardware engineersalsousebusfunctionalmodels. Inthehighperformancesystem-on-boardarea,afull-functionalmodelisnotusually availableforhighperformancemicroprocessorchipssuchasthePowerPCorMIPS. Tofillthisneed,hardwareengineershaveturnedtotheBFMandthehardware modelforverification.Thebusfunctionalmodelislesscomplexthanthefull-functionalmodel.Itjustimplementsthebusinterfaceportionofthemicroprocessor.Bus transactionslikememoryread,memorywrite,instructionfetchandinterruptacknowledgearesimulatedatatasklevel.Atestbenchisusedtogenerateasequenceof possibletransactionstheprocessormayperform.Thistestbench-drivenBFMmayor maynotreflectthetransactionsequencethatwilloccurwhentheembeddedsystem softwareisruninthetargetsystem,butismucheasiertouseforahardwareengineer. BusfunctionalmodelsareusuallywritteninCusingthelogicsimulator’sCinterface orinbehavioralVerilogorVHDL.RecentadvancesinthemicroprocessorbusprotocolsaremakingBFMshardertowrite.Techniquessuchasbuspipelining,bursting, out-of-ordertransactioncompletion,cachesnooping,andcacheinterventionsmake canmakeitimpracticaltoconstructahome-brewBFM.
61
Chapter2 Anothertechniqueusedforboardsimulationisthehardwaremodel.Atechnologythathasexistedforover15years,thehardwaremodelusesactualdevicesilicon mountedinasocketthatinterfaceswithalogicsimulator.Thedeviceservesasits ownmodel.Hardwaremodelscanbeusedformostanydigitaldeviceforwhicha simulationmodelisneeded.Thingslikemicroprocessors,digitalsignalprocessors, businterfacechips,networkprocessors,andsystemcontrollersarecommon.Hardwaremodelsareusuallydeployedforthosedevicesforwhichasoftwaremodelwould betoocomplicatedtowriteandfullfunctionalityisneeded. Allofthesemicroprocessormodelsarebeingusedtohelpsolvethehardwareand softwareintegrationproblem.Bythemselvestheyareusefulforhardwareverification, butdonotnaturallyinvolvethesoftwareengineersandsoftwaredebugging.Wewill seeinChapter4thatsomeofthesemodelswillserveasabasetechnologyforbuildingmoreusefulwaystosolvethehardwareandsoftwareintegrationproblemthatare suitableforbothhardwareandsoftwareengineersanddon’trequiresoftwaretobe debuggedusingsimulationlogfilesandwaveforms.
HardwareDesignProcess Thehardwaredesignprocessconsistsofdecidingonthearchitectureandbehaviorof thedesign,proceedingwithimplementationusingVerilogorVHDLattheregister transferlevel(RTL),andtakingtheRTLcodethroughasetofstepstoproducechips andboards.ThesetofstepsfromRTLtochipsandboardsiswelldefined.Activities atthearchitecturelevelaremuchlessdefinedandvarygreatly. Fromaverificationviewpoint,theprocessisfirsttoverifyeachRTLblockasitis produced,thenaccumulatetheblocksintosubsystemsandsystemsperformingverificationalongthewayuntiltheentiredesignisverifiedalltogether. ThekeyissueinSoCdesigniswhentospendtimeonhardwareandsoftwareintegration.Someprojectsadvocatedoingthisveryearly,inthearchitecturestage,tomake suretherearenoperformanceissuesthatwillshowuplater.Atthisearlyphasean abstractmodelofthedesigncanalsobeusedtoprovideahigh-performancewayto runsoftware.OthersadvocatedelayingintegrationuntiltheRTLcodeofthedesign isreadyandalloftheblockshavebeenverifiedtogetherandallmajorsubsystemsare
62
HardwareandSoftwareDesignProcess working.Regardlessofthespecificmethodology,itisimportanttodothehardware andsoftwaredesigninparallelandmeetforintegrationsometimebeforethehardwareiscommittedforfabrication.
MicroprocessorReview Aswehaveseen,hardwareandsoftwareengineershaveadifferentviewofthe microprocessor.Itisconvenienttobreakdownthefunctionsofthemicroprocessor intotwoparts:theinternaloperationandtheexternalinterface.Internalfunctions include: ■
InstructionSet.Theformatfortheinstructionsthattheprocessorcanrun.Instructionscanbeencodedinmanyways.ForRISCprocessorstheinstructions areafixedlengthsuchas32bitor16bitinstructions.
■
Registers.Somecombinationofgeneral-purposeregisters,statusregisters,and programcounter.
■
Cache.Specialmemorytostorefrequentlyuseddata.Decidingwhichdatato cacheandmaintainingcachecoherencyistypicallydonebyhardware.
■
Pipeline.Thedifferentstagesofinstructionprocessingtoprovideincreased performance.EmbeddedRISCprocessorsmayusea3or5stagepipeline.
■
MemoryManagementUnit(MMU).TheMMUprovidesaddresstranslations necessarytoimplementvirtualmemoryandisusedbyoperatingsystemto provideprotectionfromprogramsortaskscrashingthesystem.
Togetoutsidethemicroprocessorthereareloadandstoreinstructions(RISC)and amemorymapthatdefineswhichmemoriesorregisterswillbeaccessed.Theexternalinterfaceofthemicroprocessorisofprimaryinteresttohardwareengineers. Thisinterfaceconsistsofasetofsignalsthatfollowadefinedprotocol.Thisexternal interfaceincludes: ■
MemoryBusInterface.Therulesforbusarbitration,masterandslavetoobey inreadingandwritingdataonthebus.Thisishowthemicroprocessorgets datainandout.
63
Chapter2 ■
CoprocessorBusInterface.Sometimesaseparate,dedicatedinterfaceisusedfor communicationwithahardwarefunctionthatrequiresspecialattentionto performance.
■
Interrupts.Asetofsignalsusedtotellthemicroprocessorthatserviceis needed.Typicallytherearemultipleinterruptsourceswithdifferentpriority levels.
Tobecomeproficientinembeddedsystemsagoodunderstandingofcomputerarchitectureandmicroprocessoroperationisrequired.Readerswishingmoredetailinthis areashouldconsultoneofthemanytextbooksavailableoncomputerarchitecture.
HardwareandSoftwareInteraction Tosetthestageforthetechniquesusedtoverifyhardwareandsoftware,considerthe debuggingmethodsusedbyengineers.
SoftwareDebuggingCharacteristics ■
Completelyinteractive
■
Historicallydoneinalabwithaboardrunningat25MHzorfaster
■
Useprintf()statementtotraceexecutionandvariables
■
Tracesoftwareexecutionwithsource-leveldebugger
■
Usebreakpointstostopexecutionandinspectmemory(variablesanddata structures),callstackandregistercontents
■
Iterativelyrebootthesystemandadjustbreakpointsuntilbugsarefound
HardwareDebuggingCharacteristics ■
Runlogicsimulationatspeedsof10–100Hz
■
Useprintstatements(orlackofprintstatements)todetecterrors
■
Useofwaveformdumpstoexaminesignalvaluesuntilbugsarefound
64
HardwareandSoftwareDesignProcess Thislistofcharacteristicsdemonstratesthechallengeofverifyingadesignwithboth hardwareandsoftware.Wehaveseenthattheprimaryinteractionbetweenhardwareandsoftwarecomesatthemicroprocessorbusboundaryandinthecontentsof programmableregistersandmemory.Memoryisprobablythemostimportantplace wherehardwareandsoftwaremeet,andyetitisoneofthemostmisunderstood.For example,hardwareengineersareaccustomedtoloadingmemorydataintoasimulationmodelatthestartofsimulationusingcommandssuchas$readmemh.Oncethe simulationstartsupandresetiscompleted,thememorydataisreadytobereadfrom orwrittento.Softwareengineershaveadifferentconcept.Themostcommonway towriteatestprogramforanembeddedsystemistocompileitintoanexecutable fileformatsuchasELF,gotoalab,power-onaboardwiththemicroprocessor,connectasoftwaredebuggerusingaJTAGcableconnectedbetweentheboardandthe parallelportofaPCandusethedebuggertodownloadtheELFfileintomemoryon theboardandstartrunningtheprogram.Ofcourse,theJTAGconnectionusesthe microprocessortoperformalongseriesofmemorywritesontheCPUbustogetthe dataintomemoryinsteadofjustpre-loadingthedatalikeinthesimulationcase.I’m constantlyremindedofthesedifferencesbetweenhardwareandsoftware. Notlongago,Iwastalkingtoanengineerabouttheseconceptsofmemoryand explainingtheconnectionbetweentheoperationsofasoftwaredebuggerandthe simulatedhardwaredesign.Allatonce,alightseemedtogoonandheaskedifhe couldtryanexperiment.Ijustgotoutofthewayandgavehimthekeyboard.He askedwhereintheCPUmemorymapwasanSRAM.ItoldhimthereisSRAMat address0x10000000.Heusedthesoftwaredebuggertowriteonewordintothememoryatthisaddress.HethenproceededtostopthelogicsimulatorbyhittingCtrl+c andaskedfortheVeriloghierarchypathtooneofthe16-bitwideSRAMmodels.I toldittohimandproceededtoissueacommandtodumpthecontentsoftheSRAM intoafile.Hethenwenttoanotherxtermandopenedthefilewiththetrustyvieditorandamazinglyenoughthedatavaluehehadwrittenintothismemorywasthere! Heseemedalmostinshockthathenowunderstoodtheconnectionbetweenthe softwaredebuggerandthesimulationmemorymodels.Theconnectionbetweenthe softwaredebuggerandthememoryinthelogicsimulationisshowninFigure2-17.
65
Chapter2 From the software debugger command line write a new data value to address 0: arm9sd: ex 0,1 0x00000000: 0xea000012 arm9sd: let 0 = 0x12345678 arm9sd: ex 0,1 0x00000000: 0x12345678 arm9sd:
"...." "xV4."
From the Verilog command prompt dump the new memory contents to a file: C1> $dumpmemh(“mem.dat”,top.mem.Rom); xp8:4 % @000000 @000008 @000010
more mem.dat 12345678 ea0005ef ea0005f6 ea00062a ea000621 eafffffe ea000611 ea000602 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 e3a00000 e3a01000 e3a0dd84 e92d0003
Sure enough, address 0 has the data value written from the debugger.
Figure2-17:Thememoryconnection
Aswemoveintothedetailsofhardwareandsoftwareco-verification,let’sreviewthe keypoints.Hardwaresimulationalwayshasthenotionofasimulationtimestampto marktheplaceintimewheneventsoccur.Thisisthekeyreferencepointthatisused toidentifywhenthingshappen.SoftwaregeneratesbustransactionsontheCPUbus toaccessmemoryandcontrolregisters.Hardwareperipheralsgenerateinterruptsto requesthelporservicefromthesoftware.Thesearetheimportantareaswherethe hardwareworldmeetsthesoftwareworld. Learningabouttherelationshipbetweenhardwareandsoftwareengineeringislike theoldjokeaboutamechanicalengineer,electricalengineer,andsoftwareengineer. Amechanicalengineer,anelectricalengineerandasoftwareengineerareinacar thatbreaksdown.Themechanicalengineersays:“Maybeit’sastuckvalveinthe engine.”Theelectricalengineersays:“Maybethebatteryisdeadorafuseisblown.” Thesoftwareengineersays:“Iknow.Let’sallgetoutofthecarandgetbackinagain andseeiftheproblemgoesaway.”
66
HardwareandSoftwareDesignProcess Understandingthetoolsandtechniquesofhardwareandsoftwareengineersandthe conceptsofembeddedsystemsandmicroprocessorsiscriticaltoverifyingdesigns madeupofbothhardwareandsoftware.
67
This page intentionally left blank
CHAPTER
3
SoCVerificationTopicsforthe ARMArchitecture Understandingmanyoftheconceptsofhardware/softwareco-verificationisbest donebyexample.Understandingaspecificmicroprocessorarchitectureisusefulfor discussionofco-verificationtopics.Intoday’sSoCprojects,theARMarchitecture holdsthelargestmarketsharefor16/32bitprocessors.Thegoalofthischapterisnot toteacheverydetailofthearchitecture,buttoprovidesomebackgroundonARM andtoteachthethingsthataredirectlyrelatedtoverifyingahardwaredesignwith softwarerunningonanARMmicroprocessor.EngineerscurrentlyinvolvedinARM projectswillfindthematerialdirectlyapplicabletotheirproject,whilethosenot yetinvolvedinanARMprojectcanusethematerialtobecomefamiliarwiththe architectureandshouldhavelittletroublerelatingtheconceptstootherprocessor architectures.
ARMBackground ARMstartedasabranchofAcornComputerinCambridge,England,withthe formationofajointventurebetweenAcorn,AppleandVLSITechnology.Ateam oftwelveemployeesproducedthedesignofthefirstARMmicroprocessorbetween 1983and1985.TheARMwasthefirstRISCprocessordevelopedforcommercial use. In1990anewcompany,AdvancedRISCMachinesLtd.,spunoutfromAcorn,to focusoncreatingmicroprocessors.TheARMacronymoriginallystoodforAcorn RISCMachine,butwaschangedtoAdvancedRISCMachinetoreflectthespinout fromAcorn.TodaytheARMnameisnotanacronym,justARMLtd.Thecom-
69
Chapter3 panyisstillheadquarteredinCambridgeandemploysabout800peopleworldwide. ItoperatesdesigncentersinSheffield,Maidenhead,aroundEuropeandtheUnited States,andsalesofficesthroughouttheworld.ARMcurrentlyholdsa75–80%marketsharein32-bitmicroprocessors. BesidespioneeringthefirstRISCmicroprocessorforcommercialapplications,ARM haspioneeredanewbusinessmodelintheelectronicsindustry.Thecompanymakes nochips.Almostallofthecompanyrevenueisrealizedbylicensingitstechnology (primarilymicroprocessordesigns)tosemiconductorpartners.Asofthiswriting, ARMcounted118semiconductorpartnersutilizingARMtechnologyandthe numbergrowseveryquarter.ThesepartnersuseARMintellectualproperty(IP)to developchipsandproductsfortheelectronicsmarket.ARMrevenuecomesfrom threeprimarysources: 1. LicensefeesformicroprocessorsandotherIPblocks. 2. Per-chiproyaltiesonshipmentsofchipsusingARMIP. 3. Toolsandboardstosupportdevelopmentanddebuggingof ARMapplications(developmentsystems). ARMiscurrentlythelargestandmostsuccessfulIPcompany.Itofferscustomers time-to-marketadvantagesbyoffloadingsomeofthedesignwork.Duetotheglobal acceptanceoftheARMstandard,usersareneverlockedintoasinglesemiconductorpartner,asmanysourcesofARMtechnologyareavailable.Thelargenetwork ofpartnersalsomakestheARMarchitecturepopularwithengineers,asthereare manysupportingtoolstochoosefrom.Aswewillsee,co-verificationisnoexception. ARMisnumberoneintermsofpartnersupportformodeling,co-design,andco-verificationtools.
ARMArchitecture AllARMprocessorsuseRISC(reducedinstructionsetcomputer)principles.The othertypeofmicroprocessordesigniscalledCISC(complexinstructionsetcomputer).ThegoalsofARMRISCprocessorsaresimplicity,highinstructionthroughput, excellentreal-timeresponse,andasmall,low-powerdesign.ARMhashistoricallybeenknownastheleaderinlow-powerdesign.AllARMprocessorsarestatic
70
SoCVerificationTopicsfortheARMArchitecture designs.Thismeanstheclockcanbestoppedatanytimetosavepower.Theselowpowerfeaturesmakethearchitecturepopularinportable,battery-operated applications. RISCdesignstartedinIBMwhenengineersnoticedthatmanyinstructionsinthe CISCinstructionsetwereneverorrarelyused.Intheearly1980sengineersinIBM andHP,aswellasresearchersinStanford,begantodevelopmicroprocessorsthat providedhigherperformancebysimplifyingtheinstructionset.Theavailabilityof anoperatingsystem(UNIX)writteninCthatcouldbeeasilyportedandcompiled forRISCarchitecturesmadethecombinationofUNIXandRISCthestandardfor engineeringworkstations.Thisisstilltruetoday,althoughitmaybechangingwith thewidespreadadoptionofLinuxonIntelPChardwareasthestandardengineering workstation.Today,RISCprocessorsdominatetheworldofembeddedsystems.RISC shipmentsforvariousarchitecturesareshowninFigure3-1.ARMhascontinuedto gainmarketshareinthelastcoupleofyears. Merchant RISC Microprocessor Shipments (1000s) thru 1994 ARM/StrongARM 2,170 3,254 MIPS 2,800 Hitachi SH PowerPC 2,090 Total 30,499
1995
1996
2,100 5,500 14,000 3,300 33,830
4,200 19,200 18,300 4,300 58,480
1997
1998 9,800 50,400 48,000 53,200 23,800 2,600 3,800 6,800 98,220 149,080
1999 152,000 57,000 33,000 8,300 262,820
2000 414,000 62,800 50,000 18,800 556,800
2001 402,000 62,000 45,000
23,000 538,860
Source: Andrew Allison
Figure3-1:RISCshipmentsfrom1994to2001
ARMArchitectures,Families,andCPUCores RISCprocessorsemployfixedlengthinstructions.TheARMinstructionsetisdefinedbyaparticularversionoftheARMarchitecture.EachspecificARMCPUcore implementsaspecificarchitectureversion.TheARMarchitecturedatesalltheway backtoversion1thatcorrespondstothefirstdesigndevelopedatAcorn.Hereare theprimaryarchitectureversionsusedinrecentprocessorfamilies:
71
Chapter3 ■
ARMv4T
■
ARMv5T
■
ARMv5TE
■
ARMv5TEJ
■
ARMv6
TherearefourfamiliesofARMprocessorscurrentlybeingusedinnewSoCdesigns: ■
ARM7familyusesARMv4T
■
ARM9familyusesARMv4T,ARMv5TEandtheARMv5TEJ
■
ARM10familyusesARMv5T,ARMv5TEandtheARMv5TEJ
■
ARM11familyusesARMv6
TheARM7familyusesavon-Neumannarchitecture(asinglebusforinstructionand data)whiletheARM9familyusesaHarvardarchitecture(differentbusesforinstructionanddata). ARMcoresareavailableinbothhardmacroformorinsoftform.Ahardmacrois adesignthathasbeentakenallthewaytosiliconimplementation.Itisprovided totheuserasalayoutobjecttobeincludedwiththerestofthedesignatthephysicaldesignstage.Hardcoresprovidethehighestperformanceandsmallestdiearea becausetheyhavebeenfullyoptimizedbyARM.Hardcoresarenotlimitedbyany designrestrictionsoftheuser’sfront-endchipdesignflow.Thedrawbackofahard coreistheportabilitybetweendifferentsiliconprocesses.Sincethehardcoreis alreadycommittedtoaspecificsemiconductorprocess,auserisforcedtothisprocess insteadofadifferentonethatmaybemoresuitablefortherestofthechip. SoftcoresaredeliveredinVerilogRTLformat.Theuserisresponsibleforperforming synthesisandphysicalimplementationoftheCPU.Softcoresoffermoreflexibility fortheuser,relatedtothesiliconprocess,butcannotprovideperformanceasgood asthehardcore.Overtime,ARMhasshiftedmoreandmoreintosoftcores.Today’s front-endEDAtoolsarecapableofmixedlanguage(VerilogandVHDL)designsand advancedsemiconductorprocessesreducetheneedfortotaloptimization.Wewill
72
SoCVerificationTopicsfortheARMArchitecture seeinfuturechaptersthattheimplementationusedfortheARMcoreisimportant fordeterminingwhatkindofmodelsareavailableforco-verification. TherearemanyARMprocessorsthatareactivelybeingusedinnewdesignstoday. Followingisalistofthemostcommonvariantsandthehighlightsofeach: ■
ARM7TDMI:Developedin1995,itisstilloneofthemostpopulartoday. Usesathree-stagepipelineandthev4Tarchitecturetoprovidegoodperformanceandverylowpowerandsmallsize.It’savailableasbothahardandsoft macrocell.
■
ARM720T:IncludestheARM7TDMIcoreplusan8kunified(instruction anddata)cache,MMU,andwritebufferandcanrunoperatingsystemssuch asWindowsCEandSymbianOS.Availableasahardmacrocell.
■
ARM9TDMI:AnupgradetotheARM7TDMIthatusesthesamev4T architecture,butmovestoafive-stagepipelineandHarvardarchitecture. Availableasahardmacrocell.
■
ARM940T:IncludestheARM9TDMIcoreplusdual4kcaches(separate instructionanddata)andanMPUthatsupportsmostreal-timeoperating systems.Availableasahardmacrocell.
■
ARM920T/ARM922T:IncludestheARM9TDMIcoreplusdual16k cachesforARM920Tanddual8kcachesforARM922T.Otherthancache sizetheyareidentical.BothsupportoperatingsystemslikeWindowsCE, SymbianOS,PalmOSandLinux.Availableashardmacrocells.
■
ARM9E-S:Usesafive-stagepipelinetosupportthev5TEarchitecturethat includesanenhancedmultiplierforimprovedDSPperformance.Availableas asoftmacrocell.RepresentsthefirstshifttosynthesizableCPUcores.
■
ARM966E-S:IncludesanARM9E-SCPUandinstructionanddatatightly coupledmemory(TCM).Targets“hard”real-timeembeddedapplications,no cachesorMPU/MMU.Availableasasoftmacrocell.
■
ARM946E-S:AddsuserconfigurableinstructionanddatacachesandMPU totheARM966E-S.Supportspopularreal-timeoperatingsystems.
73
Chapter3 ■
ARM9EJ-S:AddsJazelleJavatechnologytotheARM9E-Scoretosupport thev5TEJarchitecture.Availableasasoftmacrocell.
■
ARM926EJ-S:IncludestheARM9EJ-Scoreandinstructioncache,data cacheandMMU.SupportsoperatingsystemssuchasWindowsCE,Symbian OS,PalmOSandLinux.Availableasasoftmacrocell.Themostpopular ARM9core.
■
ARM1020E/ARM1022E:CombinestheARM10EintegercorewithsixstagepipelinewithDSPextensionsandsupportsthev5TEarchitecturewith instructionanddatacachesandMMU.Supportsoperatingsystemssuchas WindowsCE,SymbianOS,PalmOSandLinux.Firstdesigntouse64-bitinternaldatapathsand64-bitexternaldatabuses.ARM1020EandARM1022E areidenticalexceptforcachesizes.ARM1020Eis32k/32kandARM1022Eis 16k/16k.Availableasahardmacrocell.
■
ARM1026EJ-S:CombinestheARM10EJintegercorewithsix-stage pipeline,DSPinstructions,Jazelletechnologyandsupportsthev5TEJarchitecturewithinstructionanddatacaches,MMUandinstructionanddata TCM.Fullysynthesizabledesignthatallowscachesizesandexternalbus widthstobeconfiguredbytheuser.Capableofrunningalloperatingsystems. Availableasasoftmacrocell.
■
ARM1136J-S:Representsthefirstimplementationofthev6architecture (some80+newinstructions)witheight-stagepipeline.Thehighestperformanceprocessorwithmultiple64-bitexternalbusses.Designisavailableasa softmacrocell.
BecauseoftheuniquesemiconductorpartnerrelationshipsmaintainedbyARM, thecoresthatareavailableprimarilyinsoftformmayalsobeavailableashard macrocellsfromspecificsemiconductorpartners.Thisallowsthepartnerstoprovide optimizeddesignsforspecificprocessesandofferthemdirectlytousers.Forexample, mostprojectsusetheARM946E-SasVerilogRTL,butsomemayuseafoundryhardenedARM946E.
74
SoCVerificationTopicsfortheARMArchitecture AbriefexplanationofsomeofthenamingconventionsusedbyARMisuseful.One ofthemostcommonquestionsis“Whatdoes7TDMIstandfor?Doesitmeananything?”Actually,itdoes,andhereisthemeaning: 7=ARM7family. T=Thumbextensionsspecifiedinthev4Tarchitecture. D=debughardware. M=enhancedmultiplier. I=embeddedICEforJTAGdebuggerconnections. OthercommonlyusedconventionsaretheEforthefurtherenhancedmultiplierto improveDSPperformance(v5TEarchitecture)andtheJtospecifytheJazelletechnologythatallowsthedirectexecutionofJavabytecodes.Sindicatesasynthesizable orsoftcore.
ThumbInstructionSet AlloftheaboveCPUcoresare32-bitarchitectures.32-bitarchitecturesprovidethe bestperformancetooperateon32-bitdata.Theregistersare32bitswideasarethe datapathsinsidetheCPU.Oneofthedrawbacksofa32-bitRISCarchitecturewhen comparedtoaCISCarchitectureistheamountofmemoryrequiredtoholdthesoftwareinstructions.Themeasureofhowmuchmemoryisrequiredintheembedded systemtoholdinstructionsiscalledthecodedensity.Inembeddedsystems,there isoftenaconstrainttominimizememorysize.Thisisespeciallytrueforon-chip memory.On-chipmemorysuchascacheusuallytakesupmorespaceonthechip thantheCPUandtheotherrandomlogicoftheSoC.Toaddressmemorysensitive applications,theARMarchitectureincludesamodethatallowsittorun16-bitinstructionscalledThumb.TheThumbinstructionsetisagroupof16-bitinstructions thatareasubsetofthemostpopular32-bitinstructions.Specialpurposehardwareis usedto“decompress”the16-bitinstructionsinto32-bitinstructions.Ofcourse,the extrasteprequiredtohandle16-bitinstructionsincurssomeperformancepenalty. Eachapplicationhasthefreedomtotradeoffperformanceformemorysize.Theuse
75
Chapter3 ofThumbinstructionsmayshrinktherequiredmemoryby1/3anddecreaseperformancebyabout1/3.ARMusesaveryinnovativeapproachthatallowstheCPUto changedynamicallybetween32-bitinstructionsand16-bitinstructions.Thisgives thesoftwareengineertheabilitytorunperformancecriticalsoftwareusing32-bit instructionsandlesscriticalsoftwarein16-bitmodetosavememory.TheThumb instructionsetwasintroducedintheARMv4Tarchitectureandfirstmadepopularin theARM7TDMI. ARMmicroprocessorsuseapipelinetoincreaseinstructionthroughput.Theuseof apipelineallowsmultipleoperationstobedoneinparallel.AstheARMarchitecture hasadvancedthelengthofthepipelinehasincreasedfromthreestages(ARM7)to fivestages(ARM9)tosixstages(ARM10)toeightstages(ARM11).Adiagramofthe ARM7pipelineandhowitisusedtoincreaseperformanceisshowninFigure3-2.
Fetch
Decode
Execute
Fetch
Decode
Execute
Fetch
Decode
Execute
Figure3-2:ARM7three-stagepipeline
ProgrammingModel TheARMprogrammingmodelconsistsof16general-purpose32-bitregistersknown asR0throughR15.Thearchitectureallowsfortwooperatingmodes,userand supervisor.Theregisterdefinitionsareslightlydifferentdependingonthemodeof operation.TheCPUwillstartoperationinsupervisormodeandcanbeswitchedto usermode.Applicationswillruninusermodeandoperatingsystemorothersystem softwarewillnormallyoperateinsupervisormode.
76
SoCVerificationTopicsfortheARMArchitecture EngineersfocusedonverificationofARMSoCdesignsareprobablynotextremely interestedinallofthedetailsofthearchitecture,theinstructionsetandpipeline. However,thereareafewthingsthatareimportanttoknow.Thefirstthingistobe abletoidentifytheprogramcounter(PC)asR15.Mostsimulationmodelslistthe registersasR0throughR15withnospecialdistinctionoftheprogramcounter.BeingabletofindthePCinthesimulationwaveformsisusefultofigureoutwherethe softwareisexecuting.TheotherregisterthatissometimesworthlookingatisR14, thelinkregister(LR).Thisregisterholdsthereturnaddressthatwillbecomethe programcounterwhenthesoftwareleavesthecurrentsubroutine.Thiscanalsohelp identifywherethesoftwareisexecuting.R13iscommonlyusedasthestackpointer (SP).TheCPSRregisterstandsforCurrentProgramStatusRegisterandholdsthe conditioncodebits,interruptflagsandoperatingmodebits.Figure3-3showsthebasicARMregisters.ThereisalottolearntounderstandtheARMregistersetinthe differentmodes,includingbothARMstateandThumbstate,butfamiliaritywiththe basicregistersisusefulforverification. R0 R1 R2 R3 R4
Figure3-3:BasicARMregisterset (ARMstate,system/usermode)
R5 R6 R7 R8 R9 R10 R11 R12 R13 (SP) R14 (LR) R15 (PC) CPSR
77
Chapter3
InstructionSet Detailsoftheinstructionsetarenotextremelyimportantforverification.However, someinstructionsthathaveadirectrelationshiptohardwareoperationareusefulto understandforthepurposesofverification.
DataTransferInstructions Memoryloadsandstoresareinstructionsthatmovedatafrommemorytoregisters (load)andfromregisterstomemory(store).Theseinstructionscreatearead(load) orawrite(store)ontheCPUdatabus.Theinstructionsetallowsdatatobeaccessed indifferentsizes.AlthoughsomeARM10andARM11coreshave64-bitbuses,we willfocusonthestandard32-bitbuseshere.OneinterestingnoteisthatearlyversionsoftheARMarchitecturedidnotsupporthalf-wordloads/storesorsignedbyte loadsandstores,onlywordandunsignedbyteaccesses.ThismadethingsinterestingforCprogrammers.Recentversionshaveaddedsupportforhalf-wordsaswellas signedbytes.SomeexamplesofloadandstoreinstructionsareshowninFigure3-4. STR r0,[r1],#4 LDR
pc,[r4,r6, LSL #2]
LDRH
r0, [r0]
STRH
r4, [r2], #3
STRB
r4, [r2]
LDRBT
r3, [r6], #0
; this is actually a function call
Figure3-4:ExamplesofLoadandStoreInstructions
78
SoCVerificationTopicsfortheARMArchitecture MRC
p15, 0, r3, c1, c0, 0
MCREQ
p15, 0, r12, c7, c14, 1
MCR
p15, 0, r0, c9, c1, 1
; read from coprocessor ; write based on condition ; write to coprocessor
Figure3-5:ExamplesofcoprocessorInstructions
CoprocessorInstructions TheARMarchitectureallowshardwarecoprocessorstobeconnectedtotheprocessorandprovidesspecialcoprocessorinstructionstoaccessthishardware.The coprocessorsarealsoaccessedvia32-bitloadandstoreinstructions.Upto16coprocessorsaresupported.SomeexamplesofcoprocessorinstructionsareshowninFigure 3-5.Notallprocessorssupportallcoprocessorinstructiontypes.Themostcommon areMCRandMRC.Normally,coprocessoraccessisalsorestrictedtoprivileged modetokeepapplicationsoftwarefromchangingsettings. SomeCPUs,liketheARM7TDMI,provideasetofpinsthatserveasthecoprocessorinterface.Usersarefreetoconnectspecialhardwaretothisinterfacetoprovide adedicated,directconnectionbetweentheCPUandthecoprocessorwithoutsharingthememorybus.Themostcommonuseofcoprocessorsiscoprocessor15.ARM hasdesignatedcoprocessor15tobeusedtocontrolandconfigurecaches,memory managementunits,memoryprotectionunits,buffers,andanyotherconfigurable hardwarerelatedtotheCPU.RecentdesignssuchastheARM926EJ-Susecoprocessor15extensivelyforconfigurationandcontrol.Someofthethingsprovidedby coprocessor15intheARM926EJ-Sare: ■
CPUidentification
■
Detailsofcachesuchassizeandothercacheattributes
■
Abilitytosetcacheattributes
■
Abilitytoinvalidateentirecacheorspecificcachelines
■
Abilitytolockspecificdatainthecache
79
Chapter3 ■
Presenceoftightlycoupledmemory(TCM),itssize,andtheabilitytoenable/ disableTCM
■
AbilitytoconfigureTCMaddressranges
■
Configurationofaddresstranslation,protection,andotherMMUfunctions
ExceptionsandInterrupts ExceptionsareeventsthatbreakthenormalexecutionofCPUinstructionsand causeittodosomethingelse.Therearegenerallymanythingsthatfallunderthe categoryofexceptions.Interruptsareonetypeofexceptionandaredefinedas eventsthatoriginateoutsidetheCPUandsignaltotheCPUtotakeanexception. Interruptsplayakeyroleinallembeddedsystems.Whilesoftwarenormallycommunicateswithhardwarebywritingandreadingregistersandmemory,hardware initiatescommunicationwithsoftwareusinginterrupts. LearningaboutexceptionsandexceptionvectorsisalsoveryusefulforARMSoC verification.ARMprocessorsstartexecutionafterresetbyfetchingthefirstinstructionfromaddress0(theresetvector).Resetisaninterruptsinceanexternalsignal tellstheCPUtostartover.Someprocessorshaveaprovisiontouseanalternateaddressfortheresetvector,calledthehigh-resetvector,locatedataddress0xffff0000. AnexampleofthisistheARM926EJ-S.IfthesignalVINITHIislowduringreset thentheexceptionvectorsstartataddress0.IfVINITHIishighduringresetthen theexceptionvectorsstartat0xffff0000.Thisalternativelocationenablesthe useofadifferent(usuallyfaster)typeofmemorytohandleexceptions.Thealternateexceptionaddressisalsousefulfordualprocessordesigns.Theaddressesforthe exceptionvectorsaregiveninFigure3-6.Ifthehighexceptionvectorsareusedthe offsetsarenotfrom0,butfrom0xffff0000instead.
80
SoCVerificationTopicsfortheARMArchitecture Reset
0x0
Undefined Instruction Software Interrupt
0x4
0x8
Instruction Fetch Abort
0xc
Data Abort
0x10 0x18
Interrupt (nIRQ) Fast Interrupt (nFIQ)
0x1c
Figure3-6:Exceptionaddresses ThepriorityfortheCPUtohandleexceptionsis: 1. Reset 2. DataAbort 3. FastInterrupt 4. NormalInterrupt 5. InstructionFetchAbort 6. SoftwareInterrupt AllARMCPUsusetwointerruptsignalsonthebus,nIRQandnFIQ.nIRQisthe normalinterruptrequestandnFIQisthefastinterruptrequest.Thebussignalsfor thesetwointerruptsareactivelowsignals,sodrivingthesignallowindicatesan interrupt. Anotherpairofexceptionsthatareimportantforco-verificationaretheabortexceptions.Therearetwokindsofaborts,instructionprefetchanddataaborts.IftheCPU triestoaccessmemorythatcannotbehandledbythememorysystem,anabortwill besignaledtotheCPU.TheABORTsignalisusedforsomeARMCPUssuchasthe ARM7TDMI.ForCPUsusingtheAHBinterface(describedbelow),HRESPisused tosignalabort.
81
Chapter3 Theremainingtwoexceptionsintheexceptiontablearetheundefinedinstruction andsoftwareinterrupt(SWI).Thesetendtoplaylessofaroleinco-verification.The softwareinterruptisgeneratedbyasoftwareinstructionandhasnospecialconnectiontothehardwaredesign.SeeminglytheUndefinedInstructionexceptionwould beimportantifthememorysystemprovidedbadinstructionvaluestotheCPUin asituationwheretherewasabuginthehardwaredesign,butmoreoftenthannot whateverbaddataisprovidedtotheCPUisusuallyinterpretedassomeinstruction andtheCPUcontinuestorunwithnoclearindicationthatdisasterhasalready occurred.Forexample,ifthememorysystemissomehownotworkingcorrectlyand providesaninstructionof0xfffffffftheCPUstillrunsforwardincrementingthe programcounterby4forever.Figure3-7showsascreenshotofanARMdebugger runningaprogramwherealldatais0xffffffffandnoexceptionoccurs.
Figure3-7:Runningameaninglessprogramdoesn’tgenerateanyexception
82
SoCVerificationTopicsfortheARMArchitecture
MemoryLayoutandByteOrder Differentmicroprocessorsusedifferentwaystoorderthebytesofdataonthebus. Thecomputertermendianisusedtodescribetheorderofbytesofamultibytedata value.Therearetwowaystoorderthebytes:
Big-endianmeansthemostsignificantbytehasthelowestaddress.
Little-endianmeansthemostsignificantbytehasthehighestaddress.
Historically,Intelmicroprocessorshaveusedlittle-endianbyteorderandRISC microprocessorshaveusedbig-endian.TheARMmicroprocessorsdiscussedinthis chapterallowtheusertochoosethebyteorder.Althoughitisprobablynotaword, thebyteorderchoseniscalledendianness.Figure3-8and3-9showhowtoassignaddressesfora32-bitwordforeachbyteorder. Address 3
DATA[31:0] 31
Address 2
24 23
Address 1
16 15
Address 0
87
0
Figure3-8:Littleendianbyteorder
Address 0
DATA[31:0] 31
Address 1
24 23
Address 2
16 15
Figure3-9:Bigendianbyteorder
83
Address 3
87
0
Chapter3 Endiannessisoneofthemostmisunderstoodtopicsinco-verification.OnemisunderstoodfactthatisthatforARMcoreswitha32-bitbus,aslongastheCPUis transferringwordvaluesonthebus,endiannessdoesnotmatter.Whendatatransfers areafullword,thedataonthebusisidenticalforbothbig-endianandlittle-endian byteorder.Endiannessbecomesimportantwhendatatransfersonthebusarefor1or 2bytes.Inthesecases,bothmasterandslavemustknowwhichbytesarevalidand whicharenot.Incorrectinterpretationofthedatabyeithermasterorslaveleadsto certaindatacorruption. Thesecondmisunderstandingisthathowabusisdeclareddefinesendianness.In Figure3-8thedatabusisdeclaredas[31:0],butthisdoesnotsayanythingaboutbyte order. OneofthefirstquestionsIalwaysaskforanewdesignis,“Whatisthebyteorderofthe design,bigorlittleendian?”Workinginco-verificationrequiresmetotalkwithapplicationengineersaboutprojectsusingco-verification.WhenIaskaboutendiannessImay getablankstare,ormaybetheyfiguretheyhavea50-50chancesotheyjusttakeaguess. Oneresponsewas:“TheonlythingIknowisthatthebusisfrom“31downto0”which, tome,islittleendian,correct?”Understandingbyteorderisimportantwhentryingto findproblemsrelatedtoincorrectdatatransferredonthebus.
ARMBusInterfaceProtocols ThebusprotocolusedbytheCPUisanimportantaspectofco-verificationsincethis isthemaincommunicationbetweentheCPU,memory,andothercustomhardware. ARMprocessorsusedifferentbusprotocolsdependingonwhenthecorewasdesigned.ThissectionwillcovertheprotocolusedfortheARM7TDMIandtheCPU busprotocolscoveredbytheadvancedmicrocontrollerbusarchitecture(AMBA) specification.TheARM7TDMIutilizesanolderbusprotocolbutitisstillwidely usedtodaywithouttheAMBAspecificationsoitisworthlearning.Thegoalofthis sectionisnottoteacheverydetailabouttheprotocols,buttogiveagoodoverview suchthatengineersdealingwithverificationwillhavethenecessaryskillstounderstandwhatishappeningonthebusandhowthebusactivityrelatestosoftware execution.Fordetailedinformationonhowthebusprotocolsoperate,theappropriatespecificationshouldbeconsulted.
84
SoCVerificationTopicsfortheARMArchitecture
ARM7TDMIBusProtocol TheARM7TDMIusesasingle32-bitaddressand32-bitdatabustoaccessmemory. Theprotocolhastwocharacteristicsthatmakeitnon-trivialtounderstand,buspipelininganddualedgeclocking.Thecorecontainsaverylargenumberofsignalsthat atfirstseemoverwhelming,butmostofthesearenotimportantfornormaloperation andcertainlynotneededinthecontextofco-verification. TheprimarysignalsusedbytheARM7TDMIbusprotocolaredescribedbelow.A goodunderstandingofthissubsetissufficientformostsituations.Adiagramofthe ARM7TDMIsignalsisshowninFigure3-10.Acompletedescriptionofallsignals andfunctionalityisavailablefromARMintheARM7TDMITechnicalReference Manual(TRM).
MCLK nRESET
nWAIT ABE BUSEN DIN[31:0] DBE BIGEND
ARM7TDMI
ABORT nIRQ nFIQ
ISYNC
Figure3-10:ARM7TDMIsignals
85
nMREQ SEQ A[31:0] MAS[1:0] nRW D[31:0] DOUT[31:0] BL[3:0] nOPC nTRANS LOCK TBIT nM[4:0]
Chapter3 MCLKisthemaininputclocktothecore.SincetheARM7TDMIisafullystatic design,theclockdoesnothavetoberegular.Infact,stoppingtheclockisa commonlyusedwaytoinsertwaitstatesonmemoryaccesses. nRESETistheprocessorreset.Assertingitlowforatleast2clocksandthenreleasingitwillcauseexecutiontostartfromtheresetvector(address0).While nRESETislowtheCPUwillperformidletransactions,incrementingtheaddresson eachtransactiontoindicateitisintheresetstate. nWAITprovidesanotherwaytoinsertwaitstatesonthebus.Thecoreisclockedby thelogicalANDofMCLKandnWAIT.nWAITmustchangeonlywhen MCLKislow. nMREQislowwhentheprocessorstartstoperformamemoryaccess.SEQand nMREQtogetherdefinethetypeofbuscyclebeingperformed.Thecycle typesareshowninFigure3-11. SEQindicatesamemoryaccesswithanaddressrelatedtothepreviousaddress. DependingontheCPUmode,thenewaddressisanincrementof2or4 bytesfromthepreviousaddress. A[31:0]isthe32-bitaddressbus. ABEmustbedrivenhightoenabletheaddressbus. MAS[1:0]indicatesthesizeofadatatransferonthe32-bitdatabus.Thevaluesare 00forbyte,01forhalfword(16-bit),and10orword(32-bit).Thevalueof 11isnotused. nRWindicatesaread(low)orawrite(high). BUSENselectseitherthebi-directionaldatabus(low)ortheuni-directional databusses(high). D[31:0]isthe32-bitbi-directionaldatabususedwhenBUSENislow. DIN[31:0]isthe32-bitinputdatabususedwhenBUSENishigh.
86
SoCVerificationTopicsfortheARMArchitecture DOUT[31:0]istheoutputdatabususedwhenBUSENishigh. DBEmustbedrivenhightoenablethedatabus. BIGENDselectsthebyteorder,highforbigendianandloworlittleendian. ABORTindicatesthatamemoryaccessisnotallowed.Itwillcausetheprocessorto takethedataabortexception(vectoraddress0x10). BL[3:0]providesawaytoconnecttomemorysystemsthatarelessthan32-bit.BL isusedtolatchdataontothe32-bitdatabus.Thissignalisrarelyused,since mostdesignsuseexternallatchestointerfacetomemoriesthatarelessthan 32-bitswide. nOPCislowtoindicateaninstructionfetchandhightoindicateadataaccess. nTRANSislowtoindicateUserModeandhightoindicatePrivilegedMode. LOCKishightoindicateanatomicread/writeoperationsuchastheSWPorSWPB instructions. TBITishighwhenoperatinginARMstateandlowwhenoperatinginThumb state. nM[4:0]indicatestheprocessormode.Thesebitsmirrortheleastsignificant5bits oftheCPSRandareintendedfordebuggingonly.ValuesarelistedinFigure3-12. nIRQisthenormalinterrupt(lowerpriority)request.AlowvaluecausestheprocessortotakethenIRQexception(vectoraddress0x18). nFIQisthefastinterrupt(highpriority)request.Alowvaluecausestheprocessor totakethenFIQexception(vectoraddress0x1c). ISYNChightellstheprocessortosamplenIRQandnFIQwiththerisingedgeof MCLK(synchronousinterrupts).
87
Chapter3 nMREQ
SEQ
Non-Sequential Cycle
0
0
Sequential Cycle
0
1
Internal Cycle
1
0
Coprocessor Cycle
1
1
Cycle Type
Figure3-11:ARM7TDMIbuscycletypes
Mode
nM[4:0]
System
00000
Unidentified
00100
Abort
01000
Supervisor
01100
IRQ
01101
FIQ
01110
User
01111
Figure3-12:ARM7TDMIprocessormodevalues
88
SoCVerificationTopicsfortheARMArchitecture OneofthedifficultieswiththeARM7TDMImemorybusisfiguringoutwhenthe varioussignalsarevalidonthebus.Sincethebususespipeliningandbothedgesof MCLKitisnotalwayseasy.Inthenormalpipelinedmode(APE=1andALE=1), thecycletypeindicators,nMREQandSEQ,aresampledontherisingedgeof MCLK.Theaddressrelatedsignalsaresampledonthenextfallingedgeoftheclock, andassumingnowaitstates(vianWAIT)thedataissampledonthenextfalling edgeoftheclock.AwaveformoftheARM7TDMImemorybusafterresetisshown inFigure3-13.
Figure3-13:ARM7TDMIwaveformafterreset
AMBASpecification TheevolutionoftheARM7TDMIbusinterfaceledtothecreationoftheadvanced microcontrollerbusarchitecture(AMBA).SinceARMisfocusedonsellingintellectualpropertythatincludesbothmicroprocessorsandotherperipherals,itbecame usefultopropagateastandardbusstructurethatisopenforengineerstousein designingwithARMIP.Issuessuchasdesignreuseandmodularsystemdesignare importanttoARM.AMBAenablesevenmoreIPthatworkstogetherwithARMIP andprovidesacommonsetofprotocolsforbusinfrastructurethathasbecomethede factostandardusedinSoCdesigntoday.AMBAhasbecomesoubiquitousthateven designsnotbasedonARMmicroprocessorsareusingAMBA. 89
Chapter3 Theprimaryspecificationinusetodayisversion2.0ofAMBA.Thisspecification definesthreebusses:twohigh-speedbussesandoneperipheralbus.Thehigh-speed bussesareadvancedsystembus(ASB)andadvancedhigh-performancebus(AHB). Theperipheralbusisadvancedperipheralbus(APB).It’spurespeculation,butthe term“advanced”inallthreespecificationsmaystemfromoneofthemeaningsofthe ARMacronym,AdvancedRiskMachines,oritmayhavebeenchosenjusttoform aniceacronym.Eitherway,noneofthebussesareparticularly“advanced”asbus protocolsgo,butneverthelesstheyareallusefulformostclassesofSoCdesigns. ThetypicaldesignusingAMBAisadiagramthatisseenintheAMBAspecificationandoverandoveragainthroughoutARMliterature.ItcontainsasingleARM processorandimportanthardwarefunctionssuchasamemorycontrollerforfast memoryandDMAattachedtothehigh-speedbus(ASBorAHB)andabridgeto theslowerperipheralbus(ABP)thatconnectsslowerperipheralsthatareoptimized forpowerinsteadofspeed.AblockdiagramofatypicalAMBAsystemisshownin Figure3-14.
Image Image Processor Processor
AHB Arbiter
ARM7 CPU
Dual Port RAM
DSP
AHB wrapper
Control Logic
AHB wrapper
AHB
Memory Controller
External ExternalBus Bus
AHB to APB Bridge
AHB Decoder
64kByte 64kByte ROM/Flash/ Mask ROM ROM Mask
Timers APB
Ethernet Ethernet
Interrupt Interrupt Controller
UART
GPIO
Figure3-14:Example“typical”AMBAsystem 90
SoCVerificationTopicsfortheARMArchitecture
IntroductiontoAMBAProtocols ThissectionprovidesashortintroductiontothethreeAMBAprotocols.Thenext sectionprovidesmoredetailsofthemostcommonprotocolusedforARMmicroprocessors(AHB)sincethisistheprotocolthatismostimportantforco-verification.
AMBAASB ASBisthefirst-generationsystembusthatevolvedfromtheARM7TDMIbusprotocol.Itsupportspipelining,bursts,andmultiplebusmasters.Thefourbusagentsin ASBare: ■
Arbiter:Implementsasimplerequest/grantstructuretosupportmultiplebus masters.
■
Decoder:Centralizedaddressdecodertodeterminewhichslaveisresponsible forservicingabustransaction.
■
Master:Initiatesreadsandwritesonthebus.
■
Slave:Respondstomasterinitiatedreadsandwrites.
AswewillseeinthenextsectiontherearetwoprimarydrawbacksofASBthat ledtothedevelopmentofAHB;double-edgedclockingandthebi-directionaldata bus.ASBusesbothedgesofthebusclock,whichmakesitnotonlymoredifficultto understandanddesignwith,butalsoimposedincreasedcomplexityformostASIC designflowsandsynthesistoolsthatarebasedonusingonlytherisingedgeofclock. Similarly,thebi-directionaldatabus(andtri-statesignalsingeneral)isnotpossible undermanyASICdesignrules.Eveniftri-statesignalsarepossible,busturnaround timesalwayscausesomeperformancepenalty.AMBAAHBcorrectstheseissues. Forhighestperformance,typicaldesignsbasedonASBuseanARMprocessorwith awrite-backcache.Awrite-backcacheisacachealgorithmthatallowsdatatobe writtenintocachewithoutupdatingthesystemmemory.SinceASBdoesnothave anyprovisionsformaintainingcachecoherencyofmultiplecachingbusmastersonly oneprocessorcanbeusedonASB.
91
Chapter3
AMBAAHB AHBisthesecond-generationsystembusthatevolvedfromASBandimprovesupon itsbusclockingandbushandover.LikeASB,AHBalsosupportspipelining,bursts, andmultiple-masteroperation.Itusesthesamefourbusagents:Arbiter,Decoder, Master,andSlave.Allbusclockingissingle-edgeandusestherisingedgeofthebus clocktoeasethesynthesisflow.AllAHBsignalsareuni-directional,whichguaranteesimplementationonallASICdesignprocessesandenablessingle-cyclebus masterhandoff.ThedatabuswidthofAHBisspecifiedas8,16,32,64,128,256, 512,or1024bitswide,butpracticallyitisimplementedas32,64,or128bitswide. AHBalsosupportssplittransfersinwhichtheaddresstransferisseparatedfromthe datatransfer.Slavescanuseasplitresponsewhenitwilltakemanycyclestocompletethebustransaction.Overallbusutilizationisimprovedwhenslowerslavesdo notstallthebusforlongperiodsoftime.
AMBAAPB APBisaperipheralbusthatisoptimizedforlowpowerandinterfacingtoslower peripherals.APBisnormallyencapsulatedbehindabusbridgethatconnectsAPB toeitherASBorAHB.ThebridgeisaslaveonASBorAHB.LikeAHB,APBhas beenimprovedtouseonlytherisingedgeofthebusclock.TheAPBprotocoldoes notsupportanypipelining(addressandcontrolsignalsremainonthebusforthe entiretransfer).
AMBA3.0andAXI Attheendof2002,ARMannouncedAMBA3.0andalargelistofpartnersfor thisnewversionofAHB.ThespecificationwaslaterreleasedunderthenameAdvancedExtensibleInterface(AXI)anditisthenextgenerationofhigh-performance AMBA.AXIisalsoanopenspecificationthatiseasilydownloadablefromtheARM website.SomeofthefeaturesofAXIare: ■
Separateaddress/controlanddataphases
■
Supportforunaligneddatatransfersusingbytestrobes
■
Burst-basedtransactionswithonlystartaddressbroadcast
92
SoCVerificationTopicsfortheARMArchitecture ■
Separatereadandwritedatachannelstoenablelowcostdirectmemory access(DMA)
■
Issuingofmultipleoutstandingaddresses
■
Out-of-ordertransactioncompletion
AXIhasrecentlybeenincorporatedintonewARM11CPUcoressuchasthe ARM1156T2-SandtheARM1176JZ-S.ThesecoreswereannouncedatMicroprocessorForuminOctober2003andwillbeavailableforuseinmid-2004.AXIisthe nextevolutionoftheARMsystembusandwilllikelybeusedonallnewhigh-performancedesigns.
SummaryofARMCPUBusInterfaces BelowisasummaryofthebusprotocolsusedbythepopularARMCPUcores.When referencingaCPUtechnicalreferencemanual(TRM),sometimesthesignalnames arenotexactlythesameasthoselistedintheAMBASpecification,butthebehavior isusuallyequivalent.TRMsthatwereupdatedafterAMBA2.0wasreleasedmay provideinformationontheequivalenceoftheAMBAsignalnamesandtheCPU signalnames.AnexampleofthisistheARM920TRev1TRM.OtherTRMsthat werereleasedbeforeAMBA2.0willnothavethisinformation,butthereaderwill beabletomaketheconnectionssincethesignalnamesareusuallyverysimilar.Each CPUhasadditionalmemorybussignalsbeyondthoselistedintheAMBAspecification,soitisbesttofirstlearntheAMBAprotocolsandthenusetheTRMasa supersetofAMBAtolearnthedetailsofaCPUmemorybusinterface.MostTRMs assumethisAMBAknowledge.Forexample,theARM926EJ-SRev0TRMhasonly sevenpagesofinformationonthebusinterface.Thesepagesareusedonlytoqualify whichfunctionalityofAHBisusedbytheARM926EJ-S,soitassumescomplete knowledgeofAHB. ■
ARM7TDMIusesnon-AMBA(ARM7TDMInative)businterface.Itcan beconvertedtoAHBusingHDLwrappers.
■
ARM720TstartedwithASBinterfacebutrev4hasmigratedtoanAHB interface.
93
Chapter3 ■
ARM9TDMIusesnon-AMBA(ARM9TDMInative)businterface.Canbe convertedtoAHBusingHDLwrappers.
■
ARM940TusesanASBinterface.ItincludesextrasignalstoenableconversiontoAHBusinganHDLwrapper.
■
ARM920TandARM922TbothuseanASBinterface.Theyincludeextra signalstoenableconversiontoAHBusinganHDLwrapper.
■
ARM9E-Susesanon-AMBAHarvardarchitecture(separateinstructionand databuses).
■
ARM966E-SusesasingleAHBinterface.
■
ARM946E-SusesasingleAHBinterface.
■
ARM9EJ-Susesanon-AMBAHarvardarchitecture(separateinstruction anddatabuses).
■
ARM926EJ-SusesadualAHBinterface,oneAHBforinstructionsandone AHBfordata.
■
ARM1020EandARM1022EbothusedualAHBinterfaces.
■
ARM1026EJ-SusesadualAHBinterface.
■
ARM1136J-Susesfour64-bitAHBinterfacesplusa32-bitAHBinterfaceas aperipheralport.
AHBTutorial ThissectionpresentsatutorialonAMBAAHBfromtheviewofco-verification. ThepurposeisnottoteachhowtoimplementadesignforAHB,butrathertobe abletounderstandtheprotocolandtobeabletodiagnoseproblemsinthecontext ofdesignverification.Thegoalisnottorepeatalloftheinformationthatisprovided intheAMBAspecification,buttosummarizetheimportantpointsandoffersome insightintointerestinginformationthathasbeenlearnedbyexperienceusingreal ARMdesignsandvarioustypesofARMmodels.AdiagramofthesignalsofanAHB masterisshowninFigure3-15.
94
SoCVerificationTopicsfortheARMArchitecture HCLK HRESETn HGRANT HRESP[1:0] HRDATA[31:0] HREADY
HBUSREQ HLOCK
AHB Master
nIRQ nFIQ HCLKEN
HADDR[31:0] HTRANS[1:0] HSIZE[2:0]
HWRITE HBURST[2:0] HPROT[3:0] HWDATA[31:0]
Figure3-15:AHBmastersignals AHBasimplementedonARM7andARM9CPUcoresisasingle32-bitaddressand 32-bitdatabustoaccessmemory.Allsignals,withtheexceptionofthereset,are activehigh.Allactivityusestherisingedgeofthebusclockandallsignalsareunidirectional(notristatevalues).Interruptsignalsarealsosynchronous.PreviousCPU coressuchasARM7TDMIprovidedtheoptionforasynchronousinterruptsviathe ISYNCsignal,butwithAHBcorestheinterruptsignalsmustsynchronizedbefore presentingthemtotheCPU.TheprimaryAHBsignalsaredescribedbelow: HCLKisthemainbusclock.AllsignalsaresampledonanddrivenfromtherisingedgeofHCLK.Onmostpre-AHBCPUcoressuchasARM920Tand ARM940T,thereweretwoclocks,onefortheCPUcoreandoneforthebus interface.OnnewerAHBcoressuchasARM946E-SandARM926EJ-Sthereis onlyoneclockinputCLKthatclockstheCPUcoredirectly.TheAHBinterfacescanberunataslowerratebygatingCLKwithHCLKENtoproducethe AHBclockHCLK. HRESETnisanactivelowsignalusedtoresetthebusandCPU.Whenassertedit willcausetheCPUtorestartfromtheresetvector.FormanyARMcoresusing AHB,resetisalsousedtosampleconfigurationsignalsforendiannessandthe locationoftheexceptionvectors.
95
Chapter3 HBUSREQisthebusrequestsignalfromabusmasterthatwantstousethebus. EachbusmasterwillhaveitsownHBUSREQ.ForARMCPUcoreswith multipleAHBinterfacessuchasARM926EJ-S,eachAHBinterfacewillhaveits ownHBUSREQ. HGRANTisthebusgrantfromthearbitertothemasterthatsignalsthemaster canusethebus.ThemasterisgrantedthebuswhenHGRANTandHREADY arebothhigh.FailingtoalsounderstandthatHREADYmustalsobehighto indicateabusgrantisoneofthemostcommonmistakesengineersmakein tryingtounderstandAHBoperation. HLOCKindicatesthemasterisperformingalockedaccess(read-modify-write sequence)andtellsthearbiternottograntthebustoanothermasterduringthis sequence. HADDR[31:0]isthe32-bitaddressbus. HTRANS[1:0]indicatesthecurrenttransfertype,eitherIDLE,BUSY, NONSEQUENTIAL,orSEQUENTIAL.ValuesforHTRANSareshown inFigure3-18. HSIZE[2:0]indicatesthesizeofthetransfer.Usingthreebitsallowsforsizesupto 1024bitstobetransferred,butwiththe32-bitdatabusonlyvaluesof8,16,and 32-bitsareusedandonly2bitsofHSIZEareused.Forexample,theARM946E-S hasHSIZE[2]permanentlytiedlow.ValuesforHSIZEareshowninFigure3-20. HWRITEspecifiesaread(low)orawrite(high). HBURST[2:0]indicateswhetherornotthetransferispartofaburst.Using3bits, thereareeightdifferentburstvaluesthataresupported.IndividualARMcores willnotuseallofthebursttypes.Forexample,ARM946E-SusesSINGLE, INCR,INCR4,andINCR8.TheARM926EJ-SusesSINGLE,INCR4,INCR8, andWRAP8.AllofthevaluesforHBURSTarelistedinFigure3-19. HPROT[3:0]isdrivenbythemastertoprovidemoreinformationaboutthetypeof transfer.Itcanbeusedtoimplementprotectioncontrol.Informationprovided relatestodataorinstructionfetch(bit0),privilegedoruseraccess(bit1), bufferableornotbufferable(bit2),cacheableornotcacheable(bit3).Bus
96
SoCVerificationTopicsfortheARMArchitecture
mastersthatarenotprocessorswillnotusuallyprovidethisinformation.Values forHPROTareshowninFigure3-21.
HRDATA[31:0]isthe32-bitreaddatabus. HWDATA[31:0]isthe32-bitwritedatabus. HREADYisdrivenbyaslavetoindicatethatitsdatatransferisfinished;theslave canuseittoinsertwaitstates.Sincethebusispipelined,HREADYdoesnot onlypertaintothedataphaseofatransfer,italsohasimplicationsrelatedtothe terminationoftheaddressphaseandarbitration.Whenthelasttransferofadata phaseiscompleteditmeansthetransfercurrentlyintheaddressphasenow movestothedataphase.HREADYisalsousedtodeterminethebusgrantfor thenextmasterthatwillgainaccesstothebus.HREADYcanbeviewedas advancingthebuspipelinetothenextstage.SlavesmustbothsampleHREADY onthebustotrackthebusprotocolandalsodriveHREADYfortransfer whereitistheslave.ARMrecommendscallingtheHREADYinputtoaslave HREADYandtheoutputHREADYOUT. HRESP[1:0]istheslaveresponseindicatingifthetransferwascompletedsuccessfully,hadanerror,orneedstobecompletedusingthesplitorretryprotocol. ThefollowingsignalsarenotpartofabusmastersuchasanARMCPU,butare includedinaslaveorarbiter: HSELselectstheslavethatisresponsibleforthetransaction.Itisgeneratedfrom thedecoderbasedonthemastersuppliedaddress. HMASTER[3:0]isgeneratedbythearbitertoindicatewhichmasterisusingthe bus.Itisgeneratedwiththeaddressphaseofatransfer.Upto16busmastersare possiblewiththe4bitsofHMASTER. HMASTLOCKisgeneratedbythearbiterandindicatesthemasterisperforminga lockedtransfer. HSPLIT[15:0]isgeneratedbyaslavethatisnowreadytocompleteapreviously postponedtransferusingtheprotocolforsplittransfers.Thearbiterwillseethe HSPLITfromtheslaveandinformthemastertoretrythetransfer.
97
Chapter3
ConfigurationatReset Attheendofreset(risingedgeofHRESETn)ARMCPUcoressamplesomeinputs tosettheoperatingconfiguration.EachCPUhasdifferentconfigurationinformation.Someofthecommonconfigurationinformationdeterminedatresetisthe locationoftheexceptionvectors,memoryorganization(endianness),andenabling oftightlycoupledmemory(TCM).Followingaresomeexamplesforprocessorswith AHBinterfaces. TheARM946E-SandtheARM926EJ-SsamplethesignalVINITHItodetermine thelocationoftheexceptionvectors.IfVINITHIislow,thevectorsarelocatedat address0,butifVINITHIishighthenthevectorsarelocatedataddress0xffff0000. Asmentionedpreviously,endiannessisoneofthemostmisunderstoodissuesin co-verification.Toaddtotheconfusion,ARMcoreshaveuseddifferentwaystoconfigureendianness.TheARM7TDMIstartedbyusinganinputpinthatwassampled atresettosettheendianness.Later,coresliketheARM946E-Smovedtosoftware configuration.Thecorealwaysstartsoffinlittleendianmodeanddesignsthatwish tousebigendianmodeshouldreadonlywords(sinceendiannessdoesn’tmatterif youreadandwritewords)untilsuchtimeasthesoftwarechangesaregisterbitto specifybigendian.ThereisalsoanoutputpinnamedBIGENDOUTthatreflects thestateoftheregisterbit.Oncetheswitchtobigendianmodeismadethecorecan begintoaccessdatausingbyteandhalf-wordtransfers. Thedebatebetweenhardwareversussoftwarecontrolmustnothaveresultedina clearwinnersincetheARM926EJ-Sadoptedbothmethods.TheARM926EJ-Suses thesamesetupastheARM946E-Sexcepttheoriginalvalueoftheregisterissetby aninputsignalnamedBIGENDINITtodeterminethememoryorganization.If BIGENDINITishightheCPUwillstartinbigendianmodeandiflowitwillstart inlittleendianmode.AftertheinitialvalueissetbyBIGENDINITsoftwaremay changetoadifferentbyteorderbychangingtheregistervalue.ForARM926EJ-Sthe outputsignalthatreflectstheregistervalueiscalledCFGBIGEND.Itappearssoftwarewinsoutintheendsinceithasthelastchancetodetermineendianness,unless ofcoursehardwaredecidestobecomedifficultandissueanotherresettochange endiannessyetagain.Explainingendiannesscandefinitelycauseconfusion.After
98
SoCVerificationTopicsfortheARMArchitecture readingthiscanyourememberthedifferencebetweenthesignalsBIGEND,BIGENDOUT,BIGENDINIT,andCFGBIGEND?IusethemoftenandIstillmust lookthemupmuchofthetime. Anotherconfigurationoptionsampledatresetistheenablingofthetightlycoupled memory(TCM).TCMdetailsarecoveredinafuturesection,butTCMcanoptionally beenabledforthepurposeofbootingtheCPUfromcodelocatedinTCM.
PhasesofAHBTransfer TherearethreephasesoftheAHBtransfer:arbitration,addressphase,anddata phase.Toincreasebusutilization,allthreeofthesephasescanbeperformedin parallel.Thetechniqueofperformingthethreephasesinparallelisknownasbus pipelining.Whenwaitstatesareinsertedintothedataphase,ithasthesideeffectof delayingtheadvancementoftheaddressphaseandarbitrationphase.Notunlikea CPUpipeline,thebuspipelineisshowninFigure3-16. Arbitrate
Address Phase
Data Phase
Arbitrate
Address Phase
Data Phase
Arbitrate
Address Phase
Data Phase
Figure3-16:ThreephasesofAHBtransaction
AHBArbitration Thebasicarbitrationmechanismisstraightforwardtounderstand,butIhavelearned thatitismorecomplicatedthanitlooks.Eachbusmasterhasitsownrequestand grantsignalconnectedtothearbiter.BusmasterswillassertHBUSREQwhenthey desiretousethebus.ThearbiterwillgrantthebusbyassertingHGRANT.The masterhasbeengrantedthebuswhenitsamplesHGRANTandHREADYhighon therisingedgeoftheclock.TherequirementforHREADYresultsfromthepossibil-
99
Chapter3 itythatwaitstatescanbeinsertedintoadataphase.Whilethedataphaseisstalled, theaddressphaseisalsostalled,andthearbitrationisalsostalled. LOCKisalsopartofarbitration,andisusedwhenabusmasterwantstodoasequenceoftransfersthatmustnotbeinterrupted.Examplesincluderead/modify/write operationsusedtoimplementsemaphoresandotheratomicoperations.Thearbiter musthonorLOCKandmakesurenootherbusmasterisgrantedthebusuntilthe lockedsequencefinishes.ThearbiteralsoassertsHMASTLOCKtoindicateto slavesthatalockedsequenceisinprogress. SplittransferswereintroducedintheAHBprotocoltoimprovebusutilizationby decouplingtheaddressanddataphasesforthosecaseswhereslavesrequiremany cyclestocompleteatransfer.ThesignalsinvolvedareHSPLITandHRESP.The splitprotocolallowsthedataphasetobepostponedintothefutureandallowsother transferstobestartedonthebus.Inconcept,thesplitprotocolsoundsveryuseful, butfeedbackfromengineersindicatesitisnotoftenused(althoughalmosteverybody claimstheywilluseitinthefuture).Indesignswithslowperipheralsorinterfaces wherethetimetocompleteatransferisvariable,mostdesignsfindotherwaysto interfacetheseperipheralstoAHB.Forthisreason,nomoredetailaboutthesplit protocolisprovidedhere. Interestingsituationsoccurwheneitherthearbiterorthedefaultslaveuses HREADYtoinfluencethearbitration.TheAMBAspecificationmakesitnecessary toassertHREADYwhenthereisnodataphaseinprogress.Takeforinstancethe firstinstructionfetchafterreset.Sincethisisthefirstnon-IDLEtransfer,HREADY mustbeassertedsotheCPUcanreceiveabusgrant.Thissituationexpandsthe meaningofHREADYbeyondthebasicuseofinsertingwaitstatesintheaddress phase.OnceitisrealizedthatHREADYdoesmorethanjustinsertwaitstates,itcan beusedcreativelytoinfluencearbitration.Forexample,thefirstfetchafterresetis usuallyfromaslowmemorysuchasflashorotheroff-chipmemory.SinceHREADY mustbeassertedtoenablethefirstbusgrant,whynotassertHREADYfor2clocks toprovidethesecondbusgrantaswellandgetthesecondaddressbeingaccessed (address4).ThissituationisshowninFigure3-17.Noticeaddress0isonlyonthe busfor1clock.
100
SoCVerificationTopicsfortheARMArchitecture
Figure3-17:UseofHREADYinarbitration
AHBAddressPhase Onceamasterisgrantedthebus,thenextphaseistheaddressphase.Duringthe addressphase,themasterwillputtheaddressonthebus,alongwiththeother attributesthatdefinethetransfer.Theaddressphaseisonly1clocklong,butasmentionedwaitstatesinsertedintothedataphasebytheuseofHREADYhavetheside effectofextendingthenextaddressphasesinceitisstalledfromusingthedatabus. Slaveswillsampletheaddressphasesignalsandwillpreparetheresponsesignalsfor thenextclockcycle.Theaddressphasesignalsdrivenbythemasterare:HTRANS, HWRITE,HSIZE,HBURST,andHPROT.Valuesforsomeofthesesignalsare giveninFigures3-18,3-19,3-20,and3-21. Cycle Type
Description
HTRANS[1:0]
IDLE
No bus activity
00
BUSY
Master inserting wait states
01
Transfer with address NONnot related to the SEQUENTIAL previous Transfer with address SEQUENTIAL related to the previous transfer
10 11
Figure3-18:Transfertypevalues
101
Chapter3 Cycle Type
Description
HBURST[2:0]
SINGLE
Single Transfer
000
INCR
Incrementing Burst (length unknown)
001
WRAP4
Burst length 4 Wrapping Address
010
INCR4
Burst length 4 Incrementing Address
011
WRAP8
Burst length 8 Wrapping Address
100
INCR8
Burst length 8 Incrementing Address
101
WRAP16
Burst length 16 Wrapping Address
110
INCR16
Burst length 16 Incrementing Address
111
Figure3-19:Burstvalues
102
SoCVerificationTopicsfortheARMArchitecture Size
HSIZE[2:0]
8 bits (byte)
000
16 bits (half word)
001
32 bits (word)
010
64 bits
011
128 bits
100
256 bits
101
512 bits
110
1024 bits
111
Figure3-20:Sizevalues
Description
HPROT[0]
Description
HPROT[1]
Opcode Fetch
0
User Access
0
Data Access
1
Privileged Access
1
Description
HPROT[2]
Description
HPROT[3]
Not bufferable
0
Not Cacheable
0
Bufferable
1
Cacheable
1
Figure3-21:Protectionvalues
103
Chapter3 AbasicAHBtransferisshowninFigure3-22.
Figure3-22:BasicAHBtransfer
AHBDataPhase Thedataphaseisusedtotransferdataonthebusbetweenmasterandslave.The mastercaninsertwaitstatesusingHTRANS=BUSY.Theslavemayinsertwait statesbybringingHREADYlow.TheslavealsousesHRESPtoprovidethestatus ofthetransfer.ThepossiblevaluesforHRESPareshowninFigure3-23. Description
HRESP[1:0]
Completed Sucessfully
00
Error occured
01
Master should retry
10
Perform Split Protocol
11
Figure3-23: Slavetransferresponsevalues
104
SoCVerificationTopicsfortheARMArchitecture ForCPUcores,anHRESPofERRORfromtheslavemapstoanabortintheCPU exceptiontable.Therearetwokindsofaborts,dataandinstructionprefetchasalreadydiscussedintheinterruptandexceptionsectionofthischapter. AHBusesseparatedatabussesforreaddataandwritedata.Thisavoidstheneedfor tri-statesignals.WhenthemasterperformsawriteHWDATAisusedbythemaster forthedatabeingwritten,andwhenthemasterperformsareadHRDATAisused bytheslaveforthereaddata.DatatransfersoccurontherisingedgeofHCLKwhen HREADYishigh. Thedatabususesall32addressbitsalongwiththetransfersize,HSIZE,todeterminewhichportionofthedatabusisbeingusedonaparticulartransfer.Theaddress isalignedtothetransfersizeandtheportionsofthedatabususedonhalf-wordand bytetransfersaredeterminedbyendianness.Themastermustputdataincorrect bytelanesonwrites,andtheslavemustdothesameonreads.Unalignedmemory accesseswerenotpartoftheARMarchitectureuntiltherecentlyintroducedv6 extensions. Somemastersreplicatedataonwrites,butitisnotrequired.Datareplicationapplies totransfersthatarelessthanthebuswidth.Inthecaseofabytewrite,themaster willputthewritedataonallbytesofthedatabus,regardlessoftheaddress.Thesame istruefor16-bitwritesona32-bitbus;themasterwillputthedataonbothhalvesof thedatabus.Ifthemasterdoesreplicationtheendiannessdoesn’tmatter,sincethe slavewillsamplethecorrectdatanomatterwhichbytesorhalfwordittakesfrom thebus.Thispracticeisnotagreatidea,sinceithasbeenshowntomaskproblems inslaves.IfaslaveistestedwithaCPUcorethatperformsdatareplicationonwrites andthenislaterusedwithamasterthatdoesnotdoreplication,aproblemmay occur.Thisisespeciallytrueforslavesthataredesignedtoworkwithbothbigand littleendianconfigurations,sincethereisnoguaranteetheslaveishandlingendianesscorrectly.Itisnotcommonforslavestodoanydatareplicationforreadtransfers. TheARM7TDMIalwaysreplicatesdataonwritestomakeiteasierforslavestotake datafromanyplace.ThisdatareplicationisshowninFigure3-24.TheCPUiswritingaddress0x20000000withdatavalueof0x61whichisdrivenonall4bytelanes.
105
Chapter3
Figure3-24:Datareplicationonwrites AHBdoesnotuseanybyteenablestospecifywhichbytelanesareactive.This meansallmastersandslavesmustknowandunderstandthebyteorder(endianness) onthebustointerprettheportionofthedatabusbeingusedonhalf-wordandbyte transfers.ThisendiannessagreementwasperfectlyfineuntiltheARM926EJ-Sadded awritebuffer.ThewritebufferrequiredanewsignalnamedDHBLbeaddedto specifywhichbytelanesareactiveduringawrite.Thereasonisthatendiannesscan bedynamic,sotheslaveneedstoknowmoreaboutthetransaction.Softwarecan changeendianness.ThiswillchangethevalueoftheBIGENDOUTsignal.DelayingwritesbyusingawritebufferhasthepotentialofBIGENDOUTchangingtoa newendiannessvaluewhilewritesoftheoldendiannessarestillinthewritebuffer. Whenthesewritesmakeittothebustheywillbeinconsistentwiththevalueof BIGENDOUT,hencetheneedforDHBL.
AHB-Lite AnothervarietyofAHBthatisnotcoveredintheAHBspecification,butiscoveredinaseparatedocument,iscalledAHB-Lite.Therearemanysituationswhere AHBisdeployedandonlyonebusmasterispresentonthebus.Inthesecasesthere isnoneedforanarbitertotakecareofrequestandgrant.Thereisalsonobenefitto thesplitorretryprotocolsincethereisnoothermasterthatcanusethebusanyway whenaslavewouldrequirealongtimetocompleteatransfer.AHB-Literemovesthe HBUSREQandHGRANTsignalsandspecifiesthatslavescannotusethesplitand retryresponse.AnyAHBmasterisautomaticallyanAHB-Litemasterbysimplynot
106
SoCVerificationTopicsfortheARMArchitecture connectingHBUSREQandconnectingtheHGRANTto1.AllAHBslavesthat don’timplementsplitorretry(almostallslavesinexistencetoday)areautomatically AHB-Liteslaves.AHBslavesthatdoimplementsplitorretrycanbefittedwithan HDLwrappertohandlethesplitandretrywithnochangestotheslave.
Single-LayerandMultilayerAHB ForanAHBsystemwithmultiplemasters,therearetwodifferentstylesofimplementation.Thetraditionalbusstructurewheretherearemultiplemastersandslaveson asharedbuscontrolledbyanarbitrationprotocolusingrequestsandgrantsiscalled single-layerAHB.Withsingle-layerAHBallmasterssharethebusbandwidthand apartfrompipeliningthereisonlyonedatapathbetweentheactivemasterandslave atanytime.ForSoCdesignwhereextrasignalsdon’timpactthenumberofpinsand packagesize,itispossibletoallowformultipleconnectionsbetweenmastersandslaves tobeactiveinparallel.Thisincreasesthebusbandwidthandincreasesperformance. AHBincludesaspecificationonhowtodothiscalledmultilayerAHB.Multilayer AHBreplacesthesharedbuswithaninterconnectionmatrixusingamultiplexing schemetoconnectmastersandslavestogether.Inadditiontoperformancebenefits, multilayerAHBalsoremovestherequirementforarbitrationonthemastersandmoves arbitrationtotheinterconnectionmatrix.Theinterconnectionmatrixconsistsofaset ofmultiplexerfunctions.EachmasterandslaveinmultilayerAHBcanusetheAHBLiteprotocolinsteadofthefullAHBprotocolthatincludesarbitration.
ARM926EJ-SExample ThissectionusestheARM926EJ-SasanexampleofdifferentAHBbusarchitectures.TheARM926EJ-SutilizesaHarvardarchitecturewithtwoAHBinterfaces, onefortheinstructionbusandoneforthedatabus.Fromtheabovedescriptions ofsingle-layerAHB,multilayerAHB,andAHB-Litetherearemanydifferent combinationsofhowtheARM926canfitintoanAHBsubsystem.Threepossible alternativesarediscussed: ■
Bothmastersconnectedtosingle-layerAHB
■
EachmasterconnectedtoadifferentAHB
■
BothmastersconnectedtoamultilayerAHBinterconnectmatrix
107
Chapter3 ThefirstimplementationfortheARM926isconnectingbothmasterstosingle-layer AHB.EachAHBmasterhasabusrequest(IHBUSREQandDHBUSREQ)anda grant(IHGRANTandDHGRANT).Anarbiterwilldeterminewhichbusmaster isallowedtousethebus.Thearbiternormallygivesahigherprioritytothedata AHB.ThisstructureisshowninFigure3-25.Forsingle-layerAHB,bothmasters mustrunatthesameclockspeedsincetheyareonasharedbus.
Arbiter
I AHB Master
Decoder
AHB Slave #1
D AHB Master
AHB Slave #2
Figure3-25:ARM926usingsingle-layerAHB
ThesecondalternativeistoconnecteachmastertoaseparateAHB.Sincethemastersare(almost)independent,theydonotneedtouseasharedAHBbusorevenrun atthesameclockfrequency.Internally,theARM926willuseasingleclock,buteach AHBinterfacehasseparateclockenables(IHCLKENandDHCLKEN).Effectively thisallowstheAHBbusinterfaceunitstorunatdifferentclockspeedsthataremultiplesofeachother.Figure3-26showsadiagramofconnectingeachAHBmasterto separatebusses.
108
SoCVerificationTopicsfortheARMArchitecture
I AHB Master
Arbiter
Arbiter
Decoder
Decoder
AHB Slave #1
AHB Slave #2
AHB Bridge #1
AHB Slave #2
D AHB Master
Figure3-26:EachAHBonseparatebusses ThethirdalternativeistousemultilayerAHBandconnecteachmastertoaninterconnectionmatrix.Thismovesthearbitrationresponsibilitytotheinterconnection matrixandcanofferhigherperformance.TheuseofmultilayerAHBisshownin Figure3-27.Thisalternativeallowsconcurrenttransactionsbetweenmultiplemastersandslavestoimproveperformance.
109
Chapter3
I AHB Master
AHB Slave #1
D AHB Master
AHB Slave #2
Interconnection Matrix
Figure3-27:MultilayerAHB Inadditiontotheimpactonthehardwaredesign,softwareimpactmustalsobe consideredwhendeterminingbusarchitecture.Althoughthereisusuallynodependencybetweeninstructionmemoryanddatamemory,therearesituationswhensuch dependenciesexist.Someexamplesofdependenciesbetweeninstructionsanddata areloadingorcopyinginstructionstoadifferentmemory,suchascopyingcritical codefromflashtoSDRAM.Otherexamplesareself-modifyingcode,wheresoftware explicitlymodifiesothersoftwareinstructions.Althoughself-modifyingcodeisnot commonpractice,oneexampleofthisistheuseofsoftwarebreakpoints.Onetechniquefordebuggerstosetbreakpointsistooverwritememorywithanewinstruction thatwillcauseexecutiontostopatthebreakpoint.Sincethewritetomodifythe instructionwilltakeplaceonthedataAHBandtheinstructionwillbefetchedfrom theinstructionAHBitispossiblethatthenewinstructionmaynotbefetchedifthe writedidnotreachmemorybeforetheinstructionfetchoccurred.Therearemany possiblereasonsforthedatawritenotreachingmemory.Thedatacouldhavebeen writtenintothedatacache(whichisawrite-backcache)andmaybesittinginthe cacheinthemodifiedstate.Thedatacouldbeinthewritebuffer.Thedatawrite mayalsonotbecompleteduetospeeddifferencesbetweenthetwoAHBinterfaces (theD-AHBisrunningveryslowandtheI-AHBisrunningveryfast).Theseand otherissuesaretakencareofinsoftwareusingcommonfunctionstoinvalidate caches,makingsurewritebuffersandprefetchbuffersareflushed.
110
SoCVerificationTopicsfortheARMArchitecture
InterruptSignals AlthoughinterruptsarenotpartoftheformalAMBAspecifications,ARMCPU coresusetwointerruptsignalstoindicatearequestforservice,nFIQ(fastinterrupt request)andnIRQ(normalinterruptrequest).Aninterruptwillcausetheprocessortofetchandexecutetheinterruptserviceroutine.ThisbehaviorontheAHBis showninFigure3-28.
Figure3-28Fetchinginterruptvectorafteraninterrupt
InstructionandDataCaches Almostallmicroprocessorsusecachingasawaytoincreaseperformance.Caches providehighmemoryperformancebystoringfrequentlyusedinstructionsanddata. Cacheoperationismostlytransparenttosoftwareoperation.Thismeanssoftware isnotrequiredtoexplicitlycontrolwhatisstoredinthecache.Thecachecontrollerhardwarewillautomaticallydeterminewhichdatashouldbestoredinthecache basedonthecachealgorithm.Whileitisnottheaimofthissectiontoexplainall oftheprinciplesofcaching,itprovidesagoodoverviewofARMcachesandsome basicsofhowtheywork.
111
Chapter3 Cachesoperatebydefiningaunitofdatacalledacacheline.Thisistheamountof datathecachewilloperateon.Sinceburstingdataonthebusismoreefficientthan movingsinglewords,thecachelinesizeischosenassomenumberofbytesthatrepresentsaburstaccessonthebus.For32-bitdatabussesacommonlinesizeis16or32 bytes.Thiscorrespondstoaburstoffouroreightwords.Cacheoperationswillloada cachelineintothecacheorwritealinefromcachetomemory.IftheCPUneedsto readorwritedatathatisnotinthecacheitiscalledacachemiss.IftheCPUneeds toreadorwritedatathatisalreadyinthecacheitiscalledacachehit. Memoryreadsthatmissthecacheandareretrievedviathebusarealwaysplacedin thecache.Theassumptionincachingisthereisahighprobabilitythesoftwarewill accessthesameinformationoraddressesveryneartothedatabeingread(withinthe samecacheline).Writesthatmissthecachecanbehandleddifferentways.Oneway istowritememoryandleavethecacheasitwas.Anotherwayistowritememory andthenreadthelineintocachewiththeassumptionwillbeusedagain.Aneven moreefficientwayistoreadthelineintocacheandwritethecontentsofcache withoutwritingthedatatomemory.Thisissometimescalledawrite-allocatebecause itmakesallwriteslooklikecachelinereads. Therearemanyvariations,butalgorithmsthatensurethelatestdataisconsistent withmemoryarecalledwrite-throughcaches.Withawrite-throughcache,writehits tothecachewillupdatethecacheandalsomemory(tomaintainconsistency).An algorithmthatallowsthecachetocontaindifferentdatathanmemoryiscalled awrite-backcache.Write-backcachesaremorecomplexsincetheinconsistencies betweencacheandmemorymustberesolvedifsomeotherbusmasterwouldlike toreaddatafrommemoryforanaddressthathasbeenmodifiedinoneofthedata caches.Intheworkstationandserverworlditiscommontohavemultipleprocessor systemswheremanyprocessors,eachwithmultiplelevelsofwrite-backcaches,are connectedtoacommonbus.Thisarchitectureisknownassymmetricmulti-processing (SMP).Debuggingcachecoherencyissuescanbehair-pullingatbest.Fortunately, theworldofembeddedsystemsandSoCdesignhasnotreachedthislevelofcomplexity,yet.
112
SoCVerificationTopicsfortheARMArchitecture MostARMcoreshaveoneormorecaches,butnotall.Somearewrite-backand somearenot.Synthesizablecoresallowfordifferentcachesizestobeselected.Cachesarecontrolledviacoprocessor15,partoftheCPUmacrocell. FollowingisalistoftheARMcoresandasummaryofthetypeofcachestheyuse: ■
ARM7TDMIhasnocache.
■
ARM720Tcontainsan8kbunifiedinstructionanddatacache.Cacheline sizeis16bytes(4words)anditusesa4-waysetassociativealgorithm.
■
ARM9TDMIhasnocache.
■
ARM940Tusesseparateinstructionanddatacaches,eachis8kb.Cache linesizeis16bytes(4words)anditusesa64-waysetassociativealgorithm. Thedatacachescanbeconfiguredforwrite-throughorwrite-backoperation.
■
ARM920Tusesseparateinstructionanddatacaches,eachis16kb.Cache linesizeis32bytes(8words)anditusesa64-waysetassociativealgorithm. Thedatacachescanbeconfiguredforwrite-throughorwritebackoperation.
■
ARM922TisidenticaltoARM920Texceptthecachesizesareonly8kb.
■
ARM9E-Shasnocache.
■
ARM966E-Shasnocache.
■
ARM946E-Susesseparateinstructionanddatacaches.Itallowsthedesigner toselectfromcachesizesbetween0and1Mb.Cachelinesizeis32bytes(8 words)anditusesa4-waysetassociativealgorithm.Thedatacachescanbe configuredforwrite-throughorwritebackoperation.
■
ARM9EJ-Shasnocache.
■
ARM926EJ-Susesseparateinstructionanddatacaches.Itallowsthedesignertoselectfromcachesizesbetween4kband128kb.Cachelinesizeis32 bytes(8words)anditusesa4-waysetassociativealgorithm.Thedatacaches canbeconfiguredforwrite-throughorwritebackoperation.
113
Chapter3 ■
ARM1022Eusesseparateinstructionanddatacaches,eachis16kb.Cache linesizeis32bytes(8words)anditusesa64-waysetassociativealgorithm. Thedatacachescanbeconfiguredforwrite-throughorwritebackoperation.
■
ARM1026EJ-Susesseparateinstructionanddatacaches.Itallowsthedesignertoselectfromcachesizesbetween0and128kb.Cachelinesizeis32 bytes(8words)anditusesa4-waysetassociativealgorithm.Thedatacaches canbeconfiguredforwrite-throughorwritebackoperation.
■
ARM1136J-Susesseparateinstructionanddatacaches.Itallowsthedesignertoselectfromcachesizesbetween4and64kb.Cachelinesizeis32bytes (8words)anditusesa4-waysetassociativealgorithm.Thedatacachescan beconfiguredforwrite-throughorwritebackoperation.
Cachecontrolisdoneviacoprocessor15registers.Figure3-29showsasmallexample ofhowtoturnoncachingfortheARM926EJ-S. ; Enable the caches MOV R1, #0xFFFFFFFF MCR p15, 0, R1, c3, c0, 0 MRC p15, 0, R0, c1, c0, 0 ORR R0, R0, #0x7D ; bits 3:6 should be HIGH ; bit 0: mmu ; bit 2: d-cache ORR R0, R0, #0x5000 ; bit 12: i-cache ; bit 14: Round Robin replacement MCR p15, 0, R0, c1, c0, 0 MOV
PC,
R14
; return
Figure3-29:InstructionstoenablecacheonARM926EJ-S
114
SoCVerificationTopicsfortheARMArchitecture
TightlyCoupledMemory(TCM) Embeddedapplicationsoftenhaverequirementsforfast,deterministicmemoryto storereal-timedataandperformancecriticalinstructionsequences.SomeARM coresprovidetightly-coupledmemory(TCM)tosatisfythisrequirement.TheARM coreprovidesaninterfacetoTCM,butthememoryitselfisimplementedoutsideof theCPUcore.WhenTCMfunctionalityisprovidedforHarvardarchitecturecores, thereareseparatememoryinterfacesforinstruction(ITCM)anddata(DTCM).By locatingthememoryoutsideofthecore,designershavethegreatestflexibilityin systemdesignandcanworkwithdifferencesinRAMlibrariesforspecificsemiconductorprocesses. TCMisdifferentfromcachememorysinceitcanbeaddresseddirectlybysoftwareat aspecificlocationinthemicroprocessormemorymap.TCMisusuallyimplemented assingle-cycleSRAM,andTCMsizeisspecifiedusinginputsignalstotheCPU core.TCMstatusandcontrolisdoneviaprogrammingregistersincoprocessor15. SoftwarecanenableanddisableTCMaswellasassignthelocationinthememory map.Asanexample,informationabouttheTCMinterfacefortheARM926EJ-Sis presentedbelow. DataTCMisalwaysdisabledatresetandmustbeexplicitlyenabledbysoftware. InstructionTCMisdisabledatresetunlesstheINITRAMsignalishigh.With INITRAMhigh,ITCMstartsenabledandwillrespondtomemoryrequestsataddress0.ThisoptionallowstheCPUtobootdirectlyfromTCM.Ofcourse,booting fromTCMimpliesthedesigncanloadinstructionsintotheITCMbeforeresetand withouttheuseofsoftware.IfITCMistobeenabledatreset,butnotusedforthe resetvectors,theARM926EJ-SofferstheVINITHIsignaltotelltheCPUtoboot fromaddress0xffff0000instead.WhenVINITHIishigh,theCPUwilllocatethe exceptionvectorsat0xffff0000. BecausetheTCMinterfaceisoptimizedforsingle-cycleperformance,thereisno protectionagainstsoftwarereadingfromthismemory.TheMMUcanbeusedwith TCMtoprotectagainstunauthorizedaccess,butonlyabortedwritesareguaranteed nottotakeplace.Abortedreadswillstillreadmemory,sothereisnowaytoprotect againstTCMreads.
115
Chapter3 TheITCMandDTCMsignalsfortheARM926EJ-Saresummarizedbelow.Signals startingwithIRarefortheinstructionTCM(ITCM)andthosestartingwithDR areforthedataTCM(DTCM).NotallARMcoreswithTCMusethesamesignal names,buttheprotocolisverysimilar.ThelessfrequentlyusedDMAsignalsarenot coveredhere. IRADDR[17:0]andDRADDR[17:0]aretheaddressbussesforITCMandDTCM. IRCSandDRCSarethechipselectsthatenablethememory. IRIDLEandDRIDLEindicatetheTCMinterfaceisidle.Thisallowsforthedesign tostoptheclocktothememoriesorevenpowerdownthememorieswhenthey arenotbeingused. IRnRWandDRnRWindicateiftheaccessisaread(low)orwrite(high). IRRD[31:0]andDRRD[31:0]arethedatabussesusedforreads. IRWD[31:0]andDRWD[31:0]arethedatabussesusedforwrites. IRSEQandDRSEQindicateasequentialaddressisbeingaccessed.Ifsingle-cycle memoryisusedthereisnouseforthissignal,butitisusefulforpipelinedburst memoriesthattaketwocyclesforthefirstaccessthenonecycleonsuccessive accesses. IRSIZE[3:0]andDRSIZE[3:0]specifythesizeofthememory.Sizevaluesare giveninFigure3-30.Sizevaluesshouldmatchthevalueprogrammedincp15by software. IRWAITandDRWAITinsertwaitstatesinthememoryaccess. IRWBL[3:0]andDRWBL[3:0]arethewriteenablesforeachofthe4databyte lanestoallowforbyteandhalf-wordwrites.Thesesignalsarenotusedforreads sinceallreadsarewordreads.
116
SoCVerificationTopicsfortheARMArchitecture IRSIZE[3:0] DRSIZE[3:0]
Size
8 kb 4
0011 0100 0011
16 kb
0101
32 kb
0110
64 kb 4 kb
0111 0011
128 kb 512 kb
1000 1001 0011 1010
1 Mb
1011
4 kb
4 kbkb 256
Figure3-30:ValuesforTCMsize
FollowingisanexampleofenablingtheTCMintheCPUmemorymap.Tosetup a32kbDTCMandITCMataddresses0x40008000and0x4001000respectively, register9ofcoprocessor15mustbeprogrammed.Bits[31:12]ofthis32-bitregister containthebaseaddressoftheTCM,bits[5:2]areprogrammedaccordingtoTCM size,andbit0istheenable/disablebit.Allotherbitsshouldbeprogrammedto0. Theinstructionstoprogramr9ofcp15areshowninFigure3-31. ; Set up the TCMs and enable them. ; This must be done before any stack use LDR R0, =0x40010019 MCR p15, 0, R0, c9, c1, 1 ; Initialise ITCM LDR R0, =0x40008019 MCR p15, 0, R0, c9, c1, 0 ; Initialise DTCM
Figure3-31:InstructiontosetupTCM
117
Chapter3 Whenphysicaladdress0x40008000isaccessedbysoftware,aDTCMaccesswilltake placewithDRADDRof0.Whensoftwareaccessesaddress0x40008004,TCMaccesswithDRADDRof4willbedone.
ARMSummary ARMissuccessfulbecauseitoffersawidevarietyofCPUcoresthatmeettherequirementsofmanySoCapplications.ARMoriginallybuiltitsreputationon low-powerasanearlypioneerembedding32-bitmicroprocessorsintochipsdesigned forhandheldconsumerelectronics.Certainly,astheARMarchitecturehasgrown andmorecomplexdesignshavebeencompletedtoprovidehigherperformance,the powermustalsoincreasetomeetperformancelevels.Eventhoughthereareprobably otherCPUdesignswithlowerpowerorhigherperformancecomparedtoARM,the companyhascontinuedtodominatethemarketbyprovidingthebroadestsetofdesigns,tools,andpartnershipstoprovidemanywaysfordesignprojectstomakeuseof ARMtechnologyandthesupportingtoolsprovidedbyARMpartners.AnengineerI metataconferencesummeditupaccurately,“nobodygetsfiredforchoosingARM.” WithagoodoverviewoftheARMarchitectureandthekeyfeaturesofARMcores, thenextchapterwillfocusonhardware/softwareco-verificationandhowbothhardwareengineersandsoftwareengineerscanbetterunderstandhowtoimprovethe processofintegratingsoftwarewithhardwarebeforeadesigniscommittedtofabrication.Therearemanywaystodothisandwewillexaminemanyofthemincluding theprosandconsofeach.
118
CHAPTER
4
Hardware/Software Co-Verification Althoughhardware/softwareco-verificationhasbeenaroundformanyyears,over thelastfewyears,ithastakenonincreasedimportanceandhasbecomeaverificationtechniqueusedbymoreandmoreengineers.Thetrendtowardgreatersystem integration,suchasthedemandforlow-cost,high-volumeconsumerproducts, hasledtothedevelopmentofthesystem-on-a-chip(SoC).InChapter1,theSoC wasdefinedasasinglechipthatincludesoneormoremicroprocessors,application specificcustomlogicfunctions,andembeddedsystemsoftware.IncludingmicroprocessorsandDSPsinsideachiphasforcedengineerstoconsidersoftwareaspartofthe chip’sverificationprocessinordertoensurecorrectoperation.Thetechniquesand methodologiesofhardware/softwareco-verificationallowprojectstobecompletedina shortertimeandwithgreaterconfidenceinthehardwareandsoftware.Inthe EETimes“2003SalaryOpinionSurvey,”agoodnumberofengineersreportedspendingmorethanone-thirdoftheirdayonsoftwaretasks,especiallyintegratingsoftware withnewhardware.Thisstatisticrevealsthatthedaysofthrowingthehardwareover thecubiclewalltothesoftwareengineersaregone.Inthefuture,hardwareengineers willcontinuetospendmoreandmoretimeonsoftwarerelatedissues.Thischapter presentsanintroductiontocommonlyusedco-verificationtechniques.
HistoryofHardware/SoftwareCo-Verification Co-verificationaddressesoneofthemostcriticalstepsintheembeddedsystem designprocess,theintegrationofhardwareandsoftware.Thealternativetoco-verificationhasalwaysbeentosimplybuildthehardwareandsoftwareindependently,
119
Chapter4 trythemoutinthelab,andseewhathappens.WhenthePCIbusbegansupporting automaticconfigurationofperipheralswithouttheneedforhardwarejumpers,the termplug-and-playbecamepopular.AboutthesametimeIwasworkingonprojects thatsimplybuilthardwareandsoftwareindependentlyanddifferenceswereresolved inthelab.Thistechniquebecameknownasplug-and-debug.Itisanexpensiveand verytime-consumingeffort.Forhardwaredesignsputtingoff-the-shelfcomponents onaboarditmaybepossibletodosomereworkontheboardorchangesomeprogrammablelogicifproblemswiththeinteractionofhardwareandsoftwarearefound. Ofcourse,thereisalwaysthe“softwareworkaround”toavoidaggravatinghardware problems.Asintegrationcontinuedtoincrease,somethingmorewasneededtoperformintegrationearlierinthedesignprocess.Thesolutionisco-verification. Co-verificationhasitsrootsinlogicsimulation.TheHDLlogicsimulatorhasbeen usedsincetheearly1990’sasthestandardwaytoexecutetherepresentationofthe hardwarebeforeanychipsorboardsarefabricated.Asdesignsizeshaveincreased andlogicsimulationhasnotprovidedthenecessaryperformance,othermethods haveevolvedthatinvolvesomeformofhardwaretoexecutethehardwaredesign description.Examplesofhardwaremethodsincludesimulationacceleration,emulation,andprototyping.Inthischapter,wewillexamineeachofthesebasicexecution enginesasamethodforco-verification. Co-verificationborrowsfromthehistoryofmicroprocessordesignandverification.In fact,logicsimulationhistoryismucholderthantheproductswethinkofascommerciallogicsimulatorstoday.Themicroprocessorverificationapplicationisnotexactly co-verificationsincewenormallythinkofthemicroprocessorasaknowngoodcomponentthatisputintoanembeddedsystemdesign,butnevertheless,microprocessor verificationrequiresalargeamountofsoftwaretestingfortheCPUtobesuccessfully verified.Microprocessordesigncompanieshavedonethislevelofverificationfor manyyears.Companiesdesigningmicroprocessorscannotcommittoadesignwithoutfirstrunningmanysequencesofinstructionsrangingfromsmalltestsofrandom instructionsequencestobootinganoperatingsystemlikeWindowsorUNIX.This levelofverificationrequirestheabilitytosimulatethehardwaredesignandhave methodsavailabletodebugthesoftwaresequenceswhenproblemsoccur.Aswewill see,thisisakindofco-verification.
120
Hardware/SoftwareCo-Verification Ibecameinterestedinco-verificationafterspendingmanyhoursinalabsetting tryingtointegratehardwareandsoftware.Ithinkitwasjusttoomanydaysoflogic analyzerprobesfallingoff,failedtriggerconditions,makingeducatedguessesabout whatmightbehappening,andsometimesjustplaintrial-and-error.Idecidedthere mustbeabetterwaytositinaquiet,air-conditionedcubicleandfigureoutwhatwas happening.Fortunatelyforme,therewerebetterwaysandIwasfortunateenoughto getjobsworkingonsomeofthem.
CommercialCo-VerificationToolsAppear Thefirsttwocommercialco-verificationtoolsspecificallytargetedatsolvingthe hardware/softwareintegrationproblemforembeddedsystemswereEagleifromEagle DesignAutomationandSeamlessCVEfromMentorGraphics.Theseproductsappearedonthemarketwithinsixmonthsofeachotherinthe1995-1996timeframe andbothwerecreatedinOregon.EagleDesignAutomationInc.wasfoundedin 1994andlocatedinBeaverton.TheEagleproductwaslateracquiredbySynopsys, becamepartofViewlogic,andwasfinallykilledbySynopsysin2001duetolackof sales.Incontrast,MentorSeamlessproducedconsistentgrowthandestablisheditself astheleadingco-verificationproduct.Othersfollowedthatwerebasedonsimilar principles,butSeamlesshasbeenthemostsuccessfulofthecommercialco-verificationtools.Today,Seamlessistheonlyproductlistedinmarketsharestudiesfor hardware/softwareco-verificationbyanalystssuchasDataquest. ThefirstpublishedarticleaboutSeamlesswasin1996,atthe7thIEEEInternationalWorkshoponRapidSystemPrototyping(RSP’96).Thetitleofthepaper was:“Miami:AHardwareSoftwareCo-simulationEnvironment.”Inthispaper,Russ Kleindocumentedtheuseofaninstructionsetsimulator(ISS)co-simulatingwithan event-drivenlogicsimulator.Aswewillseeinthischapterandthenext,thepaper alsodetailedaninterestingtechniqueofdynamicallypartitioningthememorydata betweentheISSandlogicsimulatortoimproveperformance. IwasfortunatetomeetRussafewyearslaterintheMinneapolisairportandhear thestoryofhowSeamless(ormaybeit’sMiami)wasoriginallyprototyped.When hefirstgottheideaforaproductthatcombinedtheISS(afamiliartoolforsoftware engineers)withthelogicsimulator(afamiliartoolforhardwareengineers)andused optimizationtechniquestoincreaseperformancefromtheviewofthesoftware,the 121
Chapter4 valueofsuchanideawasn’timmediatelyobvious.Toinvestigatetheideainmore detailhedecidedtocreateaprototypetoseehowitworked.Testingtheprototype requiredaninstructionsetsimulatorforamicroprocessor,alogicsimulationofa hardwaredesign,andsoftwaretorunonthesystem.HedecidedtocreatetheprototypebasedonhisoldCP/Mpersonalcomputerheusedbackincollege.CP/Mwas theoperatingsystemthatlaterevolvedintoDOSbackaround1980.Themachine usedtheZ80microprocessorandsoftwarelocatedinROMtostartexecutionand wouldlatermovetoafloppydisktoboottheoperatingsystem(muchliketoday’s PCBIOS).Ofcourse,noneofthesourcecodeforthesoftwarewasavailable,but RusswasabletoextractthedatafromtheROMandthefirstcoupleoftracksofthe bootfloppyusingprogramshewrote.Fromtherehewasabletogetitintoaformat thatcouldbeloadedintothelogicsimulator.Workingonthishome-brewsimulation,heperformedvariousexperimentstosimulatetheoperationofthePC,andin theendconcludedthatthiswasavalidco-simulationtechniquefortestingembeddedsoftwarerunningonsimulatedhardware.Eventuallythesimulationwasableto bootCP/MandusedamodelofthekeyboardandscreentorunaMicrosoftBasic interpreterthatcouldloadBasicprogramsandexecutethem.Incertainmodesof operation,thesimulationranfasterthantheactualcomputer! RussturnedhisworkintoaninternalMentorprojectthatwouldeventuallybecomeacommercialEDAproduct.Inparallel,Eagleproducedaprototypeofa similartool.WhileSeamlessstartedwiththepremiseofusingtheISStosimulate themicroprocessorinternals,Eaglestartedusingnative-compiledCprogramswith specialfunctioncallsinsertedformemoryaccessesintothehardwaresimulation environment.Atthetime,thisstrategywasthoughttobegoodenoughforsoftware developmentandeasiertoproliferatesinceitdidnotrequireafullinstructionset simulatorforeachCPU,onlyabusfunctionalmodel.ThefoundersofEagle,Gordon HoffmanandGeoffBunza,wereinterestedinlookingforlargerEDAcompaniesto marketandsellEaglei(andpossiblybuytheirstartupcompany).Aftertheypitched theproducttoMentorGraphics,Mentorwasfacedwithabuildversusbuydecision. ShouldtheycontinuewiththeinternaldevelopmentofSeamlessorshouldtheystop developmentandpartneroracquiretheEagleproduct?AccordingtoRuss,thedecisionwasnotaneasyoneandwentallthewaytoMentorCEOWallyRhinesbefore Mentorfinallydecidedtokeeptheinternalprojectalive.Theotherdifficultdecision
122
Hardware/SoftwareCo-Verification wastodecidetowhethertocontinuetheuseofinstructionsetsimulationorfollow Eagleintohost-codeexecutionwhenEaglealreadyhadaleadinproductdevelopment.Intheend,MentordecidedtoallowEagletointroducethefirstproductinto themarketandconfirmedtheircommitmenttoinstructionsetsimulationwiththe purchaseofMicrotecResearchInc.,anembeddedsoftwarecompanyknownforits VRTXRTOS,in1996.ThedecisionmeantSeamlesswasintroducedsixmonthsafterEagle,butMentorbetthattheuseoftheISSwouldbeadifferentiatorthatwould enablethemtowininthemarketplace. Anothercommercialco-verificationtoolthattookadifferentroadtomarketwas V-CPU.V-CPUwasdevelopedinsideCiscoSystemsaboutthesametimeas Seamless.ItwasengineeredbyBennySchnaider,whowasworkingforCiscoasa consultantindesignverification,forthepurposeofearlyintegrationofsoftwarerunningwithasimulationofaCiscorouter.DetailsofV-CPUwerefirstpublishedatthe 1996DesignAutomationConferenceinapapertitled“SoftwareDevelopmentina HardwareSimulationEnvironment.” AsV-CPUwasbeingadoptedbymoreandmoreengineersatCisco,thecompany wasstartingtoworryabouthavingaconsultantasthesinglepointoffailureona pieceofsoftwarethatwasbecomingcriticaltothedesignverificationenvironment. Ciscodecidedtosearchthemarketplacehopesoffindingacommercialproductthat coulddothejobandbesupportedbyanEDAvendor.Atthetimethereweretwo possibilities,MentorSeamlessandEaglei.Aftersomeevaluation,Ciscodecidedthat neitherwasreallysuitablesinceSeamlessreliedontheuseofinstructionsetsimulatorsandEagleirequiredsoftwareengineerstoputspecialCcallsintothecodewhen theywantedtoaccessthehardwaresimulation.Incontrast,V-CPUusedatechnique thatautomaticallycapturedthesoftwareaccessestothehardwaredesignandrequiredlittleornochangetothesoftware.Intheend,Ciscodecidedtopartnerwith asmallEDAcompanyinSt.Paul,MN,namedSimulationTechnologies(Simtech) andgavethemtherightstothesoftwareinexchangefordiscountsandcommercial support.DaveVonBankandIwerethetwoengineersthatworkedforSimtechand workedwithCiscotoreceivetheinternaltoolandmakeitintoacommercialcoverificationtoolthatwaslaunchedin1997attheInternationalVerilogConference (IVC)inSantaClara.V-CPUisstillinusetodayatCisco.OvertheyearsthesoftwarehaschangedhandsmanytimesandisnowownedbySummitDesign. 123
Chapter4
Co-VerificationDefined Definition AtthemostbasiclevelHW/SWco-verificationmeansverifyingembeddedsystem softwareexecutescorrectlyonembeddedsystemhardware.Itmeansrunningthe softwareonthehardwaretomakesuretherearenohardwarebugsbeforethedesign iscommittedtofabrication.Aswewillseeinthischapter,thegoalcanbeachieved usingmanydifferentwaysthataredifferentiatedprimarilybytherepresentationof thehardware,theexecutionengineused,andhowthemicroprocessorismodeled. Butmorethanthis,atrueco-verificationtoolalsoprovidescontrolandvisibilityfor bothsoftwareandhardwareengineersandusesthetypesoftoolstheyarefamiliar with,atthelevelofabstractiontheyarefamiliarwith.Aworkingdefinitionisgiven inFigure4-1.Thismeansthatforatechniquetobeconsideredaco-verification productitmustprovideatleastsoftwaredebuggingusingasourcecodedebuggerand hardwaredebuggingusingwaveformsasshowninFigure4-2.Thischapterdescribes manydifferentmethodsthatmeetthesecriteria. HW/SW Co- Verification is the process of verfying embedded system software runs correctly on the hardware design before the design is committed for fabrication.
Figure4-1:Definitionofco-verification Co-verificationisoftencalledvirtualprototypingsincethesimulationofthehardware designbehavesliketherealhardware,butisoftenexecutedasasoftwareprogram onaworkstation.Usingthedefinitiongivenabove,runningsoftwareonanyrepresentationofthehardwarethatisnotthefinalboard,chip,orsystemqualifiesas co-verification.Thisbroaddefinitionincludesphysicalprototypingasco-verificationaslongastheprototypeisnotthefinalfabricationofthesystemandisavailable earlierinthedesignprocess.
124
Hardware/SoftwareCo-Verification
Software Source Code Debugger
CPU Model
Hardware Debugging Tools
Hardware Execution Engine
Figure4-2:Co-verificationisaboutdebugginghardwareandsoftware Anarrowerdefinitionofco-verificationlimitsthehardwareexecutiontothecontextofthelogicsimulator,butaswewillsee,therearemanytechniquesthatdonot involvelogicsimulationandshouldbeconsideredco-verification.
BenefitsofCo-Verification Co-verificationprovidestwoprimarybenefits.Itallowssoftwarethatisdependenton hardwaretobetestedanddebuggedbeforeaprototypeisavailable.Italsoprovides anadditionalteststimulusforthehardwaredesign.Thisadditionalstimulusisuseful toaugmenttestbenchesdevelopedbyhardwareengineerssinceitisthetruestimulus thatwilloccurinthefinalproduct.Inmostcases,bothhardwareandsoftwareteams benefitfromco-verification.Theseco-verificationbenefitsaddressthehardwareand softwareintegrationproblemandtranslateintoashorterprojectschedule,alower costproject,andahigherqualityproduct. Theprimarybenefitsofco-verificationare: ■
Earlyaccesstothehardwaredesignforsoftwareengineers
■
Additionalstimulusforthehardwareengineers
ProjectScheduleSavings Forprojectmanagers,theprimarybenefitofco-verificationisashorterprojectschedule.Traditionally,softwareengineerssufferbecausetheyhavenowaytoexecutethe
125
Chapter4 softwaretheyaredevelopingifitinteractscloselywiththehardwaredesign.They developthesoftware,butcannotrunitsotheyjustsitandwaitforthehardware tobecomeavailable.Afteralongdelay,thehardwareisfinallyready,andmanagementisexcitedbecausetheprojectwillsoonbeworking,onlytofindoutthereare manybugsinthesoftwaresinceitisbrandnewandthisisthefirsttimeishasbeen executed.Co-verificationaddressestheproblemofsoftwarewaitingforhardwareby allowingsoftwareengineerstostarttestingcodemuchsooner.Bygettingallthetrivialbugsout,theprojectscheduleimprovesbecausetheamountoftimespentinthe labdebuggingsoftwareismuchless.Figure4-3showstheprojectschedulewithout co-verificationandFigure4-4showsthenewschedulewithco-verificationandearly accesstothehardwaredesign.
Requirements Architecture HW Design HW Build SW Design HW/SW Integration
SW waiting for HW
Project Time
Figure4-3:Projectschedulewithoutco-verification
Requirements Architecture HW Design HW Build SW Design HW/SW Integration
Time Savings
Project Time
Figure4-4:Projectschedulewithco-verification
126
Hardware/SoftwareCo-Verification
Co-VerificationEnablesLearningbyProvidingVisibility Anothergreatlyoverlookedbenefitofco-verificationisvisibility.Thereisno substituteforbeingabletorunsoftwareinasimulatedworldandseeexactlythecorrelationbetweenhardwareandsoftware.Weseewhatisreallyhappeninginsidethe microprocessorinanonintrusivewayandseewhatthehardwaredesignisdoing.Not onlyisthisusefulfordebugging,butitcanbeevenmoreusefulinprovidingaway tounderstandhowthemicroprocessorandthehardwarework.Wewillseeinfuture examplesthatco-verificationisanidealwaytoreallylearnhowanembeddedsystem works.Co-verificationprovidesinformationthatcanbeusedtoidentifysuchthings asbottlenecksinperformanceusinginformationaboutbusactivityorcachehitrates. Itisalsoagreatwaytoconfirmthehardwareisprogrammedcorrectlyandoperations areworkingasexpected.Whensoftwareengineersgetintoalabsettingandrun code,thereisreallynowayforthemtoseehowthehardwareisacting.Theyusually relyonsomeprintstatementstofollowexecutionandassumeifthesystemdoesnot crashitmustbeworking.
Co-VerificationImprovesCommunication Forsomeprojects,therealbenefitofco-verificationhasnothingtodowithearlyaccesstohardware,improvedhardwarestimulus,orevenashorterschedule.Sometimes therealbenefitofco-verificationisimprovedcommunicationbetweenhardwareand softwareteams.Manycompaniesseparatehardwareandsoftwareteamstotheextentthateachdoesnotreallycareaboutwhattheotheroneisdoing,akindof“not myproblem”attitude.Thisresultsinnegativeattitudesandfingerpointing.Itmay soundabitfarfetched,butsometimestheintroductionofco-verificationenables theseteamstoworktogetherinapositivewayandmakeapositiveimprovementin companyculture.Figure4-5showswhatBrianBailey,oneoftheearlyengineerson Seamless,hadtosayaboutcommunication:
127
Chapter4 "Software engineering for electronic systems is a very different culture, they have very different ways of doing things. We're just beginning to find ways that the two groups can communicate. It's getting to be a cliché now. When we first started going out and telling people about Seamless, we would insist on companies that we talked to having both hardware and software engineers there for our meeting. In many of those meetings, the hardware and software guys (from within the same potential customer) literally met for the first time and exchanged business cards." "There is still a big divide. We find there is no common boss until perhaps the vice-president level. And we are not seeing that change quickly." Brian Bailey, chief technologist, Mentor Graphics, December 2000
Figure4-5
Co-VerificationversusCo-Simulation Asimilartermtoco-verificationisco-simulation.Infact,thefirstpaperpublished aboutSeamlessusedthisterminthetitle.Co-simulationisdefinedastwoormore heterogeneoussimulatorsworkingtogethertoproduceacompletesimulationresult. ThiscouldbeanISSworkingwithalogicsimulator,aVerilogsimulatorworking withaVHDLsimulator,oradigitallogicsimulatorworkingwithananalogsimulator.Someco-verificationtechniquesinvolveco-simulationandsomedonot.
Co-VerificationversusCo-Design Oftenco-verificationislumpedtogetherwithco-design,buttheyarereallytwodifferentthings.InChapter1,verificationwasdefinedastheprocessofdetermining somethingworksasintended.Designistheprocessofdecidinghowtoimplement arequiredfunctionofasystem.Inthecontextofembeddedsystems,designmight involvedecidingifafunctionshouldbeimplementedinhardwareorsoftware.For software,designmayinvolvedecidingonasetofsoftwarelayerstoformthesoftwarearchitecture.Forhardware,designmayinvolvedecidinghowtoimplementa DMAcontrolleronthebusandwhatprogrammableregistersareneededtoconfigureaDMAchannelfromsoftware.Designisdecidingwhattocreateandhowto implementit.Verificationisdecidingifthethingthatwasimplementedisworking correctly.Someco-verificationtoolsprovideprofilingandotherfeedbacktotheuser
128
Hardware/SoftwareCo-Verification abouthardwareandsoftwareexecution,butthisalonedoesnotmakethemco-design toolssincetheycandothisonlyafterhardwareandsoftwarehavebeenpartitioned.
IsCo-VerificationReallyNecessary? Afterlearningthedefinitionofco-verificationanditsbenefits,thenextlogicalquestionasksifco-verificationisreallynecessary.Theoretically,ifthehardwaredesign hasnobugsandisperfectaccordingtotherequirementsandspecificationsthenit reallydoesnotmatterwhatthesoftwaredoes.Forthissituation,fromthehardware engineer’spointofview,thereisnoreasontoexecutethesoftwarebeforefabricating thedesign. Similarly,softwareengineersmaythinkthatearlyaccesstohardwareisapain,not abenefit,sinceitwillrequireextraworktoexecutesoftwarewithco-verification. Forsomesoftwareengineers,nohardwareequalsnoworktodo.Also,attheseearly stagesthehardwaremaybestillevolvingandhavebugs.Thereisnothingworsefor softwareengineersthantotrytorunsoftwareonbuggyhardwaresinceitmakesisolatingproblemsmoredifficult. Thepointisthatwhileindividualengineersmaythinkco-verificationisnotfor them,almosteveryprojectwithcustomhardwareandsoftwarewillbenefitfromcoverificationinsomeway.MostembeddedprojectsdonotgetthepublicityofanIntel microprocessor,butmostofusrememberthefamous(orinfamous)PentiumFDIV bugwheretheCPUdidnotdividecorrectly.Hardwarealwayshasbugs,software alwayshasbugs,andgettingridofthemisgood.
Co-VerificationMethods Mostco-verificationmethodscanbeclassifiedbasedontheexecutionengineusedto runthehardwaredesign.Asecondaryclassificationexistsbasedonthemethodused tomodeltheembeddedsystemmicroprocessor.Beforediscussingspecificco-verificationmethods,aquickreviewofsomeofthekeyingredientsinco-verificationis useful.
129
Chapter4
NativeCompilingSoftware Manysoftwareengineersprefertoworkasmuchaspossibleinthehostenvironment (onaPCorworkstation)beforemovingtotheembeddedsysteminalabsetting. Therearetwowaystodosoftwaredevelopmentandsoftwaresimulationinthehost environment.Thefirstistouseworkstationtoolstocompiletheembeddedsystem softwareforthehostprocessor(insteadoftheembeddedprocessor)andexecuteiton theworkstation.IftheembeddedsystemsoftwareiswritteninCorC++,hostcompiledsimulationworksverywellforfunctionaltesting.Theembeddedsystemsoftware nowbecomesaprogramthatrunsonaPCorworkstationandusesallofthecompilers, debuggers,profilers,andotheranalysistoolsavailableforwritingworkstationsoftware. Workstationtoolsaremoreplentifulandhigherqualitysincemoreprogrammersare makinguseofthem(remember,theembeddedsystemspaceisextremelyfragmented). Errorslikememoryleaksandbadpointersareajoytofixontheworkstationwhen comparedtothetoolsavailableonthetargetsysteminthelab.
InstructionSetSimulation Theinstructionsetsimulator(ISS)wasmentionedinChapter2asatypeofmicroprocessormodelthatisusedinco-verification.Thesecondmethodtoworkinthe hostenvironmentistocompiletheembeddedsystemsoftwareforthetargetprocessorusingacrosscompilerandsimulatethesoftwareusinganapplicationcalled aninstructionsetsimulator.TheISSisamodelofthetargetmicroprocessoratthe instructionlevel.Ithastheabilitytoloadprogramscompiledforthetargetinstructionset,itcontainsamodeloftheregisters,anditcandecodeandmodelallofthe processor’sinstructionset.Typically,thistypeoftoolisaccurateattheinstruction level.Itrunsthegivenprograminasequentialmanneranddoesnotmodelthe instructionpipeline,superscalarexecution,oranytimingofthemicroprocessorat thehardwarelevelintermsofaclockordigitallogic.Forthisreasonagood,fast, functionalsimulationisprovided,butdetailedtimingandperformanceestimationis notavailable.Mostinstructionsetsimulatorscomewithaninterfacetooneormore softwaredebuggers.Thesameembeddedsoftwaretoolcompaniesthatprovidedebuggersandcross-compilersmayalsoprovidetheinstructionsetsimulators.TheISSis alsousefulfortestingcompilersanddebuggerswithoutrequiringarealprocessorona workingboard.Whenanewprocessorisdeveloped,compilersmustbedevelopedin
130
Hardware/SoftwareCo-Verification parallelwithsilicon,andtheISSenablesacompilertobereadywhenthesiliconis readysosoftwarecanberunimmediatelyuponsiliconavailability.
HardwareStubs ThemajordrawbackofworkingonthehostwithnativecompiledcodeortheISSis thelackofamodeloftherestoftheembeddedsystemhardware.Muchoftheembeddedsystemsoftwareisdependentonthehardware.Softwaresuchasdiagnostics anddevicedriverscannotbetestedwithoutamodelofhowthehardwarewillreact. Thishardwaredependentsoftwareisusuallythemostimportantsoftwareduring thecrucialhardwareandsoftwareintegrationphaseoftheproject.Tocombatthis limitation,softwareengineersstartedusingCcodetoimplementsimplebehavioral models,orstubs,ofhowthetargethardwareisexpectedtobehave.Thesestubscan providetheexpectedresultsforsystemperipheralsandothersysteminterfaces.Some instructionsetsimulatorsalsostartedtoincorporatehardwarestubsthatcouldbeincludedinthesimulationbyprovidingaCinterfacetothememorymodeloftheISS. Peripheralssuchastimers,UARTs,andevenEthernetcontrollerscanbeincludedin thesimulation.ThenumberofhardwaremodelsneededtomaketheISSusefulwill determinewhetheritisworthinvestingincreatingCmodelsofthehardware.For alargesystem,itcanbemoreworktocreatethestubsthancreatingtheembedded systemsoftwareitself.Figure4-6showsadiagramofanISSwithamemorymodelinterfacethatallowstheusertoaddCcodetotakecareofthememoryaccesses.Figure 4-7showsafragmentofasimplestubmodelthatreturnstheIDregisterofaCPUso theexecutingsoftwaredoesnotgetanerrorwhenitreadsanexpectedIDcode. Software Debugger
MemoryRead();
Memory Model
Instruction Set Simulator
MemoryWrite();
Figure4-6:ISSwithmemorymodelinterface 131
Chapter4 static void Access(int nRW, unsigned long addr, unsigned long *data) { if (!nRW) /* read */ { if (addr == ID_REGISTER) { *data = 0x7926F; /* return ID value */ } } }
Figure4-7:Codeforasimplestub
Real-TimeOperatingSystem(RTOS)Simulator Forprojectsthatuserealtimeoperatingsystems(RTOS),itispossibletouseahostcompiledversionoftheRTOS.Somecommercialoperatingsystemvendorsprovide thehost-compiledversionthatcanberunonaworkstation.Forcustomorproprietaryoperatingsystems,theRTOScodecanusuallybe“ported”tothehost.The RTOSsimulatorisfastandmostusefulforhigherlevelsofsoftware.Itcanbeusedto testthecallstoRTOSlibrariesfortasking,mailboxes,semaphores,andsoforth.The RTOSsimulatorismoreabstractthentheISS,andusuallyrunsatahigherspeed. Sincethesoftwareiscompiledforthehostmachine,itdoesnotallowtheuseofany assemblylanguage.Again,itsuffersfromthesamelimitationoftheISSsincethe customhardwareisnotavailable. AnexampleofanRTOSsimulatorisVxSim,asimulationofthepopularRTOSVxWorksfromWindRiver.VxSimallowsdevicedriversandapplicationstobetestedin thehostenvironmentbeforemovingtotheembeddedsystem.Driversusuallyrequire hardwarestubstoprovidesimulatedresponses.
MicroprocessorEvaluationBoard Amongsoftwareengineers,themostpopulartoolusedforlearningaprocessorand testingcodebeforethetargetsystemisreadyisthemicroprocessorevaluationboard. Thisisaboardwiththetargetmicroprocessorandsomememory,thattypicallyuses anetworkconnectionoraserialporttocommunicatewiththehost.Itallowsinitial
132
Hardware/SoftwareCo-Verification codetobedeveloped,downloaded,andtested.Targettoolsareusedtodebugand verifythecode.Manysoftwareengineersprefertousetheevaluationboardsincethe toolsarethesameasthosethatwillbeusedwhenthesystemisreadyanditismost likeworkingwiththetrueproductbeingdeveloped.Everymicroprocessorvendor hasanevaluationboardforsalesoonaftertheprocessorisavailable,usuallyatavery reasonableprice.Vendorsalsoprovidesamplecodeandevenhardwareschematics fortheboard.Someembeddedsystemdesignsevengosofarastocopytheevaluationboardandjustaddasmallamountofcustomhardwareorevenbuyandusethe evaluationboardinaproductwithoutmodification.Thisisverytemptingtogeta hardwaredesignquickly,buttheboardsarenotusuallydesignedforhigherproductionvolumeproducts.Checkthecostandthereliabilityofthedesignbeforedirectly usinganevaluationboardaspartofaproduct. Iftheembeddedsystemcontainsafairamountofcustomhardware,theevaluation boardislessuseful.Dependingontheamountandnatureofthecustomhardware, itmaybepossibletomodifytheevaluationboardbyincludingextraprogrammable logicorothersemiconductordevicestomakeitlookandactmorelikethetarget systemdesign.
Waveforms,LogFiles,andDisassembly ForSoCdesigns,manysoftwareengineersareforcedtodoearlysoftwareverification withfull-functionallogicsimulationmodelsandwaveformsinahardwaredesign environment.Engineerswhoareskilledinbothsoftwaredevelopmentandhardware designmaybeabletodebugthisway,butitisnotthemostcomfortabledebugging environmentformostsoftwareengineers.AsourceleveldebuggerwithCcodeis preferredtobuswaveformsandlargelogfilesfromaVerilogorVHDLsimulator. Ionceintroducedco-verificationtoaprojectteamworkingonacomplexvideochip withfourARMCPUcores.Afterpreachingthebenefitsofco-verificationandthe abilitytodebugsoftwareusingasourceleveldebuggerthesoftwareengineersshook theirheadsandseemedtounderstand.Theircurrentsetupinvolvedtheuseofthe RTLcodefortheARMcoresrunninginalogicsimulator.Aspartofthisenvironment,theyincludedamodelthatmonitoredtheexecutionoftheARMcoresand outputalogfilewiththedisassemblyoftheexecutingsoftwareasawaytotracksoft-
133
Chapter4 wareprogress.Sincethetestsranveryslow,theywouldwaitpatientlyforsimulation tocompleteandthengetthislogfileandtrytocorrelateitwiththesourcecodeto seewhathappened.Whentheywenttostartco-verificationtheyimmediatelyasked iftheco-verificationtoolcouldoutputthesamekindoflogfilesotheycouldtrack executionafterthetestfinished.Ofcourse,itcould,butthistypeofdebuggingdoes notreallyimprovetheirsituation.Aftersomecoaxing,theyagreedtotryinteractive softwaredebuggingwithasource-leveldebuggerandwerepleasedtodiscoverthis typeofdebuggingwaspossible.
ASampleofCo-VerificationMethods Thissectionintroducessomeofthecommonlyusedco-verificationmethodsand architecturesusedtoverifyembeddedsoftwarerunningonthehardwaredesign.All ofthesehavesomeprosandcons.Thatiswhythereissomanyofthemanditcanbe difficulttosortoutthechoices.
Host-CodeModewithLogicSimulation Host-codemodeisatechniquetocompiletheembeddedsystemsoftware,notfor theembeddedprocessorinthehardwaredesign,butinsteadforthehostworkstation. Thisisalsoreferredtoasnativecompile.Toperformco-verificationtheresulting executableisrunonthehostmachine,anditconnectstoalogicsimulatorthat executesthehardwaredesign.Sometypeofinter-processcommunication(IPC)is requiredtoexchangeinformationbetweenthehost-compiledembeddedsoftware andthelogicsimulator.TheIPCimplementationcouldbeasocketthatallowseach ofthetwoprocessestobeondifferentmachinesonthenetworkorsharedmemory thatrunsbothprocessesonthesamemachine. Host-codemodeisnotlimitedtousingalogicsimulatorasthehardwareexecution engine.Anyhardwareexecutionenginecanbeused.Someothersthathavebeen usedwithhost-codemodeareanaccelerator/emulatorandaprototypingplatform. Withhost-codemode,abusfunctionalmodelisusedinthehardwareexecution enginetocreatebustransactionsforthebusinterfaceofthemicroprocessor.The combinationofthehost-compiledprogramplusthebusfunctionalmodelservesasa microprocessormodel.
134
Hardware/SoftwareCo-Verification Host-codemodeprovidesanattractiveenvironmentforbothsoftwareandhardware engineers.Softwareengineerscancontinuetousethesoftwaretoolstheyarealready using,includingsourcecodedebuggersandotherdevelopmentanddebugtoolson thehost.Hardwareengineerscanalsousethetoolstheyarealreadyusingaspartof thedesignprocess;aVerilogorVHDLlogicsimulatorandassociateddebugtools. Thisrequiresaminimalmethodologychangeforbothgroupsofengineersandcan benefitbothsoftwareandhardwareverification.Theabilitytodopre-siliconco-verificationisagreatbenefitwhentheprocessordoesnotyetexist.Figure4-8showsthe basicarchitecture. Host-codemodecanalsobeusedwhenthesoftwaredoesnotaccessthehardwaredesignviaamicroprocessorbus,butinsteadviaagenericbusinterfacelikePCI.Many chipsdonothaveanembeddedmicroprocessor,butaredesignedwiththePCIbus asaprimaryinterfaceintotheprogrammableregisters.Inthiscasethesoftwarecan berunonthehostandreadandwriteoperationsfromthesoftwarecanbetranslated intoPCIbustransactionsinthehardwareexecutionengine.Thisisagoodexample ofwhenitisusefultoabstractthesoftwareexecutiontothehostandlinkittohardwareexecutionatthePCIinterface.
Software Debugger Inter-Process Communication
Native Compiled Software
C API
BFM Read, Write, and Interrupt Messages
Process 1
Logic Simulation with Hardware Design
Process 2
Figure4-8:Host-codeexecutionwithlogicsimulation
135
Chapter4 Host-codemoderequirestheembeddedsoftwaretobemodifiedtoperformfunction callswhenitaccessesthehardwaredesignthroughthebusfunctionalmodel.This processofputtinginspecificfunctioncallscaneitherbeapainifalotofembedded softwarealreadyexistsorbelittleornoproblemifthecodeisbeingwrittenfrom scratchandallmemoryaccessesarecodedtogothroughacommonfunctioncall. ExamplesofClibrarycallsthatareusedforhost-codeexecutionareshownin Figure4-9. ret_val = CoverRead(address, &data, size, options); ret_val = CoverWrite(address, data, size, options);
Figure4-9:Host-codemodeexamplefunctioncalls InsertingtheseCcallsintothesoftwareiscalledexplicitaccessbecausetheuser mustexplicitlyputinthereferencestothehardwaredesign.Theotherwaytouse host-codemodeistouseimplicitaccess.Implicitaccessdoesnotrequiretheuserto putinspecialcalls,butautomaticallyfiguresoutwhenthesoftwareisaccessingthe hardwarebasedontheloadandstoreinstructionsbeingrun.Thistechniquewillbe coveredinmoredetailinthenextchapter,butwithimplicitaccess,theusercanuse ordinaryCcodetoaccesshardwareviapointersasshowninFigure4-10. unsigned long *ptr; unsigned long data; ptr = 0xff0000000 /* address of ASIC control registers */ data = *ptr; /* read the control register */ data |= 1; /* set bit 0 to 1 */ *ptr = data; /* write new value back to control register */
Figure4-10:Exampleofimplicitaccess
136
Hardware/SoftwareCo-Verification Host-codemodecanalsobeusedtointegrateanRTOSsimulatorsuchasVxSimas discussedabove.Adiagramofhost-codeexecutioninthecontextofanRTOSsimulatorisshowninFigure4-11. Software Debugger
Inter-Process Communication
Drivers and Applications
RTOS RTOS mem API model
C API
BFM Read, Write, and Interrupt Messages
Process 1
Logic Simulation with Hardware Design
Process 2
Figure4-11:RTOSSimulationandhost-codeexecution
InstructionSetSimulationwithLogicSimulation Anotherwaytoperformco-verificationistocompiletheembeddedsystemsoftware forthetargetprocessorandrunitonaninstructionsetsimulator.AnISSallowsnot onlyCcodebutalsoassemblylanguageofthetargetprocessortoberun.Thisallows morerealisticsimulationofthingsnormallycodedinassemblylanguagesuchasinitializationsequences,cacheandMMUconfigurationandsimulation,andexception handlers.Thismodeofoperationisreferredtoastarget-codemode. Aswithhost-codemode,sometypeofinter-processcommunication(IPC)isrequiredtoexchangeinformationbetweentheinstructionsetsimulatorandthelogic simulator. Target-codemodeisnotlimitedtousingalogicsimulatorasthehardwareexecution engine.Anyhardwareexecutionenginecanbeused,butsincetheinstructionset simulatorwilllikelyrunslowerthanahostcodeprogramitisimportanttomakesure thespeedoftheinstructionsetsimulatorisnottooslowtoseebenefitsfromahardwareexecutionenginesuchasanaccelerator.
137
Chapter4 ThebusfunctionalmodelsusedwithanISSarethesameorsimilartothoseusedin hostcodemode.ThemaindifferenceisthatwithanISSitmaybepossibletounderstandthecontextofthebustransactionsbetter.Inhostcodemode,onlyasinglebus transactionisconsideredatatime.Onabusthatsupportsaddresspipelining,such asAHB,thereisnowaytodeterminethenextbuscyclethatwillbedonebythe hostcodeprogram,soonlyasingletransactionwouldbesimulatedandthereisno pipelining.TheISScanutilizeknowledgeofwhatwillbethenextbustransactionto occurandcansupplythebusfunctionalmodelwiththenextaddresssothatitcan modeltheaddresspipeliningcorrectly.ThisisamajorbenefitofusingagoodISSfor co-verification.Target-codemodealsoenablesinstructionfetchestobeverified. Likehost-codemode,softwareengineerscandebugcodeinafamiliarenvironment. Intarget-codemode,thedebuggerisnotahostdebugger,butratheradebuggerthat canworkwiththeISSanddebugprogramscross-compiledfortheembeddedprocessor.Figure4-12showsthearchitecture.
Software Debugger Inter-Process Communication Instruction Set Simulator
C API
BFM Read, Write, and Interrupt Messages
Process 1
Logic Simulation with Hardware Design
Process 2
Figure4-12:Instructionsetsimulatorconnectedtologicsimulation
138
Hardware/SoftwareCo-Verification TointegrateanISSwithabusfunctionalmodel,thememoryinterfacetotheISS mustbemodifiedtorunlogicsimulationtosatisfythememoryaccesses.Instruction setsimulatorsasusedbysoftwareengineersnormallyhaveaflatmemorymodelthat isasimpleCmodelallowingtheprogramtobeloadedandrun.Someinstruction setsimulatorshavetheabilitytocustomizethismemorymodelsotheuserscanadd theirCmodels(stubs)toprovidesomerudimentarymodelofthehardware.Without atleasttheabilitytoputinstubmodels,mostembeddedsystemcodewillnotrun onaflatmemorymodelsinceitdealswithmemory-mappedhardwareregistersthat shouldhavenonzerovaluesafterreset.Doingco-verificationwithanISSisreallyjust asimpleextensionoftheuseofstubstoinsteadturnmemorytransactionsintocalls tothelogicsimulatorforexecutiononthebusfunctionalmodel.Theotherthing thatmustbereportedtotheISSisinterrupts.Whenaninterruptoccursonthebus, theISSmustknowaboutitsoitcanmodeltheexceptionprocessingandstartthe serviceroutine.Mostcommercialco-verificationtoolsprovidemanymorefeatures thatjustgluingthememorymodeloftheISStoabusfunctionalmodelandreportinginterrupts,butthisdescriptioniseasytounderstandandhasbeenusedbymany userstoconstructtheirownco-verificationenvironmentusinganISS. Someinstructionsetsimulatorskeepstatisticsandaccountforthesimulationcycles thathavebeenusedtosatisfymemoryrequests.Thisallowsusefulfeaturessuchas performanceestimationandprofilingtobeusedtofindoutdetailsofsoftwareexecution.InthesimpleISSintegrationdescriptionabove,thereadandwriteactivity wouldhavetoreportanumberofbusclocksthatwereconsumedtosatisfythe transaction.TheISSmaybeabletousethisclockcyclecountandupdateitsinternal notionoftime.Unfortunately,thisisnotalwayseasytodosincethetimedomainof theISSisnowout-of-stepwiththatofthelogicsimulator.Synchronizationbetween thesoftwareexecutionenvironmentandthehardwareexecutionenvironmentare discussedinthenextchapter,butthesetypesofissueshaveledtotheshiftfroma transaction-basedinterfacetoonethatiscyclebased.
139
Chapter4 Onewaytothinkofacycle-basedISSistosaythatitexchangespinvaluesbetween theISSandlogicsimulatoroneverybusclockcycle.Thisisequivalenttomoving thebusfunctionalmodelstatemachineintotheISSandjustapplyingthesignal valuesinlogicsimulation.Anotherwaytoviewitisasatransaction-basedinterface wherethelogicsimulatorhastheabilitytoreportwaitstatestotheISSandtheISS willreturnwiththesamememorytransactionuntilitcompletes.Thisapproachis bettersuitedforcaseswherebetteraccuracyisneeded.Itisalsobettersuitedfor multiprocessordesignssinceitcankeepallprocessorssynchronizedwiththelogic simulatoronacycle-by-cyclebasis.Figure4-13showsthearchitectureofacyclebasedinstructionsetsimulator. Software Debugger
Inter-Process Communication
Instruction Set Simulator
C BFM
Bus Signal Values
Process 1
HDL Bus Shell
Logic Simulation with Hardware Design
Process 2
Figure4-13:Cycle-basedinstructionsetsimulatorconnectedtologicsimulation
CSimulation Thelogicsimulationandaccelerationtechniquesdiscussedsofarevolvedfromthe hardwaresimulationdomain.Onecomplaintaboutco-verificationdevelopedby extendingthehardwaresimulationplatformtoincludesoftwareengineersincludes limitedavailabilityoftheplatform.Forexample,toperformco-verificationusing logicsimulationrequiresalogicsimulationlicenseforeachsoftwareengineerthatis runninganddebuggingsoftware.Mostcompaniespurchaselogicsimulationlicenses basedonthedemandforhardwareverificationanddon’thaveextrasavailablefor thepurposesofco-verification.Similarly,higherperformancehardwareexecution
140
Hardware/SoftwareCo-Verification enginessuchassimulationaccelerationandemulationareevenmoredifficultto acquireforsoftwaredevelopment.Mostcompanieshaveonlyoneortwosuchmachinesthatmustbesharedbyverificationengineersandsoftwareengineers.This limitedscalabilityoftenleavesengineerswonderingifthereisawaytodoco-verificationthatdoesn’trequiretraditionallogicsimulation. ThenaturalconclusionistothinkaboutusingaCorC++simulationenvironment toeliminatetheneedforlogicsimulation.Atthesametime,thereisaperception thatCsimulationisfasterthanVerilogandVHDLsimulation.SystemCisonesuch environmentthatisgainingmomentumasamodelinglanguagethatcanprovide C++simulationofthedesignwithoutrequiringlogicsimulation,andatthesame timecanalsoco-simulatewithanHDLsimulatorwhenneeded.SystemCbyitselfis notaco-verificationmethod,butratheranalternativehardwareexecutionenvironmentorevenanalternativemodelinglanguagetobeusedinsteadofVerilogand VHDL.Model-basedmethodsrequirealibraryofmodelstobecreated,andmissing modelsareacommonsourceofdifficulty. ThequestionwithanyCsimulationenvironment,SystemCorhomegrown,has alwaysbeenthedevelopmentofthedesignmodel.Liketheprimitivehardwarestub methodsusedbysoftwareengineers,somebodymustcreatethesimulationmodel ofthehardwaredesign.Sincethismodelcreationisnotyetamainstreampathto designimplementation,anyworktocreateanalternativemodelthatisnotinthe criticalpathofdesignimplementationisusuallyalowerprioritythatmaynever becomereality.ContrastthistologicsimulationwhereRTLcodeforthedesign mustbedevelopedforimplementationsousingthisRTLcodeinalogicsimulatoris alwaysamodelthatisreadilyavailable. ToolsarenowavailabletotaketheVerilogandVHDLcodeforthedesignandturn itintoaCmodelbytranslatingitintoCorSystemCorevendirectlytoanexecutableprogramthatisnotatraditionallogicsimulator.Ofcourse,suchtoolsmustdo morethanjusteliminatetheneedforthelogicsimulatorlicense;theyalsomust offersomeperformancegaintosatisfytheperceptionthatsomehowCshouldbe fasterthanVerilogorVHDL;atoughjobconsideringtheoptimizationalreadybeing donebytoday’slogicsimulators.Bydoingnothingmorethaneliminatingthelogic simulatorlicensethepricewouldhavetodramaticallylowerthanthatofasimulator
141
Chapter4 tobecompelling,whichisverydifficultsincethesimulationmarketismatureand priceswillonlycomedownastimeprogresses.TheapproachoftheseVerilogtoC translatorsistoturntheVerilogintoacyclebasedsimulationbyeliminatingtiming.Cycle-basedsimulationhasneverbeenamainstreammethodology,soitisnot clearthatconvertingVerilogcodeintoacycle-basedexecutablewillsucceed;only timewilltell.AcommonpostonnewsgroupsrelatedtoVerilogsimulationisfrom theengineerlookingfortheVerilogtoCtranslator.Therearemanyofthem,and acoupleofthemareshowninFigure4-14.Theanswerusuallycomesbackthatthe bestVerilogtoCtranslatoristheVCSlogicsimulator.Mostengineersaskingforthe translatorarenotclearonhowitwouldbenefitthem.Infact,manyoftheproducts mentionedarenolongeravailableascommercialproducts. > Was wondering if anyone could point me in the direction of a > Verilog to C translator.....if such a thing exists. > > Hi all, > I am looking for a Verilog to C converter.
Figure4-14:Verilog-to-Ctranslatorrequests TheonlyrealwaytogainhigherperformancefromCorSystemCsimulationisto raisetheabstractionlevelofthemodel.InsteadofmodelingthedesignatRTL,more abstractmodelsmustbedevelopedthateliminatethedetailofthemodelandasa resultenableittorunfaster.Thetheoryonhigh-levelmodelingisthatanengineer canmakeanabstractmodelinabout1/10thetimeittakestodevelopanRTLmodel andthemodelshouldrun100to1000timesfasterinaCorSystemCenvironment. Engineersarelookingforaminimumof100kHzperformance,and1MHzismore desirable.SometoolstranslatingHDLintoCarestartingtoshowabout10xperformancespeedupoverlogicsimulationbyeliminatingsomeofthedetailedtimingof logicsimulationwithoutrequiringtheusertomakeanychangestotheRTLcode. RaisingthelevelofabstractionholdspromiseforrunningsoftwarebeforetheRTL forthehardwaredesignisavailable.
142
Hardware/SoftwareCo-Verification Co-verificationutilizingCsimulationenvironmentsisverymuchthesameaswith traditionallogicsimulators.Instructionsetsimulatorsandhostcodeexecutionmethodscanbeusedtoruntheembeddedsystemsoftwareandperformsoftwaredebug. Thecompellingreasontolookintoco-verificationbasedonCsimulationistheabilitytoscaleco-verificationtomanysoftwareengineers.OnceaCmodelofthedesign isinplaceandco-verificationisavailable,theneverysoftwareengineercanuseit bysimplymakingcopiesofthesoftwaremodel.Thisalsomakesitpossibletogive themodelandenvironmenttosoftwareengineersthatareoutsidethecompanyto startdevelopingsoftwareanddoingsuchtasksasportinganRTOSwithoutwaiting forhardwareandwithouttheneedtouselogicsimulation.Ihaveneverconfirmed it,butIcanguessthatsoftwarecompaniessuchasWindRiverhaveaneedtoport vxWorkstonewprocessorsandcustomhardwaredesignsbeforechipsandboardsare available.Icanalsoguesstheydon’thaveaVerilogsimulatorandeveniftheycould getasimulatortheyprobablydon’twanttolearnhowtouseit. Companiesthatstartedoutdevelopingco-verificationtoolsthatallowuserstocreate theirownCmodelsandcombinethemwithmicroprocessormodelsanddebugging toolstoformarepresentationofthedesignfaceadifficultmodelingdilemmaabout whowillcreatethemodels.Toenablewideruseofthetechnologyandgobeyond focusingonthecreationofmodelsforcustomdesigns,someproductsshiftedtoward theuseofaCmodelasareplacementforthecommontoolthatallsoftwareengineersknowandlove,theevaluationboard.Theall-softwarevirtualevaluationboard isanalternativetobuyinghardware,cables,powersupplies,andJTAGtools.When manyengineersneedaccesstotheboard,itbecomesmuchmorecosteffectiveto deployasoftwareversionofit.Inadditiontobasicmicroprocessorevaluationboards, Cmodelscanbecreatedforreferencedesignsandplatformsthatareoftenusedas startingpointsforaddingcustomhardware.Thistypeofvirtualboardenablesdebuggingthatisnotpossibleonarealpieceofhardware.Valueisderivedfrombeingable tomonitorhardwarestatesandhaveeasyaccesstoperformanceinformation.By constrainingsupporttooff-the-shelfboardsitiseasiertoservethemarket,butdoes notaddresscustomdesigns.Modelbasedmethodsalwaysseemtofacemodelavailabilityquestions.
143
Chapter4 Co-verificationrevolvingaroundCsimulationisaninterestingareathatwillcontinuetoevolveasengineersstarttolookattopdowndesignmethodologythatcould leveragesuchamodelforhigh-speedsimulationandalsouseitforthedesignimplementation.
RTLModelofCPUwithSoftwareDebugging Aswehaveseen,therearebenefitsanddrawbacksofusingsoftwaremodelsofmicroprocessorsandotherhardware.Thissectionandthenextdiscusstechniquesthat avoidmodelcreationissuesbyusingarepresentationofthemicroprocessorthat doesn’tdependonanengineercodingamodelofitsbehavior. AstheworldofSoCdesignhasevolved,thedesignflowsusedformicroprocessor andDSPIPhavechanged.Inthebeginning,mostIPforcriticalblockssuchasthe embeddedmicroprocessorwereintheformofhardIP.ThecompanycreatingtheIP wantedtomakesuretheuserrealizedthemaximumbenefitintermsofoptimized performanceandarea.ThehardmacroalsoallowstheIPtobeusedwithoutrevealingallofthesourcecodeofthedesign.Asanexample,mostoftheARM7TDMI designsuseahardmacrolicensedfromARM.Today,mostSoCdesignsdon’tuse hardmacrosbutinsteaduseasoftmacrointheformofsynthesizableVerilogor VHDL.Softmacrosofferbetterflexibilityandeliminateportabilityissuesinthe physicaldesignandfabricationprocess. NowthattheRTLcodefortheCPUisavailableandcaneasilyberuninalogic simulatororemulationsystem,everybodywantstoknowthebestwaytoperform co-verification.Isaseparatemodelliketheinstructionsetsimulatorreallyneeded? Itdoesnotseemnaturaltomostengineers(especiallyhardwareengineers)toreplace thegoldenRTLoftheCPU,therepresentationofthedesignthatwillbeimplementedinthesilicon,withsomethingelse.TherealityisthattheRTLcodecanbeused forco-verificationandhassuccessfullybeenusedbyprojectteams. ThedrawbackofusingtheRTLcodeisthatitcanonlyexecuteasfastasthe hardwareexecutionengineitisrunningon.Sinceitistotallyinsidethehardwareexecutionengine,thereisnochancetotakeanysimulationshortcutsthatarepossible (orautomatic)withhost-codeexecutionorinstructionsetsimulation.Historically, logicsimulationhasalwaysbeentooslowtomaketheinvestigationofthistech-
144
Hardware/SoftwareCo-Verification niqueinteresting.Afterall,asimulationenvironmentforalargeSoCtypicallyruns lessthan100cycles/secandrunningatthisspeeditisnotpossibletouseasoftware debuggertoperforminteractivedebugging. Theprimaryareawherethistechniquehasseensuccessiswithsimulationaccelerationandemulationsystemsthatarecapableofrunningatmuchhigherspeeds.With ahardwareexecutionenginethatrunsafewhundredkHzupto1MHzitispossible tointeractivelydebugsoftwarerunningontheRTLmodeloftheCPU. Toperformco-verificationwithanRTLmodelofthemicroprocessor,asoftware debuggermustbeabletocommunicatewiththeCPURTL.Todebugsoftware programs,asoftwaredebuggerrequiresonlyafewprimitiveoperationstocontrol executionofamicroprocessor.ThiscanbestbeseeninasummaryoftheGNUdebugger(gdb)remoteprotocolrequirements.TocommunicatewithatargetCPUgdb requiresthetargettoperformthefollowingfunctions: ■
Readandwriteregisters
■
Readandwritememory
■
Continueexecution
■
Singlestep
■
Retrievethecurrentstatusoftheprogram(stopped,exited,andsoforth)
Infact,gdbprovidesaninterfaceandspecificationcalledtheremoteprotocolinterface thatimplementsacommunicationchannelbetweenthedebuggerandthetarget CPUtoimplementthenecessaryfunctionalitytoenablegdbtodebugaprogram. Onasilicontargetwhereachipisplacedonaboard,theonlywaytocommunicate withgdbtosendandreceivetheprotocolinformationisbyaddingsomespecial softwaretotheuser’ssoftwarerunningontheembeddedprocessorthatwillcommunicatewithgdbtosendinformationsuchastheregistercontentsandmemory contents.Thepieceofcodeaddedtothesoftwareiscalledagdbstub.Thestub (runningonthetarget)communicateswithgdbrunningonadifferentmachine(the host)usingaserialportoranEthernetconnection.Whilethismayseemcomplicated,itistheeasiestwaytodebugwithoutrequiringtheCPUtoprovideprovisions insiliconfordebugging.
145
Chapter4 Thegoodnewsisthatforsimulationaccelerationandemulationapplicationsthereis muchgreaterflexibilitysinceitisreallyasimulationoftheCPURTLcodeandnot apieceofsilicon.Thedifferenceisvisibility.Insiliconthereisnovisibility.Thereis nowaytoseethevaluesoftheregistersinsidewithouttheaidofsoftwaretoexport thevaluesorspecialpurposehardwaretoscanoutthevalues.Simulation,onthe otherhand,hasverygoodvisibility.Inasimulationaccelerationoremulationplatform,allofthevaluesoftheregistersandwiresarevisibleatalltimes.Thisvisibility makestheuseofthegdbremoteprotocolevenbetterthanitsoriginalintentsince aspecialstubisnolongerneededbytheuserintheembeddedsystemcode.Now thesolutionistotallytransparenttotheuser.Nowgdbcanusetheremoteprotocol specificationtotalktothesimulation,bothofwhichareprogramsrunningonaPC orworkstation.Thistechniquerequiresnochangestogdb,andtheworktoimplementitiscontainedinthesimulationenvironmenttobridgethegapbetweengdb andthedataitisrequestingfromthesimulation.Thearchitectureofusingthegdb remoteprotocolwithsimulationaccelerationandemulationisshowninFigure4-15.
Inter-Process Communication (socket)
gdb configured for the target processor
RTL C code CPU gdb remote protocol
Process 1
Logic Simulation with Hardware Design
Process 2
Figure4-15:gdbconnectedtotheRTLcodeofthemicroprocessor
146
Hardware/SoftwareCo-Verification
HardwareModelwithLogicSimulation Anotherwaytoeliminatetheissuesassociatedwithmicroprocessormodelsistouse theconceptofa“hardwaremodel.”AhardwaremodelusesthesiliconofthemicroprocessorasamodelforVerilogandVHDLsimulation.Acustomsocketholdsthe siliconandcapturestheoutputsfromthesilicon,sendsthemtoalogicsimulatorand appliestheinputsfromthesimulatortotheinputpinsofthesilicon.Thecommunicationmechanismbetweenthehardwaremodelerandthesimulatormustinvolve softwaretotalktothesimulatorsoanetworkconnectionismostnatural.Theconceptismuchlikethatofatesterwherethestimulusandresponseisprovidedbya logicsimulator.Thearchitectureofusingthehardwaremodelforco-verificationis showninFigure4-16. Softwaredebuggingwiththehardwaremodelcanbeaccomplishedinmultipleways. Intheprevioussection,thegdbstubwaspresented.Thisisatechniquethatcanbe usedonthehardwaremodeltodebugsoftware.UnliketheRTLmodelinasimulationenvironment,thehardwaremodelcannotprovidevisibilityoftheinternal registerssotheusermustintegratethestubwiththeothersoftwarerunningonthe microprocessor.TheothertechniquefordebuggingsoftwareisaJTAGconnection forthosemicroprocessorsthatsupportthistypeofdebuggingbyprovidingdedicated silicontoconnecttotheJTAGprobeanddebugger.Inbothcases,performanceof theenvironmentcanlimittheutilityofthehardwaremodelforsoftwaredebugging. Thehardwaremodelcanalsoprovidelocalmemoryinthehardwaretoservicesome memoryrequeststhatarenotrequiredtobesimulated.Forpuresoftwaredevelopment,softwareengineersareinterestedinhighperformanceandlessinterestedin simulationdetail.Byservicingsomeofthememoryrequestslocallyonthehardware modelerandavoidingsimulation,thesoftwarecanrunatamuchhigherspeed. Hardwaremodelerscanrunatspeedsofupto100kHzwhenrunningindependently ofthelogicsimulator.Ofcourse,inthelockstepmodetheywillonlyrunasfastas thelogicsimulatorandexchangepininformationeverycycle.
147
Chapter4 Inter-Process Communication (network connection)
Hardware Model
HDL shell Signal Values
Logic Simulation with Hardware Design
Figure4-16:Hardwaremodelofthemicroprocessor Withthehardwaremodel,co-verificationisnolongercompletelyvirtualsincea realsampleofthemicroprocessorisused,butforthoseengineersthathavenegativeexperienceswithpoorsimulationmodelsinthepast,theconceptisveryeasyto understandandveryappealing.Whatcouldbeabettermodelthanthechipitself? Clockinglimitationsareoneofthemaindrawbacksofthehardwaremodel.Todo interactivesoftwaredebugging,theCPUmustbecapableofrunningslowlyand maintainingitsstate.Earlyhardwaremodelingproductsweredevelopedatatime whenmanymicroprocessorchipsstartedusingphase-lockedloopsandcouldnotbe sloweddownbecausethePLLsdon’tworkatslowspeeds.Togetaroundthisproblem,thehardwaremodelerwouldresetthedeviceandreplaythepreviousnvectors togettovectorn+1.Thisallowedthedevicetobeclockedatspeedshighenough tosupportPLLoperation,butmadesoftwaredebuggingimpossible,exceptbyusing waveformsfromthelogicsimulator.Aswehaveseen,today’smicroprocessorscome intwoflavors,thehigh-performancevarietywithPLLsandthosemorefocusedon lowpower.Thehigh-performancevarietyusuallyhavemechanismstobypassthe PLLtoenablestaticoperationandthelow-powervarietyaremeantforstaticdesign andareveryflexibleintermsofslowclockingandevenstoppingtheclock.Unfortunately,experimentswithsuchprocessorshaverevealedthatwhenbypassingthePLL, devicebehaviorisnolonger100%identicaltobehaviorwiththePLL.Forlow-powercoreslikeARM,irregularclockingcanalsobetroublesinceitrequirestheclock inputtobetreatedmorelikeadatainputsinceitmustbesampledinsimulationand isnotrequiredtoberegular.
148
Hardware/SoftwareCo-Verification WiththeRTLcorebecomingmorecommon,therearenowproductsthatprovide anFPGAforthesynthesizableCPUandlinktothelogicsimulatorinthesameway asthemoretraditionalhardwaremodeler.UsingtheCPUinanFPGAgivessome benefitbyallowingJTAGdebuggingproductstobeused,butperformanceisstill likelytobeaconcern.IftheJTAGclockcanrunindependentlyofthelogicsimulator,highperformancecanbeobtainedforgoodJTAGdebugging.
EvaluationBoardwithLogicSimulation Themicroprocessorevaluationboardisapopularwayforsoftwareengineerstotest codebeforehardwareisavailable.Theseboardsarereadilyavailableforareasonable cost.Toextendtheuseoftheevaluationboardforco-verification,theboardcan serveasimilarpurposeastheinstructionsetsimulator.Sincemostboardshavenetworkingsupport,asocketconnectionbetweentheboardandthelogicsimulatorcan bedeveloped.Abusfunctionalmodelresidinginthelogicsimulatorcaninterface theboardtotherestofthehardwaredesign.Thearchitectureofusingtheevaluation boardforco-verificationisshowninFigure4-17. Inter-Process Communication (socket)
Microprocessor Evaluation Board
BFM Bus Transactions read/write
Logic Simulation with Hardware Design
Figure4-17:Microprocessorevaluationboardwithlogicsimulation ThiscombinationofaCPUboardconnectedtologicsimulationviaasocketconnectionandBFMismostappealingtosoftwareengineerssincetheperformanceof theboardisverygood.Sinceeachisrunningindependently,thereisnosynchronizationorcorrelationbetweenthetwotimedomainsoftheboardandthelogic simulator.
149
Chapter4 Thedrawbacktothistypeofenvironmentistheneedtoaddcustomsoftwaretothe coderunningontheCPUboardtohandlethesocketconnectiontothelogicsimulator.Somecommercialco-verificationvendorsprovidesuchalibrarythatmaybe suitable,butmustalwaysbemodifiedsinceeachboardisdifferentandthesoftware operatingenvironmentisdifferentfordifferentreal-timeoperatingsystems.Althoughthesolutionrequiresalotofcustomization,ithasbeenusedsuccessfullyon projects.
In-CircuitEmulation In-circuitemulationinvolvesusingexternalhardwareconnectedtoanemulation systemthatrunsatmuchhigherspeedsthanalogicsimulator.Emulationisanattractiveplatformtodoco-verificationsincethehigherspeedenablessoftwaretorun faster.Thissectiondiscussesthreedifferentwaystoperformco-verificationwithan emulationsystem. ThefirstmethodisusefulformicroprocessorcoresthatareavailableinRTLform.As wehaveseen,thereisatrendfortheIPvendorstoprovideRTLcodetotheuserfor thepurposesofsimulationandsynthesis.Ifthisisavailable,themicroprocessorcan bemappeddirectlyintotheemulationsystem.MostcoresusedinSoCdesigntoday supportsomekindofJTAGinterfaceforsoftwaredebugging.Toperformco-verificationasoftwareengineercanconnectaJTAGprobetotheI/Opinsoftheemulator andcommunicatewiththeCPUthatismappedinsidetheemulator.ThearchitectureofusingaJTAGconnectiontoanemulatorforco-verificationisshownin Figure4-18.
JTAG connection RTL CPU core
JTAG debugger and probe serial bitstream
Emulation System with Hardware Design
Figure4-18:JTAGconnectiontoanemulationsystem
150
Hardware/SoftwareCo-Verification Inthismodeofoperation,theCPUrunsatthespeedoftheemulationsystem,in lock-stepwiththerestofthedesign.Themainissuesinperformingco-verification aretheoverallspeedoftheemulatoranditsabilitytomaintaintheJTAGconnectionreliablyatspeedsthatarelowerthanmosthardwareboards. Asecondwaytoperformco-verificationwithanemulationsystemistouseaboard withthemicroprocessortestchipandconnectthepinsofthechiptotheI/Opinsof theemulator.ThistechniqueisusefulforhardmacromicroprocessorIPsuchasthe ARM7TDMIthatcannotbemappedintotheemulationsystem.JTAGdebugging canalsobedonebyconnectingtotheJTAGportonthechip.Thearchitectureof usingaJTAGconnectiontoanemulatorforco-verificationisshowninFigure4-19. Software Debugger JTAG
CPU Chip on a board
Signal Values
CPU Shell
Emulation System with Hardware Design
Figure4-19:JTAGconnectionwithtestchipandemulationsystem Likethepreviousmethod,theCPUcorewillrunatthespeedoftheemulationsystem.Signalvalueswillbeupdatedoneachclockcycle.Theresultisacycle-accurate simulationoftheconnectionbetweenthetestchipandtherestofthedesign.The cycle-accuratelock-stepsimulationisdesiredforhardwareengineersthatwantto modelthesystemexactlyandwanttorunfasterusingemulationtechnologyforlong softwaretestsandregressiontests.
151
Chapter4 Inbothoftheprevioustechniques,theusermustmakesuretoconfirmthatthe JTAGsoftwareandhardwarebeingusedfordebuggingcantolerateslowclock speeds.Mostemulationsystemsruninthe250kHzto1MHzrangedependingon theemulationtechnologyandthedesignbeingrunontheemulator.Whilethisis muchfasterthanalogicsimulator,itismuchslowerthanwhatthedevelopersofthe JTAGtoolsprobablyexpected.MostJTAGtoolshavebuiltintimeouts,eitherinthe hardwareorinthesoftwaredebugger(orboth)forsituationswhenthedesignisnot responding.Itiscrucialtoverifythatthesetimeoutscanbeturnedoff.Emulation, likesimulation,allowstheusertostopthetestbypressingCtrl+c,waitingforsome unspecifiedamountoftime,andthenrestartingoperation.Iftimeoutsexistinthe JTAGsolution,thiswillcertainlycauseadisconnectandresultinthelossofsoftware debugging.ThebestwaytoprovideastableJTAGconnectionistouseafeedback clocktotheJTAGhardwaretohelpitadaptitsspeedbasedonthespeedoftheemulationsystem. Thethirdco-verificationmethodcommonlyusedwithemulationistouseaspeed bridgebetweenhardwarecontainingamicroprocessordeviceandtheemulation system.Theclassiccaseforthisapplicationisforverificationofachipthatconnects tothePCIbus.Acommonsetupisforsoftwareengineersthataredevelopingdevice driversforoperatingsystemssuchasWindowsorLinuxandtheboardtheyarewritingthedriverforsitsonthePCIbus.SincethePCIboardisnotyetavailable,they canuseaPCtotestthesoftwareandtheemulationsystemprovidesaPCIboardthat plugsintothePCandbridgesthespeeddifferencesbetweentherealspeedofthePCI businthePC(33or66MHz)andtheslowerspeedoftheemulator.ThePCwillrun atfullspeeduntilthedevicedrivermakesamemoryorI/Oaccesstotheslotwith thehardwarebeingdeveloped.Whenthisoccurs,thebridgetotheemulatorwill detectthePCItransactionandsenditovertotheemulator.Whiletheemulatoris executingthePCItransaction,thebridgecardwillcontinuouslyrespondwitharetry responsetostallthePCuntiltheemulatorisready.Eventually,theemulatorwill completethePCItransactionandthebridgecardwillcompletethetransactionon thePC.ThismethodisshowninFigure4-20.
152
Hardware/SoftwareCo-Verification Native Software Debugger Cables
PC running Windows or Linux
PCI speed bridge board
PCI Signal Values
HDL PCI Shell
Emulation System with Hardware Design
Emulator
PC
Figure4-20:JTAGconnectionspeedbridgeandemulationsystem
Similarenvironmentsarecommonforembeddedsystemswhereaboardcontaining amicroprocessorcanrunanRTOSsuchasVxWorksandcommunicatewiththe emulatorthroughaspeedbridgeforabussuchasPCIorAHB.
FPGAPrototype IalwaysgetalaughwhentheFPGAprototypeisdiscussedasaco-verificationtechnique.Prototypingisreallyjustbuildingthesystemoutofprogrammablelogicand usingthedebuggerjustasifthefinalhardwarewasconstructed.Theonlydifference maybeASICsaresubstitutedforFPGA,andasaresulttheperformanceislower thanthefinalimplementation.Sincehardwaredebuggingisverydifficult,prototypingbarelyqualifiesasco-verification,butsincetherepresentationofthehardwareis notthefinalproductitisausefulwayforsoftwareengineerstogetearlyaccesstothe hardwaretodebugsoftware. RecentadvancesinFPGAtechnologyhavecausedmanyprojectstore-examine hardwareprototyping.WithFPGAsfromAlteraandXilinxnowexceeding250k to500kASICgates,customprototypinghasbecomeapossibilityforhardwareand softwareintegration.Untilnowdesignflowissues,toolissues,andthegreatdensity differencesbetweenASICandFPGAhavelimitedtheuseofprototyping.Withthe latestFPGAdevices,mostASICscannowbemappedintoasetofonetosixFPGAs. NewpartitioningtoolshavealsobeenintroducedthatworkattheRTlevelanddonot requirechangestotheRTLcodeordifficultgate-level,post-synthesispartitioning.
153
Chapter4 Althoughprototypingiseasierthanithaseverbeenitisstillnotatrivialtask. Prototypingissuesfallintotwocategories:FPGAresourceissuesandASIC/FPGA technologydifferences.CommonresourceissuescanbethelimitednumberofI/O pinsavailableontheFPGAorthenumberofclockdomainsavailableinanFPGA. Technologydifferencescanberelatedtodifferencesinsynthesistoolsforcingthe usertomodifythedesigntomaptotheFPGAtechnology.AnothercommontechnologyissueisgatedclocksthataredifficulttohandleinFPGAtechnology.If resourceandtechnologyissuescanbeovercome,prototypingcanprovidethehighestperformanceco-verificationsolutionthatisscalabletolargenumbersofsoftware engineers.Beforecommittingtoprototypingisitimportanttoclearlyunderstand theissuesaswellasthecost.Onthesurface,prototypingappearscheapcompared toalternatives,butlikeallengineeringprojectscostshouldbemeasurednotonlyin hardwarebutalsoinengineeringtimetocreateaworkingsolution.
Co-VerificationMetrics Manymetricscanbeusedtodeterminewhichco-verificationmethodsarebestfora particularproject.Followingisalistofsomeofthem: ■
Performance(speed)
■
Accuracy
■
Synchronization
■
Typeofsoftwaretobeverified
■
Abilitytodohardwaredebugging(visibility)
■
Abilitytodoperformanceanalysis
■
Specificvs.general-purposesolutions
■
Softwareonly(simulatedhardware)vs.hardwaremethods
■
Timetocreateandintegratemodels:businterface,cache,peripherals,RTOS
■
Timetointegratesoftwaredebugtools
■
Pre-siliconcomparedtopost-silicon
154
Hardware/SoftwareCo-Verification
Performance Itiscommontoseenumbersthrownoutaboutcycles/secandinstructions/secrelated toco-verification.Whilesomeprojectsmayindeedachieveveryhighperformance usingco-verification,itisdifficulttopredictperformanceofaco-verificationsolution.Ofcourseeveryvendorwillsaythatperformanceis“designdependent,”but withagoodunderstandingofco-verificationmethodsitispossibletogetagoodfeel forwhatkindofperformancecanbeachieved.Thegeneralunpredictabilityisaresultoftwofactors;first,manyco-verificationmethodsuseadual-processarchitecture toexecutehardwareandsoftware.Second,thesizeofthedesign,thelevelofdetail ofthesimulation,andtheperformanceofthehardwareverificationplatformresults inverydifferentperformancelevels.Thenextchapterwillprovidemoreinformation aboutco-verificationperformance.
VerificationAccuracy Whileperformanceissuesarethenumberoneobjectiontoco-verificationfrom softwareengineers,accuracyisthenumberoneconcernofhardwareengineers.Some commonquestionstothinkaboutwhenevaluatingco-verificationaccuracyarelisted here.Thekeytosuccessfulhardware/softwareco-verificationisthemicroprocessor model. ■
Howisthemodelverifiedtoguaranteeitbehavesidenticallytothedevicesilicon? Softwaremodelscanbeverifiedbyusingmanufacturingtestvectorsfrom themicroprocessorvendororrunningaside-by-sidecomparisonwiththe microprocessorRTLdesigndatabase.Metricssuchascodecoveragecanalso provideinformationaboutsoftwaremodeltesting.Alternatively,notall co-verificationtechniquesrelyonseparatelydevelopedmodels.Techniques basedonRTLcodefortheCPUcaneliminatethisquestionaltogether.Make surethemodelcomeswithadocumentedverificationplan.Anybodycan makeamodel,buttheeffortrequiredtomakeagoodmodelshouldnotbe underestimated.
155
Chapter4 ■
Doesthemodelcontaincompletefunctionality,includingallperipherals? Usingbusfunctionalmodelswasafeasiblemodelingmethodbeforesomany peripheralswereintegratedwiththemicroprocessor.Forchipswithhigh integration,itbecomesverydifficulttomodelalloftheperipherals.Evenif adeviceappearstohavenointegratedperipherals,lookforthingslikecache controllersandwritebuffers.
■
Isthemodelcycleaccurate? Doallpartsofthemodeltakeintoaccounttheinternalclockofthemicroprocessor?Thisincludesthingssuchasthemicroprocessorpipelinetiming andthecorrelationofbustransactiontimeswithinstructionexecution.This mayormaynotbenecessarydependingonthegoalsofco-verification.A noncycleaccuratemodelcanrunatahigherspeedandmoremaybemore suitableforsoftwaredevelopment.
■
Areallfeaturesofthebusprotocolmodeled? Manymicroprocessorsusemorecomplexbusprotocolstoimproveperformance.Techniquessuchasbuspipelining,bursting,out-of-ordertransaction completion,writeposting,andwritereorderingareusuallyasourceofdesign errors.Simplereadandwritetransactionsbythemselvesrarelybringout hardwaredesignerrors.Itisthesequenceofmanytransactionsofdifferent typesthatbringoutmostdesignerrors. ThereisnothingmorefrustratingthantryingtouseamodelforaCPUthat hasmultiplemodesofoperationonlytofindoutthatthemodethatisusedby thedesignsuffersfromthemostdreadedwordinmodeling,“unsupported.”
IonceworkedonaprojectinvolvingadesignwiththeARM920TCPU.Thiscore usesseparateclocksforthebusclock(BCLK)andtheinternalcoreclock(FCLK). Theclockinghasthreemodesofoperation: ■
FastBusMode:TheinternalCPUisclockeddirectlyfromthebusclock (BCLK)andFCLKisnotused.
■
SynchronousMode:TheinternalCPUisclockedfromFCLKwhichmustbe fasterandasynchronousintegermultipleofthebusclock(BCLK).
156
Hardware/SoftwareCo-Verification ■
AsynchronousMode:TheinternalCPUisclockedfromFCLKwhichcanbe totallyasynchronoustoBCLKaslongasitisfasterthanBCLK.
Asyoucanprobablyguess,theasynchronousmodecausedthisparticularproblem. TheARM920TstartsoffusingFastBusmodeafterresetuntilwhichtimesoftware canchangeabitincoprocessor15toswitchtooneoftheotherclockingmodesto gethigherperformance.Whentheappropriatebitincp15waschangedtoenable asynchronousmode,amysteriousmessagecomesout: “SettoAsynchmode,WARNINGthisisnotsupported”
Itisquitedishearteningtolearnthisinformationonlyafteralongcampaigntoconvincetheprojectteamthatco-verificationisuseful. Followingaresomeotherthingstopayattentiontowhenevaluatingmodelsusedfor co-verification: ■
Canperformancedatabegatheredtoensuresystemdesignmeetsrequirements? Ifthemodelisnotcycleaccurate,theanswerisNO.Bothhardwareandsoftwareengineersareinterestedinusingco-verificationtoobtainmeasurements aboutbusthroughput,cachehitrates,andsoftwareperformance.Amodel thatisnotcycleaccuratecannotprovidethisinformation.
■
Willthemodelaccuratelymodelhardwareandsoftwaretimingissues? Likethebusprotocol,themoredifficulttofindsoftwareerrorsarebrought outbythetiminginteractionsbetweensoftwareandhardware.Examples includeinterruptlatency,timers,andpolling.
Whenitcomestomodelingandaccuracyissuestherearereallytwodifferentgroups. Onesetofengineersismostlyinterestedinthevalueofco-verificationforsoftware developmentpurposes.Ifco-verificationprovidesawaytogainearlyaccesstothe hardwareandthecoderunsatahighenoughspeedthesoftwareengineerisquite happyandthedetailsofaccuracyarenotthatimportant.Theothergroupisthe hardwareengineersandverificationengineersthatinsistthatifthesimulationisnot exactthenthereisnoreasontoevenbothertorunit.Simulatingsomethingthatis notrealityprovidesnobenefittotheseengineers.Thefollowingexamplesdemonstratethedifficultyinsatisfyingbothgroups.
157
Chapter4
AHBArbitrationandCycleAccuracyIssues Ionceworkedwithaprojectthatusedasingle-layerAHBimplementationforthe ARM926EJ-S.Onewaytoperformco-verificationistoreplaceafull-functionallogic simulationmodeloftheARMCPUbyabusfunctionalmodelandaninstruction setsimulator.SincetheARM926busfunctionalmodelwillactuallyimplementtwo businterfaces,aquestionwasraisedbytheprojectteamaboutarbitrationandthe orderingofthetransfersonthebusbetweentheIAHBandtheDAHB.Theorder inwhichthearbiterwillgrantthebusisgreatlydependentonthetimingofthebus requestsignalsfromeachAHBmaster.FromthediscussionofAHBtheHREADY signalplaysakeyroleinarbitrationsincethebusisonlygrantedwhenboth HGRANTandHREADYarehigh.Theparticularbusfunctionalmodelbeingused decidedthatsinceHREADYisrequiredtobehighforarbitrationanddatatransfer itcanbeusedmorelikeanenableforthebusinterfacemodelstatemachinesince nothingcanhappenwithoutitbeinghigh.TooptimizeperformancethebusfunctionalmodeldecidedtodonothingduringthetimewhenHREADYwaslow.This assumptionaboutthefunctionofHREADYisnearlycorrect,butnotexactly.The CPUindicationtothebusinterfaceunitoftheARM926thatitneedstorequestthe bushasnothingtodowithHREADYorthebusinterfaceclock,itusestheCPU clock.ThisproducedasituationwheretheIHBUSREQandDHBUSREQwere artificiallylinkedtotheHREADYandthetimingofthesewasincorrect.Theresult totheuserwasthearbitergrantingtheIAHBtousethebusinsteadoftheDAHB. Sincethetwobussesareindependent,exceptforthefewexceptionswediscussed, thereisnoharminrunningthetransactionsinadifferentorderonthesingle-layer AHB.Functionally,thismakesnodifferenceandallverificationtestsandsoftware willexecutejustfine,buttohardwareengineersseekingaccuracythesituationisno good.Thiscasedoesbringupsomeinterestingquestionsrelatedtoperformance: ■
Doesarbitrationpriorityaffectsystemperformance?
■
Whataretheperformancedifferencesbetweensingle-layerAHBversus multilayerAHBversusseparateAHBinterfaces?
158
Hardware/SoftwareCo-Verification Figure4-21showsthecorrectbusrequesttiming.ThesequenceshowstheIAHB readingaddresses0x44and0x48followedbytheDAHBreadingfromaddress0x90. Itisdifficulttosee,butthenexttransactionistheIAHBreadingfromaddress0x4c. NoticethetimingofDHBUSREQatthestartofthewaveform.Ittransitionshigh beforethefirstIHREADYonthewaveform.Thisdemonstratesthatthetimingof DHBUSREQisnotrelatedtoHREADY.
Figure4-21:Correcttimingofbusrequest
Figure4-22showstheincorrectorderingofbustransferscausedbythedifferencein thetimingofDHBUSREQ.ThesequencestartthesamewaywithIAHBreadsfrom 0x44and0x48,butthereadfrom0x4ccomesbeforetheDAHBreadfromaddress 0x90.ThereasonisthetimingofDHBUSREQ.NoticeDHBUSREQtransitions highAFTERthefirstIHREADYonthewaveform.Thisdifferenceresultsinout-ofordertransactions.
159
Chapter4
Figure4-22:Incorrecttimingofbusrequest ContrastthispursuitofaccuracywithasoftwareengineerImetoncethatdidn’tcare anythingaboutthedetailofthehardwaresimulation.Skippingthemajorityofthe simulationactivityjusttorunfastwasthebestwaytogo.Hehadnodesiretoruna detailed,cycle-accuratesimulation.Actually,hewasinterestedinmakingsurethe softwareranonthecycle-accuratesimulation,butonceithadbeendebuggedusinganoncycleaccurateco-verificationenvironmentthefinalcheckofthesoftware wasbettersuitedforalongbatchsimulationusingtheARMRTLmodelandfarm ofworkstationsthatwasmaintainedbythehardwareengineers,nothim.Sincethe chanceoffindingasoftwarebugwaslowtherewasnoreasontoworryaboutthe problemofdebuggingthesoftwareinapurelogicsimulationenvironmentusing waveformsorlogfiles.
ModelingSummary Modelingisalwayspainful.Thereisnowayaroundit.Nomatterwhatkindof checksandbalancesareavailabletocomparethemodeltotheactualimplementation,therearealwaysdifferences.Oneofthecommondebatesisaboutwhat representsthegoldenviewoftheIP.InthecaseofARMmicroprocessors,thereare threepossiblerepresentationsthatareconsidered“golden”:
160
Hardware/SoftwareCo-Verification ■
RTLcode(forsynthesizableARMdesigns)
■
Designsign-offmodel(DSM)derivedfromtheimplementation
■
SiliconintheformofatestchiporFPGA
Engineersviewthesethreeasgolden,evenmoresothanthespecification.Co-verificationtechniquesthatusemodelsthatdonotcomefromoneofthesegoldensources areatadisadvantagesinceanyproblemsarealwaysblamedonthe“non-golden” model.Ihaveseencaseswheretheuser’sdesigndoesnotmatchthebusspecification, butdoesworkwhensimulatedwithagoldenmodel.Sinceaspecisnotexecutable, engineersfeelstronglythatthedesignworkingwiththegoldenmodelismostimportant,notthespec.Whenalternativemodelsareusedforco-verificationamodelthat conformstothespecisstillviewedasabuggymodelinanyplacesitdiffersfromthe goldenmodel.Itisnotalwayseasytoconvinceengineersthatadesignthatrunswith manydifferentmodelsandadherestothespecificationisbetterthanadesignthat runsonlywiththegoldenmodel.
Synchronization Mostco-verificationtoolsoperatebyhidingcyclesfromtheslowerlogicsimulation environment.Becauseofthis,issuesrelatedtosynchronizationofthemicroprocessorwiththerestofthesimulateddesignoftenarise.Thissituationisalsotrueusing in-circuitemulationwithaprocessorlinkedtotheemulatorviaaspeedbridge.In co-verificationtherearetwodistincttimedomains,themicroprocessormodelrunningoutsideofthelogicsimulatorandthelogicsimulatoritself.Understandingthe correlationofthesetwotimedomainsisimportanttoachievingsuccesswithco-verification.Co-verificationusesmainlythespatialmemoryreferencestodecidewhen softwaremeetsthehardwaresimulation.Synchronizationisdefinedbywhathappens tothelogicsimulatorwhenthereisnobustransactionoccurringinthelogicsimulator.Itcouldbestoppeduntilanewbustransactionisreceived.Itcouldjust“drift” forwardintimeexecutingotherpartsofthelogicsimulationhardware(evenwith anidlemicroprocessorbus).Ineithercasetheamountoftimesimulatedinthelogic simulatorandthemicroprocessormodelisdifferent.Anotheralternativeistoadvancethelogicsimulationtimethepropernumberofclockcyclestoaccountforthe hiddenbustransactionbutdon’trunthetransactiononthebus.Nowthecorrection
161
Chapter4 ofthetimedomainsismaintainedattheexpenseofperformance.Synchronization isalsoimportantforthosetemporalactivitieswherethehardwaredesigncommunicateswithsoftwaresuchasinterruptsandDMAtransfers.Withoutproper synchronization,thingslikesystemtimersandDMAtransfersmaynotworkcorrectly becauseofdifferencesinthetwotimedomains.
TypesofSoftware Thetypeofsoftwaretobeverifiedalsohasamajorimpactonwhichco-verification methodstodeploy.AswesawinChapter2,therearedifferenttypesofsoftwarethat engineersseeascandidatesforco-verification:systemdiagnostics,devicedrivers, RTOSandapplicationcode.Differentco-verificationmethodsarebettersuitedto differenttypesofsoftware.Usuallythelowerlevelsoftwarerequiresamoreaccurate co-verificationenvironmentandhigher-levelsoftwareislessinterestedinaccuracy andmorefocusedonperformancebecauseofthecodesize.RunninganRTOSsuchas VxWorkshasbeenshowntobeviablebymultipleco-verificationmethodsincluding in-circuitemulation,anISSandtheRTOSsimulator,VxSIM.Evenwiththemarketingclaimsthatsoftwaredoesnothavetobemodified,expectsomemodificationto optimizesuchthingslikelongmemorytestsandUARTaccesses.Themajorconfusion todayexistsbecauseofthemanytypesofsoftwareandthemanymethodsofhardware execution.Often,differentlevelsofperformancewillenabledifferentlevelsofsoftwaretobeverifiedusingco-verification.Aquicksanitychecktocalculatethenumber ofcyclesrequiredtorunagiventypeofsoftwareandthespeedoftheenvironment willensureengineerscanremainproductive.Ifittakes1hourtorunasoftware programtogettothenewsoftwarethismeansthesoftwareengineerwillhaveonlya handfulofchancesperdaytorunthecodeanddebuganyproblemsfound.
OtherMetrics Besidesperformanceandaccuracy,therearesomeothermetricsworththinking about.Projectteamsshouldalsodetermineifageneral-purposesolutionisimportant versusaprojectspecificsolution.General-purposesolutionscanbereusedonfuture projectsandonlyonesetoftoolsneedstobelearned.Unfortunately,general-purposesolutionsarenotgeneralifthemodelusedonthenextprojectisnotavailable. Methodsusingtheevaluationboardorprototypingaremorespecificandmaynotbe
162
Hardware/SoftwareCo-Verification applicableonthenextproject.Formanyengineers,especiallysoftwareengineers,a solutionthatconsistsofsimulationonlyispreferredoveronethatcontainshardware. Anotherimportantdistinctioniswhetherthesolutionisavailablepreorpostsilicon. Manyleadingedgeprojectsusemicroprocessorsthatarenotyetavailableandapresiliconmethodisrequired.Allofthesevariablesshouldbeconsideredwhendeciding onaco-verificationstrategy. Understandingallofthesemetricswillavoidcommittingtoaco-verificationsolutionthatwillnotmeettheprojectneeds.Remember,thegoalofco-verificationisto savetimeintheprojectschedule.Withtheunderstandingofco-verificationmethods andmetrics,thenextchapterwilllookintomoredetailsofhowco-verificationworks andhowtogetthemostbenefitfromit.
163
This page intentionally left blank
CHAPTER
5
AdvancedHardware/Software Co-Verification Hardware/softwareco-verificationismuchmorethanexecutingahardwaredesign beforefabricationandusinganinteractivesoftwaredebuggertodobasicoperations likebreakpoint,singlestep,andviewmemory.Usingco-verificationeffectively requiresfindingtherightmixofperformance,simulationdetail,anddebuggingviews forthejob.Unfortunately,inco-verificationthereisnoone-size-fits-allsolutionthat canbeappliedtoeveryproblem.Fartoooften,Iseeprojectswhereoneengineersets upaco-verificationmethodorconfiguration(probablybestsuitedforhisownwork) andotherengineersblindlyusethesameconfigurationfortasksthatwouldbebetter suitedtosomeothermethodorconfiguration.Oneofthecausesissoftwareengineers cannoteasilymodifythesimulationenvironmentwithoutsomehelpfromhardware orverificationengineers.Conversely,hardwareengineersdonotalwaysunderstand theneedsofsoftwareengineersandtrytoapplyaone-size-fitsallsolution.Thischapterdiscussessomeoftheadvancedtopicsofco-verificationtohelpusersgetthemost benefitfromco-verification.
DirectAccesstoSimulationMemories Oneofthebenefitsofsimulationisthelevelofvisibilityandcontrollabilitythatis available.Thisismostusefulforco-verificationintheareaofmemory.Embedded softwaremakesuseofamemorymapandmicroprocessoraddressestoaccessdifferent typesofmemoryandmemorymappedregistersinthedesign.Softwaredebuggingis alsomemoryintensive.Theprimaryoperationperformedbyasoftwaredebuggeris readingmemory.Scrollingthedebugger’smemorywindoworsourcecodewindow requiresquantitiesofmemorydata. 165
Chapter5 InChapter2,wediscussedaccessingsimulationmemorymodelsinwaysotherthan performingsimulationandchanginginterfacesignalstodoareadorawrite.Oneexampleofthisisthe$readmemhsystemtaskinVerilogthatcanbeusedloadmemory contentsfromatextfile.AnotherexampleistheuseofaCprogramminginterface toreadandwritesimulationmemorymodelsdirectlywithoutperformingsimulation. Thisfeatureisusefulinco-verificationbecauseasoftwaredebuggercannowreadand writememorydirectlywithoutperforminganysimulation.ExampleCAPIfunctions fromthexsimlogicsimulatorareshowninFigure5-1. /* get a verilog memory handle by the path name */ extern void* axisGetVmemHandle(char* path, int* left, int* right, int* top, int* bottom); /* read 'data' from verilog memory 'handle' at 'addr'. data is in verilog memory format */ extern int axisReadVmem(void* handle, int addr, unsigned char* data); /* write 'data' to verilog memory 'handle' at 'addr'. data should be in verilog memory format */ extern int axisWriteVmem(void* handle, int addr, unsigned char* data); /* tell the simulator to propagate verilog memory 'handle' */ extern int axisPropVmem(void* handle); /* like $readmemh, read verilog memory 'handle' from 'file' in hex */ extern int axisReadVmemh(void* handle, char* file); /* like $readmemb, read verilog memory 'handle' from 'file' in binary */ extern int axisReadVmemb(void* handle, char* file);
Figure5-1:ExampleC-APIforVerilogmemorymodels Beforeunderstandinghowdirectmemoryaccessisused,itisimportanttounderstand thetwotypesofmemoryaccessesthatarepresentwhendebuggingsoftware.Thefirst typeofmemoryaccesscomesfromthesoftwareinstructionsrunningontheCPU. Thiscouldbeinstructionfetches,ordataoperationssuchasmemoryloadandstore instructions.Evenifasoftwaredebuggerisnotused,thesewillalwaysoccuronthe buswhentheprogramisrun.ThistypeofmemoryaccessisshowninFigure5-2in theareamarked(1).Thesecondtypeofmemoryaccesscomesfromthedebugger.
166
AdvancedHardware/SoftwareCo-Verification SoftwaredebuggersreadbothCPUregistersandmemorydatatodisplaythecurrentcontextofthesoftware.Manydebuggercommandsrequireaccesstomemoryto displayusefulinformation.Thedebuggercanevenchangetheprogramordatavalues tochangesoftwareandsystembehavior.Memoryaccessesgeneratedbythedebugger areshownintheareamarked(2)inFigure5-2.
MEM
(2) memory values
Software Debugger
register values
CPU
(2) memory values
(1) loads stores
MEM
Figure5-2:Twotypesofmemoryaccesses Sincememoryoperationscausedbythedebuggerarenotpartoftheembeddedsoftwaretheyareintrusivebecausetheycauseextraactivitythatisnotnormallypartof thehardware/softwareinteraction.Mostsoftwareengineerscanrelatetoembedded systemproblemsthataretimingrelated.Acommonproblemisforsoftwaretofail whenrun,butwhenthesoftwareengineerusesadebuggertoputabreakpointona functionnearthefailuretheproblemdisappears,evenifnochangesaremade.The timingofthesoftwareandhardwareisaffectedbythebreakpointandtheproblem disappears.Similarly,inco-verificationisitadvantageoustoavoidintrusiveness causedbythesoftwaredebugger.Debuggergeneratedmemoryactivitycausestwo problems: ■
Theaccuracyofthesimulationischanged
■
Debuggerresponsetimecanbeslowifalldebugger-generatedmemory activityissimulated
167
Chapter5 Ifthememoryrequestsgeneratedbythedebuggercandirectlyaccessmemory withoutanysimulation,bothoftheproblemsareimmediatelysolved.Theonlyremaininghurdleistodescribetothedebuggerthememorymapfromtheviewofthe CPU.Inasystemdesign,therearemanyinstancesofsmallermemoriesthatfitinto theglobalCPUaddressrange.BetweentheCPUandindividualmemoryinstances therearememorycontrollers,decoders,chip-selects,write-enablesandotherlogic thatdecideshowaparticularCPUaddressendsupaccessingspecificmemoryinstances.Anexampleofhowmemoriesinasystemmaybeimplementedisshownin Figure5-3. Data
0xffffffff
D[31:24]
D[23:16]
D[15:8]
D[7:0]
8-bit ROM
8-bit ROM
8-bit ROM
8-bit ROM
Address
16-bit Flash RAM
8-bit SRAM
32-bit SDRAM 0
Figure5-3:Exampleofmemoryconfiguration
168
AdvancedHardware/SoftwareCo-Verification Adebuggercannotautomaticallyfigureoutwhichmemoriesneedtobeaccessedto retrievethedataforaparticularCPUaddress.Onewaytohelpitistoallowtheuser tospecifythemappingoftheCPUaddressestotheindividualmemoryinstances. Differenttoolshavedifferentwaystospecifytheinformation.Someofthewaysit canbedoneare: ■
Textfilecontaininginformation
■
Graphicaltooltodrawboxesintheaddressmap
■
Drag-and-dropfromaVeriloghierarchydescription
■
UsercanwriteCfunctionsusingprovidedhookstoimplement directmemoryaccess
Atextfileistheeasiestwaytospecifytheconfigurationinformation.Figure5-4 showsanexampleofspecifyingmemorymappingsothedebuggercandirectlyaccessmemorieswithoutanysimulationtomaintaincycleaccuracyandprovideafast debuggerresponse.
169
Chapter5 ; Global variables defining bus data_bus_width 32 bit_order = descending byte_order = littleendian ; Individual Memories in the design name TBplatform.uMemory.uBootROM0.Rom start_address 0x0 start_bit = 0 last_bit = 7 name TBplatform.uMemory.uBootROM1.Rom start_address 0x1 start_bit = 8 last_bit = 15 name TBplatform.uMemory.uBootROM2.Rom start_address 0x2 start_bit = 16 last_bit = 23 name TBplatform.uMemory.uBootROM3.Rom start_address 0x3 start_bit = 24 last_bit = 31 name TBplatform.uMemory.uRAM6.Ram start_address 0x28000000 start_bit 0 last_bit 31 name TBplatform.uTrickWrapper.uIntROM.Mem start_address 0xffff0000 start_bit 0 last_bit 31 name TBplatform.uPlatform.uProcSubSys.uProcCoreMod.uDRAM.mem start_address 0x40008000 start_bit 0 last_bit 31 name TBplatform.uPlatform.uProcSubSys.uProcCoreMod.uIRAM.mem start_address 0x40010000 start_bit 0 last_bit 31
Figure5-4:Examplememoryconfigurationfile
170
AdvancedHardware/SoftwareCo-Verification Thissectiontalkedaboutusingdirectmemoryaccessforthememoryreadsand writescausedbythedebugger.Thenextsectiontalksaboutusingdirectmemoryaccessformemoryactivitysuchasinstructionfetchesordataoperations.
MemoryOptimizationsandPerformance Softwareengineersalwaysrequestfasterperformance.It’sinterestingthatsoftware engineersmorenaturallytalkaboutperformanceintermsofslowdownfromthereal system,andhardwareengineerstalkaboutperformanceintermsofspeedupfrom logicsimulationspeed.Logicsimulationspeedrangesfrom10to500cycles/second forSoCprojects.Forhardwareengineers,a10xspeedupisusefulifitcanbeachieved withlittleornoeffort,a100xspeedupisworthsomeefforttoachieve,anda1000x speedupenablesanewsetofteststoberunthatwouldotherwisehavetobeskipped sinceittakesruntimesfromdaystominutes.Forsoftwareengineersa10xslowdown isnotaproblem,takinga100MHzdesigndownto10MHz.A100xslowdownto 1MHzisprobablystilluseful,buta1000xslowdownisnotusefulformanytypesof softwareincludingRTOSandapplications. Inco-verification,therearetwobasicwaystoprovidefasterperformance.Thefirstis tosimulatelesssothatfromtheviewofsoftwareengineer,theperformanceappears tobefaster,andthesecondistospeedupthehardwareexecutionengine. Waystoimproveco-verificationperformanceare: ■
Simulateless
■
Increaserawsimulationperformance
Achievinghigherperformancebysimulatinglesscanbedoneusingthesamedirectmemoryaccesstechniquesdescribedabove.Whenthesoftwarerunningonthe embeddedCPUmakesmemoryaccesses,eitherinstructionfetchesordatareadsand writes,co-verificationtoolscanbeconfiguredforlogicsimulationorsimplyusedirect memoryaccesstocompletethereadorwrite.Theabilitytoskipsimulationforcertainareasofthememorymapissometimescalledoptimization,optimizedmemory,or memoryaccessoptimization.Theabilitytobypasssimulationtoincreaseperformance isalsoanalternativedefinitionofco-verification.Thisabilityisprimarilywhatdifferentiatessomeco-verificationmethodsfrompureco-simulation.
171
Chapter5 MemoryoptimizationisreallytheonlyhopeforreasonableperformancewhencoverificationisperformedusinganISSandlogicsimulation.Withoutoptimization thesimulationwillrunatthespeedoflogicsimulation(rememberthat’sabout10to 500cycles/second).Host-codemodehassomeamountofmemoryoptimizationthat occursnaturallysinceitdoesnotfetchinstructionsfromthelogicsimulator,only dataaccessesaresimulated.Thiseliminatesmuchofthesimulationthatwouldbe performedusingtarget-codemodewithanISS. Althoughnotallco-verificationmethodsusethedual-processarchitecture,let’slook atthecommonISS+BFMsetuptogetaflavorforco-verificationperformance.The embeddedsystemmemorymapisusedtodecidewhichmemoryaccessesareoptimizedandwhicharesimulated.Performancedependsonhowoftensoftwareaccesses thehardwaredesign.Thisvariableisreferredtoashardwaredensity.Thinkofitmuch likethewayaninstructioncacheworks.Ahighnumberofcodefetcheshitthe cacheandareservicedatveryhighspeeds,andsomefetchesmissthecacheandwill takelongertogetfrommemory. Overallperformanceisdeterminedbythespeedofeachenvironmentandthenumberofmemoryaccessesthatarerequiredforeach.Followingisasimplifiedexample thatwillhelptounderstandco-verificationperformance.AssumeanISScanrun softwareat100,000instructionspersecond,andassumethatinstructionsthataccess thelogicsimulatorrunataspeedof50instructionspersecond(ips).Also,assume (forsimplicity)that1instructiontakes1clockcycle.Foragivendesign,assumethe hardwaredensityis10%,thismeans10%oftheprogram’sinstructionsmustaccess logicsimulation.Theother90%ofinstructionsarenotsimulatedduetomemory optimization.Torunaprogramwith100,000instructions,thetotalexecutiontimeis: 90,000instructions*(1inst/100kips)+10,000instructions*(1inst/50 ips)=200.9seconds Torunthis100,000instructionprogramexclusivelyinalogicsimulatorrunningat50ipswouldtake 100,000instructions*(1inst/50ips)=2,000seconds Torunthis100,000instructionprogramexclusivelyintheISSwouldtake 100,000instructions*(1inst/100kips)=1second
172
AdvancedHardware/SoftwareCo-Verification Co-verificationachieves10xperformanceincrease,oratwoorderofmagnitude performancedecreaseoftheISSspeeddependinghowyouviewit.Hardwaredensity ishighestforlower-leveltypesofsoftware,suchasdiagnosticsoftware,andwilldecreaseforhigher-leveltypesofsoftwaresuchasapplications.Thistypeofcalculation isnearlyidenticaltothoserequiredtodeterminehowmuchperformanceimprovementcanbeobtainedfromsimulationaccelerationwheresomeofthetestbenchruns onaworkstationandmostofthedesignismappedintotheaccelerationsystem. Manyco-verificationsolutionspromotethebenefitofincreasedsimulationperformance.Thisincreasedperformanceusuallycomeswithaprice,typicallysimulation detail.Whenusingmemoryoptimizations,thereisaperformanceversusdetail trade-off.Ideally,thistrade-offisconfigurablebytheuserdependingonthegoalsof aparticularsimulation.Sometimesengineersareinterestedinperformancefigures forthetargetsystem.Thingslikecachehitrates,datasizes,andinterruptlatencyare important.Iftheseissuescanbeuncoveredinasimulationearlyintheproject,the riskofthesystemdesignnotmeetingrequirementsisminimized.Othertimesthese detailsarenotnecessaryandperformanceforsoftwaredebuggingismostimportant. Mostco-verificationtoolsallowmemoryoptimizationstobechangeddynamically duringsimulationusingeitheraGUI,commandline,orevenprogrammatically (callingCfunctions)inthecaseofhost-codeexecution. Thebenefitofmemoryoptimizationiseasytounderstand.Thereisnoreasonfor softwareengineerstoexecuteinstructionfetchesagainandagainwhenthehardware designhasbeenproventofetchinstructionsfrommemorysuccessfully.Theeliminationofsimulatingfetchescaneasilyresultina10xspeedupanditkeepssoftware engineersfromwatchinguninterestingactivity. Traditionally,processorsusedforembeddedsystemsuseonlyasinglebusforbothinstructionanddataaccessestoandfrommemory.AswesawinChapter3,allARM7 processorshaveasinglememorybus.Coreswithmultiplebusseshavebeenintroducedtomeetincreasingsystemperformancerequirements.ThefirstpopularARM corewithmultiplebuseswastheARM926EJ-S.Itcontainsseparatebusesforinstructionanddataaccesses.Italsocontainstwomorebussesfortightlycoupledmemory (TCM).Oneofthefuturequestionsforco-verificationistheeffectivenessofmemory optimizationsasthenumberofbussesonaCPUcontinuestoincrease.Memory
173
Chapter5 optimizationtakesadvantageofthefactthatmemoryaccessesoccurseriallyonabus andsomecanbeskipped.Iftwobussesnowoperateinparallel,thechancesofboth bussesperforminganaccesstooptimizedmemoryismuchless,especiallywhenmany ofthedataaccessesareinterestingtosimulate. Thesecondwaytoachievefasterperformanceisbyspeedingupthehardwareexecutionengine.Thepreviousexampledemonstrateda10xspeedupbyusingmemory optimizations,butthisaloneisnothighenoughperformancetorunlongtests withhighsoftwarecontent.Usingsimulationaccelerationandin-circuitemulationisawaytoimprovethespeedofthehardwareexecutionengineandincrease performance.Co-verificationmethodsusingemulationandmodelssuchasRTLor testchipsmayachieve200kto400kcycles/sec.Thisisover1000xoverlogicsimulation.Ofcourse,thisprovidesa100%cycleaccurateenvironment,butmayhave somesoftwaredebugginglimitations. SimulationaccelerationbasedonanISSandanaccelerationsystemasthehardware executionenginecanadditionallyprovidehigh-performanceco-verification.Let’s lookatthesameexampleassumingsimulationaccelerationrunsat10kipsversusthe logicsimulatorof50ips.Let’susethesametechniqueformemoryoptimization. Again,theprogramlengthis100,000instructions.Thetotalexecutiontimeis: 90,000instructions*(1inst/100kips)+10,000instructions* (1inst/10kips)=1.9seconds Toruntheprogramexclusivelywithsimulationaccelerationrunningat 10kipswouldtake100,000instructions*(1inst/10kips)=10seconds RecalltheISSalonetook1secondtoruntheprogramat100kips.Ifsimulation accelerationusinganISScanprovidebeneficialdebuggingfeaturesbeyondwhat in-circuitemulationorprototypingcanprovideitcanbeavaluabletechniquefor co-verification. Wehaveseenthatdirectaccesstosimulationmemory,whetheritbeinalogic simulatororsimulationacceleration/emulationplatformisusefulforbothinteractive debuggingandincreasingperformance.Whentakingadvantageofsuchtechniques thesynchronizationbetweenmultipleprocessesmustalsobeconsidered.
174
AdvancedHardware/SoftwareCo-Verification
ModesofSynchronization Wheneverco-verificationusesmultipleprocesses(orthreads)suchaswithanISS andBFMinalogicsimulator,orwithhost-codeexecution,theremustbesynchronizationbetweentheprocesses.TheeasiestcasetoconsiderisanISSthatrunsin conjunctionwithalogicsimulator,asdiscussedinChapter4,Figure4-12.IfallmemoryaccessesaresimulatedinthelogicsimulatoritiseasytoimaginetheISSrunsand modelstheinternalworkingsoftheCPUandthenpassesinformationovertothe logicsimulatorforabustransactionandthenblockssothesimulatorcanrun.The logicsimulatorwillthenactivateandrunthebustransactiontocompletion.Once theresultisavailableitwillbestoredsomewherefortheISS,thelogicsimulator willblock,andtheISSwillbeactivatedagain.Assumingsomemethodtoexchange dataandawaytoactivatetheotherprocessandblockitselfthistogglingbackand forthbetweenISSandlogicsimulatorwillproduceacompletesimulationofboth hardwareandsoftware.InterruptsfromthehardwarecanbereportedbacktotheISS alongwiththebustransactionresults.ThereasonthissynchronizationisstraightforwardisbecauseweknowtheISSwillalwaysbesendingsomethingfortheBFMto do,evenifitjustidlecycles(AHBHTRANS[1:0]=0).Thisensuresallinterrupts willbecommunicatedbacktotheISSwithinafewclocksofwhentheyoccur. Ofcourse,wehaveseenthatrunningeverymemoryaccessinalogicsimulatoris slowerthanskippingsimulation.Whenco-verificationtechniquesthatmakeuseof memoryoptimizationorhost-codeoperationareused,thesynchronizationgetsmore complex.Insituationswherethereisnoguaranteethetwoprocesseswillconstantly exchangeinformation,thereisapossibilityforproblems.Itcanoccurthatthesoftwareiswaitingforaninterruptfromhardwarebutbecauseitisnotmakingany requeststothelogicsimulatorthereisnowayforittogetanynewinterruptmessages fromthesimulator.Thistypeofdeadlockisbestshownbyexample.Theexamplein Figure5-5showssomesoftwarethatsetsatimerandthenwaitsuntilthetimerhas expired.Whenaninterruptcomesfromthetimer,thesoftwareresumes.
175
Chapter5 ;; assembly function to wait for interrupt from timer EnterPause ROUT LDR R0, =Halt LDR R1, =Pause MOV R2, #4 STR R2, [R1] STR R2, [R0] 100 B %100 ; wait here forever until interrupt MOV PC, R14 ; function return EXPORT EnterPause END /* C funtion to set timer and wait */ void delay (int delay) { *Load = delay; /* set up period */ *Control = (Enable + Periodic + Prescale0); /* enable */ EnterPauseMode(); /* wait for interrupt */ *Control = Disable; /* disable timer */ }
Figure5-5:Exampleofsynchronizationproblem Thisexamplemayhangifthelogicsimulatordoesnothaveachancetorunlong enoughforthetimertoexpireandsendtheinterrupttosoftware.Thiscanoccurif techniquestoskipsimulationareusedtotrytoincreaseperformance. Co-verificationtoolstypicallyallowdifferentmodesofsynchronization.Withoutgoingintotoomuchdetailaboutthem,thefollowingmodesarecommonlyused: ■
LockStepmeansthelogicsimulatorwillblockuntilarequestisready.After therequestiscompleted,itwillblockagainuntilthenextrequest.
■
FreeRunningmeansthelogicsimulatorisalwaysrunning.Itwillservicerequestsfromsoftwareastheycomein.Testresultsmaynotberepeatableusing thistypeofsynchronization.
Asimplewaytothinkaboutthemodesisbylookingatthesimulationwaveform foreach.Forlockstep,thetransactionswillbeclosetogetherwithnogapbetween them.Freerunningmodeproducesawaveformwithgapsofidletimeonthebus sincethesimulatorwasrunningandnotransactionswerecomingfromsoftware.
176
AdvancedHardware/SoftwareCo-Verification Fordesignswithmultiplebusseswhereitcanbeguaranteedthateachsoftware programwillcontinuallysendbusrequests,amultibuslockstepmodecanbeused.If not,amultibusfreerunningmodeisbettertoavoidsynchronizationproblems. TheremayalsobeCAPIfunctionsthatcanbeusedforhost-codemodetoadvance simulationtime.Inthecontextofhost-codemode,thiscanbeusedtoprovidesynchronizationwhichissomethingbetweenlockstepandfreerunning.Thesimulator canblock,butevenifnomemoryaccessismadeaCfunctioncanbecalledperiodicallytoadvancethesimulation.Theonlyparametertothefunctionisthenumber ofbusclockstoadvance.Aneasywaytoautomatethesynchronizationistocreatea separatethreadthattakescareofthesimulationadvancement. Ofcourse,co-verificationtoolshideasmuchdetailaboutsynchronizationaspossible, butunderstandinghowsynchronizationworksallowsuserstobetterevaluatethe benefitsofmemoryoptimizationandhost-codeoperation.
InterprocessCommunication Interprocesscommunication(IPC)isusedtoexchangedatabetweentwoormore processesonthesameordifferentcomputers.TherearevariousformsofIPCsuch assockets,memorymappedfiles,sharedmemory,pipes,andmessagequeues.IncoverificationtherearetwomainplacesIPCisused.DebuggersoftenconnecttoCPU modelsviaIPC.ExamplesareshowninFigure5-6.
Software Debugger Process 1
Interprocess Communication ARM RDI or gdb remote
Instruction Set Simulator
Process 2
Figure5-6:IPCfromasoftwaredebuggertoamodel
177
Chapter5
SystemC Model of part of Hardware Design
Inter-Process Communication Signal Values
Thread 2
HDL Shell
Logic Simulation with Hardware Design
Thread 1
Figure5-7:ExamplesofIPCinco-simulation ThesecondplaceIPCisusedisbetweendifferentexecutionengines(simulation kernels)inco-simulation.ThemostcommonexampleisbetweenanISSandalogic simulator.AnotherexampleisbetweenaSystemCmodelandalogicsimulatoras showninFigure5-7.OtherplacesforIPCinco-verificationincludetheconnection betweenasoftwaredebuggerandanRTLmodel,betweenahardwaremodeleranda logicsimulator,andbetweenanevaluationboardandalogicsimulator. ThereisnoneedtogivedetailsabouthowvariousIPCmechanismsworkasthereare plentyofresourcesforengineerslookingtolearnaboutIPCandimplementsolutions.FollowingisasimplereviewofthedifferencesbetweenthreeIPCtechniques commonlyusedinco-verification: Socketsareusedwhencommunicationisneededbetweendifferentcomputers. SocketsworkusingIPaddressesandcanconnectmultiplecomputersofdifferent types.Thebasicprocesstocreateasocketandstarttoreceivedatafromanother programistousethreesystemcalls:socket(),bind(),andlisten().Theprogram thatcreatesthesocketandlistensforanotherprogramtosendsomethingisthe server.Theprogramthatwantstocommunicatewiththeserveristheclientand usessocket()andconnect().Thebeautyofsocketsisthattheycanconnectany machinestogether,evenoverlongdistancesviatheInternet.Themostcommon benefitsocketsprovideinco-verificationistheabilitytoallowsoftwareengineerstorunononetypeofplatformandhardwareengineersonanother.When
178
AdvancedHardware/SoftwareCo-Verification V-CPUwasoriginallydeveloped,thehardwareengineersranlogicsimulationon HPworkstationsandthesoftwareengineersallcodedandtestedsoftwareonSun workstations.Thesocketallowedcommunicationbetweenthenativecompiled softwareprogramonSunandtheVerilogsimulationonHP.Itisalsocommon forsoftwareengineerstoworkonWindowsplatformsandwanttorundebuggers onWindowsthatwillcommunicatewithahardwareexecutionengineonUNIX orLinux.Thedrawbackofsocketsistheperformance.Inapplicationswherethe timerequiredtosendandreceivedatabetweenprocessesiscritical,socketsare notthehighestperformanceIPCmethod. SharedmemoryisanotherIPCtechniqueusedinco-verificationtocommunicate betweenmultipleprocessesrunningonthesamemachine.Sharedmemoryisa specialblockofmemorythatiscreatedbytheoperatingsystemandthatallows differentprocessestomapitintotheiraddressspace.Oncemultipleprocesses mapit,theyuseordinarypointerstoaccessthememoryjustlikememorycreated withmalloc().Settingupsharedmemoryisdoneusingshmget()andshmat() byeachprocessthatwantstouseit.Sharedmemoryisfasterthansocketswhen twoprocessesonthesamemachineneedtocommunicate.Itisalsoeasiertouse, sinceitbehavesjustlikeregularmemory.Processescanunmapordetachfrom sharedmemoryusingshmdt().Caremustbetakentoremovesharedmemory segmentswithshmctl()whentheprogramsusingitterminate.Ifprocessesterminateabnormally,thesharedmemorymustbemanuallyremovedusingtheipcrm command. ThreadsarethethirdIPCmethodusedinco-verification.Threadsarenotseparate processes,butarepartofasingleprocessandcanexecuteinstructionsusingtheir ownprogramcounterandstack.Sincethreadsareallpartofthesameprocess, theycaneasilysharedatawithintheprocess.Withthreadsthereisnoneedfor sharedmemorysincethedatasegmentisvisiblefromeachthread.Threadsare themostefficientIPCforco-simulationapplicationssuchasanISSorCmodel co-simulatingwithalogicsimulator.Tousethreadsrequiressomecontrolover thesourcecodeofthemodelsorapplications.Forexample,iftherearetwo independentapplications,eachwithamain()function,itisnotpossibletomake themruninoneprocesswithtwothreadswithoutmodifyingthesourcecode
179
Chapter5 aroundthemain()function.Evenifthestartupcanbemodifiedtoruninasingle processwithmultiplethreads,itmaybebettertoleavetheprocessesseparatefor otherreasons.Multithreadedprogramssharethesamecontrollingterminaland I/Ostreams.Ifeachthreadneedsdifferentkeyboardinput,multithreadingmay notbethesolution.Inco-verificationandotherco-simulationapplications, standardthreadlibrariessuchasPosixpthreadsorSolaristhreadsarenotused becausetheyareimplementedintheoperatingsystemkernel.Systemcallsonly slowdowntheperformanceofthethreadlibrary.MoreoftenQuickThreads,a toolkitforbuildingthreadspackagesdevelopedattheUniversityofWashington, isusedtoimplementuser-levelthreading.AnexampleistheuseofQuickThreadsinOSCISystemC.
MixingHDLandCModels TherearetimeswhenmixingCmodelswithlogicsimulationisuseful,especially whenRTLforsomeblocksisnotavailable.Mostco-verificationtoolsallowamixtureofCandRTLtorepresentthehardware.Therearedifferentapproachesto mixingCandHDLsimulation.SomeapproachesaremoreCfocusedandconsistofa CenvironmentthatincludestheCPUmodelandCmodelsworkingtogetheroutside ofthelogicsimulator.Thepurposeistousethelogicsimulatoraslittleaspossible, sinceithasslowerperformancethantheCmodels.Inthiscase,theCsimulationis themasterandthelogicsimulatoractsmorelikeaslavethatisactivatedonlywhen necessarybythemaster.Apininterfaceortransactioninterfaceisnormallyusedto activatethelogicsimulator.Thistypeofinterfaceisusedwhendesignteamsstart fromaCenvironmentanditerativelymovetoHDLforeachblock.Figure5-8shows thisapproachwithapininterfaceandFigure5-9showsitwithatransactioninterface.Thebenefitofthetransactioninterfaceisperformanceinhardwareexecution enginessuchassimulationaccelerationandemulation.Theseenginesrunfasterif communicationisminimized.Toincreaseperformance,thebusis“mirrored”inthe HDLsimulatorandtransactionsarecapturedfromtheCsimulationandruninthe logicsimulatorusingabusfunctionalmodel.Totargetaccelerationapplications,the busfunctionalmodelmustbesynthesizable.
180
AdvancedHardware/SoftwareCo-Verification Model in C or SystemC
ARM CPU
HDL simulator
P1
P2
RTL Version of P2
Signal Values
DSP
P3
Master Process
Slave Process
Figure5-8:CenvironmentwithHDLco-simulationatpinlevel
Model in C or SystemC
ARM CPU
HDL simulator
P1 C-API for BFM
BFM
Bus Transactions
RTL Version of P2
DSP P3
Master Process
Slave Process
Figure5-9:EnvironmentwithHDLco-simulationattransactionlevel
181
Chapter5 Thesecondapproachistoco-simulateCmoretightlycoupledtothelogicsimulator.SimulatorshavealwaysprovidedwaystoincorporateCmodelsdirectlyinthe simulationprocessviaCinterfacessuchasVerilogPLI.Simulatorsnowprovideways toco-simulateSystemCdirectlyinthesimulation.SystemChasbecomejustanother languagethatiscompiledandsimulatedautomaticallywithinthesimulationprocess.WiththiscapabilitythesimulationcanmixVerilog,VHDL,andSystemCat anylevelofhierarchy.ThisapproachmaybeabitslowerthanusingCoutsideofthe logicsimulationprocess,butaswediscussedpreviouslythesynchronizationissuesare moreeasilyunderstoodinsidethecontextofthelogicsimulatorversustheschedulingandcallbacksrequiredwhenrunningCoutsideofthesimulator.Thisapproach toco-simulationiswellsuitedforusingCandSystemCformodelingspecificblocks ofthedesignaswellasfortestbenchpurposes.Figure5-10showstheclassicalco-verificationsetupfromFigure4-12withtheuseofaSystemCmodelbeingco-simulated withthelogicsimulator.ThisdesignrequiresthatallactivitybetweentheCPUand theSystemCmodelgothroughthelogicsimulator.Performanceisnotasgoodasthe previousapproaches,butwithSystemCco-simulationbuiltintologicsimulatorsand automaticsynchronization,itisveryeasytosetupanduse. Software Debugger Inter-Process Communication
Instruction Set Simulator
C API
BFM Read, Write, and Interrupt Messages
Process 1
Logic Simulation with Hardware Design SystemC Model
Process 2
Thread of Process 2
Figure5-10:Co-verificationusingco-simulationofHDLandSystemC
182
AdvancedHardware/SoftwareCo-Verification
ImplicitAccess Oneofthebenefitsofusingco-verificationwithhost-codemodeistheperformance ofthesoftwareprogram.Anativecompiledprogramrunningontoday’sworkstations isafewordersofmagnitudefasterthanonanISS.Thisgapmaycontinuetogrowas embeddedprocessorsgetmorecomplexandtheISSgetsmorecomplex.Themain complaintofsoftwareengineersabouthost-codeoperationistheneedtomodifythe softwareforworkstationcompilationversuscross-compilationforthetargetprocessor.Afterall,ifthesoftwareisnottheexactsoftwareengineerscannotbesurethey verifiedtherightcode.Intheworstcase,everylineofsoftwarehastobechanged andthenthesoftwareis100%somethingelse.Earlyco-verificationtoolsadvocated replacingallmemoryaccessestotheembeddedhardwarewithCfunctioncallsto activatelogicsimulation.Tomakethistransitioneasier,theyadvocatedthatall memoryaccessesfromsoftwareshouldcallasetofafewcommonCfunctionsand thenthechangesneededforco-verificationcouldbeconfinedtojustafewfunctions.Unfortunately,softwareisnotthateasy;inCmemoryaccessesareeverywhere becauseoftheeasyuseofpointers.Figure5-11showamemoryaccessusingpointers andhowitshouldbereplacedwithaCcallforco-verification. #include "cover-api.h"
/* include file with C API definitions */
void mem_fill(unsigned long addr, unsigned char pattern, unsigned long length) { unsigned int i; printf("Setting memory buffer to %x\n\n",pattern); #ifndef COVERIFICATION (void) memset((char *) addr, pattern, (size_t) length);*/ #else for (i = 0; i < length; i++) { coverWrite(addr, pattern, 1); /* explicity write 1 byte */ addr++; } #endif }
Figure5-11:Host-codeexampleofreplacingmemoryaccesseswithCcalls
183
Chapter5 Formanyprojects,codemodificationslikethisarenotpractical,andabettersolutionisneeded.Thebestsolutiontothisproblemistocapturethememoryaccesses automaticallyandsimulatetheminthelogicsimulatorwithoutrequiringtheuser tochangethesoftware.Thisautomaticcaptureofmemoryaccessesiscalledimplicit access. Implicitaccessispossiblebecauseembeddedsoftwareoftenaccessesmemorydirectly byuseofpointersinC.Whenwritingsoftwareforaworkstationthatisrunning anoperatingsystemlikeWindows,UNIX,orLinuxaprogramcannotdirectlyaccessspecificmemory.Doingsoisdangerous,andpoorlywrittenprogramswillcause theoperatingsystemtocrash.Toavoidcrashes,operatingsystemsprovidememory protectiontostopprogramsfromcashingtheOS.Programsmustusefunctionslike malloc()toallocatememoryandusethereturnedpointertoaccessthememory.Figure5-12showsthedifferencebetweenaworkstationprogramaccessingmemoryand anembeddedprogramaccessingmemorydirectly. /* On a workstation malloc must be used to create memory */ unsigned long *ptr; size_t buf_size; unsigned long data = 0xabcd1234; ptr = (unsigned long *) malloc(buf_size); *ptr = data; /* Write to allocated mem, no idea what the address is */ /* In embedded diagnostics C can directly access hardware */ unsigned long *ptr = 0xffff0000; unsigned long data = 0xabcd1234; *ptr = data;
/* Write address 0xffff0000 directly */
Figure5-12:Memoryaccessonworkstationvs.embedded
184
AdvancedHardware/SoftwareCo-Verification ThekeytoimplicitaccessiswhathappensiftheembeddedsoftwareinFigure5-12is compiledandrunonaworkstation.Evenwithoutactuallytryingityouknowitwill notwork,becausethepointerissettoaccessanaddressthatisspecifictotheembeddedsystem.Aworkstationprogramcannotdirectlyaccessthismemorybecause itwillbepreventedbytheoperatingsystem’smemoryprotection.Implicitaccess forco-verificationintervenesandcapturestheembeddedstylememoryaccessand sendstheaddress,data,andcontrolinformationtologicsimulationtocarryoutthe request.Oncethelogicsimulatorcarriesouttherequest,implicitaccesswillreceive theresultandputitintotheproperregisteroftheworkstationCPU.Thesoftware programdoesnotevenknowwhathappenedorthatsimulationwasusedtoexecute thislineofCcode. Implicitaccessallowslargeamountsofexistingcodetobeeasilyportedtoaworkstationplatformforco-verificationwithlimitedchanges.Ataminimum,asoftware librarymustbelinkedwiththeuser’scodetoenableimplicitaccesstowork,and someinitializationmustbedoneatstartuptoconfigureimplicitaccess,butoverall thisisverynonintrusivefortheuser. Oneofthemostinterestingthingsaboutimplicitaccessisitsimplementation.Those interestedincomputerarchitectureandinstructionsetswillfinditveryinteresting. WorkstationswithRISCprocessorssuchasSunSolarismachineshaveonlyafew loadandstoreinstructionsthataccessmemory.Memoryaccessesarealwaysbetween registersandmemory.Incontrast,IntelprocessorsrunningeitherWindowsorLinux havealargenumberofinstructionsthataccessmemorywithmorecomplexaddressingmodes. Therearetwothingstobeawareofwhenusingimplicitaccess.First,depending onthememoryareastobeaccessedbytheembeddedsoftware,theremaybesome conflictswiththememoryspacethatisaccessiblebythenativecompiledprogram. Considera32-bitaddressspaceofworkstationprogram.Thecodeforthisprogram mustbelocatedinsomeaddressesofthevirtualaddressspacesothiswillprevent somerangeofaddressesfrombeingaccessedasdata.Thisissuecanbesolvedbycompilerandlinkerswitchestomovetheprogramaddressestodifferentlocationsthat arecompatiblewiththeembeddedsystemmemorymap.
185
Chapter5 Thesecondthingtobeawareofarethedifferencesintheinstructionsetofthe workstationversustheembeddedprocessor.Thismaycausesomedifferencesinthe memorytransactionsthatareruninsimulation.Forexample,iftheembeddedCPU isARM,whichallows1,2,and4bytealignedmemoryaccesses,andtheworkstationCPUisSunSolariswhichallows1,2,4,and8bytealignedmemoryaccesses itispossibleforthenativecompilertogeneratea64-bitreadorwritethatwillnot occuronthetarget.SimilardifferencesmaybefoundintheimplementationofC libraryfunctionssuchasthosedealingwithstrings.Aworkstationcalltostrcpy() maybeimplementedasaseriesof1bytememoryaccessesandthecorrespondingC libraryfunctionfortheembeddedprocessormayusesomenumberof4bytememory accessesfollowedbyonetothreesinglebyteaccesses.Biggerdifferencesexistwhen thehostworkstationisanarchitecturethatallowsunalignedmemoryaccessessuch asPowerPC.Whenmixinganunalignedhostmachinewithanalignedtarget,more precautionsmustbetakentoensurethebustransactionspassedtotheBFMarelegalin thetargetsystem.Allinall,thesedifferencesareminoranddon’tnormallycauseany problemfortheembeddedsoftwareengineerusinghost-codemodeforco-verification.
SaveandRestart Oneofthemostcommonlyaskedaboutco-verificationfeaturesistheabilitytosave thestateofasimulationandrestartitatalatertimefromthesavedpoint.This functioniscommoninlogicsimulators,soitiseasytoimagineitextendedtoco-verification.Whetherornotsaveandrestartcanbedonedependsonthearchitectureof theco-verificationsolution. TechniquesusinganISStypicallycannotbesavedandrestarted,becausetheISS doesnotsupportsuchafeature.Technically,itshouldbepossibletoimplement,butI haveseenitimplementedonlyonce.TosaveandrestarttheISSrequirestheinternal stateofthemodelincludingregisters,cache,pipeline,andsoforth,tobesavedto diskforlaterreloading.Techniquesusinghost-codemodecanbesavedandrestarted sincethesoftwareisaCprogramthattheuserhascontrolofandtheco-verification toolcanprovideAPIfunctionstosaveandrestartconnectionsbetweenthehostcodeprogramandthelogicsimulator.
186
AdvancedHardware/SoftwareCo-Verification Techniquesusingactualphysicalhardware,suchasin-circuitemulation,cannotbe savedsincethereisnowaytotellachiptosavestatetoafile.Somewishfulthinking relatedtousingJTAGtoscanoutvaluesmaysoundinterestingbutisnotpossible. Saveandrestartispossibleonachip(orevenISS)withhelpfromsoftware.Laptops haveafeaturecalledhibernate,whichwritestheentirecontentsofmemorytodisk andthenshutsdown.Whenturnedbackonitknowstoreloadthememoryandstart fromwhereitleftoff.ThisrequiressoftwaretoberunontheCPUtotakecareof savingmemoryandusingwhateverpowermanagementfeaturesareintheCPUto shutdownandrestart.Forco-verificationitdoesn’tmakesensetoaddthistypeof softwareonlyforsimulationpurposes. Co-verificationtechniquesbasedonRTLmodelsisthebestwaytodothesave andrestartfeature.SincetheCPUmodelrunsinthelogicsimulator,thefeatureis automaticallyavailable.Thekeyistobeabletoreconnectthedebuggerafterthe stateisrestored.Thegoodnewsisthatadebuggerdoesnotneedtostoreanystate andcanbereconnectedatanytime.Debuggerreconnectioniseasytodemonstrate onaworkstation.Thedebuggercanbeconnectedatanytimetoarunningprogram. Figure5-13showsconnectinggdbtoalogicsimulationusingthe“attach”command. Thedebuggercanalsobedisconnectedusingthe“detach”command. Thesameistrueforembeddeddebuggers.IftheconnectionisgdbtoanRTLmodel, itcanbeconnectedatanytimeusingthe“targetremote”command.IftheconnectionisJTAGviain-circuitemulationthenitdependsmoreontheJTAGtooland debuggerused.SomeJTAGdebuggerswilltrytoidentifytheCPUandimmediately issuearesetasawaytotakecontroloftheCPU.ThiswillrestarttheCPUfromthe resetvectorandtrashthecurrentstate.OtherJTAGtoolswillnotissueanyreset, justreadtheregistersandfigureoutthesoftwarecontext.Connectingwithoutrequiringanyresetallowssaveandrestarttobeperformed.
187
Chapter5 sp8:7 % ps -ef | grep vlg jason 16606 16175 3 08:12:39 pts/3 0:01 vlg sp8:8 % gdb vlg GNU gdb 5.3 Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.7"... (gdb) attach 16606 Attaching to program `/tools/example/ARM926EJS/vlg', process 16606 0xff1969d8 in _semsys () from /usr/lib/libc.so.1 (gdb) where #0 0xff1969d8 in _semsys () from /usr/lib/libc.so.1 #1 0xff3647f4 in xSemLock (xShmemHandle=0xff2e0000) at xPli.c:353 #2 0xff364ac4 in xSuspend (xShmemHandle=0xff2e0000) at xPli.c:465 #11 0x00078ae8 in main () (gdb) det Detaching from program: /tools/example/ARM926EJS/vlg LWP 1 (gdb) q
Figure5-13:Attachinggdbtoarunningprogram AnexampleofwhensaveandrestartisusefulisforRTOSbootthatisthesameeverytime.Awirelessprojectmanageronceestimatedthattheoperatingsystemfora mobilephonewouldtakeabout18minutestobootwithanemulationsystem.Allof thesoftwareengineersneededtotestsoftwarethatstartsrunningafterthe18minutestartup.Savingstartuptimebytheuseofsaveandrestartwouldallowsoftware engineerstorunmanymoretimesperdayastheymakesoftwarechanges.Thereis nothingworseforsoftwaredevelopmentthanhavingtowaitahalfhourforeachnew testwhensoftwareengineersareaccustomedtotakingonlyminutesorsecondsto runsuchtests.
Post-ProcessingSoftwareDebuggingTechniques Wehavetalkedaboutthedifferentexpectationsofsoftwareandhardwareengineers intermsofperformance.Co-verificationtoolsoftenpromotethefactthathardware engineerscontinuetoworkinthefamiliarsimulationenvironment,andsoftware
188
AdvancedHardware/SoftwareCo-Verification engineerscontinuetoworkwithfamiliardebuggingtoolsanddoinginteractive debuggingjustliketheynormallydo.Analternativetopromoting“nochange”and “familiarenvironment”istopromotetheuseofpost-processingdebuggingtechniquesforsoftwareengineers.Hardwareengineershavealwaysoperatedwiththe mentalitythatsimulationisslowsothebestwaytocopeistorunbatchjobsandsave logfilesandwaveforms.Iftherearetestswithfailures,thesecanbeanalyzedwhile thenexttestisrunning.Thisisthemostefficientwaytokeepthesimulators(or emulators)busyanddothemanualdebugginginparallel.Contrastthistosoftware engineersthathaveahistoryofinteractivedebugging.Thewaytodebugsoftwareis toobservethefailure,makeeducatedguessesaboutwheretoputbreakpoints,and iterativelyrerunthetest,movingthebreakpointsaroundandinspectingthesystem untiltheproblemisfound.Acombinationofbreakpointsandprintf()statements willusuallyfindtheproblem. Sinceco-verificationoftendependsonhardwareexecutionenginesthatarenotas fastasrealhardware,thisgreatlyslowsdowntherun-debug-run-debugloop.Postprocessinganalysisofsoftwareisanareathatholdspromiseforimproveddebugging techniques.Today,engineersusecrudemethodsforpost-processingdebugwithlogic simulation.Mostsimulationenvironmentsuseabusmonitortoprintaddressanddata valuesastheyoccuronthebuswiththesimulationtimestamp.ManyCPUmodelsalso allowthesoftwareengineertoviewregistersinawaveformtool.Withacombination ofthebusmonitorlog,registersandsoftwarelistings,itispossibletofigureoutmanuallywherethesoftwareisexecutingandtraceexecution.Unfortunately,thisisavery tediousprocess.Figure5-14isamessageIreceivedfromasoftwareengineertryingto tracehissoftwareusingregistersandlogfilesfromsimulation. I seem to be getting aborts or other fatal exits near entry to main() in dhry_1.c, and my current methodology of tracking PC through my map/symbol table isn't yielding perfect results.
Figure5-14:Asoftwareengineertryingtodebug
189
Chapter5 Somemodelsandbusmonitorshaveadisassemblyfeaturesotheycanprintoutnot onlythebusaddressanddataactivity,butalsotheassemblylanguageinstructionthe datarepresents.Whilethismakescorrelationbacktosoftwaresourcecodealittle easierbecauseasoftwareengineercanmatchtheassemblyinstructionseasierthan justdatavalues,itstilldoesnotprovideanautomatedlinkbacktosourcecode.For softwaredebugging,agoodautomatedlinkbacktotheoriginalCorassemblycodeis valuabletofindandfixproblems. Anotherhardware-centricutilitythatcanbeusefulisaprogramthatcanread waveformfilesinVCDorNovasFSDBformatandprintoutalogofbustransactionsand/orregistervaluesfromit.Formorecomplexbusseswithaddresspipelining suchasAHB,weknowthatitisnotthateasyforasoftwareengineerorverification engineertolookatthewaveformandinstantlyseetheaddressanddatavaluesfor thebustransactionsthatareoccurringonthebus.Infact,misreadingwaveformscan leadtowasteddebuggingtimewhenincorrectassumptionsaremade.Autilitythat canreliablytakeawaveformandgenerateasimplelogfileofbusactivityisusefulto tracksoftwareproblems.Thisutilityishandywhentestsarenotrunwitharegular busmonitor.Anexampleapplicationisforsimulationaccelerationandemulation. Whenusinganemulationsystem,performanceisthemostcriticalfactor.Adding abusmonitorthatuses$displaystatementstoalogfileeveryfewclockswillslow downoverallperformance.Mostemulationsystemshavewaystoextractwaveforms forthedesignafterthetestiscompletewithoutre-runningthetest.Thesewaveforms canbereadbyautilitytoconvertthemintoalogfilethatcandrivepost-processing softwaredebuggingtools. ThenextlogicalevolutionfordebuggingsoftwarerunningonmodelsliketheDSM orRTLcodeforanARMcoreistoextendthehardwaredebuggingtoolstodisplay softwarecontextalso.HardwaredebuggingtoolsareverygoodatdisplayingVerilog andVHDLsourcecodeandthevaluesofsignalsduringsimulation.Theyarealso goodatcapturingthemainelementsofanembeddedsystemrequiredforsoftware debugging,CPUregistersandmemorycontents.Embeddedsoftwarecompilation toolscanalsobeusedtofindoutthelineofsoftwarethatwasexecutingforaparticularprogramcountervalueautomaticallybyreadingtheELFfileforthesoftware.This automationgivessoftwareandhardwareengineersacorrelatedviewofwhichline ofsoftwarewasexecutingduringaspecificsimulationtimestamp.Someprojectsend 190
AdvancedHardware/SoftwareCo-Verification upcustomizinghardwaredebuggingtoolsandextendingthemtoprovideasoftware viewofthedesign. Forco-verificationmethodsbasedoninstructionsetsimulation,aninterestingpostprocessingdebuggingtechniqueisavailablethatcapturesbustransactionsduringa testandreplaysthemusingonlytheISS.Recallthesoftwareengineer’sviewofthe world.TheonlythingthatmattersishowtheARMCPUexecutestheinstructions. Considerascenariowherealongdiagnostictesthasbeendeveloped.Onanormal logicsimulator,thetestmayrunforhours.Evenwithsimulationacceleration,this testmayrunforalongertimethanasoftwareengineercanstandtowaitforit.If thetestfails,thediagnosticdeveloperwillnotbetookeenonrestartingthetest, guessingwheretoputabreakpointinthecodeandrestartingthetestandtryingto interactivelydebugthetest.Thisprocessofrestartingthetestandmovingthebreakpointswillbequitetedious. Asawaytoaddressthisproblem,theengineercanrunasinglesimulationandsave acompressedfilethatcontainsthebustransactionsattheprocessorinterface.The memorytransactions,includingaddress,data,andsimulationtimestampalongwith interruptinformation,are“recorded”intothisfile.Afterthesimulationiscomplete, softwareengineerscanstarttheinstructionsetmodelandsoftwaredebuggerand “re-run”thesoftwareexecutionsequence.However,thistimeinsteadofinteracting withhardwaresimulation,theresultsarereadfromtherecordedfile.This“playback” ofthebusinterfacereplicatestheexactsequenceofsoftwareexecutionasshownin Figure5-15. Software Debugger
Instruction Set Simulator
C API
Read, Write, and Interrupt Messages transaction database
Figure5-15:SoftwareplaybackusingISSandrecordedbustransactions 191
Chapter5 BecausethesimulationnowrunsatthespeedofthestandaloneISS,softwareengineerscanre-runthesoftwareasmanytimesasneededtofindtheproblem.The simulationtimestampisalsoprovidedatanytimetohelpcorrelatethesoftwareand hardwareexecution.Thisrecordandplaybackmethodologyisagoodwaytodebug longsimulationteststhatmakeinteractivedebuggingunproductive. Forengineersthatunderstandbothhardwareandsoftware,orwhensoftwareand hardwareengineerswanttositdowntogetheranddebug,theuseoftheISSplaybackcanbeenhancedtoprovideaviewofthehardwaredesignaswell.Sincethe simulationtimestampsweresavedwiththestimulustotheISSthetimestampcan beusedtoautomatethehardwareview.Forhardwareengineers,theywouldliketo usewaveformstoseewhatishappeninginthedesign.Hardwaredebuggingtools haveaCAPIthatcanbeusedtoperformmostanycommandthatcanbedonewith themouse.ByusingtheCAPItocontrolthecursorandcenteritonthescreen, acorrelatedviewofbothhardwareandsoftwaresynchronizedbythetimestampis available.Thisisthemostpowerfulviewofwhatishappeninganditcanbedone withoutinteractivedebugging.Figure5-16showshowthedebuggerandwaveform aresynchronizedbythesimulationtimestamp.Asthesoftwaresteps,thecursoron thewaveformisupdatedtothenewsimulationtimeautomatically. Changingsoftwaredebuggingfrominteractivetopost-processingmaynotbethe easiestchange,butitisaco-verificationtrendthatholdspromiseasdesignsgetlarger andinteractivedebuggingbecomesmoredifficult.
192
AdvancedHardware/SoftwareCo-Verification
Correlates Software Source Code to Simulation Timestamp
transaction database
Figure5-16:Correlatedviewofhardwareandsoftware
EmbeddedSoftwareToolIssues Oneoftheembeddedsystemissuesstillunsolvedistheincompatibilitybetween tools.Writingworkstationsoftwareisnodifferent.Anybodythathaswrittensoftwareandcompileditwithgccknowsthatgdbisthebestwaytodebugit.Similarly, aSundebuggerisbesttodebugaprogramcompiledwiththeSuncompiler,andthe MicrosoftdebuggerisbestforaprogramcompiledwithaMicrosoftcompiler.The dependencybetweencompileranddebuggercanplagueco-verificationsinceonlya limitednumberofdebuggerscanbesupportedforaspecificsolution.Anothervariableistheembeddedoperatingsystem.Eachoperatingsystemsupportsonlyalimited numberofcompilerssothesoftwareengineersmustusetoolstheoperatingsystem supports.
193
Chapter5 Tobestexplainthis,let’slookatanexample.Allco-verificationsolutionsthatsupportARMcoresnaturallysupportARMdebuggersandARMcompilers.Aslongas projectsuseARMtoolsthereisnoproblemwithefficientco-verificationdebugging. OneoftheoperatingsystemsgainingpopularityforembeddedapplicationsisLinux. Duetoitsfreesoftwareroots,theLinuxkernelincludessomeassemblylanguagecode writteninGNUassembler(gas)format,andtheCcodeincludesGNUextensions. ThismeansusersmustuseGNUtoolstouseLinuxforanembeddedapplicationfor anARMcore.Infact,ARMdoesnotrecommendanybodyeventrytouseARM compilationtoolsforLinux.Thereareeffortstostandardizeapplicationbinary interfacesandcompilerconventionssothatdifferentcompilationtoolscanmixand matchonLinux,butsuchsolutionswilltaketimetocompleteandmature.This resultsinthepossibilityoftherebeingnosoftwaredebuggerforco-verificationtools todebugembeddedLinux. ASICandASSPprojectsoftenmustsupportmanydifferentoperatingsystemsdependingonwhichonethesystemdesigncompanywantstouseandwhatarethede factostandardsforaparticularapplication.Ioncemetateamofsoftwareengineers designingamultimediachipwithanembeddedARM.Theyhandedmethefollowinglistofenvironmentsvarioussoftwareengineerswereworkingwith: ■
SupportforVxWorksandTornado(basedongcc3.2)
■
SupportforLinux(compiledwithgcc2.95)
■
SupportforpSOS2.5andDiabcompiler
■
SupportfordiagnosticscompiledwithARMcompiler
Maybesomedayallofthesethingswillbecompatibleandsoftwareengineerscaneasilychangetoolswithoutcompatibilityproblems,butfornowitpaystopayattention.
DebuggingCo-VerificationIssues Co-verificationenvironmentscanbedifficulttodebugwhenthingsgowrong.Based ontheadvancedtopicsinthischapterandthedifficultyofcreatingmodelsthatare notderivedfromthemicroprocessordesigndatabaseaswediscussedinChapter4,it iscleartherearemanyvariablesinvolved.Oneoftheknocksagainstco-verificationis thatitisdifficulttouse.Thedifficultystemsmorefromthefactthatwhensomething
194
AdvancedHardware/SoftwareCo-Verification goeswrongitisnoteasytofindoutwhatitis.Theproblemmayormaynotberelated toco-verification.Wehavediscussedhowmodelingdifferencesmaycauseatesttofail, eventhoughthemodelmatchesthespecificationoftheCPUbusprotocol.Wehave discussedtheconnectionsbetweensoftware,theembeddedsystemmemorymap,and thelogicsimulationmemorymodels.Anapplicationengineeroncee-mailedtotellme aco-verificationmodelwasnotmatchingtheresultsoftheARMDSM.Afterrepeated effortstoblametheCPUmodel,theproblemturnedouttobesomethingwrongina memorymodelthatwasonlyexposedbytheco-verificationmodel.Wetalkedabout howsynchronizationissuescancauseteststofailbecauseoftimingissues.Thelist ofthingsthatcangowrongislong.Theonlywaytoavoidsuchpitfallsistobecome educatedinhowco-verificationworksandtrytointroducenewvariablesinalogical way.Ashappenswithmostengineeringprojects,changingtoomanyvariablesatonce iscauseforconfusionwhenthingsgowrong.
195
This page intentionally left blank
CHAPTER
6
HardwareVerificationEnvironment andCo-Verification Thehardwareverificationenvironmentandtestbenchesarecloselyrelatedtocoverification.ToverifyanSoCtakesacombinationoftoolsandtechniquesthatmust worktogethertobeeffective.Thischapterdiscussestherelationshipbetweencoverificationandthehardwareverificationenvironment.Thefirstpartofthechapter talksaboutpassiveverificationactivitiesthataiddebuggingandmonitoringofthe designandthelatterparttalksaboutactivetechniquesthatprovidestimulusto createactivityinthedesign.Sincethefocusisonco-verification,thediscussionis primarilyrelatedtothemicroprocessorbusinterface.Theconceptofsoftwareverificationisalsointroduced.
BusMonitor Oneofthemostbasicthingsthatnearlyeverydesignusesisabusmonitorsimply todisplaybusactivityasitoccurs.ItisbesttoseparatethemonitorfromanyspecificCPUorbusfunctionalmodelsthatwillbeusedsothatmodelscaneasilybe interchangedduringdifferentphasesofverification.AsimpleexampleofanAHB monitorisshowninFigure6-1.
197
Chapter6 /* * Synthesizable AHB Bus Montor * * This monitor works with Xcite and Xtreme and monitors AHB activity. * * The following features are implemented: * * 1. Display Address and Data information for bus activity * When all accesses are not required the peek feature can be used * to selectively display activity. * * 2. Provide Address and Data breakpoints to stop simulation based on * given address and read/write data values. * * To enable the monitor to display all address and data information * use +EN_AHB_MON on the vlg command line * * To use peek capability: if you do not want to monitor all the time * use +EN_AHB_PEEK, which turns on monitoring for some * cycles (ahb_peek_duration) every few cycles (ahb_peek_interval) * (both numbers can be changed on command-line). * * To use simple address and data breakpoints: * To use address breakpoint: * from CLI: * $scope(); * #1 $stop;. * ab_v=; // set address breakpoint value * ab_m=; // mask, 1 means that bits is not compared. * ab_en=1; // enables the address breakpoint. * To use read data breakpoint: * from CLI: * $scope(); * #1 $stop;. * rdb_v=; // set read data breakpoint value * rdb_m=; // mask, 1 means that bits is not compared. * rdb_en=1; // enables the read data breakpoint. * To use write data breakpoint: * from CLI: * $scope(); * #1 $stop;. * wdb_v=; // set write data breakpoint value * wdb_m=; // mask, 1 means that bits is not compared. * wdb_en=1; // enables the write data breakpoint. * * Note that, the you will need to do #1 so as not to race * with the intial block in the code for the monitor. * * For simulations with multiple monitors use the NAME parameter to * easily identify which messages below to which AHB interface. * * */
198
HardwareVerificationEnvironmentandCo-Verification module ahbmon ( HCLK, HRESETn, HGRANT, HADDR, HTRANS, HWRITE, HSIZE, HBURST, HPROT, HREADY, HRESP, HRDATA, HWDATA ); parameter DATA_BUS_WIDTH = 32; parameter NAME = "AHB"; input input input input input input input input input input input input input
[31:0] [1:0] [2:0] [2:0] [3:0] [1:0] [DATA_BUS_WIDTH-1:0] [DATA_BUS_WIDTH-1:0]
parameter parameter parameter parameter
IDLE BUSY NONSEQUENTIAL SEQUENTIAL
HCLK; HRESETn; HGRANT; HADDR; HTRANS; HWRITE; HSIZE; HBURST; HPROT; HREADY; HRESP; HRDATA; HWDATA; = = = =
2'b00; 2'b01; 2'b10; 2'b11;
parameter READ parameter WRITE
= 1'b0; = 1'b1;
parameter parameter parameter parameter parameter parameter parameter parameter
= = = = = = = =
SINGLE INCR WRAP4 INCR4 WRAP8 INCR8 WRAP16 INCR16
3'b000; 3'b001; 3'b010; 3'b011; 3'b100; 3'b101; 3'b110; 3'b111;
199
Chapter6 parameter parameter parameter parameter
OKAY ERROR RETRY SPLIT
= = = =
2'b00; 2'b01; 2'b10; 2'b11;
parameter parameter parameter parameter parameter parameter parameter parameter
BYTE HALFWORD WORD DOUBLEWORD WORD_LINE_4 WORD_LINE_8 BITS_512 BITS_1024
= = = = = = = =
3'b000; 3'b001; 3'b010; 3'b011; 3'b100; 3'b101; 3'b110; 3'b111;
parameter parameter parameter parameter parameter parameter parameter
IDLE_STATE ADDR_STATE DATA_STATE ERROR_STATE RETRY_STATE SPLIT_STATE UNKNOWN_STATE
= = = = = = =
4'h0; 4'h1; 4'h2; 4'h3; 4'h4; 4'h5; 4'h6;
reg [3:0] reg
state; wait_reg;
reg
latched_hready;
reg [1:0]
latched_htrans;
reg reg reg reg reg
latched_haddr; latched_hwrite; latched_hsize; latched_hburst; latched_hprot;
[31:0] [2:0] [2:0] [3:0]
reg
HGRANT_d;
reg [4:0] reg
beat_count; burst_in_progress;
reg [31:0] reg reg
error_retry_or_split_address; retry_pending; split_pending;
reg
en_ahb_mon;
// enable AHB monitor
reg
ab_en, rdb_en, wdb_en, ab_hit, rdb_hit, wdb_hit;
// // // // // //
address breakpoint enable read data breakpoint enable write data breakpoint enable address breakpoint hit read data breakpoint hit; write data breakpoint hit;
200
HardwareVerificationEnvironmentandCo-Verification reg [31:0] data value reg [31:0] data mask
ab_v, rdb_v, wdb_v; // address value, read data value, write ab_m, rdb_m, wdb_m; // address mask, read data mask, write
wire
trig_bkpt_stop;
reg
trig_addr_disp, trig_reset_disp, trig_data_disp;
reg reg integer integer
en_ahb_peek; peek_on; ahb_peek_interval; ahb_peek_duration;
reg [13*8:1] htrans_name; always @(latched_htrans) begin case (latched_htrans) IDLE : htrans_name BUSY : htrans_name NONSEQUENTIAL : htrans_name SEQUENTIAL : htrans_name default : htrans_name endcase end
= = = = =
"idle"; "busy"; "nonsequential"; "sequential"; "unknown";
reg [7*8:1] hwrite_name; always @(latched_hwrite) begin case (latched_hwrite) READ : hwrite_name = "read"; WRITE : hwrite_name = "write"; default : hwrite_name = "unknown"; endcase end reg [7*8:1] hburst_name; always @(latched_hburst) begin case (latched_hburst) SINGLE : hburst_name = "single"; INCR : hburst_name = "incr"; WRAP4 : hburst_name = "wrap4"; INCR4 : hburst_name = "incr4"; WRAP8 : hburst_name = "wrap8"; INCR8 : hburst_name = "incr8"; WRAP16 : hburst_name = "wrap16"; INCR16 : hburst_name = "incr16"; default : hburst_name = "unknown"; endcase end
201
Chapter6 reg [7*8:1] hresp_name; always @(HRESP) begin case (HRESP) OKAY : hresp_name ERROR : hresp_name RETRY : hresp_name SPLIT : hresp_name default : hresp_name endcase end
= = = = =
"okay"; "error"; "retry"; "split"; "unknown";
reg [8*8:1] hsize_name; always @(latched_hsize) begin case (latched_hsize) BYTE : hsize_name HALFWORD : hsize_name WORD : hsize_name DOUBLEWORD : hsize_name WORD_LINE_4 : hsize_name WORD_LINE_8 : hsize_name BITS_512 : hsize_name BITS_1024 : hsize_name default : hsize_name endcase end
= = = = = = = = =
"byte"; "half"; "word"; "64bits"; "128bits"; "256bits"; "512bits"; "1024bits"; "unknown";
/* ** Initializaton */ initial begin $timeformat(-9,3," ns"); $display("AHB monitor instantiated at %m with name %s",NAME); state = IDLE_STATE; /* ** Monitor enable for all address and data */ if($test$plusargs("EN_AHB_MON")) begin $display("%0t: [%s] monitor enabled",$time,NAME); en_ahb_mon = 1; end else begin $display("%0t: [%s] monitor disabled",$time,NAME); en_ahb_mon = 0; end /* ** Peek enable */ if($test$plusargs("EN_AHB_PEEK"))
202
HardwareVerificationEnvironmentandCo-Verification en_ahb_peek = 1; else en_ahb_peek = 0; ahb_peek_interval = 10000; // Peek every 1000 clocks ahb_peek_duration = 500; // Peek for 500 clocks ab_en = 0; rdb_en = 0; wdb_en = 0; ab_hit = 0; rdb_hit = 0; wdb_hit = 0; ab_m = 32'h0; rdb_m = 32'h0; wdb_m = 32'h0; trig_addr_disp = 0; trig_data_disp = 0; trig_reset_disp = 0; wait_reg = 0; end always @(posedge HCLK) HGRANT_d