Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis [1 ed.] 0470848391, 9780470848395

Genomics and bioinformatics play an increasingly important and transformative role in medicine, society and agriculture.

425 6 31MB

English Pages 376 [373] Year 2004

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Protein Bioinformatics: An Algorithmic Approach to Sequence and Structure Analysis [1 ed.]
 0470848391, 9780470848395

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

IngvarEidhammer I Ingelonassen I WilliamB,Taylor

PROTEIN BIOINFORMATICS An Algorithmic Approacht-o Sequence and Structure Analysis

Pmtèin Bioinformatics: An Algorithnic Appmach ao Sequenceand StrucoìreAnalysis

tryrar Eidhmn€r ùd frge Jons*tr Deporhùt of Infomaîics,UnirérsittoÍ Beryeh,NoNa, Irtision of MathztutìcdLaiobsx Natiotul t"sîituîefor Vediml Rétulci, Londù, ùK

JohnWiley & Sons,Lrd

521.63t !264

Cor

Part I

it

i,

I

P

Contents

Part I

SEQUENCEANALYSIS

PaiNis

Global Afigrùent of S€qrcrq

1.r a s6nns schmefd rheModcl 1.4 Fiidìig HighcrsontrgAriemts wnnDynmic r.4.r rr.mir.&,j 1.4.2 UEofEahic*

ii ì'6sÓno8GÀps:c,pPefu|ljs 1.7 rrymic Pmgrunninglof Gcndîl cap Penrq, ì.3 Dyrantu rmglffing fof af6necap FeD.rry l.e ar4rreÍ scoreaid s.qE@ Dishne

Pairwlselacal Aligmenl aùd Dstsbse Se(h 2'ITheBalicop.nliai:cÙredì!8T{oseqEtrcs 2.2.2 Rep€rins$sDe*

2,],2Fjndingúebellocdr|isihcns 2.4

2.3.i1 S.onn! Dúi6 andgappcfruies Dahbse Strh: BLAS|

r.l

HyF,ú6is Testinatù seqùem Homology

r.2.r

r! r.j

Poìsson prcbùnny dÌfdbtrtior

ftobabfiry Dnriburiotrsfor cappedalieEds asesinc md Cohpcdn! PósJ

MultiDlèGlobrl Alismenr and Phylogenetic T.€6 .1.1.2 a pBnìDgaìeon66 îor ùe Dp soLurion 4',Mul'ipìeA|igihdhmdPh'|oFreÌicTtds l,r.lTteDmhe'ofdiflÙù''Gbpoloei$ 4.r.2 Moìecutù.lockÌheory 4.r.,1

DifÍeEnr.pprùchestd lmmhdjns

4.3ó Roorincorlé t6t: bmMpping 4.3.7 Sbtisricsl 4.4.1 Aligrinst{o subsrùsnmnb '1.1'sseqEr@vè|glB

5.r s.ùrirg Múi€! ss€d onFùY 5.2 PAMscùns MdÙic$ slhsùtuiìornùn 5.2.2 calcuìare 5:.r

MàtricsrorAtn$ dol!úon!4 Ljme

5.2.6 ScorìigMties (ìos oddrms@t 5'2'7Estìrdinglheevo|Ùlionij'dtbl.Ò

Conìpdìis BLoSUM úd PAMMdn.è

62

6 1.2 Rènovi'g Ns úd corms 6.1.3 Pdsidonvùshb 6.r.4 scqEnc wcrls T ó.1.5 rerùc grps seNhìtrs Dd,bses wirhPrÒ6les IdTSd BLAST:PSI-BLAST 6'3'lM*jnguì.mÙlljPlea]ìgrmeÚ 6.32 cdsùcddg rh. Èofile

,

ó..1.2 Cdnnrudiry, Èrrìle HMM rof a prcreùfmìly

?.r 7.?

Îe PROSìî!,ragù4Ò E\ac,/aprminaÈMÍrhins

7 4.?

S.ónnsFlftefr

77

CompùnÒDBaedMcrh0ls ?.7.1 Piwl rllsd nertrods

r3

Pfrem Dfirn MdhodsrPruh

Part II

STRUCTUREANALYSIS

StùcturesandStùctù.€Dcsc.iptioÌs 3 r unns ofslncrun Der;P.ions 1l 85.r

Linè$smen6(ficks)

3.5.1 srmdted sheb (roPs) 3.5.5 ropolosyorÈo'einsrrucruE 3.6 rdsrryins i\e ssEs 3 6.2

DènÉseÒr&4 shcM orPndis (Dss?l

ssFnn.wùkfolPaiÚjssn4uftconpùjsor

Sùpè.posiiiotrùal Dymmic ProgrMming

9.2 93

9 r 3 U$nsRMSDÀsoriisofsdctuEsiDiìeìries rnd Alignmed AlEmtins Sùpcrsìsirion DoùblèDynmic Pner3lmine 9.r.1 Ldlevel soring naùies 9'3.3lbdEddohtdylani.P,lgrmriig

l0 ì0 I I îvd.dirúsionl seonètic Mshins rorshdùrc.onpùison 10.12 GamcùichÀsbiis

10.2.r Meaqjq rhesinìlùiry or dirù.è ($b)mries

Il

Cluste.irg: CombiùineL@alSiùúlùiti6 aodCo*nhcy 111 compdibirirr rl.2 s.ùchirsrorS.ù1Mtuhà\ ll.3'2orefìappinccla..6

ll.5.l Compffins rtrtomaÌioDs ll.s.2 cd.ul ii! drercv hoslohdiòn rr ó Crlrdig by U.corRetarions 1r.62 Ceondfi.fchrion

t2 SignifcanceandAssm€Ìr r2l

of SrruclùreConparisons

coishritrg R.1.dom shdua ModÈlN 11.2.1 cÒnsndinenorEduidadrbsG r2.2.2 Den!úloD ti'r for.imì1 l,

1l Mùltiple Sùuctùre Conpafi son

rinding r conmon corc fior ! MuìriptcAtignncn!

Part

135.2 Diso(nne PrcrinsPúem l3.5.3 lhe aPPúmh 135'4 scrirg ùs Prckns úods rots 13.7 Biblìo$.Phic

la Ptut€tnStnctúre Cla$ilcatiotr r1.2 An lsinsModeltof DoMi! rddúficfior 14.31 MdnìY{donl?B r4.r.2 Miinry-, domdns d , dorùis lari 14.5 Aúomtic APprcshs b Cl,ssi6cati0n rj.6 DibhMs lor Sh.M cì6sific'rion r4.7 FSsP-DdiDomhDidioúJY Ia3.1 Domains r.r.3.4 ToF,losy(roìdlibÍv) lmili6 143ó sequcdce ddsìrì.adonpD*dÙE Th. CAIII 14.3.? BNed m sùcLs r1.9 clssificarior

Paúm

SEQUE\CE'STRLCTIIREANALYSIS

15 Srmctùrehealt tro!: Th@ding 15., PFb! Ssonddf, SÙtrctuEPtdicúoD 15.1.3 Ac.ú4v in $cond.4 shctuE predicÌior ts.r

MdhodsBucd d sequde ausmst rri.l Tr.3D rD ratchiic n.ìrod '5,:]'2]xcFUCU€ú.úod 15.4.1 PotúiolsofFd

rorc

15j2

DoubÈDyimi Pmsn'rming

Pre

AppeDdixA Brsic in Mathematics,PrcbÀbitity..d

A42Prcbabilil}di|dbulioN

B tntmduction to Moleotar Biolo$,

Preface

r probiobioiDfomaiìG,rcuiins or mdhÒds

!lms'gúnlberelbaisforch6inglbgtish.plogDmndtb{iglipaEn46od oprions,aÍd liniuy be&ne noÈ conÉ rhe onpúr rcicd (o! q,npúú sico@ rodcú). on 'he dhq hand,nly lìDd aid n or sh. mry edi úc údèóbdiis ics$ry b mmc iftú!ìng probrè'ns, ónnù

bsis for ólabon'ivc prcja 6 nodùe igbt balanemakinsùe

ùis by l$uiry on rheideÀ or rhemrhdrlprcgEft lhire sryiis mry r scnrd 'Ih. ider {bou(heir ùù of appri.ltior úcùons ùo .lso d$nb.n rorn{lly $

'ÀeonginilÉ*afthpapcsalwdlsÈv r.o

ahoú ùe abjcd i EoE daù.

etve.xene desdFio$ onheivanabbdrbbrss forDNA aodprccinsquEncs, Tr4e hool! tpìdlly 3oì! y.e lirlè d Md ii ú. pmgrumsor hk. îrrc a

my rì'd sonc sdios

hardro foìtor lwhi.h oi bc 'kipped). ri . simitrnEDq

s we Fdìdc biolo3nar ndidioi

er

ùd rÀ.

What is biotntormalicsani{ay?

.vùbsddodfmflEbokpìi.ed rndmofDNAbyw{sialdcfi

mjÒ! bE*rhrouehsitr rlÈ 1910s J r95

x n rhcdeveìopnst ii rres fó

iigmaÍyrldentrndcsÒdhcÀrìlmr]oùfieúmruùtiÒn,lsd'h9bù]o!ia|

i. rùvcr duì !ùh aos ùd od (bìr).DNA trles(rùdodid$) dìd comein foùr dijìcrú

ldopnm'olhighlhrcuchpuln.lh ih. appì'6ùù 0f 6mpúh

h h6Yd

rhis

pi., bu.rilì oreof rheho$ widclyusd :inilÙ6agiqqu4s.qu.i..'Ti.q ru be*prÈd rohapp.nby ch!n.è)si rero rhGcorrriòdaúbs. prorin.areoirhms dì*ú$n in dePlhìl chaP€n r-3. úd, anddudy rhcarìsmsd ro s,in infomrion abournherclarionshìps bdwes 6ùaìaldsfurudprcpefiesommonblhc h.E n\e sme dioo aid). 'ìe aino aid rir. sdc( îcspdivcry) 'bù nry bchifu úNr rhcdnrrc. or.cùdiry rhdtr€ riomfy felarionship beNen a s. or prcbns ror seret. sorh tor pmEin runriona mryss, i' is .ùciaì 'bd rhemùlÌipìealism !fles rc rrcúedin chlpÎeB,r 7. !ìmtl foreuùrd smd€s dd 'hemjtrwùkn's hù*s (pedomjJìe,rordddq tuúbdkm rndsignalìitre) ii livinscel d€*), b bdh ùdddshd rheevolùrimùrpóbìls Gìn.Òrhcsúùcrùie013Èdeir rhdg6 froÉ slowlyin cvolùtionúù d@si$ $qúrc4, aid.o id.idry rb.ommor da$i6.!rìoi or rho úivù*

of prc!ìr 3hdrca. A coúnon appfúh.o doirg t r . , q ,\ n 4 r G t o , l r o n , t r . . , r u . o n

oi prcied rhd(s.

rn Pld n (chlpÌen 3_r4) Ne defnb.

frudùr grcnrssqù.i!Ò('hcodtrorrmjnomidsrtÒrs

ctuinG).rìoEqmos

!c.rc .tìa i o.ù. rru4re diíion

No This Qn !s sonc

d rhùdine in pin n (chapref l5)

ind porh rrudres Mo{ofùci'lg rrpìicrbìe 'o iúcrùÒ.ide(DNA ù RNÀ) squcr.ù\ lrde e. bÒ\$€( a nùnbú

Íl,/pjdeiJìbìÙfonficr

Notation

. o. ,. (, . . . dedÒksuD$ecifred úìno &ids. . a, ., D,. . . (úc oràr..Èf codd is us.dtur spdi8l Àùiro ùjds. . c, Ìd c" e boÌl ù*d fùr rhcblcrbonea{arhon arom. i e is a geHrr aiphabd,mÒsdyusd fd rì. id or mino dds.

tsr.r,,...,r'tfo.!!c!ors.qùo.aj.

. sr' r Nd rórrh.sr Ir,.rr. ..., Í I. . { is ùsedror a (qùery)*queftq ud a for a dÍib.e seqúna. ! 4r..J;rìcruh6.qu.nccGuùsùing)or4fim4 rÒ4r. . . isùedfùrsèrcBrBidue. . Jsd S'ùc us.drore.orìnsffirìy b r I is usd for o $dir€ mrix, Rd,is rhcsoins beMeer. ud à.

r ur,4,ar,rj.4,4....

GN.drùnsìdusvh.nffiyrdnsstururcs.

. , 4 r . A r ,A , , r / . & , r . . . . . f t N e d . ? is ùsd asa prh in dynmi. prcsGnnirg.

. ( 5 ,P )= r R ( r ,r . . . . ) . ùsìlgrhe$o is na|nxr?.s is rhe$oline,hd t

AC

Acknowledgements 'ftis bmk is builronletoE mb for llm6 in bioirlmrùd aìsonìàms bughr r 'n. 0niv.6iryof Bà-ger.we rcknowredle en studedb whùhM torlow€d rhe drarr oysbìnrùfod Elsdsm ùd iiromaliE ardiúpinns.on!ffirions *iù ReinA,srúdiDdKjerrPtuen, Sone of rh. d.itur ov.rd buirdronjoinrwr únhavh Brumr md DrvidGtb.r eI ùd èpdùUy RurhNusimr ud omù Drcffor vrubìe he1p. rinally,rbinrsroou fmiliG fork cpirgupvirh usdufìns

Part I

SEQUENCEANALYSIS

I

PairwiseGlobalAlignmenlof Sequences i"^F.,s ",.*

.h.."" '","

llll:11'":l''-) 6"' d.r r prrif we

eqrRÙ'ch!fo5jI4|^;\ràels.ongJ!'1

*"p'"

ùèotú-cD6k

bhtr'Ùehmdrdtrid-m'

"lgollmbÒJ'n!T!r.em'.xed

Pù'h

bbúJqùqk'

dt dù1ùùaqtu@r

016 d \ r" &.",. i

nB^. -.N"i.r

l.l

ÀlignnentandEvolution

drhÉ5 6uúr

Nhsc

rhcrclrìoNhìpby a/irrù8 'he qEEs. nre iliemù' strùtd ippencdinòccldlurioiof rhcnqìsLqucnces.

(ùoded by rrd,f) nEms delèrilji d iirÒrión (t/rn.

onc q rrsr

owi Grd, ir iÒrkióNn).rsilLnrì$mcnr (onryone hlLrpscd Hidtrechrqe ìn eùb muturioù. rheaùsnnenr b€rnenr rnd

PAIR$ÌSECLOBALAI-IGNMENTOFSEQLENCES

:1.t1:!.11: :t oi.4db, : too. ftncon.

Ì1..d,

Rod I ooeoiúleBl eú$brliii6.

l..ql6.-*-*.,.

s à t r r d i r n . r , , ' or . o o ! . , n e m r

sd o- \ ne . hpr. moder.r r ù!d. q b m rÈc ù" ,, Àù ù mnmc

rbuns r*o ildeb. onè hiíoy cÒutdbe

qf\ù.h.erc|JdmJryIil'ol)hîjel|'gnmÙ'ùL'D\okh\ò"!bùt'

m\ mùr .ì " f - e r h o r t dh , e o ú o @ú a r t Ò E h r i r s " n ; . , u r n 6 "^.o ro pcú rr s.rcrn q s ,\ oir' mer, sfu b, .!!roc_ d ,,4',.s;/" prce n

n'Ù""noluuonfJ"ub1'"."'.hji€g'e1'

1.2 Whatis an Alignment? mN \djsrythchrr'^linsoiqdúú (resdu6)inq andI h! . Aìl synbols

for theModel 1.3 A ScorinsScheme 1.4

Gddili\e somg $heme).

. À , ,- l f u 4 = , , 0 r o | , + , i

P4rRwrsE cLoBArar,t6M.{!NroFSIQUENcEs

ffihllffi "hH il?-*Tq;f ffi*#,",#ffi*"'ffi 1.4 FindiDgHighesrscoringAlignmenrs wirh rrynamicprogramming

'*;ffiTliÈi':r-if ;,y,#.r'f.:;hBi*r

*ffiurfr ,**àtfr'".ry#L Usiùsdpùlc pmgmnins, frd ù. hishèt po$ible kE.

hish*r ff iltl*hidìns,rc sùcbv6 ns,,e To.xpìainrie m..hodee ùbdE

"r"Íi:tì,:tr&*J '

sorÈ rohior

Pffi :T::Tr1'1 ilfi#:H',Ì

r 4, rh i!\ synbolùf4,lj ì.1în syDboì orr. "

'nbnk orq rr era0Pìe.r, î =:th. ! rr , r rhesequenc! or 'nc jì6.I lmbok or l.

.,9'JndÈhigh.s]uft{hthlinbe

Ndclhir,,lvillbe'hÈbìgh*soc u* of dÉ Eîami. pfd!Òmmùs wa r,.j by siig oie or mof or rr.J. 0 < f < i 0 c % r b e n D ij = ' r ù d

2.2.2 Repetingseenenrs dr{ r+r ftùè eirl bddos onù. .ubdirgonirriù (i. j) '0 (i + i, j + À)(r.c

s\

.:

2.3 D)îamic Pmgramming ììebc{ srob{r!|anmsr. Dyi'ri!

F+' ùi .

b$' 0 ghrd \conrs) lo(L rtis'ftnr

Òodi

ilr /6. TrÈ ber slisDmenr. ftom ùreb.gintriis, rndiie ú rbrhe* ftsjdùàsthc bsi 6)n nMd itr Fisùr 2!k). Nrh i sor of 0 4:

dt.r

DYNAMICPROCRAMMINC

2,3.

sis d1''

2.32

TneexamprelhoNrúútprc6x6 Gonins 66tìn 'he alisinot)

mùlr(4ri /J ,) r!t

{

ch(uc

nlg!

rhf2 tù€D peùLm lyan1i. rmsnnùùt

a,.tùt HsP

2.5 Exercises Tryùisonlldsquen$d=DAEAD

l,l (n. i, bc 'he fln' rnd li$ posidon i' (b), ùd (r, jr) bc rhc úme fo, d. s

!,1

EXIRCrSÈS andr ror n\e sappàìiry (ììncù).rw Íqr@s

*c givcn:BRRîRî and

(a) Find'hc highcr ndn! lodt !ìilnmsnh (yotr$oùtd findroùt. (b) Youhxlr rh$ly d foundrheaùgmÈú

cfler?rize Equîrion ( r.2) ro bÈyiìd iir rìldine a l&aì:ù4mn

{idì e.i.nr

aIA (noft€ 'haÌ only 'lìe imino xcds

A . I .L . s c r l

u s ú = 2 ,? = 5

(r) Mr]re3hbleteiúali|osibrevords(=4r= 16Ìord!) (b) Exr ncq, úd roroeh woln' ir 4 (o lxdinc l, rtul ror.ùb wd. iìndm (d) Youwil iow iii,r 'hd 'hcr ir x *o lh*conhinssLf@m,atrdhNs$rcsi hichHsP(virìì $ùe) woùìdyotr rbùcùr.oflYlt!ù (Lro)! Bci\arc (a)Nov debilk,m.orùc sùbmùdri

2.6

{hl r,ìsc

irc rbrtu bib d rhcsùù di

par (Ìoe àroded roHsr)

2.6 Bibliographicnotes dornmrrds\ arsofoù'dii afgos(1e37) hd\lnsMùd^4os0eer)

sqf

MP/www.g.h,sdq'!rc'aJúùhi lurion!d DfotnsedGeesÈohenI qql).

\'

, .'dMità rd3ò

r00t. FAsTxn dscnbedir pàMi ( 1990),,nd cippeJ BLASTjn Atrs.hùrerj

J

Sta

StatisticalAnalysis compdns r qucD\cqu0ncc la) $rh É.h 5ùrudoct(d) in ! dú!b*. sNc! fúc ro k{d ,iFEcdl naiy G/,.r),e$h

(homorogous)tsqrqo, Í mui hc fÒh

ÈrfolsiEJfrlmeofîa|ienmm'o||qo

3.1 HypothesisT€stingfor SeqùenceHomolog' drsist$i.eHypúhsjsqùgatrúidly

hypodÉsis.Él'À.1$'qiH.

r giEn rhEshdd{qc o 0l (2s]), rher is cNon ror rcj€'iis H0 Gùhc 1* rcvrDùd r.cpriig ,r :n!ndù (4,/) sigfirì.aú.i.e.irrher -,v

ùrrttúútt

údpdhrn

::ihn@ql!i!núdlNlRì!}Lú

qqitur

HYPOTHESIS TTS1INCFORSEO!

rh. hìlher $or (hì-!h6' nsni6ene).

ùd rheiÈ!sld PUfsrìlr (4./) vrh

gùi.ollJ rquur!$ (\ccscdu :. t. )

Dder!.e

ùe rjcdu

ÈEr for ,!ro.fo

Fshm dÉ A|e im.ú.hosedù (:) fioddE leftnr

piir f,!ù (.r JJ \nf

n! rherî lr o' higlú, !iru, to G$ ,hc p'obrbiìiqdisribúur nuoduboo. rilr onrìnrc{iÌhrheq{úo. ted rof

3.1.1 Rrndnm geftrrliotr ùfscqknres e rNro dd r^ sù

a$ùf!

ù'Lphlbdútoùslmbots IÀ,.

oosiioi (or úe nndor

!, Et ù'l rrroirqus!$

l/, =0rì../i=0r /r=0r. r,=04t.Drnìbedm+rwnhr!tubrrrrnl'\i!!

ndù .he cure Fom ó b @ is rtu piÒb$illry

rcÌdsin rhemtualsqEie5 (4 d)i\ u*d or onc ({tr bÒrh)or úè slùedÈs h dor

pr&dcÀ

Ior úe pnbabili$ dìstib'rior.

jlù8,ú.{q[mcisdìvidedùbcej

TESNNCFORS[AÙ HYPOTHESIS

'' . , "*.' 't1,.'

;:

r'

: tti

r.:'i I ''

, ,i;....,

! I dL beiî ù.s Ddirios (bÙrùshut['d ofdr]

GÍ|gl5iiglesfNscÙafl

siadlicance 3.1.2 Ùseotz Y!!ùestor cstinrtingthest{tistic'l

ú.7skfds'hh.]llThj\lsudlesúd].

1.2 StatisticalDistributions

3,2,1 PoissonproDabilirydisllihùtiotr Poksoldistibutioniclbenoniqrnlir|4d',nìe (úe prcbibiliry Ìhat 'he rochÀ,io ujrble

=.r=i" " Prx Plx>.r=, i1" ;,,

.

{ vill hre

:1.2,2 Orl.€ne ral ue distibtriio ns LertrL.

irs . r, tbe iodeperLrtù,

nrùcrioiroi ri i\ ùen (:incedì! r uo indlrlfrenr of eiú orhù) ,l= Pt\i

I

=t.

I I

qhn'io! ofr fr of (oven:ppiigrslneds ! (QDdon)qFcd

ofd. iid r1 rhoI

'bcf,@IlliiicpÒtÙliiLh|úje

\ÍrúI+!drioro$us(úrrestìoq.(ri

'

rr

î!F)brhiirr

'

f):

l.l

ì (d.ishyldn'nbúFnof f is

1bc 6mof tr(r) d.p:.d!.ì r urd(

PtY>rl=t

r'ìr)=Ì-erpt.

ohen.., rr) htrrk,NlhcoccLhLre s

, 41

6r)

îùc t5 ! turfbi be's*ùù'.d; a'd,.i,tr (0.5rr di\aihùrior is Enìe1s coarinr)

rhe

r|.|'

\nallsi\oÎ Slàli.lnrl SigniRcance 3..1 lheorelical KÍìfl ind^rf hd (Ìe90)hlrcdo^.rhÈ

r.{-k.r.

.Iis'heiìphibdor'hermi0om,ds

tEq@.ùr lp,l.liJ

(orter, = t).

E= L

P,,ir,Rt,.

I ,""r"a"= l be)otrdrhr $ÒPoo. rhisbúk).

{ed rmn l,rd, ùd I oo{ rhjsis doir

t lrr ) nd {tr I rrc sùfi.ieiLry imikr

ed { ùe 8Nd?'1

ùùhtt

o! ylnIl

3J.1

t (.:,i3 urdbùiùscof 1h.lPrxjmde

By sritr! Ì = 1 iDEluim c.o *. 3 Lsttf ptubtt itúr elIùùùg 4t kai otu r , { s ! = P ( r M> s ) = P ( z r ,> r ) È r - e ! = r = I sp(_E(s) Noèrhdexp(r)is.,rùivrùrk, !,.

qp( r4,e /51 l1j)

. By dpmdiis Equdjoi (1.7) inb r p

P ( s r =L e \ p (E ( r ! r = - l r - = - + : + - +

)=

wchlÉtgoseqkrùsrúd/oflère'l r. wc hndrìc b4 ìGaì (unslpFd) ar

3J,l

The P ydue hasan exlnmc valuedislrlbùiion

rr ùe bìgher$gmentprir KoE foundbycompÍie! olúo squctr€s bcs . Fón qùa'ior (3 ?),!c 3.r 'heprcbabiìiiyfq ?(J') = P(s$ > J')r r

è "= r

dp( ,(aze !3)

P(rM > sJ I | - *P(-ehú'De-!r). P(SM>S!!r 3!qinsi

cip(

= I md! - 0!lr-Ì)/4 P(sM>rl

*èsd

- r -exP(-e-nr-d)

(r..1)rbencewe hale rbeso punebB (I and d rli.h is similu ro EtìuaÌÌoD

3.3.2 Theoretical ùalysis fúr datlbase search mrrysh fof ùslpP€d ìorì aiEnmú rór expraiiedii arr{hd d rì. (ree7).,rhdrollosinsdcsdprioDis bxcd ù ùir atuìe Fora {orc s . 'bÒ, erìuc(hc expÈred *qu!n!6 wirhsco€sor d ràr 5') ii gtun ji rqùdiÒnc 5)

3.4 \hùe io honoìosousr,ìù.ncs di:Ù. NorerlDr dr , \.rue eG^ b Ìhe luruq sgmcih, bu ibrvory smrrr t {lùs úì

o,

s n\Ìssor *hci . squeiccs (indep€ndùi !ìd or shr rcoBù14rc conìpmJ wi'r rhuqmfi n , mhjptied by ùn p eìue. asùr'ingin 'rìc sùc Equ{iù (r.e) Gin N = 7_,). 'hiscqullirysÈnslohddlollulsrs Ùb 0.0r mùn úc t vi'Luebesiosroinqcca fan4 úlltr rlr ? vdtr. usd.îÙdlheo,'.,pfob.biìirylolct|t riro î.id (brcksìuid prcb.biljri*) NheDdifr.ri' sonrs naùns GM brk!Òuid prebrbìtiÌi6) tueùsd. ,rhcEr{re, r s, ido r róiultizr

sm s,, sch dìarrhè

(r! Òarbc rùNi ún úis nÒhrntc'l k disribùrior *irh ! =0.1= I (se Exercis6).) T.e nomaljzedsoEs is deroredby Dri Fron Éqú'iÒ! G.r3) we tuìd s" = In P, whùh i. ú' iofrdizn rón rcqù FÒfcfhsúi'gnaùixúdlypicdsnino rhRrorobecrr.ùhcdby F4ùrdÒr(3.rr). $d 'hoÌ v,luc roùid byEqùa'ior(3 ì3).

3.4 Probability Distributions fbr GappedAlignments lìc soijsú.,r tnùy $orc is dcvclÒlolror merppedìocir dislmnb. No prur anpuuliom|qPcfìnqkfuiglys y rhcncrhoddcsribrd È!ùìÈ..aìÒulatiig Pdl1jgnmo*ùscNb.Ibhdmlyú.y

rorany$onrs núii. îiis ii ro' (yr) p

ÍJ'

ror(crpped)BL{ST. A dnúek

Ílilgdúoms'!Ùacslrcnrly'ictrlniD rirs r.r..3ppìr Er.mi. pîrrun03o. isfibúion of rhc soú. n órdè, hd ùc rhd lill be usc!. Thisfl.eduÉ n u$n n ftí sEnsncar signi6aDc .md besiver if

hbissqÙeDce'rcnoEnANmilgúil 4 is honoìogousroa mùìmun ot fd quLry{id r *t ÒrdùdomionhÒ'nolo Pe6or (1993)ha invstiefted $venl Es (z = (Y - r)/'). sinirîdtymrcs ,qnM ùú ? 0 or r$ rlìb

tt

3.5 Asscssing andConpaíng Progranlsfor Database Search

dkr. no{!!ù

ùt Fo&rì mrgr cr rir),

!)2e ?

(ned i ùÀ!4in? \qrcn( a sequ.m. r4dr qBre irn n ouhndÈou (o 4J:dlhrei5cnr !rrùì r t ,J.4.s'df

. rr(?): ùcnunrr€rorrn€posnir{quua!s. . FP(7):thcù,mhùof filsepnsnn u s4oenes . FNl.) Lh!nnrre oflrlseùèefLÉfquenes

aMd..rùt FP1arn{ua ù F!ú.r.r(i)l

3.5,t I

../'..

..':

3RmP2ú úc q@pr. O sols P ud h-.'6 t bù MN ú! utuc 1. (b)fu ssn\4 or ù.

i r 5 35 7 1 5 5 . 1 5 2 ! s5r € . 1 ? 1 6 , 1 5 1 1 1 r 1 2 1 0 1 9 * l ! l3Z5r ól ! n : : Ì1 2 :!re! !ìe6v3ó 35t9 73?37573?l6e 6e6665É1ó3636rrio!958r!5r49t

Pr : HrsnrqnMwnqnxHsnHnin P' ì HHúffiHmmffiHHlnrhrnHn.n..

fte l nù s Nhe€FP€) = FN(î) GÈ s{tioi 3.52) Fi3uE3 rG) \hoqr ho{ FP.nd FN de

3i.1

Smitility

ald spécilìcity

r.ror@nmea o'heF (N4r simiìlnúcn. hy prcponio rnd T/(rP + FN).'he

. tr. /: N/rrN + rìPl r s/4j

TP/('|P+ FP).drcPrrPùri

ùbí. trr d ùqu\ ns î 57(?lhsrs rherr o(ùr. r rùvi f hg!rc r th) Nor.rtì

,i / n i iord{€ri,ìg

iiDdùn

'vr r ue rìì dor ro1.ikr srr i n

+r(oorrlc)kì

L(ioù!e) mdèdeFnd on rheòLùlbtd f iguu r rlh)

:1.5,2 DiscriDinatio.powe. a aii.[ piogfri s /6dirùúria

(of dr)r/n(d,r

/o1q ir hN ldr i d\(nni

r^ (r ) (r r tilud oumhcFois.quarc- \tuns y lJriLld !ftrruc \quoì!$ n {ed). r,Lndfsieoodjobindiúimùrù:heneaì

ru{rutri i F,gK:.r(! ùFir(i). $her rr = r^ rdúrJ: fE qoFdig!hfl({ùic(Rocl

a^Rocancn!$ls

ii!ú

p..irì.i'

l-

=7

ii rior @vq. la) ft NÀe!d

trLÈ'oN.s

.r rEsìrrtr.

i . {pfopùÈ

EUI d 1v. *!4r prosm5

€?) s ùc ùrc$o

s i i Í q P Ls d 6 s 6 f

.rcs to r) f eqùaL'o 20. FÒf'h. splcjri!ìry. b se 'h.Jrlv rdirn? zk. wlìlch is

a \hich m dasirìed rs hondosout. Nore iùib fc dósified a bonorùgoN)Th .uidea|cmv!{ouidbulLafnliliyil]o

.Icù.ùnbqofmnhonoloeùss

dìe Roc diaenoì(r'ora dÍrbre or j b or0.0r).1xùdoc,!d r!.'l ! n'Ìi {hi!h umbcfd homorosous f+qes enmpkl ard rP, rhenù'iber or ho,no

(10ii our

rslc rsnirc5. Roc, is d(rìncda\ nPI

j,,*""-,,1=ff = ,rfc*o* t t '.t,o -9

riÒs rÒ 'I. fùÍriùa í EqufiD c 14) Ndc rrrd RocÌ n ! sù! 10,r l, {idì 0 s wonr úd I îs b.r

ù rbc urMt

turdir d P Fisro 3 f4 \hoqsrhi ùe kr (0 ..tr,o. r) urdd Lhccúrc. $ irhrrud i{tr Pr ìl Fìsùor5G) tuf

ùc nmbe or ÙÒddd3otr\(d ù. qrcfy) $quoirc\

3,5.3 Usinsnore s€quenc€s asqù€ries

3.6 Exercises

t-

{E !of!rFdù!.

b 'hc Roc {nc.

'.

(b) r

.h 0rrh. rhumLdqlcicc\

pmnde^ ( (he modi. ù ch{adefnnc. {ìm) mqsufe or decry confùr)j by rhr .q

r found

ind r (ùe vùiamc

ing i! q sd 3 tudÒm sL!@rce, by uug Eqùdion (3 4)

h,honoloeùsb4,wÈù.scFq Ìotrrìo'ioldgous rÒ4 suppo{ 'hr .bc

G) Ld '.ek $bcto cr.h or rbÒprcs.J P': !!Fn!nsn I nnnHns....

Fild rhevihe a = FP(7)- FN(î) ro r$ù {iù whd Nis n{nd in (a)

]\lhaì\i['n.vùcdj'lR'ld.t

4

(r'solr59)ùìdas.qr.ì!ci,,swn\prl

if rsi'iyirr,Àp(.iLorr rlr(;4siiurr on,hch{toúd dit c[óo\clppf)p Dr!ù$ !hi!h progfn twil|] rricrN$n p{r]d. t )'ouretud roberhr b$r

riqùrior(r.1)ii suhsLdòn I 1.: tReùùúq dúrù(!') =r ) r r(J') = r qo{ r4re

P}

rr ). Thh

{ = (n(,(uN)),// coNiLlùLrr(RrLlqD!d rnF4L!ùon c.r rr. slìN Lh! P(r ) \ | úr$ hr! r mrl izederkù rruc di!ù rrùiontr = u 4u

3.7 Bibliogrrphicnotcs ,\xsììnì(r99i).ìl rdjlr$ roÈifrd

(nrir.0d akrdrurfreeo)trd (rú 0d bna r{hùl errl (ree6,ree+)^iny's

conùsda (re33)(rfiisolrlr3).ik5dìùrúa (ree6.lnr).r!,6on(j996.19931 i \r{ rrrri6e(r99t

Nebk lnd Bldor (1001Ieri,ù,tuI ql!r$ tu siùd

ri ir ro.aeo ú j (:000rr) îd r_indil údFonsoì(:1000)useorRoc!úr$n$or!ii|cribrho!d,rRol,i$o!(ree6)

1.7

4

Mulliple GlobalAlignmentand PhylogeneticTfees 1ru|'iplealigrn@'iltmtÌmlùrci5

o rir r whote rrdiy (n h5 bd srid rhd iùtrir e aìisnftùrhoúrùùdty). ri odÉr ùdJ rvo (oi r jc\9 d Lhsmir rhe xi rnry .rcc

(or supedùir,

I. ùis Na)l 'lF arlrsi'

nid Do?lF ard ptu /ó dc\d

J.1 DynamicProg.amming Is&dbmuìúpl.ljg'ìÌjÙ'l.PÍod

ú a nùìriptc s.,lùe!d djsinqr

qiuN, lyps k (fe'ne'nbe.'hf ùiuns

vnh ùdy bbiks r€ forbiddsl

hd! an bctn (l d'Ilùer lrs onLqus!ò gnenby Equinn (4 r ) ilì. ùùnìbùorcdis

riisr€ {.,

{n ùs!rc ihÈrfúg 'h{ ricono, h (b)

rqucrus (o(,,ii) roreqml scqncc lcr-ellN,r, ùid ior phúid \orudoisor such . T'IÌotdùeÌhe rumiu rimebyNirÌgpruniisr.hîiqùs (.ù!oft vrúchf ì

aid rhe be{ (tr coreo sohúioris n

J,1.1 SPscoreof multiolcalignments

e:Ùenr.i|ldsi'nplysnlhembobhi|'he

iNen.d itr n. L.'s(i',ir)

bc ùepai

ise

(J, ;r ) * ùc so€ or3 rlfrubre Pri$ù. hrmlsìnrbeprct-'id re*ha€m rf *c tr\Ltiie{ lap cosb,rhcsp foa crn at$ b. calcuhreds a sùn .f (orum

ìr dì3nmm'.ii is'r. r,rr symrroì of r lrd ^

1.1.2

(riur 3!p |€mll],). úd (memher 0 ùsirgEq@'ioi(4.2)is (crlcurr'cdDs-Nìs) 0 + ( r)+( l)=-2,aMbyurirs r)= _. E u a ' i o n { 1 j )n i r ( ! o r ! m i * i s e )( 1 ) + ( r ) + 3 + ( , 1 ) + ( Nù'hdrofigivenscolilg{htme'l

Thisfolss ftún dd ri[r rhii J(r . Jr) i\ úr hishsnsùe rhieúbrc byalrgoinc

rk sùe or 'rreFrjsdiom or 'be 'yo iiÀl *quses ìn

vÈishd 4@ù! rheE m biotdgiùr (

elìtredii &ùadon (4 n a[ lcluercs

ùe

rulf$du[ÚoLdbeÚmtdb'gìYù3ùehigher!cj!lùÙùtùd'nohcd

'1,1,! A pruningaleorirhmfor ùeDP solulion

d úc,, ;eluenes (r r is ùf knNr i u d 6 ' ì d i 8 s Ì r u )c o n s i d ù r . eùr rÉ ( n . i l . , , r o r ù e D F ' ú r i x . l r d l d ù d $orcofrhebsr p.rhGLisinentiÌo'nthefú !Òrb ldr , bss. (sccFieùf4.:rG). < F îcn e ! kio! rtu I )e$orc or L mùf be< ú + .1 ú: 'heEfore,sr + ,., < ( , wckno{ I

K ùJ 4rR rlc ccttI = (r,I l){ìtlrher dÉ hjeh.r súto nÌ!isnns aRe, as. R (sG.Ì ). wc rho hreio ddùmiie rì ler uppùbdid (iì , ,) for rheirieiùh

t 4'!i7 r.usior j\ ùsd iirúd ÒfD,lrlad

j

'l G) r4ue 4.3 | shrhs rrìÈfo*d prudDs (b) îE s@N!

(b) d!

I i. Jr aa ri

eìls Gs in bek*ùd dtr$iÒo). a vrl D(r, ú). rher úc uùe r, + r(r, D) h sc b u: rhc$orcor drebsr pÍh rÒu

by sc ol a queue. wìÈi I .Òllis vnncd.ir pllcesib rùsùd ncighbou6(o phich i' shdùrdind ujuet in dÉ qùnc, n! oc

j\nji! cùnft. i:) ìh rofld deiehb.ù!.d! s h o u l d b e p ù s h . d i ' ì r h c o d s ( irL) . (i :a++ r , i r l . ( , + r , i r + D . Algodúf 4 r $ovs rìretoruird f.ù\iùr

snh prunirg

DA.

AND PIN'TNGENENCTREES aìso ùn 41, FoNad-lffiion

wth prunin&

ar :iconún rordoirg der muìripteati Fonùd lmaion is ùsd, {idì poDìtrsor.è[s t0'herúcrror'rrcDPnarú(rq0, .0) ù, rhefld cenor rrc DP mút {ar,r 4 . ,i,) rhelhole alism.nr s(r) ?(!) D(!.,) 0

rheber slm of m atìsmetr' (pú) frcm/lo .o ! rhcsÒrcorrh.b6tirisnnqrrmnÀob!rÒu soiT 'he$oE ror dbdìns thealiEE a sbck of ùe ceìlsI for whi.h a yaìuero p(ú) rsfoù'd

Flr, /ix) ÀpM.duÉ rhÈh frndsm ùppú bÒud ol rhesG ofúè rlisntur tom r ceuI rorh. èid-.ètr, fl r = /'oì P(,) := 0i push(', 0) pushdresúÉ cf onrh. quce p o p r , . o rJ:r , ) : - P , i ,

h6sorr .J,-..!r

irr(r) +.(ù.À,) > ,( .hen

6úis.@L!

tor auhNod neìEhbÒú, u ofr ànù ú. 4ht otd.l Push(u, O):P(ù),= s(ù)+ D(u,ù) P(ú) i- nd(P(,), s(,) +D(,.u)

ainding upp.r linit ro. soru Forùy úgm{r

,{ or sqùdcs f,1, l, . . . , r'1, rheEqùúios (dr) 3i/ (a.a)c

s("{) < I I rr}.rr). ldùheúser (n.i:,...,t).îÉir ro.rherLisnmón!o.rh.suhÈquenc*r1+L ",' "i+1,..

.. rl-',..

nrs mb.

Jotrcby ùsinsEqud.n (46) $

.-I I ''r,,,,,1,., "," r f{ù!d ù rimeo (,1, úeÙ d.iors f mprcrity fof6ndiiEr is o(rr,1.

sw andaRlR. , = (3, 2, 2), aid I soirg

i.vjsgali'!ú]enÌsorcs

s r " l . ., . " í * ,

",r.

r = 0 . . . t r r - 1 .i , = 0 . . . a - r no ! ,ts.,. :. .'

u.i id \c,o, .-

r)

conpL*jty ol rhendbod ror lindiner lLppùbouùd\is ú.rîoft oo:n1 spondiEronoviig lod ù rou (DG, u)) n ùbùraredby úe vùih' Éq@'in (4.3)or rhesP soF. Nor rhf rhctumbcror

rbodJd uscd,aÍd mrnyof rhemus (rcush efimaÈ o0 phyroseúic (orflorùrioisJr) b! riir heh wnhdìeaìignins.

4.2 M tiple Aligtrmentsand Phylog€neticTfeer rte lè*as {lemì!!Lnqret,

and rhe 'nEnor

ros. sù.h I r4 tr ciììed J phylolonúic (r .vorùúùrfy) ùco.ind sfiuy '11iib (bmnctr$) o$Lnr sqùems, $d ùe ed,ees e tn i rK cùmrurL\i f.om prcrein{of rJN^)

lioiÒlamuìlipklìj!Ù[4sd\N}nsJ

co^ldsr s'orseqùeocsl^Rl_,ARrr.aRs'.ansl.awrl,a\yr] ùù4rlEl

oùùrL\

b $irrudattud) prryk,!

o|Ío'nR'ov,h6ccuftdin'h.pa rdúryhrùn î nùùriotrfftn s bî

e^ trro 4ueNes. one*qkNcJd

rGd)

cr)arLcmur, ofbds.ùr 1{o Gul\d

whor Dcihq u fru, rdirìe ililmÉnr.or i (lirc) ltìrogqrrd! a!u i! tN\( (L r ph![]genà ! re. (plr rr! orh{ mdhod1 tva) hy dhq mdhùL clmhùùg rh

.Ùlp[]ro:uldi!to$ud0d'ielcJ]sF rerhod!lo1ÓNndiigplìylogÙtù.ft4

.LL ri$ ld$cu, úrn trhorrl'3r), rq, u'!$kr0lnsrhd

cdg$ u Lhc'(o

ln ùyìosi€ric rudies.'heoL (6 tur uú,torLn h ou($! úr {,bjLd\!

ud

ND PHYLOCINEI1CIREES

a prryroa{eri. rÈ cdsrudd ùyòc Eisù@cjohiq Dr'hoduis ù N ù (ùr sùc 4\ red ro.Fie!€ LD. rhe Nnen ioss ù. Ésh orsirg bobhPpùg Gq .ieEr.5

ows,cìvúDdgìú|sqEo$'qlimde diphy|ocsdi.hc,wÒÓNidùoly . Theùe ha r l.trimr nÒdes oeavct. otrcfof flch ùiein.ì squence.

dnediotris dÉidcd.(rnùrf@Èd ei

úc difarion is undsided.)

rvc ryo chitdcn;rheùrenat mda ofm unrcord re haverbE comsrd edges(btr.h.9 ! Ai ircdd ioò.ù

('nsomc@t

r+

r+ ofPddeÈùFgurc4'5

nús bM rwÒiDstin sqq. lrìis fte rùùefor eivesùr rlc Òepúniry roìlludraé rbeconcephor ,n ,r,s hd rdrdr4. dÙpucaliù'í wejitiptì

lh. fuìl

ìes,bÙlpmlÙgsirúcy{edEnvennlmgeoe FigE4j s tbeevoluliotr shNi in FictrE'1.6,

. rNsr(MoNe) d nrs2(Mose) m p8dogq . n{sl(Mose)a

nrsl(Rra ft onnobs.ì

. rNs r(MoNe)Ìd rNs2(tun ft odrybùúÒrÒsi . rNsl(MGe) :ld Ns(cliìo) ùc onhorocs.

4.3.2 l

speiès.í rhemw.ùpy f cÒpìde normrive(tudibed) rheyft calìedpr.trdd listlÙÒkrtúcdtrdbàofdìfi@íftebpoìogies.

43.1 ft€ nùmberof ilifr€ml treetopotogi6 Al ùmoredfte (ot dretypew. msidèr) h4 ft 2 inhal rode,ard a r@Èdhaj mh€roldifimrbpologies,TleÙmbdol utlmrd roporogies for a > 3 disì,,1 squcrcs i3 '""*l.l-

]::

'.

FÒrcxmprc, 7i.d0 0) - 2 02702t. so. dcn rù quíc snatìn, ir voutd bea talle runbqifal|pssiblebpologish0db

4,3,3 )

"---i A :-;

:A

/LA/L^^ Plyiry 7i,ihù(!) hy h

r.rcsùù8ù

J,3.2 Molccuhr clocktheorj-'

frúdio crnb€sriiùrd (sedu 9)m ù60mdr r 42 oúfioi\ rrrEhNú

J.1.3 Addilire a

ultrm€tric

1r€es

Fiqc{.3

GrÀ!ldd

k,eca^!

addrqùc

È edg* .ÒD!4'hg

rhe nods. Flgùe 4.31i)

ded iion rrì! dirù..rs io Fisùc 4 3lb) rrc

o\ùrìir!

rk.qr!!)irlidodyir

ir{y6uror'lÈnwecarhbel'tunr.

j.r.

4i.4

wcchr!b!r rhurods!\rnFis!rc49(orndúl squenesshoMi.Fgùrc4.7(Ì) elrr5eúdEqÉfùe.r0)núbe (srìd$ing úúÈqú rú (4 ú) inpf$ {k|niq h beyoùd dìe$opeof rhisb.ùr )

G$unii! nrcù!\I)ac)rf dorrr-itrorerùyrnpbi.j r

iI!LIlPLt

CTOBAI ALIGNNIENIA

j

rieoi i.e

(ù F+r. Íù idùe

'hr iddiiny d tu inkrc$ d toù objrds (br rÈ dtueobi4r fre ùùudd! 4qùiPnd, tr

Fisùrca.eo) iìrunr{* rhisrEqu ior (r.Il) is $úsfiedror aI rlÈi,ives or the ry rhatEqùÍion (4 11)impÙ* rhd ìÌ h Eùa'ior (a.l l) inìpliestr!ùrtion (1.r0),hei.e ùlknÈúi.ny iúprì* xddidvlr (*

J.3.4

Diff€rent ap!rcachN

for reconstucling

r,mlÙmolinullipkarig'ùaÍ'Ù i:ì