438 6 31MB
English Pages 376 [373] Year 2004
IngvarEidhammer I Ingelonassen I WilliamB,Taylor
PROTEIN BIOINFORMATICS An Algorithmic Approacht-o Sequence and Structure Analysis
Pmtèin Bioinformatics: An Algorithnic Appmach ao Sequenceand StrucoìreAnalysis
tryrar Eidhmn€r ùd frge Jons*tr Deporhùt of Infomaîics,UnirérsittoÍ Beryeh,NoNa, Irtision of MathztutìcdLaiobsx Natiotul t"sîituîefor Vediml Rétulci, Londù, ùK
JohnWiley & Sons,Lrd
521.63t !264
Cor
Part I
it
i,
I
P
Contents
Part I
SEQUENCEANALYSIS
PaiNis
Global Afigrùent of S€qrcrq
1.r a s6nns schmefd rheModcl 1.4 Fiidìig HighcrsontrgAriemts wnnDynmic r.4.r rr.mir.&,j 1.4.2 UEofEahic*
ii ì'6sÓno8GÀps:c,pPefu|ljs 1.7 rrymic Pmgrunninglof Gcndîl cap Penrq, ì.3 Dyrantu rmglffing fof af6necap FeD.rry l.e ar4rreÍ scoreaid s.qE@ Dishne
Pairwlselacal Aligmenl aùd Dstsbse Se(h 2'ITheBalicop.nliai:cÙredì!8T{oseqEtrcs 2.2.2 Rep€rins$sDe*
2,],2Fjndingúebellocdr|isihcns 2.4
2.3.i1 S.onn! Dúi6 andgappcfruies Dahbse Strh: BLAS|
r.l
HyF,ú6is Testinatù seqùem Homology
r.2.r
r! r.j
Poìsson prcbùnny dÌfdbtrtior
ftobabfiry Dnriburiotrsfor cappedalieEds asesinc md Cohpcdn! PósJ
MultiDlèGlobrl Alismenr and Phylogenetic T.€6 .1.1.2 a pBnìDgaìeon66 îor ùe Dp soLurion 4',Mul'ipìeA|igihdhmdPh'|oFreÌicTtds l,r.lTteDmhe'ofdiflÙù''Gbpoloei$ 4.r.2 Moìecutù.lockÌheory 4.r.,1
DifÍeEnr.pprùchestd lmmhdjns
4.3ó Roorincorlé t6t: bmMpping 4.3.7 Sbtisricsl 4.4.1 Aligrinst{o subsrùsnmnb '1.1'sseqEr@vè|glB
5.r s.ùrirg Múi€! ss€d onFùY 5.2 PAMscùns MdÙic$ slhsùtuiìornùn 5.2.2 calcuìare 5:.r
MàtricsrorAtn$ dol!úon!4 Ljme
5.2.6 ScorìigMties (ìos oddrms@t 5'2'7Estìrdinglheevo|Ùlionij'dtbl.Ò
Conìpdìis BLoSUM úd PAMMdn.è
62
6 1.2 Rènovi'g Ns úd corms 6.1.3 Pdsidonvùshb 6.r.4 scqEnc wcrls T ó.1.5 rerùc grps seNhìtrs Dd,bses wirhPrÒ6les IdTSd BLAST:PSI-BLAST 6'3'lM*jnguì.mÙlljPlea]ìgrmeÚ 6.32 cdsùcddg rh. Èofile
,
ó..1.2 Cdnnrudiry, Èrrìle HMM rof a prcreùfmìly
?.r 7.?
Îe PROSìî!,ragù4Ò E\ac,/aprminaÈMÍrhins
7 4.?
S.ónnsFlftefr
77
CompùnÒDBaedMcrh0ls ?.7.1 Piwl rllsd nertrods
r3
Pfrem Dfirn MdhodsrPruh
Part II
STRUCTUREANALYSIS
StùcturesandStùctù.€Dcsc.iptioÌs 3 r unns ofslncrun Der;P.ions 1l 85.r
Linè$smen6(ficks)
3.5.1 srmdted sheb (roPs) 3.5.5 ropolosyorÈo'einsrrucruE 3.6 rdsrryins i\e ssEs 3 6.2
DènÉseÒr&4 shcM orPndis (Dss?l
ssFnn.wùkfolPaiÚjssn4uftconpùjsor
Sùpè.posiiiotrùal Dymmic ProgrMming
9.2 93
9 r 3 U$nsRMSDÀsoriisofsdctuEsiDiìeìries rnd Alignmed AlEmtins Sùpcrsìsirion DoùblèDynmic Pner3lmine 9.r.1 Ldlevel soring naùies 9'3.3lbdEddohtdylani.P,lgrmriig
l0 ì0 I I îvd.dirúsionl seonètic Mshins rorshdùrc.onpùison 10.12 GamcùichÀsbiis
10.2.r Meaqjq rhesinìlùiry or dirù.è ($b)mries
Il
Cluste.irg: CombiùineL@alSiùúlùiti6 aodCo*nhcy 111 compdibirirr rl.2 s.ùchirsrorS.ù1Mtuhà\ ll.3'2orefìappinccla..6
ll.5.l Compffins rtrtomaÌioDs ll.s.2 cd.ul ii! drercv hoslohdiòn rr ó Crlrdig by U.corRetarions 1r.62 Ceondfi.fchrion
t2 SignifcanceandAssm€Ìr r2l
of SrruclùreConparisons
coishritrg R.1.dom shdua ModÈlN 11.2.1 cÒnsndinenorEduidadrbsG r2.2.2 Den!úloD ti'r for.imì1 l,
1l Mùltiple Sùuctùre Conpafi son
rinding r conmon corc fior ! MuìriptcAtignncn!
Part
135.2 Diso(nne PrcrinsPúem l3.5.3 lhe aPPúmh 135'4 scrirg ùs Prckns úods rots 13.7 Biblìo$.Phic
la Ptut€tnStnctúre Cla$ilcatiotr r1.2 An lsinsModeltof DoMi! rddúficfior 14.31 MdnìY{donl?B r4.r.2 Miinry-, domdns d , dorùis lari 14.5 Aúomtic APprcshs b Cl,ssi6cati0n rj.6 DibhMs lor Sh.M cì6sific'rion r4.7 FSsP-DdiDomhDidioúJY Ia3.1 Domains r.r.3.4 ToF,losy(roìdlibÍv) lmili6 143ó sequcdce ddsìrì.adonpD*dÙE Th. CAIII 14.3.? BNed m sùcLs r1.9 clssificarior
Paúm
SEQUE\CE'STRLCTIIREANALYSIS
15 Srmctùrehealt tro!: Th@ding 15., PFb! Ssonddf, SÙtrctuEPtdicúoD 15.1.3 Ac.ú4v in $cond.4 shctuE predicÌior ts.r
MdhodsBucd d sequde ausmst rri.l Tr.3D rD ratchiic n.ìrod '5,:]'2]xcFUCU€ú.úod 15.4.1 PotúiolsofFd
rorc
15j2
DoubÈDyimi Pmsn'rming
Pre
AppeDdixA Brsic in Mathematics,PrcbÀbitity..d
A42Prcbabilil}di|dbulioN
B tntmduction to Moleotar Biolo$,
Preface
r probiobioiDfomaiìG,rcuiins or mdhÒds
!lms'gúnlberelbaisforch6inglbgtish.plogDmndtb{iglipaEn46od oprions,aÍd liniuy be&ne noÈ conÉ rhe onpúr rcicd (o! q,npúú sico@ rodcú). on 'he dhq hand,nly lìDd aid n or sh. mry edi úc údèóbdiis ics$ry b mmc iftú!ìng probrè'ns, ónnù
bsis for ólabon'ivc prcja 6 nodùe igbt balanemakinsùe
ùis by l$uiry on rheideÀ or rhemrhdrlprcgEft lhire sryiis mry r scnrd 'Ih. ider {bou(heir ùù of appri.ltior úcùons ùo .lso d$nb.n rorn{lly $
'ÀeonginilÉ*afthpapcsalwdlsÈv r.o
ahoú ùe abjcd i EoE daù.
etve.xene desdFio$ onheivanabbdrbbrss forDNA aodprccinsquEncs, Tr4e hool! tpìdlly 3oì! y.e lirlè d Md ii ú. pmgrumsor hk. îrrc a
my rì'd sonc sdios
hardro foìtor lwhi.h oi bc 'kipped). ri . simitrnEDq
s we Fdìdc biolo3nar ndidioi
er
ùd rÀ.
What is biotntormalicsani{ay?
.vùbsddodfmflEbokpìi.ed rndmofDNAbyw{sialdcfi
mjÒ! bE*rhrouehsitr rlÈ 1910s J r95
x n rhcdeveìopnst ii rres fó
iigmaÍyrldentrndcsÒdhcÀrìlmr]oùfieúmruùtiÒn,lsd'h9bù]o!ia|
i. rùvcr duì !ùh aos ùd od (bìr).DNA trles(rùdodid$) dìd comein foùr dijìcrú
ldopnm'olhighlhrcuchpuln.lh ih. appì'6ùù 0f 6mpúh
h h6Yd
rhis
pi., bu.rilì oreof rheho$ widclyusd :inilÙ6agiqqu4s.qu.i..'Ti.q ru be*prÈd rohapp.nby ch!n.è)si rero rhGcorrriòdaúbs. prorin.areoirhms dì*ú$n in dePlhìl chaP€n r-3. úd, anddudy rhcarìsmsd ro s,in infomrion abournherclarionshìps bdwes 6ùaìaldsfurudprcpefiesommonblhc h.E n\e sme dioo aid). 'ìe aino aid rir. sdc( îcspdivcry) 'bù nry bchifu úNr rhcdnrrc. or.cùdiry rhdtr€ riomfy felarionship beNen a s. or prcbns ror seret. sorh tor pmEin runriona mryss, i' is .ùciaì 'bd rhemùlÌipìealism !fles rc rrcúedin chlpÎeB,r 7. !ìmtl foreuùrd smd€s dd 'hemjtrwùkn's hù*s (pedomjJìe,rordddq tuúbdkm rndsignalìitre) ii livinscel d€*), b bdh ùdddshd rheevolùrimùrpóbìls Gìn.Òrhcsúùcrùie013Èdeir rhdg6 froÉ slowlyin cvolùtionúù d@si$ $qúrc4, aid.o id.idry rb.ommor da$i6.!rìoi or rho úivù*
of prc!ìr 3hdrca. A coúnon appfúh.o doirg t r . , q ,\ n 4 r G t o , l r o n , t r . . , r u . o n
oi prcied rhd(s.
rn Pld n (chlpÌen 3_r4) Ne defnb.
frudùr grcnrssqù.i!Ò('hcodtrorrmjnomidsrtÒrs
ctuinG).rìoEqmos
!c.rc .tìa i o.ù. rru4re diíion
No This Qn !s sonc
d rhùdine in pin n (chapref l5)
ind porh rrudres Mo{ofùci'lg rrpìicrbìe 'o iúcrùÒ.ide(DNA ù RNÀ) squcr.ù\ lrde e. bÒ\$€( a nùnbú
Íl,/pjdeiJìbìÙfonficr
Notation
. o. ,. (, . . . dedÒksuD$ecifred úìno &ids. . a, ., D,. . . (úc oràr..Èf codd is us.dtur spdi8l Àùiro ùjds. . c, Ìd c" e boÌl ù*d fùr rhcblcrbonea{arhon arom. i e is a geHrr aiphabd,mÒsdyusd fd rì. id or mino dds.
tsr.r,,...,r'tfo.!!c!ors.qùo.aj.
. sr' r Nd rórrh.sr Ir,.rr. ..., Í I. . { is ùsedror a (qùery)*queftq ud a for a dÍib.e seqúna. ! 4r..J;rìcruh6.qu.nccGuùsùing)or4fim4 rÒ4r. . . isùedfùrsèrcBrBidue. . Jsd S'ùc us.drore.orìnsffirìy b r I is usd for o $dir€ mrix, Rd,is rhcsoins beMeer. ud à.
r ur,4,ar,rj.4,4....
GN.drùnsìdusvh.nffiyrdnsstururcs.
. , 4 r . A r ,A , , r / . & , r . . . . . f t N e d . ? is ùsd asa prh in dynmi. prcsGnnirg.
. ( 5 ,P )= r R ( r ,r . . . . ) . ùsìlgrhe$o is na|nxr?.s is rhe$oline,hd t
AC
Acknowledgements 'ftis bmk is builronletoE mb for llm6 in bioirlmrùd aìsonìàms bughr r 'n. 0niv.6iryof Bà-ger.we rcknowredle en studedb whùhM torlow€d rhe drarr oysbìnrùfod Elsdsm ùd iiromaliE ardiúpinns.on!ffirions *iù ReinA,srúdiDdKjerrPtuen, Sone of rh. d.itur ov.rd buirdronjoinrwr únhavh Brumr md DrvidGtb.r eI ùd èpdùUy RurhNusimr ud omù Drcffor vrubìe he1p. rinally,rbinrsroou fmiliG fork cpirgupvirh usdufìns
Part I
SEQUENCEANALYSIS
I
PairwiseGlobalAlignmenlof Sequences i"^F.,s ",.*
.h.."" '","
llll:11'":l''-) 6"' d.r r prrif we
eqrRÙ'ch!fo5jI4|^;\ràels.ongJ!'1
*"p'"
ùèotú-cD6k
bhtr'Ùehmdrdtrid-m'
"lgollmbÒJ'n!T!r.em'.xed
Pù'h
bbúJqùqk'
dt dù1ùùaqtu@r
016 d \ r" &.",. i
nB^. -.N"i.r
l.l
ÀlignnentandEvolution
drhÉ5 6uúr
Nhsc
rhcrclrìoNhìpby a/irrù8 'he qEEs. nre iliemù' strùtd ippencdinòccldlurioiof rhcnqìsLqucnces.
(ùoded by rrd,f) nEms delèrilji d iirÒrión (t/rn.
onc q rrsr
owi Grd, ir iÒrkióNn).rsilLnrì$mcnr (onryone hlLrpscd Hidtrechrqe ìn eùb muturioù. rheaùsnnenr b€rnenr rnd
PAIR$ÌSECLOBALAI-IGNMENTOFSEQLENCES
:1.t1:!.11: :t oi.4db, : too. ftncon.
Ì1..d,
Rod I ooeoiúleBl eú$brliii6.
l..ql6.-*-*.,.
s à t r r d i r n . r , , ' or . o o ! . , n e m r
sd o- \ ne . hpr. moder.r r ù!d. q b m rÈc ù" ,, Àù ù mnmc
rbuns r*o ildeb. onè hiíoy cÒutdbe
qf\ù.h.erc|JdmJryIil'ol)hîjel|'gnmÙ'ùL'D\okh\ò"!bùt'
m\ mùr .ì " f - e r h o r t dh , e o ú o @ú a r t Ò E h r i r s " n ; . , u r n 6 "^.o ro pcú rr s.rcrn q s ,\ oir' mer, sfu b, .!!roc_ d ,,4',.s;/" prce n
n'Ù""noluuonfJ"ub1'"."'.hji€g'e1'
1.2 Whatis an Alignment? mN \djsrythchrr'^linsoiqdúú (resdu6)inq andI h! . Aìl synbols
for theModel 1.3 A ScorinsScheme 1.4
Gddili\e somg $heme).
. À , ,- l f u 4 = , , 0 r o | , + , i
P4rRwrsE cLoBArar,t6M.{!NroFSIQUENcEs
ffihllffi "hH il?-*Tq;f ffi*#,",#ffi*"'ffi 1.4 FindiDgHighesrscoringAlignmenrs wirh rrynamicprogramming
'*;ffiTliÈi':r-if ;,y,#.r'f.:;hBi*r
*ffiurfr ,**àtfr'".ry#L Usiùsdpùlc pmgmnins, frd ù. hishèt po$ible kE.
hish*r ff iltl*hidìns,rc sùcbv6 ns,,e To.xpìainrie m..hodee ùbdE
"r"Íi:tì,:tr&*J '
sorÈ rohior
Pffi :T::Tr1'1 ilfi#:H',Ì
r 4, rh i!\ synbolùf4,lj ì.1în syDboì orr. "
'nbnk orq rr era0Pìe.r, î =:th. ! rr , r rhesequenc! or 'nc jì6.I lmbok or l.
.,9'JndÈhigh.s]uft{hthlinbe
Ndclhir,,lvillbe'hÈbìgh*soc u* of dÉ Eîami. pfd!Òmmùs wa r,.j by siig oie or mof or rr.J. 0 < f < i 0 c % r b e n D ij = ' r ù d
2.2.2 Repetingseenenrs dr{ r+r ftùè eirl bddos onù. .ubdirgonirriù (i. j) '0 (i + i, j + À)(r.c
s\
.:
2.3 D)îamic Pmgramming ììebc{ srob{r!|anmsr. Dyi'ri!
F+' ùi .
b$' 0 ghrd \conrs) lo(L rtis'ftnr
Òodi
ilr /6. TrÈ ber slisDmenr. ftom ùreb.gintriis, rndiie ú rbrhe* ftsjdùàsthc bsi 6)n nMd itr Fisùr 2!k). Nrh i sor of 0 4:
dt.r
DYNAMICPROCRAMMINC
2,3.
sis d1''
2.32
TneexamprelhoNrúútprc6x6 Gonins 66tìn 'he alisinot)
mùlr(4ri /J ,) r!t
{
ch(uc
nlg!
rhf2 tù€D peùLm lyan1i. rmsnnùùt
a,.tùt HsP
2.5 Exercises Tryùisonlldsquen$d=DAEAD
l,l (n. i, bc 'he fln' rnd li$ posidon i' (b), ùd (r, jr) bc rhc úme fo, d. s
!,1
EXIRCrSÈS andr ror n\e sappàìiry (ììncù).rw Íqr@s
*c givcn:BRRîRî and
(a) Find'hc highcr ndn! lodt !ìilnmsnh (yotr$oùtd findroùt. (b) Youhxlr rh$ly d foundrheaùgmÈú
cfler?rize Equîrion ( r.2) ro bÈyiìd iir rìldine a l&aì:ù4mn
{idì e.i.nr
aIA (noft€ 'haÌ only 'lìe imino xcds
A . I .L . s c r l
u s ú = 2 ,? = 5
(r) Mr]re3hbleteiúali|osibrevords(=4r= 16Ìord!) (b) Exr ncq, úd roroeh woln' ir 4 (o lxdinc l, rtul ror.ùb wd. iìndm (d) Youwil iow iii,r 'hd 'hcr ir x *o lh*conhinssLf@m,atrdhNs$rcsi hichHsP(virìì $ùe) woùìdyotr rbùcùr.oflYlt!ù (Lro)! Bci\arc (a)Nov debilk,m.orùc sùbmùdri
2.6
{hl r,ìsc
irc rbrtu bib d rhcsùù di
par (Ìoe àroded roHsr)
2.6 Bibliographicnotes dornmrrds\ arsofoù'dii afgos(1e37) hd\lnsMùd^4os0eer)
sqf
MP/www.g.h,sdq'!rc'aJúùhi lurion!d DfotnsedGeesÈohenI qql).
\'
, .'dMità rd3ò
r00t. FAsTxn dscnbedir pàMi ( 1990),,nd cippeJ BLASTjn Atrs.hùrerj
J
Sta
StatisticalAnalysis compdns r qucD\cqu0ncc la) $rh É.h 5ùrudoct(d) in ! dú!b*. sNc! fúc ro k{d ,iFEcdl naiy G/,.r),e$h
(homorogous)tsqrqo, Í mui hc fÒh
ÈrfolsiEJfrlmeofîa|ienmm'o||qo
3.1 HypothesisT€stingfor SeqùenceHomolog' drsist$i.eHypúhsjsqùgatrúidly
hypodÉsis.Él'À.1$'qiH.
r giEn rhEshdd{qc o 0l (2s]), rher is cNon ror rcj€'iis H0 Gùhc 1* rcvrDùd r.cpriig ,r :n!ndù (4,/) sigfirì.aú.i.e.irrher -,v
ùrrttúútt
údpdhrn
::ihn@ql!i!núdlNlRì!}Lú
qqitur
HYPOTHESIS TTS1INCFORSEO!
rh. hìlher $or (hì-!h6' nsni6ene).
ùd rheiÈ!sld PUfsrìlr (4./) vrh
gùi.ollJ rquur!$ (\ccscdu :. t. )
Dder!.e
ùe rjcdu
ÈEr for ,!ro.fo
Fshm dÉ A|e im.ú.hosedù (:) fioddE leftnr
piir f,!ù (.r JJ \nf
n! rherî lr o' higlú, !iru, to G$ ,hc p'obrbiìiqdisribúur nuoduboo. rilr onrìnrc{iÌhrheq{úo. ted rof
3.1.1 Rrndnm geftrrliotr ùfscqknres e rNro dd r^ sù
a$ùf!
ù'Lphlbdútoùslmbots IÀ,.
oosiioi (or úe nndor
!, Et ù'l rrroirqus!$
l/, =0rì../i=0r /r=0r. r,=04t.Drnìbedm+rwnhr!tubrrrrnl'\i!!
ndù .he cure Fom ó b @ is rtu piÒb$illry
rcÌdsin rhemtualsqEie5 (4 d)i\ u*d or onc ({tr bÒrh)or úè slùedÈs h dor
pr&dcÀ
Ior úe pnbabili$ dìstib'rior.
jlù8,ú.{q[mcisdìvidedùbcej
TESNNCFORS[AÙ HYPOTHESIS
'' . , "*.' 't1,.'
;:
r'
: tti
r.:'i I ''
, ,i;....,
! I dL beiî ù.s Ddirios (bÙrùshut['d ofdr]
GÍ|gl5iiglesfNscÙafl
siadlicance 3.1.2 Ùseotz Y!!ùestor cstinrtingthest{tistic'l
ú.7skfds'hh.]llThj\lsudlesúd].
1.2 StatisticalDistributions
3,2,1 PoissonproDabilirydisllihùtiotr Poksoldistibutioniclbenoniqrnlir|4d',nìe (úe prcbibiliry Ìhat 'he rochÀ,io ujrble
=.r=i" " Prx Plx>.r=, i1" ;,,
.
{ vill hre
:1.2,2 Orl.€ne ral ue distibtriio ns LertrL.
irs . r, tbe iodeperLrtù,
nrùcrioiroi ri i\ ùen (:incedì! r uo indlrlfrenr of eiú orhù) ,l= Pt\i
I
=t.
I I
qhn'io! ofr fr of (oven:ppiigrslneds ! (QDdon)qFcd
ofd. iid r1 rhoI
'bcf,@IlliiicpÒtÙliiLh|úje
\ÍrúI+!drioro$us(úrrestìoq.(ri
'
rr
î!F)brhiirr
'
f):
l.l
ì (d.ishyldn'nbúFnof f is
1bc 6mof tr(r) d.p:.d!.ì r urd(
PtY>rl=t
r'ìr)=Ì-erpt.
ohen.., rr) htrrk,NlhcoccLhLre s
, 41
6r)
îùc t5 ! turfbi be's*ùù'.d; a'd,.i,tr (0.5rr di\aihùrior is Enìe1s coarinr)
rhe
r|.|'
\nallsi\oÎ Slàli.lnrl SigniRcance 3..1 lheorelical KÍìfl ind^rf hd (Ìe90)hlrcdo^.rhÈ
r.{-k.r.
.Iis'heiìphibdor'hermi0om,ds
tEq@.ùr lp,l.liJ
(orter, = t).
E= L
P,,ir,Rt,.
I ,""r"a"= l be)otrdrhr $ÒPoo. rhisbúk).
{ed rmn l,rd, ùd I oo{ rhjsis doir
t lrr ) nd {tr I rrc sùfi.ieiLry imikr
ed { ùe 8Nd?'1
ùùhtt
o! ylnIl
3J.1
t (.:,i3 urdbùiùscof 1h.lPrxjmde
By sritr! Ì = 1 iDEluim c.o *. 3 Lsttf ptubtt itúr elIùùùg 4t kai otu r , { s ! = P ( r M> s ) = P ( z r ,> r ) È r - e ! = r = I sp(_E(s) Noèrhdexp(r)is.,rùivrùrk, !,.
qp( r4,e /51 l1j)
. By dpmdiis Equdjoi (1.7) inb r p
P ( s r =L e \ p (E ( r ! r = - l r - = - + : + - +
)=
wchlÉtgoseqkrùsrúd/oflère'l r. wc hndrìc b4 ìGaì (unslpFd) ar
3J,l
The P ydue hasan exlnmc valuedislrlbùiion
rr ùe bìgher$gmentprir KoE foundbycompÍie! olúo squctr€s bcs . Fón qùa'ior (3 ?),!c 3.r 'heprcbabiìiiyfq ?(J') = P(s$ > J')r r
è "= r
dp( ,(aze !3)
P(rM > sJ I | - *P(-ehú'De-!r). P(SM>S!!r 3!qinsi
cip(
= I md! - 0!lr-Ì)/4 P(sM>rl
*èsd
- r -exP(-e-nr-d)
(r..1)rbencewe hale rbeso punebB (I and d rli.h is similu ro EtìuaÌÌoD
3.3.2 Theoretical ùalysis fúr datlbase search mrrysh fof ùslpP€d ìorì aiEnmú rór expraiiedii arr{hd d rì. (ree7).,rhdrollosinsdcsdprioDis bxcd ù ùir atuìe Fora {orc s . 'bÒ, erìuc(hc expÈred *qu!n!6 wirhsco€sor d ràr 5') ii gtun ji rqùdiÒnc 5)
3.4 \hùe io honoìosousr,ìù.ncs di:Ù. NorerlDr dr , \.rue eG^ b Ìhe luruq sgmcih, bu ibrvory smrrr t {lùs úì
o,
s n\Ìssor *hci . squeiccs (indep€ndùi !ìd or shr rcoBù14rc conìpmJ wi'r rhuqmfi n , mhjptied by ùn p eìue. asùr'ingin 'rìc sùc Equ{iù (r.e) Gin N = 7_,). 'hiscqullirysÈnslohddlollulsrs Ùb 0.0r mùn úc t vi'Luebesiosroinqcca fan4 úlltr rlr ? vdtr. usd.îÙdlheo,'.,pfob.biìirylolct|t riro î.id (brcksìuid prcb.biljri*) NheDdifr.ri' sonrs naùns GM brk!Òuid prebrbìtiÌi6) tueùsd. ,rhcEr{re, r s, ido r róiultizr
sm s,, sch dìarrhè
(r! Òarbc rùNi ún úis nÒhrntc'l k disribùrior *irh ! =0.1= I (se Exercis6).) T.e nomaljzedsoEs is deroredby Dri Fron Éqú'iÒ! G.r3) we tuìd s" = In P, whùh i. ú' iofrdizn rón rcqù FÒfcfhsúi'gnaùixúdlypicdsnino rhRrorobecrr.ùhcdby F4ùrdÒr(3.rr). $d 'hoÌ v,luc roùid byEqùa'ior(3 ì3).
3.4 Probability Distributions fbr GappedAlignments lìc soijsú.,r tnùy $orc is dcvclÒlolror merppedìocir dislmnb. No prur anpuuliom|qPcfìnqkfuiglys y rhcncrhoddcsribrd È!ùìÈ..aìÒulatiig Pdl1jgnmo*ùscNb.Ibhdmlyú.y
rorany$onrs núii. îiis ii ro' (yr) p
ÍJ'
ror(crpped)BL{ST. A dnúek
Ílilgdúoms'!Ùacslrcnrly'ictrlniD rirs r.r..3ppìr Er.mi. pîrrun03o. isfibúion of rhc soú. n órdè, hd ùc rhd lill be usc!. Thisfl.eduÉ n u$n n ftí sEnsncar signi6aDc .md besiver if
hbissqÙeDce'rcnoEnANmilgúil 4 is honoìogousroa mùìmun ot fd quLry{id r *t ÒrdùdomionhÒ'nolo Pe6or (1993)ha invstiefted $venl Es (z = (Y - r)/'). sinirîdtymrcs ,qnM ùú ? 0 or r$ rlìb
tt
3.5 Asscssing andConpaíng Progranlsfor Database Search
dkr. no{!!ù
ùt Fo&rì mrgr cr rir),
!)2e ?
(ned i ùÀ!4in? \qrcn( a sequ.m. r4dr qBre irn n ouhndÈou (o 4J:dlhrei5cnr !rrùì r t ,J.4.s'df
. rr(?): ùcnunrr€rorrn€posnir{quua!s. . FP(7):thcù,mhùof filsepnsnn u s4oenes . FNl.) Lh!nnrre oflrlseùèefLÉfquenes
aMd..rùt FP1arn{ua ù F!ú.r.r(i)l
3.5,t I
../'..
..':
3RmP2ú úc q@pr. O sols P ud h-.'6 t bù MN ú! utuc 1. (b)fu ssn\4 or ù.
i r 5 35 7 1 5 5 . 1 5 2 ! s5r € . 1 ? 1 6 , 1 5 1 1 1 r 1 2 1 0 1 9 * l ! l3Z5r ól ! n : : Ì1 2 :!re! !ìe6v3ó 35t9 73?37573?l6e 6e6665É1ó3636rrio!958r!5r49t
Pr : HrsnrqnMwnqnxHsnHnin P' ì HHúffiHmmffiHHlnrhrnHn.n..
fte l nù s Nhe€FP€) = FN(î) GÈ s{tioi 3.52) Fi3uE3 rG) \hoqr ho{ FP.nd FN de
3i.1
Smitility
ald spécilìcity
r.ror@nmea o'heF (N4r simiìlnúcn. hy prcponio rnd T/(rP + FN).'he
. tr. /: N/rrN + rìPl r s/4j
TP/('|P+ FP).drcPrrPùri
ùbí. trr d ùqu\ ns î 57(?lhsrs rherr o(ùr. r rùvi f hg!rc r th) Nor.rtì
,i / n i iord{€ri,ìg
iiDdùn
'vr r ue rìì dor ro1.ikr srr i n
+r(oorrlc)kì
L(ioù!e) mdèdeFnd on rheòLùlbtd f iguu r rlh)
:1.5,2 DiscriDinatio.powe. a aii.[ piogfri s /6dirùúria
(of dr)r/n(d,r
/o1q ir hN ldr i d\(nni
r^ (r ) (r r tilud oumhcFois.quarc- \tuns y lJriLld !ftrruc \quoì!$ n {ed). r,Lndfsieoodjobindiúimùrù:heneaì
ru{rutri i F,gK:.r(! ùFir(i). $her rr = r^ rdúrJ: fE qoFdig!hfl({ùic(Rocl
a^Rocancn!$ls
ii!ú
p..irì.i'
l-
=7
ii rior @vq. la) ft NÀe!d
trLÈ'oN.s
.r rEsìrrtr.
i . {pfopùÈ
EUI d 1v. *!4r prosm5
€?) s ùc ùrc$o
s i i Í q P Ls d 6 s 6 f
.rcs to r) f eqùaL'o 20. FÒf'h. splcjri!ìry. b se 'h.Jrlv rdirn? zk. wlìlch is
a \hich m dasirìed rs hondosout. Nore iùib fc dósified a bonorùgoN)Th .uidea|cmv!{ouidbulLafnliliyil]o
.Icù.ùnbqofmnhonoloeùss
dìe Roc diaenoì(r'ora dÍrbre or j b or0.0r).1xùdoc,!d r!.'l ! n'Ìi {hi!h umbcfd homorosous f+qes enmpkl ard rP, rhenù'iber or ho,no
(10ii our
rslc rsnirc5. Roc, is d(rìncda\ nPI
j,,*""-,,1=ff = ,rfc*o* t t '.t,o -9
riÒs rÒ 'I. fùÍriùa í EqufiD c 14) Ndc rrrd RocÌ n ! sù! 10,r l, {idì 0 s wonr úd I îs b.r
ù rbc urMt
turdir d P Fisro 3 f4 \hoqsrhi ùe kr (0 ..tr,o. r) urdd Lhccúrc. $ irhrrud i{tr Pr ìl Fìsùor5G) tuf
ùc nmbe or ÙÒddd3otr\(d ù. qrcfy) $quoirc\
3,5.3 Usinsnore s€quenc€s asqù€ries
3.6 Exercises
t-
{E !of!rFdù!.
b 'hc Roc {nc.
'.
(b) r
.h 0rrh. rhumLdqlcicc\
pmnde^ ( (he modi. ù ch{adefnnc. {ìm) mqsufe or decry confùr)j by rhr .q
r found
ind r (ùe vùiamc
ing i! q sd 3 tudÒm sL!@rce, by uug Eqùdion (3 4)
h,honoloeùsb4,wÈù.scFq Ìotrrìo'ioldgous rÒ4 suppo{ 'hr .bc
G) Ld '.ek $bcto cr.h or rbÒprcs.J P': !!Fn!nsn I nnnHns....
Fild rhevihe a = FP(7)- FN(î) ro r$ù {iù whd Nis n{nd in (a)
]\lhaì\i['n.vùcdj'lR'ld.t
4
(r'solr59)ùìdas.qr.ì!ci,,swn\prl
if rsi'iyirr,Àp(.iLorr rlr(;4siiurr on,hch{toúd dit c[óo\clppf)p Dr!ù$ !hi!h progfn twil|] rricrN$n p{r]d. t )'ouretud roberhr b$r
riqùrior(r.1)ii suhsLdòn I 1.: tReùùúq dúrù(!') =r ) r r(J') = r qo{ r4re
P}
rr ). Thh
{ = (n(,(uN)),// coNiLlùLrr(RrLlqD!d rnF4L!ùon c.r rr. slìN Lh! P(r ) \ | úr$ hr! r mrl izederkù rruc di!ù rrùiontr = u 4u
3.7 Bibliogrrphicnotcs ,\xsììnì(r99i).ìl rdjlr$ roÈifrd
(nrir.0d akrdrurfreeo)trd (rú 0d bna r{hùl errl (ree6,ree+)^iny's
conùsda (re33)(rfiisolrlr3).ik5dìùrúa (ree6.lnr).r!,6on(j996.19931 i \r{ rrrri6e(r99t
Nebk lnd Bldor (1001Ieri,ù,tuI ql!r$ tu siùd
ri ir ro.aeo ú j (:000rr) îd r_indil údFonsoì(:1000)useorRoc!úr$n$or!ii|cribrho!d,rRol,i$o!(ree6)
1.7
4
Mulliple GlobalAlignmentand PhylogeneticTfees 1ru|'iplealigrn@'iltmtÌmlùrci5
o rir r whote rrdiy (n h5 bd srid rhd iùtrir e aìisnftùrhoúrùùdty). ri odÉr ùdJ rvo (oi r jc\9 d Lhsmir rhe xi rnry .rcc
(or supedùir,
I. ùis Na)l 'lF arlrsi'
nid Do?lF ard ptu /ó dc\d
J.1 DynamicProg.amming Is&dbmuìúpl.ljg'ìÌjÙ'l.PÍod
ú a nùìriptc s.,lùe!d djsinqr
qiuN, lyps k (fe'ne'nbe.'hf ùiuns
vnh ùdy bbiks r€ forbiddsl
hd! an bctn (l d'Ilùer lrs onLqus!ò gnenby Equinn (4 r ) ilì. ùùnìbùorcdis
riisr€ {.,
{n ùs!rc ihÈrfúg 'h{ ricono, h (b)
rqucrus (o(,,ii) roreqml scqncc lcr-ellN,r, ùid ior phúid \orudoisor such . T'IÌotdùeÌhe rumiu rimebyNirÌgpruniisr.hîiqùs (.ù!oft vrúchf ì
aid rhe be{ (tr coreo sohúioris n
J,1.1 SPscoreof multiolcalignments
e:Ùenr.i|ldsi'nplysnlhembobhi|'he
iNen.d itr n. L.'s(i',ir)
bc ùepai
ise
(J, ;r ) * ùc so€ or3 rlfrubre Pri$ù. hrmlsìnrbeprct-'id re*ha€m rf *c tr\Ltiie{ lap cosb,rhcsp foa crn at$ b. calcuhreds a sùn .f (orum
ìr dì3nmm'.ii is'r. r,rr symrroì of r lrd ^
1.1.2
(riur 3!p |€mll],). úd (memher 0 ùsirgEq@'ioi(4.2)is (crlcurr'cdDs-Nìs) 0 + ( r)+( l)=-2,aMbyurirs r)= _. E u a ' i o n { 1 j )n i r ( ! o r ! m i * i s e )( 1 ) + ( r ) + 3 + ( , 1 ) + ( Nù'hdrofigivenscolilg{htme'l
Thisfolss ftún dd ri[r rhii J(r . Jr) i\ úr hishsnsùe rhieúbrc byalrgoinc
rk sùe or 'rreFrjsdiom or 'be 'yo iiÀl *quses ìn
vÈishd 4@ù! rheE m biotdgiùr (
elìtredii &ùadon (4 n a[ lcluercs
ùe
rulf$du[ÚoLdbeÚmtdb'gìYù3ùehigher!cj!lùÙùtùd'nohcd
'1,1,! A pruningaleorirhmfor ùeDP solulion
d úc,, ;eluenes (r r is ùf knNr i u d 6 ' ì d i 8 s Ì r u )c o n s i d ù r . eùr rÉ ( n . i l . , , r o r ù e D F ' ú r i x . l r d l d ù d $orcofrhebsr p.rhGLisinentiÌo'nthefú !Òrb ldr , bss. (sccFieùf4.:rG). < F îcn e ! kio! rtu I )e$orc or L mùf be< ú + .1 ú: 'heEfore,sr + ,., < ( , wckno{ I
K ùJ 4rR rlc ccttI = (r,I l){ìtlrher dÉ hjeh.r súto nÌ!isnns aRe, as. R (sG.Ì ). wc rho hreio ddùmiie rì ler uppùbdid (iì , ,) for rheirieiùh
t 4'!i7 r.usior j\ ùsd iirúd ÒfD,lrlad
j
'l G) r4ue 4.3 | shrhs rrìÈfo*d prudDs (b) îE s@N!
(b) d!
I i. Jr aa ri
eìls Gs in bek*ùd dtr$iÒo). a vrl D(r, ú). rher úc uùe r, + r(r, D) h sc b u: rhc$orcor drebsr pÍh rÒu
by sc ol a queue. wìÈi I .Òllis vnncd.ir pllcesib rùsùd ncighbou6(o phich i' shdùrdind ujuet in dÉ qùnc, n! oc
j\nji! cùnft. i:) ìh rofld deiehb.ù!.d! s h o u l d b e p ù s h . d i ' ì r h c o d s ( irL) . (i :a++ r , i r l . ( , + r , i r + D . Algodúf 4 r $ovs rìretoruird f.ù\iùr
snh prunirg
DA.
AND PIN'TNGENENCTREES aìso ùn 41, FoNad-lffiion
wth prunin&
ar :iconún rordoirg der muìripteati Fonùd lmaion is ùsd, {idì poDìtrsor.è[s t0'herúcrror'rrcDPnarú(rq0, .0) ù, rhefld cenor rrc DP mút {ar,r 4 . ,i,) rhelhole alism.nr s(r) ?(!) D(!.,) 0
rheber slm of m atìsmetr' (pú) frcm/lo .o ! rhcsÒrcorrh.b6tirisnnqrrmnÀob!rÒu soiT 'he$oE ror dbdìns thealiEE a sbck of ùe ceìlsI for whi.h a yaìuero p(ú) rsfoù'd
Flr, /ix) ÀpM.duÉ rhÈh frndsm ùppú bÒud ol rhesG ofúè rlisntur tom r ceuI rorh. èid-.ètr, fl r = /'oì P(,) := 0i push(', 0) pushdresúÉ cf onrh. quce p o p r , . o rJ:r , ) : - P , i ,
h6sorr .J,-..!r
irr(r) +.(ù.À,) > ,( .hen
6úis.@L!
tor auhNod neìEhbÒú, u ofr ànù ú. 4ht otd.l Push(u, O):P(ù),= s(ù)+ D(u,ù) P(ú) i- nd(P(,), s(,) +D(,.u)
ainding upp.r linit ro. soru Forùy úgm{r
,{ or sqùdcs f,1, l, . . . , r'1, rheEqùúios (dr) 3i/ (a.a)c
s("{) < I I rr}.rr). ldùheúser (n.i:,...,t).îÉir ro.rherLisnmón!o.rh.suhÈquenc*r1+L ",' "i+1,..
.. rl-',..
nrs mb.
Jotrcby ùsinsEqud.n (46) $
.-I I ''r,,,,,1,., "," r f{ù!d ù rimeo (,1, úeÙ d.iors f mprcrity fof6ndiiEr is o(rr,1.
sw andaRlR. , = (3, 2, 2), aid I soirg
i.vjsgali'!ú]enÌsorcs
s r " l . ., . " í * ,
",r.
r = 0 . . . t r r - 1 .i , = 0 . . . a - r no ! ,ts.,. :. .'
u.i id \c,o, .-
r)
conpL*jty ol rhendbod ror lindiner lLppùbouùd\is ú.rîoft oo:n1 spondiEronoviig lod ù rou (DG, u)) n ùbùraredby úe vùih' Éq@'in (4.3)or rhesP soF. Nor rhf rhctumbcror
rbodJd uscd,aÍd mrnyof rhemus (rcush efimaÈ o0 phyroseúic (orflorùrioisJr) b! riir heh wnhdìeaìignins.
4.2 M tiple Aligtrmentsand Phylog€neticTfeer rte lè*as {lemì!!Lnqret,
and rhe 'nEnor
ros. sù.h I r4 tr ciììed J phylolonúic (r .vorùúùrfy) ùco.ind sfiuy '11iib (bmnctr$) o$Lnr sqùems, $d ùe ed,ees e tn i rK cùmrurL\i f.om prcrein{of rJN^)
lioiÒlamuìlipklìj!Ù[4sd\N}nsJ
co^ldsr s'orseqùeocsl^Rl_,ARrr.aRs'.ansl.awrl,a\yr] ùù4rlEl
oùùrL\
b $irrudattud) prryk,!
o|Ío'nR'ov,h6ccuftdin'h.pa rdúryhrùn î nùùriotrfftn s bî
e^ trro 4ueNes. one*qkNcJd
rGd)
cr)arLcmur, ofbds.ùr 1{o Gul\d
whor Dcihq u fru, rdirìe ililmÉnr.or i (lirc) ltìrogqrrd! a!u i! tN\( (L r ph![]genà ! re. (plr rr! orh{ mdhod1 tva) hy dhq mdhùL clmhùùg rh
.Ùlp[]ro:uldi!to$ud0d'ielcJ]sF rerhod!lo1ÓNndiigplìylogÙtù.ft4
.LL ri$ ld$cu, úrn trhorrl'3r), rq, u'!$kr0lnsrhd
cdg$ u Lhc'(o
ln ùyìosi€ric rudies.'heoL (6 tur uú,torLn h ou($! úr {,bjLd\!
ud
ND PHYLOCINEI1CIREES
a prryroa{eri. rÈ cdsrudd ùyòc Eisù@cjohiq Dr'hoduis ù N ù (ùr sùc 4\ red ro.Fie!€ LD. rhe Nnen ioss ù. Ésh orsirg bobhPpùg Gq .ieEr.5
ows,cìvúDdgìú|sqEo$'qlimde diphy|ocsdi.hc,wÒÓNidùoly . Theùe ha r l.trimr nÒdes oeavct. otrcfof flch ùiein.ì squence.
dnediotris dÉidcd.(rnùrf@Èd ei
úc difarion is undsided.)
rvc ryo chitdcn;rheùrenat mda ofm unrcord re haverbE comsrd edges(btr.h.9 ! Ai ircdd ioò.ù
('nsomc@t
r+
r+ ofPddeÈùFgurc4'5
nús bM rwÒiDstin sqq. lrìis fte rùùefor eivesùr rlc Òepúniry roìlludraé rbeconcephor ,n ,r,s hd rdrdr4. dÙpucaliù'í wejitiptì
lh. fuìl
ìes,bÙlpmlÙgsirúcy{edEnvennlmgeoe FigE4j s tbeevoluliotr shNi in FictrE'1.6,
. rNsr(MoNe) d nrs2(Mose) m p8dogq . n{sl(Mose)a
nrsl(Rra ft onnobs.ì
. rNs r(MoNe)Ìd rNs2(tun ft odrybùúÒrÒsi . rNsl(MGe) :ld Ns(cliìo) ùc onhorocs.
4.3.2 l
speiès.í rhemw.ùpy f cÒpìde normrive(tudibed) rheyft calìedpr.trdd listlÙÒkrtúcdtrdbàofdìfi@íftebpoìogies.
43.1 ft€ nùmberof ilifr€ml treetopotogi6 Al ùmoredfte (ot dretypew. msidèr) h4 ft 2 inhal rode,ard a r@Èdhaj mh€roldifimrbpologies,TleÙmbdol utlmrd roporogies for a > 3 disì,,1 squcrcs i3 '""*l.l-
]::
'.
FÒrcxmprc, 7i.d0 0) - 2 02702t. so. dcn rù quíc snatìn, ir voutd bea talle runbqifal|pssiblebpologish0db
4,3,3 )
"---i A :-;
:A
/LA/L^^ Plyiry 7i,ihù(!) hy h
r.rcsùù8ù
J,3.2 Molccuhr clocktheorj-'
frúdio crnb€sriiùrd (sedu 9)m ù60mdr r 42 oúfioi\ rrrEhNú
J.1.3 Addilire a
ultrm€tric
1r€es
Fiqc{.3
GrÀ!ldd
k,eca^!
addrqùc
È edg* .ÒD!4'hg
rhe nods. Flgùe 4.31i)
ded iion rrì! dirù..rs io Fisùc 4 3lb) rrc
o\ùrìir!
rk.qr!!)irlidodyir
ir{y6uror'lÈnwecarhbel'tunr.
j.r.
4i.4
wcchr!b!r rhurods!\rnFis!rc49(orndúl squenesshoMi.Fgùrc4.7(Ì) elrr5eúdEqÉfùe.r0)núbe (srìd$ing úúÈqú rú (4 ú) inpf$ {k|niq h beyoùd dìe$opeof rhisb.ùr )
G$unii! nrcù!\I)ac)rf dorrr-itrorerùyrnpbi.j r
iI!LIlPLt
CTOBAI ALIGNNIENIA
j
rieoi i.e
(ù F+r. Íù idùe
'hr iddiiny d tu inkrc$ d toù objrds (br rÈ dtueobi4r fre ùùudd! 4qùiPnd, tr
Fisùrca.eo) iìrunr{* rhisrEqu ior (r.Il) is $úsfiedror aI rlÈi,ives or the ry rhatEqùÍion (4 11)impÙ* rhd ìÌ h Eùa'ior (a.l l) inìpliestr!ùrtion (1.r0),hei.e ùlknÈúi.ny iúprì* xddidvlr (*
J.3.4
Diff€rent ap!rcachN
for reconstucling
r,mlÙmolinullipkarig'ùaÍ'Ù i:ì