SQL Server Data Mining: Plug-In Algorithms

Microsoft SQL Server Analysis Services 2000 Service Pack 1 allows the plugging in ("aggregation") of third-par

331 92 109KB

English Pages 8 Year 2004

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

SQL Server Data Mining: Plug-In Algorithms

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

MSDN Hom e >

MSDN Libr ar y >

Ser ver s and Ent er pr ise Dev elopm ent >

Page 1

SQL Ser v er >

SQL Se r ve r D a t a M in in g: Plu g- I n Algor it h m s

Pa ge Opt ion s

Ram an I y er and Bogdan Cr iv at Micr osoft Cor por at ion July 2004 N ot e This infor m at ion is pr elim inar y and subj ect t o change. This docum ent will be updat ed for fut ur e Bet as and for t he final r elease of Micr osoft SQL Ser v er 2005 Analy sis Ser v ices. The updat ed docum ent will include addit ional infor m at ion, as well as upcom ing int er face changes. (17 pr int ed pages) Applies t o: Micr osoft SQL Ser v er Micr osoft SQL Ser v er 2005 C+ + pr ogr am m ing language Su m m a ry: Descr ibes how SQL Ser v er 2005 Dat a Mining allows aggr egat ion dir ect ly at t he algor it hm lev el. Alt hough t his r est r ict s what t he t hir d -par t y algor it hm dev eloper can suppor t in t er m s of language and dat a t ypes, it fr ees t he dev eloper fr om having t o im plem ent dat a handling, par sing, m et adat a m anagem ent, session, and r owset pr oduct ion code on t op of t he cor e dat a m ining algor it hm im plem ent at ion . Con t e n t s Ov er v iew Requir em ent s Ar chit ect ur e

Ov e r v ie w Micr osoft SQL Ser ver Analysis Ser vices 2000 Ser vice Pack 1 allows t he plugging in ( " aggr egat ion") of t hir d -par t y OLE DB for Dat a Mining pr ov ider s on Analy sis Ser v er . Because t his aggr egat ion is at t he OLE DB level, t hir d -par t y algor it hm dev eloper s using SQL Ser v er 2000 SP1 hav e t o im plem ent all t he dat a handling , par sing, m et adat a m anagem ent, session, and r owset pr oduct ion code on t op of t he cor e dat a m ining algor it hm im plem ent at ion . By cont r ast , SQL Ser ver 2005 Dat a Mining allows aggr egat ion dir ect ly at t he algor it hm lev el. Alt hough t his r est r ict s what t he t hir d -par t y algor it hm dev eloper can suppor t in t er m s of language and dat a t ypes, it fr ees t he developer fr om im plem ent ing all t he addit ional lay er s descr ibed abov e. I t also allows for m uch deeper int egr at ion wit h Analysis Ser vices, including t he abilit y t o build OLAP m ining m odels and dat a m ining dim ensions. We use t he t er m "plug -in algor it hm s" t o descr ibe t hir d -par t y algor it hm s t hat plug int o t he SQL Ser ver 2005 Analy sis Ser ver (her eaft er r efer r ed t o as "Analy sis Ser v er ") and appear, in all r espect s, lik e nat ive algor it hm s t o user s.

Re quir e m e nt s Analysis Ser ver com m unicat es wit h t hir d -par t y algor it hm pr ovider s v ia a set of COM int er faces. We gr oup t hese int er faces int o t wo cat egor ies: t hose t hat need t o be im plem ent ed by an algor it hm pr ovider , and t hose t hat ar e im plem ent ed by Analy sis Ser v er obj ect s and consum ed by algor it hm pr ov ider s.

I n t e r f a ce s I m p le m e n t e d b y Alg or it h m P r ov id e r s Met hod definit ions and descr ipt ions of par am et er s ar e t o be supplied . Refer t o dm algo. h for t he m et hod definit ions for t hese int er faces. I D M Algorit h m Fa ct ory This is t he ent r y point int o a plug -in algor it hm pr ovider . Analysis Ser v er r equest s t his int er face upon inst ant iat ing an algor it hm pr ov ider , and uses it t o cr eat e new algor it hm inst ances t hat will be bound t o cor r esponding m ining m odels in t he ser v er space. I D M Algorit h m Fa ct ory can also be quer ied for t he I D M Algorit h m M e t a da t a int er face descr ibed below . I D M Algorit h m M e t a da t a This int er face is used by Analysis Ser ver t o int er r ogat e an algor it hm pr ov ider 's capabilit ies. This includes at t r ibut eset v alidat ion. I D M Algorit h m This is t he cor e algor it hm int er face t hat pr ovides access t o t he v ar ious funct ions of an algor it hm inst ance, including t r aining, pr edict ion , and br owsing. I D M Ca se Proce ssor This int er face supplies for m at t ed cases t o t he algor it hm pr ovider for t r aining. I D M Algorit h m N a viga t ion : I D M D AGN a viga t ion This int er face exposes a t r ained m odel' s algor it hm cont ent t o Analysis Ser v er for br owsing. I D M Pu llCa se Se t This int er face will be consum ed by Analysis Ser ver for sam ple case gener at ion. I D M Pe rsist Analysis Ser ver invokes t his int er face for loading and saving algor it hm -specific cont ent int o a st r eam pr ov ided by t he ser v er . I D M Ca se I D I t e ra t or [ opt ion a l] This m ay be im plem ent ed by an algor it hm pr ovider for filt er ing and cont r olling case gener at ion.

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

Page 2

I D M M a rgin a lSt a t [ opt ion a l] Mar ginal st at ist ics ar e r equir ed by Analysis Ser ver dur ing pr edict ion quer y pr ocessing . They m ay be gat her ed eit her by Analy sis Ser v er dur ing case gener at ion or by t he algor it hm pr ovider it self dur ing t r aining. I f an algor it hm pr ovider indicat es (t hr ough a m et hod in t he I D M Algorit h m M e t a da t a int er face) t hat st at ist ics will be gat her ed and ex posed by t he algor it hm , t he pr ovider m ust suppor t t his int er face. Ot her wise, Analy sis Ser v er will init ialize t he algor it hm wit h it s own im plem ent at ion of t his int er face. Even if t he algor it hm does not int er nally use Analysis Ser v er ' s im plem ent at ion of I D M M a rgin a lSt a t s, it m ust sav e and r et ur n t he int er face when quer ied for it. I D M Clu st e rin gAlgorit h m [ opt ion a l] Clust er ing algor it hm s can opt ionally suppor t t his int er face so t hat Analysis Ser v er ' s quer y pr ocessor can successfully r et ur n r esult s for quer ies t hat invok e algor it hm -specific funct ions, such as Clu st e r( ) . I D M Se qu e n ce Algorit h m [ opt ion a l] Sequence Clust er ing algor it hm s can opt ionally suppor t t his int er face so t hat Analy sis Ser v er 's quer y pr ocessor can successfully r et ur n r esult s for quer ies t hat invoke algor it hm -specific funct ions, such as Se qu e n ce ( ) . I D M Tim e Se rie sAlgorit h m [ opt ion a l] Sequence Clust er ing algor it hm s can opt ionally suppor t t his int er face so t hat Analysis Ser v er 's quer y pr ocessor can successfully r et ur n r esult s for quer ies t hat invoke algor it hm -specific funct ions, such as Tim e ( ) . I D M Cu st om Fu n ct ion I n f o [ opt ion a l] Plug-in algor it hm s m ay suppor t cust om funct ions. Met adat a for such funct ions is ex posed by t he plug -in algor it hm t hr ough t his int er face t hat can be obt ained fr om t he algor it hm 's m et adat a obj ect. I D M D ispa t ch [ opt ion a l] I f a plug -in algor it hm suppor t s cust om funct ions and exposes m et adat a for t hem t hr ough t he I D M Cu st om Fu n ct ion I n f o m et adat a int er face, it m ust also suppor t t he I DMDispat ch int er face on it s algor it hm obj ect t o enable Analysis Ser v er t o call t hese funct ions. I D M Ta ble Re su lt [ opt ion a l] I f a cust om funct ion r et ur ns a t able r esult, it m ust be r et ur ned as an I D M Ta ble Re su lt s int er face point er . This int er face allows Analy sis Ser v er t o nav igat e t he r esult (in a for war d -only m anner ) and fet ch t he dat a r ows.

I n t e r f a ce s Con su m e d b y Alg or it h m P r ov id e r s Met hod definit ions and descr ipt ions of par am et er s ar e t o be supplied . Refer t o dm algo. h for t he m et hod definit ions for t hese int er faces. I D M Pu sh Ca se Se t This is used for passing case pr ocessing infor m at ion bet ween Analysis Ser v er and t he algor it hm inst ance. I D M At t ribu t e se t This int er face encapsulat es infor m at ion about t he at t r ibut es cont ained by input cases. I D M At t ribu t e Grou p At t r ibut es can be gr ouped t oget her based on cer t ain cr it er ia (for exam ple, r elat ed at t r ibut es or nest ed t ables). I D M At t ribu t e Grou p pr ov ides a way t o it er at e ov er such gr oups of at t r ibut es. I D M Pe rsist e n ce W rit e r This is an abst r act int er face for a st r eam t hat algor it hm s can sav e t heir cont ent int o. The st r eam is im plem ent ed by Analy sis Ser v er ov er it s own st or age sy st em for t he algor it hm 's par ent m ining m odel, and passed t o t he algor it hm in t he I D M Pe rsist::Sa ve m et hod. I D M Pe rsist e n ce Re a de r This is an abst r act int er face for a st r eam t hat algor it hm s can load t heir pr eviously sav ed cont ent fr om . The st r eam is im plem ent ed by Analy sis Ser v er ov er it s own st or age sy st em for t he algor it hm 's par ent m ining m odel, and passed t o t he algor it hm in t he I D M Pe rsist::Loa d m et hod. I D M Se rvice s This is t he base int er face for passing shar ed infor m at ion fr om t he ser ver space t o t he algor it hm . I t ex poses ser v ices lik e m em or y allocat or s, st r ing and v ar iant handling, per sist ence t o files, and t r ansact ions. See DMCont ext Obj ect for a descr ipt ion of how t his int er face will be used t hr ough t he cont ext obj ect . I D M Con t e x t Se rvice s This is t he cont ext int er face t hat will be passed t o m ost algor it hm calls fr om Analy sis Ser v er . I t der iv es fr om t he I D M Se rvice s int er face descr ibed abov e and pr ovides access t o locale, m em or y allocat or s, and ot her infor m at ion specific t o t he cur r ent r equest. I D M M ode lSe rvice s This is t he cont ext int er face t hat will be passed when an algor it hm inst ance is cr eat ed. I t can be used t o access m odel-specific infor m at ion, as well as allocat or s whose lifet im e is t ied t o t he ser ver m odel obj ect . I t include m et hods for : Get t ing t he m odel's locale infor m at ion. Fir ing pr ogr ess not ificat ions.

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

Page 3

Get t ing t he m odel's cont ent m ap for updat ed node capt ions. Par sing and r ender ing PMML cont ent . I D M M e m oryAlloca t or This is t he int er face t hat allows t he plug -in algor it hm t o allocat e and fr ee m em or y in t he ser v er ' s m em or y space. See Mem or y Managem ent for det ails. I D M St rin gH a n dle r This int er face pr ov ides access t o Analy sis Ser v er 's int er nal st r ing dat a t ype. Point er s t o ser ver st r ings t hat ar e passed t o algor it hm m et hods will be t r eat ed as opaque handles t hat can decoded by I D M St rin gH a n dle r m et hods. See Access t o Shar ed Dat a Types for m or e infor m at ion about t he usage of t his int er face. I D M Va ria n t Pt rH a n dle r This int er face pr ov ides access t o Analysis Ser ver 's int er nal var iant dat a t ype. Point er s t o ser v er var iant s t hat ar e passed t o algor it hm m et hods will be t r eat ed as opaque handles t hat can be decoded by I D M Va ria n t H a n dle r m et hods. See Access t o Shar ed Dat a Types for m or e infor m at ion about t he usage of t his int er face. I D M Con t e n t M a p This int er face pr ov ides access t o updat eable par t s of t he algor it hm cont ent t hat ar e m aint ained on t he algor it hm 's behalf by t he Analy sis Ser ver fr am ewor k . Cur r ent ly t his includes node capt ions t hat user s ar e allowed t o updat e using DMX. See t he sect ion on User -Updat eable Algor it hm Cont ent for det ails on how t his int er face m ust be used by t he algor it hm nav igat or .

M e m or y M a n a g e m e n t Algor it hm pr ov ider s m ust use Analy sis Ser v er 's m em or y allocat ion int er faces and m ake m em or y r eser vat ions using t he m em or y quot a int er face. Aggr egat or algor it hm s m ust use m em or y allocat or s fr om eit her t he cont ext or m odel ser vices, depending upon t he lifet im e of t he allocat ed obj ect s. To allow Analy sis Ser ver t o m anage and cont r ol m em or y efficient ly , all m em or y allocat ions in t he plug -in algor it hm pr ov ider m ust be m ade using t hese m em or y m anagem ent int er faces.

Acce ss t o Sh a r e d D a t a Ty p e s The following int er nal ser v er t ypes will be ex posed t o t he plug -in algor it hm pr ov ider im plem ent er s: St rin g (as a handle, D M St rin g) Va ria n t (as a handle, D M Va ria n t Pt r ) XM LW rit e r (as a handle, D M XM LW rit e rPt r ) M in in gFu n ct ion sI n f o (as a handle, D M Fu n ct ion Re cPt r) Ex e cu t ion Con t e x t (t o be discussed separ at ely in t he DMCont ext Obj ect sect ion) D M St rin g a n d D M Va ria n t Pt r D M St rin g and D M Va ria n t ar e used in var ious int er faces t o pass st r ings/ values t o t he ser v er code, and t o fet ch st r ings/ v alues fr om t he ser ver . The plug -in algor it hm will oper at e over st r ing and var iant handles using t wo int er faces: I D M St rin gH a n dle r and I D M Va ria n t Pt rH a n dle r . These int er faces ar e t hr ead-safe and st at eless; t her efor e t hey can be cached. I m plem ent at ions of t hese int er faces (handler s) can be obt ained v ia I D M Se rvice s, bot h fr om cont ext and at init ializat ion t im e. N ot e Any ser vice obt ained fr om t he cont ext is allocat ed in t he cont ext ' s m em or y ; t her efor e, it m ust be r eleased befor e t he end of t he funct ion t hat br ings t he cont ext handle inside t he plug -in algor it hm space. The set of m em or y allocat or s used by a handler is det er m ined by t he way t he handler was obt ained . Lik ewise, t he m em or y allocat or s used in handling st r ings and var iant s ar e det er m ined by t he handler . Exam ples: A handler is obt ained fr om t he I D M Se rvice s point er passed in I nit ialize (t hat is, fr om t he m odel's ser v ice pr ov ider ). All t he oper at ions per for m ed by t hat handler will use t he m odel's allocat or s. Ther efor e, a st r ing handle cr eat ed wit h t his allocat or can be safely cached bet ween calls. A handler is obt ained fr om t he I D M Se rvice s of t he execut ion cont ext (t hat is, fr om t he ex ecut ion cont ext ' s ser v ice pr ov ider ). All t he oper at ions per for m ed by t hat handler will use t he execut ion cont ext 's allocat or s. Ther efor e, a st r ing handle cr eat ed wit h t his allocat or cannot be safely cached bet ween calls. M e t h ods in I D M St rin gH a n dle r HRESULT

CreateNewHandle(DMString**

out_phString)

Cr eat es a new st r ing handle t hat can be used lat er inside t he algor it hm space. The ou t_ ph St rin g par am et er is t he locat ion t o st or e t he newly cr eat ed st r ing handler .

HRESULT CopyHandleToBuffer(DMString* WCHAR* out_pchBuff, UINT* io_pcAllocated)

in_hString,

Copies t he cont ent of a st r ing handle int o a char buffer. I t behaves lik e Plat for m SDK funct ions, m eaning t hat io_ pcAlloca t e d will cont ain eit her t he r equir ed size on failur e (if t he supplied buffer size is less t han t he r equir ed size), or t he act ual size if t he funct ion succeeded .

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

HRESULT CopyBufferToHandler(DMString* WCHAR* in_pchBuff, UINT in_pcLength)

Page 4

in_hString,

Copies t he cont ent of a buffer int o a st r ing handle.

HRESULT GetConstStringFromHandle(DMString* const WCHAR** out_ppchBuff)

in_hString,

For per for m ance pur poses, r et ur ns a con st point er t o t he int er nal st r ing buffer cont ained by t he handle, inst ead of copy ing t o a user -supplied buffer.

HRESULT BSTR

AttachHandleToBSTR(DMString* in_bstrBuffer)

in_hString,

For per for m ance pur poses, at t aches t he handle t o t he input BSTR. The copy t im e is sav ed . The in _ bst rBu f f e r lifet im e is cont r olled by t he in _ h St rin g aft er r et ur ning fr om t his funct ion.

HRESULT CopyHandleToHandle(DMString* in_hString, DMString* out_hString) Copies a string from one string handle to another. M e t h ods in I D M Va ria n t Pt rH a n dle r HRESULT

CreateNewHandle(DMVariantPtr**

out_phVariant)

Cr eat es a new var iant handle t hat can be used lat er inside t he algor it hm space. The ou t_ ph Va ria n t par am et er is t he locat ion t o st or e t he newly cr eat ed st r ing handler .

HRESULT CopyVariantToHandle(DMVariantPtr* in_hVariant, VARIANT* in_pVar) Copies the input variant into the handle variant. HRESULT GetVariantCopyFromHandle( DMVariantPtr* in_hVariant, VARIANT* out_pVar) Copies the handle variant into the out_pVar location. HRESULT DetachHandleVariant(DMVariantPtr* in_hVariant, VARIANT* out_pVar)

Det aches t he handle var iant and r et ur ns t he addr ess of t he det ached v ar iant . This is a high -per for m ance v er sion of Ge t Va ria n t CopyFrom H a n dle t hat can be used wher e an ex plicit copy is not r equir ed.

HRESULT AttachHandleVariant(DMVariantPtr* VARIANT* out_pVar)

in_hVariant,

At t aches t he handle var iant. This is a high -per for m ance v er sion of CopyVa ria n t ToH a n dle t hat can be used wher e an ex plicit copy is not r equir ed.

D M Con t e x t O b j e ct The D M Con t e x t obj ect cont ains infor m at ion for t he cur r ent ly ex ecut ing r equest, including locale and m odel access. The D M Con t e x t obj ect will be ex posed as an int er face (I D M Con t e x t Se rvice s) inher it ing fr om I D M Se rvice s. Se rvice Provide r Arch it e ct u re The plug -in algor it hm will access ser v er com ponent s v ia an I D M Se rvice s m echanism . Two differ ent ser vice pr ov ider s will be available m ost of t he t im e t o t he plug -in algor it hm : A m ode l se rvice provide r, which is logically owned by an Analy sis Ser ver m ining m odel obj ect . This m odel ser v ice pr ov ider pr ov ides access t o t he following ser v ices: Per-m odel allocat or s, which can be used t o cr eat e obj ect s t hat will ex ist bet ween calls (i. e. , obj ect s wit h t he sam e lifet im e as t he algor it hm ). Model infor m at ion, such as t he nam e of t he m odel and it s locale. Pr ogr ess not ificat ions. PMML r ender ing and par sing. A con t e x t se rvice provide r, which is logically owned by an Analysis Ser v er r equest or "ex ecut ion cont ext ". This cont ext ser v ice pr ov ider pr ovides access t o t he following ser vices: Cont ext (per -EC) allocat or s, which can be used t o cr eat e obj ect s t hat will ex ist only for t he lifet im e of t he cur r ent cont ext . Locale infor m at ion. Met hods for updat ing m em or y est im at es Polling for cancellat ion st at us.

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

Page 5

The plug -in algor it hm will r eceive t he m odel ser vice pr ov ider as a par am et er of t he I D M Algorit h m ::I n it ia lize m et hod. The m odel ser v ice pr ov ider (as well as any ser v ice obt ained t hr ough it) can be safely cached for t he lifet im e of t he algor it hm (unt il it get s unloaded or dest r oy ed). Each ser v ice exposed by t he m odel ser vice pr ovider will be docum ent ed as being t hr ead-safe or not.

Sh a r e d D e f in it ion s a n d En u m e r a t ion s These ar e t he definit ions of dat a t ypes and enum er at ions t hat ar e used t o com m unicat e case and at t r ibut e infor m at ion t o algor it hm s. The dm algo. h file cont ains t ypes and st r uct s such as D M _ At t ribu t e and D M _ STATE_ STAT. Descr ipt ions t o be supplied .

Alg or it h m Re g ist r a t ion Each dat a m ining algor it hm available t o an inst ance of Analysis Ser ver will hav e an ent r y in t he ser v er ' s I NI file. This includes bot h Analy sis Ser v er 's nat iv e algor it hm s and t hir d -par t y algor it hm s. The ent r y will have t he following infor m at ion: Algor it hm nam e (such as Micr osoft _Decision_Tr ees). Pr ogI D (t his is opt ional and will be pr ovided only for t hir d -par t y pr ov ider s). Flag indicat ing whet her t he algor it hm is enabled or not. Her e is t he ent r y in "\ Pr ogr am Files\ Micr osoft SQL Ser ver \ MSSQL. 1\ OLAP\ bin\ m sm dsr v. ini" for our sam ple plug -in algor it hm pr ov ider :

...

...

...

1 Microsoft.DataMining.SamplePlugInAlgorithm.Factory

...

...

...

I n t he abov e configur at ion ent r y, M icrosof t_ Sa m ple _ Plu gI n _ Algorit h m is t he nam e t hat used t o ident ify t he algor it hm in DDL st at em ent s sent t o t he ser ver . At st ar t up, Analy sis Ser ver ver ifies t hat t his nam e is t he sam e as t he algor it hm nam e r et ur ned by t he algor it hm in I D M Algorit h m M e t a da t a::Ge t Se rvice N a m e ( ) . I f t he nam es don' t m at ch, t he ser v er does not load t he algor it hm pr ov ider and logs an er r or in t he Micr osoft Windows event log. I f a pr eviously cr eat ed m odel inst ance is accessed by a user and t he algor it hm is cur r ent ly disabled, Analy sis Ser v er will fail t he r equest and r epor t t hat t he m odel's algor it hm is not available t o t he cur r ent ser ver inst ance.

Alg or it h m Con t e n t P e r sist e n ce A plug -in algor it hm 's cont ent is loaded and sav ed int o t he ser v er m odel' s space t hr ough t he I D M Pe rsist int er face. This ser ializat ion m echanism allows a plug -in algor it hm inst ance t o per sist and load it s cont ent t o and fr om abst r act st r eam s, nam ely I D M Pe rsist e n ce W rit e r and I D M Pe rsist e n ce Re a de r pr ovided by Analysis Ser v er . Analysis Ser ver is r esponsible for t r ansact ionally per sist ing and loading algor it hm cont ent v ia t hese int er faces t o and fr om it s st or es, so y ou don' t have t o wor r y about handling er r or s t hat could occur if t he algor it hm cont ent was par t ially loaded or sav ed .

Alg or it h m - Sp e cif ic M od e lin g Fla g s Plug-in algor it hm s m ay suppor t cust om m odeling flags t hat t hey expose infor m at ion for v ia t he I D M Algorit h m M e t a da t a m et hods. These will v alidat ed and passed t o t he algor it hm by Analysis Ser ver .

Cu st om Fu n ct ion s I n addit ion t o t he st andar d funct ions t hat ar e par t of t he OLE DB for DM specificat ion, plug -in algor it hm s can suppor t cust om funct ions. Met adat a for t he cust om funct ions is exposed t hr ough t he I D M Cu st om Fu n ct ion I n f o int er face. Based on t his infor m at ion, Analy sis Ser v er handles par sing and sem ant ic v alidat ion of cust om funct ion calls in DMX quer ies issued by t he user . At pr edict ion t im e, Analy sis Ser v er obt ains t he I D M D ispa t ch int er face on t he algor it hm and calls t he Pre pa re Fu n ct ion and I n vok e Fu n ct ion m et hods t o evaluat e cust om funct ions and obt ain m et adat a/ dat a for inclusion in t he pr edict ion r esult. The cont r ol flow bet ween t he Analysis Ser ver fr am ewor k and t he plug -in algor it hm is as follows: 1. The plug -in algor it hm adver t ises it s cust om funct ions by including t hem in t he list of suppor t ed funct ions r et ur ned by algor it hm 's I D M Algorit h m M e t a da t a ::Ge t Su pport e dFu n ct ion s m et hod. The D M _ SUPPORTED _ FUN CTI ON enum er at ion in dm algo. idl has a list of enum er at ion values for well-known funct ions published in t he OLE DB for DM specificat ion–cust om funct ions hav e enum v alues equal t o and abov e D M SF_ CUSTOM_ FUN CTI ON_ BASE. 2. I f cust om funct ions ar e included in t he list r et ur ned by t he plug -in algor it hm ' s I D M Algorit h m M e t a da t a ::Ge t Su pport e dFu n ct ion s m et hod, t he Analy sis Ser v er fr am ewor k QI ' s it s point er t o t he plug -in algor it hm 's I D M Algorit h m M e t a da t a int er face for t he I D M Cu st om Fu n ct ion I n f o int er face (which m ust be suppor t ed in t his case) and obt ains m et adat a infor m at ion t hat it can use for par sing and v alidat ing cust om funct ion calls in user s' DMX quer ies, as well as for r et ur ning in t he M I N I N G_ FUN CTI ON S schem a r owset . This includes: a. Signat ur e b. Descr ipt ion c. Whet her t he funct ion is scalar or t able-r et ur ning d. Par am et er s and flags accept ed

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

Page 6

3. When a DMX r equest is r eceived t hat includes a cust om funct ion call, t he Analyis Ser v er fr am ewor k QI 's it s point er t o t he plug -in algor it hm ' s I D M Algorit h m int er face for t he I D M D ispa t ch int er face. 4. The I D M D ispa t ch ::Pre pa re Fu n ct ion is called once at t he beginning of quer y ex ecut ion (at bind t im e) t o obt ain colum n m et adat a for t he r esult. The colum n infor m at ion would be an ar r ay for t able r esult s. 5. For each case, along wit h a pr edict ion call t o I D M Algorit h m ::Pre dict , a call t o I D M D ispa t ch ::I n vok e Fu n ct ion is m ade for each cust om funct ion in t he DMX quer y.

U se r- U p d a t e a b le Alg or it h m Con t e n t Analysis Ser v er allows user s t o updat e par t s of t he algor it hm cont ent using D M X UPD ATE st at em ent s. Cur r ent ly only node capt ions ar e updat eable. Plug-in algor it hm s can access t he updat ed cont ent t hr ough t he I D M Con t e n t M a p int er face av ailable fr om t he m odel ser v ices obj ect (I D M M ode lSe rvice s::Ge t Con t e n t M a p). I n or der t o r et ur n t he cor r ect (updat ed ) node capt ions when t he Analy sis Ser v er fr am ewor k r equest s t he D M N P_ CAPTI ON pr oper t y fr om a plug -in algor it hm ' s cont ent nav igat or , it m ust fet ch t he capt ion fr om t he m odel ser v ice obj ect' s cont ent m ap. The code for t his would look som et hing like t he sam ple shown below (er r or -handling code is om it t ed for sim plicit y): // TODO: Init caption string to empty string CComPtr spContentMap; m_spModelServices->GetContentMap(in_pContext, &spContentMap)); if (S_FALSE == spContentMap->IsEmpty()) { spContentMap->FindNodeCaption(&strNodeUniqueName,&pstrCaption)); if (pstrCaption) { // TODO: Copy to caption string } } // TODO: Copy string to output variant

Ar chit e ct ur e This sect ion ex plains t he flow of dat a and cont r ol bet ween Analysis Ser ver and algor it hm pr ov ider s.

Se r v e r St a r t u p At st ar t up, Analy sis Ser ver will inst ant iat e all r egist er ed and enabled plug -in algor it hm pr ov ider s, and cache t heir I D M Algorit h m Fa ct ory int er face point er s.

M in in g Alg or it h m I n f or m a t ion I n r esponse t o Discov er r equest s for M I N I N G_ SERVI CES, t he ser v er will it er at e t hr ough t he Algor it hm Manager list of algor it hm pr ov ider s (r epr esent ed by t heir cor r esponding cached I D M Algorit h m Fa ct ory int er faces), and obt ain t he r elev ant infor m at ion t hr ough t he I D M Algorit h m M e t a da t a int er face.

M in in g M od e l Cr e a t ion When a m ining m odel is cr eat ed, a m et adat a obj ect is inst ant iat ed for it in t he ser v er and sav ed t o disk . At t his point, it does not hav e an algor it hm inst ance associat ed wit h it. Howev er , it will be v alidat ed against infor m at ion obt ained fr om t he cor r esponding algor it hm pr ov ider ' s I D M Algorit h m M e t a da t a int er face.

M in in g St r u ct u r e P r oce ssin g Aft er a m ining m odel' s par ent m ining st r uct ur e is pr ocessed, t he At t ribu t e se t obj ect cr eat ed by t he ser v er is v alidat ed against each child m odel's algor it hm pr ov ider t o confir m t hat t he At t ribu t e se t is in a for m t hat can be consum ed by t he algor it hm . This is accom plished by invok ing t he Va lida t e At t ribu t e se t m et hod on t he algor it hm pr ov ider 's I D M Algorit h m M e t a da t a int er face.

M in in g M od e l Tr a in in g When a pr ocessing r equest is car r ied out by t he ser ver for a m ining m odel, t he Cre a t e Algorit h m m et hod on t he cor r esponding algor it hm pr ov ider 's I D M Algorit h m Fa ct ory is invok ed t o cr eat e a new algor it hm inst ance associat ed wit h t he m odel. The I D M Algorit h m int er face on t he algor it hm inst ance and r elat ed int er faces obt ained fr om it ar e used by t he ser ver t o pass cases, t r ain t he algor it hm inst ance, and sav e it s cont ent as det ailed below : 1. The algor it hm inst ance is fir st init ialized wit h an At t ribu t e se t obj ect (I D M At t ribu t e Se t ) t hat can be quer ied by t he algor it hm dur ing t r aining t o obt ain at t r ibut e infor m at ion and opt ionally a m ar ginal st at ist ics obj ect (I D M M a rgin a lSt a t s). 2. Then t he ser ver init iat es t r aining by calling I D M Algorit h m ::I n se rt Ca se s wit h an I D M Pu sh Ca se Se t par am et er . The algor it hm pr ov ider is in fact r equir ed t o im plem ent a callback obj ect exposing t he I D M Ca se Proce ssor int er face t hat it passes back t o Analy sis Ser v er , in r esponse t o t he I n se rt Ca se s r equest, via I D M Pu sh Ca se Se t::St a rt Ca se s. 3. Aft er t he ser v er r eceives t he algor it hm ' s Ca se Proce ssor obj ect , it pushes cases t o it for pr ocessing by invok ing t he Proce ssCa se m et hod in t he Ca se Proce ssor for each case. 4. At t he end of t r aining, Analysis Ser ver m ay m ake addit ional calls int o t he algor it hm t o build a Dr illt hr ough st or e as descr ibed below in Mining Model Dr illt hr ough. I t m ay also m ake addit ional calls t o build a dat a m ining dim ension as descr ibed below in Mining Dim ensions.

M in in g M od e l P r e d ict ion I n r esponse t o a pr edict ion quer y, Analy sis Ser ver ' s quer y pr ocessor evaluat es t he pr edict ion j oin using t he Pre dict m et hod on t he m odel's cached algor it hm int er face (I D M Algorit h m :Pre dict ). Cust om funct ion calls ar e evaluat ed using t he m et adat a obt ained t hr ough t he algor it hm pr ov ider 's I D M Algorit h m M e t a da t a int er face. Met adat a obt ained t hr ough t he I D M Algorit h m M e t a da t a int er face descr ibes t he calling convent ions and par am et er s for pr ovider -specific funct ions. This infor m at ion is lat er used t o ev aluat e pr ovider -specific funct ion calls.

M in in g M od e l Br ow sin g I n r esponse t o Discov er r equest s for M I N I N G_ M OD EL_ CON TEN T , t he ser v er uses a call on t he cor r esponding m odel' s cached algor it hm int er face I D M Algorit h m t o r equest a cont ent nav igat ion obj ect t hat exposes t he I D M Algorit h m N a viga t ion int er face. The ser ver t hen builds t he r esult r owset by t r av er sing t he nodes of t he gr aph ex posed by t he navigat or and quer y ing each node for var ious pr oper t ies.

M in in g M od e l P e r sist e n ce

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

Page 7

The cor e int er face on an algor it hm inst ance–I D M Algorit h m –can be quer ied using COM for t he I D M Pe rsist int er face. This int er face is used for loading and sav ing algor it hm cont ent fr om and int o t he st or age space of t he m ining m odel obj ect t hat owns t he algor it hm inst ance. Ver sioning and t r ansact ional updat es of t his st or age ar e m anaged by Analysis Ser ver .

M in in g M od e l D r illt h r ou g h While br owsing a m ining m odel's cont ent in a viewer , a user m ay r equest t o see t he under ly ing cases t hat belong t o a par t icular node in t he cont ent gr aph . I f t he algor it hm suppor t s t his Dr illt hr ough oper at ion , Analysis Ser ver builds an int er nal dat a st r uct ur e associat ed wit h t he m odel t hat m aps t r aining cases t o cor r esponding nodes in t he cont ent gr aph . To build t his st r uct ur e for use dur ing br owsing, t he ser v er uses t he I D M Algorit h m ::Ge t N ode I D s m et hod for each case at t he end of t r aining.

M in in g D im e n sion s Analysis Ser ver allows user s t o build a dat a m ining dim ension based on a m ining m odel's cont ent . This dim ension can be included in an OLAP cube t hat uses t he sam e dim ensions t hat t he m ining m odel was built on, and it s discover ed hier ar chy can be used t o slice and dice t he fact dat a in int er est ing way s. I f t he algor it hm suppor t s dat a m ining dim ensions and t he user r equest ed a dat a m ining dim ension t o be cr eat ed fr om a m odel, Analy sis Ser v er builds t his special dim ension by calling t he I D M Algorit h m ::Ge t N ode I D s m et hod for each case wit h t he D I M EN SI ON_ CON TEN T flag at t he end of t r aining. Not e t hat t he algor it hm pr ovider m ay choose t o pr esent it s cont ent in a differ ent for m for t he pur pose of dat a m ining dim ensions t han it would for r egular m odel br owsing. Ther efor e, bot h I D M Algorit h m ::Ge t N ode I D s and I D M Algorit h m ::Ge t N a viga t or suppor t a flag t hat allows t he ser v er t o specify how it will be using t he cont ent node infor m at ion.

Sa m p le Ca se Ge n e r a t ion I f a user r equest s a sam ple case set by issuing a SELECT * FROM m ode l. CASES W H ERE I sI n N ode( x x x x ) quer y, Analy sis Ser v er obt ains t his case set by r equest ing t he I Pu llCa se Se t int er face fr om t he m odel's algor it hm inst ance. A sam ple case set is sim ply a hypot het ical set of cases gener at ed by t he algor it hm (using at t r ibut e values it has lear ned dur ing t r aining) t hat fit t he r ules r epr esent ed by a par t icular node in t he cont ent gr aph .

I m p or t in g a n d Ge n e r a t in g P M M L A user can r equest t he cont ent of a m ining m odel in t he PMML for m . Micr osoft Analy sis Ser v ices 2005 suppor t s PMML 2. 1. The r equest is of t he for m SELECT * FROM [ M ode l ] . PM M L. Also, a user can r equest a m odel t o be loaded fr om a PMML 2. 1 docum ent . The sy nt ax is CREATE M I N I N G M OD EL [ M ode lN a m e ] FROM PM M L 'pm m l h e re '. The fr am ewor k t akes car e of t he st r uct ur al infor m at ion in t he PMML (t he dat a dict ionar y and, opt ionally , t he m odel st at ist ics) while t he plug -in algor it hm is r esponsible for par sing/ r ender ing t he cont ent par t of t he PMML. A plug -in t hat suppor t s r ender ing of PMML 2. 1 has t o im plem ent t he Re n de rPM M LCon t e n t m et hod of t he I D M Algorit h m int er face. I f t his m et hod r et ur ns E_ N OTI M PL, t hen t he fr am ewor k will consider t hat t he plug -in does not suppor t PMML. Re n de rPM M LCon t e n t t ak es a I SAXCon t e n t H a n dle r int er face as par am et er and t he plug -in m ust gener at e XML event s on t his int er face t o wr it e int o t he out put st r eam . The PMML 2. 1 st andar d included t he m odel st at ist ics and t he m odel schem a inside t he m odel cont ent elem ent. The plug -in can delegat e r ender ing of t his infor m at ion t o t he fr am ewor k by calling Re n de rPM M LM ode lSt a t ist ics, Re n de rPM M LM in in gSch e m a and Re n de rPM M LM ode lCre a t ion Fla gs on t he I D M M ode lSe rvice s int er face pr ov ided by t he fr am ewor k . Ov er all, t he oper at ion of r ender ing a PMML 2. 1 docum ent is ex ecut ed accor ding t o t he diagr am below : An a lysis Se rve r Fra m e w ork - Cr eat es a cont ent handler t o r ender t he PMML int o3

Plu g - I n Algorit h m

- St ar t s r ender ing t he PMML 2. 1 - Render s t he dat a dict ionar y - Calls int o t he plug -in for t he cont ent - Re n de rPM M LCon t e n t –st ar t s r ender ing t he m odel cont ent (st ar t s t he XML elem ent ident ify ing t he algor it hm cont ent ) - r ender s t he st at ist ics OR calls int o t he fr am ewor k for t his - if Re n de rPM M LM ode lSt a t ist ics is called, r ender t he st at ist ics t hen r et ur n t he cont r ol t o t he plug -in - r ender s t he m ining schem a OR calls int o t he fr am ewor k for t his - if Re n de rPM M LM in in gSch e m a is called, r ender t he st at ist ics t hen r et ur n t he cont r ol t o t he plug -in - r ender s t he m odel cr eat ion flags (a Micr osoft ex t ension t o PMML 2. 1, dev eloped accor ding t o t he PMML 2. 1 specificat ion for ex t ensions) OR calls int o t he fr am ewor k for t his - if Re n de rPM M LM ode lCre a t ion Fla gs is called, r ender t he st at ist ics t hen r et ur n t he cont r ol t o t he plug -in - r ender s t he act ual cont ent of t he m ining m odel - closes t he XML elem ent t hat ident ifies t he algor it hm cont ent (end of t he Re n de rPM M LCon t e n t funct ion) - finalizes r ender ing t he PMML docum ent For r eading a PMML 2. 1 st r eam (and cr eat ing a m ining m odel fr om a PMML 2. 1 docum ent ), a plug -in m ust im plem ent 2 m et hods. I f any of t hese m et hods r et ur ns E_ N OTI M PL, t he fr am ewor k will consider t hat t he plug -in does not suppor t par sing PMML 2. 1 Pre I n it ia lize ForPM M LPa rsin g–allows pr epar ing t he plug -in for r eading t he PMML. The fr am ewor k par ses t he st r uct ur al par t of t he PMML 2. 1 t hen calls int o t his m et hod of t he plug -in, passing as par am et er a r educed at t r ibut e set im plem ent at ion (I D M At t ribu t e Se t ) t hat cont ains t he st r uct ur al infor m at ion. Ge t PM M LAlgorit h m SAXH a n dle r–t he fr am ewor k calls int o t his m et hod of t he plug -in t o get a SAX handler for t he cont ent of t he PMML docum ent . The plug -in m ust par se all t he algor it hm specific infor m at ion fr om t he PMML st r eam . The plug -in m ay par se t he m odel st at ist ics or delegat e t his t ask t o t he fr am ewor k by invoking Pa rse PM M LM ode lSt a t ist ics on t he I D M M ode lSe rvice s int er face pr ov ided by t he fr am ewor k . Once t he cont ent par sing is com plet ed, t he plug -in m ust give cont r ol back t o t he fr am ewor k by invok ing Con t in u e PM M LPa rsin g on t he I D M M ode lSe rvice s int er face. The oper at ion of par sing a PMML 2. 1 docum ent is execut ed accor ding t o t he diagr am below:

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02

SQL Server Data Mining: Plug-In Algorithms (Microsoft SQL Server 9.0 Technical Articles)

An a lysis Se rve r Fra m e w ork

Page 8

Plu g - I n Algorit h m

- St ar t s par sing t he PMML 2. 1 (cr eat e an XML SAX cont ent handler for t his) - Reads t he dat a dict ionar y - Pr e-par ses t he PMML 2. 1 docum ent and cr eat es t he m et adat a for t he m ining m odel and m ining st r uct ur e and colum ns - Calls int o t he plug -in for par sing t he cont ent - Ge t PM M LAlgorit h m SAXH a n dle r–r et ur ns a SAX Cont ent handler t hat will handle t he cont ent - r edir ect s t he XML par sing t o t he handler r et ur ned by t he plug -in - handles t he XML for r eading t he cont ent of t he m ining m odel - when t he M ode lSt a t s elem ent is encount er ed, t he plug -in can par se t he st at ist ics OR delegat e t his t ask t o t he fr am ewor k - if Pa rse PM M LM ode lSt a t ist ics was called, loads t he st at ist ics int o an I D M M a rgin a lSt a t s im plem ent at ion , t he r et ur n cont r ol t o t he plug -in - when t he cont ent par t of t he PMML is com plet ed, r et ur n cont r ol t o t he fr am ewor k by invok ing Con t in u e PM M LPa rsin g - when Con t in u e PM M LPa rsin g is invok ed, r edir ect s t he XML par sing t o t he or iginal handler - finalizes par sing t he PMML docum ent - saves t he newly cr eat ed obj ect s

Er r or H a n d lin g Algor it hm pr ovider s m ust r aise st andar d er r or s and populat e I ErrorI n f o obj ect s.

Manage Your Profile | Legal | Contact Us | MSDN Flash New sletter © 2005 Microsoft Corporation. All rights reserved. Term s of Use | Tradem arks | Privacy Statem ent

http://msdn.microsoft.com/library/en-us/dnsql90/html/ssdmpia.asp?frame=true

09/03/2005 17:07:02