Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005

- Getting information from enterprise data- Using BI across the enterprise as an integral part of doing business- Captur

329 50 2MB

English Pages 65

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005......Page 1
Introductions......Page 2
Agenda......Page 3
Agenda......Page 4
Business Intelligence Platform......Page 5
Overview......Page 6
Business Intelligence Challenges......Page 7
What Is a Cube?......Page 8
What Is a Cube?......Page 9
Enterprise BI Today......Page 10
Relational vs. OLAP Reports......Page 11
Agenda......Page 12
The Unified Dimensional Model The Best of Relational and OLAP......Page 13
UDM’s Role......Page 14
Enterprise BI with UDM......Page 15
Scalable, High Performance UDM Server......Page 16
Analysis Server as UDM Server......Page 17
Streamlined BI Infrastructure......Page 18
BI Development Studio......Page 19
Performance......Page 20
MOLAP, ROLAP, and HOLAP......Page 21
MOLAP Caching......Page 22
Agenda......Page 23
UDM and The BI Studio......Page 24
UDM Data Sources......Page 25
Data Source Views......Page 26
Dimensions and Hierarchies......Page 27
Cubes......Page 28
Perspectives......Page 29
Categorization......Page 30
Time......Page 31
Translations......Page 32
Attribute Semantics......Page 33
Key Performance Indicators......Page 34
Closing the Loop......Page 35
ProClarity Business Intelligence Analytics......Page 36
ProClarity Key Differentiators......Page 37
BookmarkTitle:......Page 38
BookmarkTitle:......Page 39
BookmarkTitle:......Page 40
BookmarkTitle:......Page 41
BookmarkTitle:......Page 42
Agenda......Page 43
CRoss Industry Standard Processfor Data Mining (CRISP)......Page 45
Data Mining Algorithms......Page 46
Microsoft Mining Models......Page 47
When To Use What......Page 48
Decision Trees......Page 50
Decision Trees (cont.)......Page 51
Decision Trees (cont.)......Page 52
Naïve Bayes......Page 53
Naïve Bayes (cont.)......Page 54
Cluster Analysis......Page 55
Cluster Analysis (cont.)......Page 56
Sequence Clustering......Page 57
Sequence Clustering (cont.)......Page 58
Microsoft Mining Models......Page 59
Association Rules......Page 60
Association Rules – Support......Page 61
Association Rules – Confidence......Page 62
Time Series......Page 63
Neural Network......Page 64
Back-Propagation......Page 65
Recommend Papers

Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Building Business Intelligence and Data Mining Applications with Microsoft SQL Server 2005

Introductions ™ Presenter – – –

Javier Loria Solid Quality Learning [email protected]

™ Overview

Agenda

& BI Challenges ™ Introducing the UDM ™ The UDM in Detail ™ Data Mining Overview

™ Overview

Agenda

& BI Challenges ™ Introducing the UDM ™ The UDM in Detail ™ Data Mining Overview

Business Intelligence Platform

Integrate z

z

Data acquisition from source systems and integration Data transformation and synthesis

Analyze z

z

Data enrichment, with business logic, hierarchical views Data discovery via data mining

Report z z

Data presentation and distribution Data access for the masses

™ ™ ™ ™ ™

Overview Getting information from enterprise data Using BI across the enterprise as an integral part of doing business Capture and model all of your data Integration with business processes Relational reporting and OLAP converged through a single dimensional model

Business Intelligence Challenges ™ Multiple

Data Models ™ Multiple Data Sources ™ Multiple APIs ™ Duplication of Data

Atlanta Chicago Denver Grapes Cherries Melons Apples

Dallas Q4 Q1 Q2 Q3 Time Dimension

Di Pro m du en c si t on

Markets Dimension

What Is a Cube?

What Is a Cube?

Enterprise BI Today Data Sources MOLAP

Data Models

Tools

OLAP Browser

MOLAP

Datamart

Reporting Tool (1)

Datamart Reporting Tool (2)

DW

Reporting Tool (3)

Relational vs. OLAP Reports Feature Flexible schema Real time data access Single data store Simple management Detail reporting High performance End-user oriented Ease of navigation and exploration Rich analytics Rich semantics

Relational

OLAP

9

8

9 9 9 9

8 8 8 8

8

9

8

9

8

8 8

9

9 9

Agenda

™ Overview & BI Challenges ™ Introducing

the UDM ™ The UDM in Detail ™ Data Mining Overview

The Unified Dimensional Model The Best of Relational and OLAP Relational Reporting

OLAP Cubes

fact tables ™Full richness the dimensions’ attributes ™Transaction level access ™Star, snowflake, 3NF… ™Complex relationships ™Recursive self joins ™Slowly changing dimensions

navigation ™ Hierarchical presentation ™ Friendly entity names ™ Powerful MDX calculations ™ Central KPI framework ™ Multiple perspectives ™ Partitions ™ Aggregations ™ Distributed sources

™Multiple

™ Multidimensional

™ Allows

UDM’s Role

the User Model to be Enriched ™ Provides High Performance Queries ™ Allows the Capture of Business Rules to Support Analysis ™ Supports “Closing the Loop” Where the User Acts Upon the Data

Enterprise BI with UDM MOLAP

OLAP Browser

MOLAP Reporting Tool

Datamart Datamart

DW

UDM BI Applications

Scalable, High Performance UDM Server Analysis Services MOLAP

Datamart Datamart

DW

UDM

XML/A or OLE DB/OLAP

MOLAP

OLAP Browser

Reporting Tool

BI Applications

Analysis Server as UDM Server

™ Optimized

SQL to all major RDBMS

platforms ™ XML/A client API – –

SOAP-based Web service API supported by all major BI vendors

™ Managed – –

and native providers

ADOMD.NET OLE DB for OLAP

Streamlined BI Infrastructure

™ Unified

logical model for both relational and OLAP with superb performance and scalability ™ One data store to manage ensure data consistency and low TCO ™ Rich user experience with many Microsoft and 3rd-party tools

BI Development Studio ™ Complete,

integrated tool for the development of BI applications ™ Enterprise software development environment ™ Integrated with Visual Studio ™ Team development, source control, versioning, developer isolation, resource independent coding

™ Proactive –

caching

Automatic MOLAP cache creation and management

™ MOLAP –

Performance

becomes transparent

No requirement to manage an OLAP store

™ Relational

reporting enjoys MOLAP-like performance

MOLAP, ROLAP, and HOLAP

MOLAP Caching Data Source

Tool

MOLAP

MOLAP

Datamart

UDM

Datamart

DW

Cache Notifications

XML/A or ODBO

Analysis Services OLAP Browser Reporting Tool BI Applications

Agenda

™ Overview & BI Challenges ™ Introducing the UDM ™ The

UDM in Detail ™ Data Mining Overview

UDM and The BI Studio

UDM Data Sources

™ Multiple

OLTP – OLAP – XML –

Data Sources

™ Tables

Data Source Views

™ Views

™ Stored

Queries

Dimensions and Hierarchies

™ Dimensions –

Attribute-Based

Consolidates all attributes of an entity

™ Hierarchies Organize Data ™ Custom hierarchies can be

from attributes

created

™ No –

Cubes More Limits

Limited only by addressable objects (2147483647)

™ Stored

as XML ™ Logical Grouping of Measures and Dimensions

™ UDM

Perspectives

Provides Subject Area Centric View of the Data Warehouse ™ Perspectives Feature Allows User/Group Specific View of the Same Data

Categorization

™ Semantically

Measures – Dimensions – Attributes – Hierarchies –

Meaningful Categories

™ UDM

Time Has Built-In Knowledge of Time

Natural (Calendar) – Fiscal – Reporting – Manufacturing – ISO 8601 –

Translations ™ UDM

provides for multiple languages ™ Metadata in BI Studio and Client Tool Displayed in Multiple Languages

Attribute Semantics

™ Names

Vs. Keys ™ Ordering ™ Descretization

Key Performance Indicators

™ Actual

Value ™ Goal Value ™ Status ™ Trend ™ Graphical Representation

Closing the Loop

™ Integrated ™ Writeback –

Data Mining

The UDM is not read-only

™ Actions

ProClarity Business Intelligence Analytics Live Client (Excel based)

Live Server

Web Client Bundle (includes Dashboard Viewer)

OLAP Cube

Dashboard Server

OLAP Cube

OLAP Cube

OLAP Cube

OLAP Cube

Business Logic Server

Analyt ics Server

Select or and KPI Designer (All Prof essional Client s)

Web St andard (zero f oot print )

Web Prof essional (Includes Business Report er f or Excel) Deskt op Prof essional (Includes Business Report er f or Excel)

ProClarity Key Differentiators ™ Speed in decisions, real insight ™ One version of the truth ™ Analysis Platform ™ ProClarity + Microsoft; total BI platform ™ Super end-user friendly environment ™ All users own information ™ Several visualizations for quick

understanding ™ Platform total customizable

Low Total Cost of Ownership & Flexible to implement

Agenda

™ Overview & BI Challenges ™ Introducing the UDM ™ The UDM in Detail ™ Data

Mining Overview

Data Mining Architecture LOB LOB Application Application Model Model Browsing Browsing

Historical Historical Dataset Dataset SQL SQL OLE/DB OLE/DB Text Text File File

Web Web ..NET NET Native Native

Reporting Reporting

Data Transform (SSIS)

Prediction

Mining Models Cube Cube Cube Cube

New New Dataset Dataset Operations (SSIS)

CRoss Industry Standard Process for Data Mining (CRISP)

http://www.crisp-dm.org

Microsoft Mining Model Algorithms

Decision Trees

Clustering

Time Series

Int roduced in SQL Server 2000

Sequence Clustering

Association

Naïve Bayes

Neural Net

Microsoft Mining Models

When To Use What Analytical Problem

Examples

Algorithms

Classification: Assign cases to predefined classes

Credit risk analysis Churn analysis Customer retention

Decision Trees Naive Bayes Neural Nets

Segmentation: Taxonomy for grouping similar cases

Customer profile analysis Mailing campaign

Clustering Sequence Clustering

Association: Advanced counting for correlations

Market basket analysis Advanced data exploration

Decision Trees Association

Time Series Forecasting: Predict the future

Forecast sales Predict stock prices

Time Series

Prediction: Predict a value for a new case based on values for similar cases

Quote insurance rates Predict customer income

All

Deviation analysis: Discover how a case or segment differs from others

Credit card fraud detection Network infusion analysis

All

Thank You Javier Loría Business Intelligence, Solid Quality Learning [email protected]

Decision Trees ™

Classify each case to one of a few discrete broad categories of selected attributes ™ The process of building is recursive partitioning – splitting data into partitions and then splitting it up more ™ Initially all cases are in one big box

Decision Trees (cont.) ™

™ ™

The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable –

Several measures of purity

Then it repeats splitting for each new class –

Again testing all possible breaks

Unuseful branches of the tree can be pre-pruned or post-pruned

Decision Trees (cont.) ™

Decision trees are used for classification and prediction ™ Typical questions: – – – –

Predict which customers will leave Help in mailing and promotion campaigns Explain reasons for a decision What are the movies young female customers likely to buy?

™

Naïve Bayes

Classification and Prediction Model ™ Calculates probabilities for each possible state of the input attribute given each state of the predictable attribute

™ Used –

Naïve Bayes (cont.) for classification

Assign new cases to predefined classes

™ Some

typical questions:

Categorize bank loan applications – Determining which home telephone lines are used for Internet access – Assigning customers to predefined segments – Quickly gathering basic comprehension –

™

Cluster Analysis Grouping data into clusters –

™

Objects within a cluster have high similarity based on the attribute values

The class label of each object is not known ™ Several techniques – – – –

Partitioning methods Hierarchical methods Density based methods Model-based methods, more…

Cluster Analysis (cont.) ™ Segments

a heterogeneous population into a number of more homogenous subgroups or clusters ™ Some typical questions: Discover distinct groups of customers – Identify groups of houses in a city – In biology, derive animal and plant taxonomies –

Sequence Clustering

™ Analyzes

sequence-oriented data that contains discrete-valued series –

The sequence attribute in the series holds a set of events with a specific order that can be cosnsidered as a model

™ Typically –

used for Web customer analysis

Can be used for any other sequential data

Sequence Clustering (cont.) Click-Stream Analysis User

Sequence

1

frontpage news travel travel

2

news news news news news

3

frontpage news frontpage news frontpage

4

news news

5

frontpage news news travel travel travel

6

news weather weather weather weather

7

news health health business business business

8

frontpage sports sports sports weather

9

weather

Microsoft Mining Models

™ For – –

Association Rules market basket analyses

Identify cross-selling opportunities Arrange attractive packages

™ Considers

each attribute/value pair as an

item ™ An item set is a combination of items in a single transaction ™ The algorithm scans through the dataset trying to find item sets that tend to appear in many transactions

Association Rules – Support ™

™

Support is the percentage of rows containing the item combination compared to the total number of rows: Transaction 1: Transaction 2: Transaction 3: Transaction 4: Transaction 5:

Frozen pizza, cola, milk Milk, potato chips Cola, frozen pizza Milk, pretzels Cola, pretzels

The support for the rule “If a customer purchases Cola, then they will purchase Frozen Pizza” is 40%

Association Rules – Confidence ™

What if 60% of customers buy milk and only 20% of those buy potato chips? ™ The confidence of an association rule is the support for the combination divided by the support for the condition ™ This gives a confidence for a rule “If a customer purchases Milk, they will purchase Potato Chips” of (20% / 60%) = 33%

™ Predict

Time Series

continuous columns, such as product sales or stock performance in a forecasting scenario ™ Builds a model in two stages – –

First stage creates a list of optimal candidate input columns Second stage investigates each candidate input column and determines if it improves the model

™ ™

Neural Network Data modeling tool that is able to capture and represent complex input/output relationships Neural networks resemble the human brain in the following two ways: – –

™

A neural network acquires knowledge through learning A neural network's knowledge is stored within interneuron connection strengths known as synaptic weights

It explores all possible data relationships –

It can be slow

™ Training

Back-Propagation

a neural network is setting the best weights on the inputs of each of the units ™ The back-propagation process: – – –

Get a training example and calculate outputs Calculate the error – the difference between the calculated and the expected (known) result Adjust the weights to minimize the error