Data Mining and Predictive Analytics for Business Decisions 9781683926757

With many recent advances in data science, we have many more tools and techniques available for data analysts to extract

281 41 18MB

English Pages 272 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
Acknowledgments
Chapter 1: Data Mining and Business
Data Mining Algorithms and Activities
Data is the New Oil
Data-Driven Decision-Making
Business Analytics and Business Intelligence
Algorithmic Technologies Associated with Data Mining
Data Mining and Data Warehousing
Case Study 1.1: Business Applications of Data Mining
Case A – Classification
Case B – Regression
Case C – Anomaly Detection
Case D – Time Series
Case E – Clustering
Reference
Chapter 2: The Data Mining Process
Data Mining as a Process
Exploration
Analysis
Interpretation
Exploitation
Selecting a Data Mining Process
The CRISP-DM Process Model
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Selecting Data Analytics Languages
The Choices for Languages
References
Chapter 3: Framing Analytical Questions
How Does CRISP-DM Define the Business and Data Understanding Step?
The World of the Business Data Analyst
How Does Data Analysis Relate to Business Decision-Making?
How Do We Frame Analytical Questions?
What Are the Characteristics of Well-framed Analytical Questions?
Exercise 3.1 – Framed Questions About the Titanic Disaster
Case Study 3.1 – The San Francisco Airport Survey
Case Study 3.2 – Small Business Administration Loans
References
Chapter 4: Data Preparation
How Does CRISP-DM Define Data Preparation?
Steps in Preparing the Data Set for Analysis
Data Sources and Formats
What is Data Shaping?
The Flat-File Format
Application of Tools for Data Acquisition and Preparation
Exercise 4.1 – Shaping the Data File
Exercise 4.2 – Cleaning the Data File
Ensuring the Right Variables are Included
Using SQL to Extract the Right Data Set from Data Warehouses
Case Study 4.1: Cleaning and Shaping the SFO Survey Data Set
Case Study 4.2: Shaping the SBA Loans Data Set
Case Study 4.3: Additional SQL Queries
Reference
Chapter 5: Descriptive Analysis
Getting a Sense of the Data Set
Describe the Data Set
Explore the Data Set
Verify the Quality of the Data Set
Analysis Techniques to Describe the Variables
Exercise 5.1 – Descriptive Statistics
Distributions of Numeric Variables
Correlation
Exercise 5.2 – Descriptive Analysis of the Titanic Disaster Data
Case Study 5.1: Describing the SFO Survey Data Set
Solution Using R
Solution Using Python
Case Study 5.2: Describing the SBA Loans Data Set
Solution Using R
Solution Using Python
Reference
Chapter 6: Modeling
What is a Model?
How Does CRISP-DM Define Modeling?
Selecting the Modeling Technique
Modeling Assumptions
Generate Test Design
Design of Model Testing
Build the Model
Parameter Setting
Models
Model Assessment
Where Do Models Reside in a Computer?
The Data Mining Engine
The Model
Data Sources and Outputs
Traditional Data Sources
Static Data Sources
Real-Time Data Sources
Analytic Outputs
Model Building
Step 1: Framing Questions
Step 2: Selecting the Machine
Step 3: Selecting Known Data
Step 4: Training the Machine
Step 5: Testing the Model
Step 6: Deploying the Model
Step 7: Collecting New Data
Step 8: Updating the Model
Step 9: Learning – Repeat Steps 7 and 8
Step 10: Recommending Answers to the User
Reference
Chapter 7: Predictive Analytics with Regression Models
What is Supervised Learning?
Regression to the Mean
Linear Regression
Simple Linear Regression
The R-squared Coefficient
The Use of the p-value of the Coefficients
Strength of the Correlation Between Two Variables
Exercise 7.1 – Using SLR Analysis to Understand Franchise Advertising
Multivariate Linear Regression
Preparing to Build the Multivariate Model
Exercise 7.2 – Using Multivariate Linear Regression to Model Franchise Sales
Logistic Regression
What is Logistic Regression?
Exercise 7.3 – PassClass Case Study
Multivariate Logistic Regression
Exercise 7.4 – MLR Used to Analyze the Results of a Database Marketing Initiative
Where is Logistic Regression Used?
Comparing Linear and Logistic Regressions for Binary Outcomes
Case Study 7.1: Linear Regression Using the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 7.2: Linear Regression Using the SBA Loans Data Set
Solution in R
Solution in Python
Case Study 7.3: Logistic Regression Using the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 7.4: Logistic Regression Using the SBA Loans Data Set
Solution in R
Solution in Python
Chapter 8: Classification
Classification with Decision Trees
Building a Decision Tree
Exercise 8.1 – The Iris Data Set
The Problem with Decision Trees
Classification with Random Forest
Using a Random Forest Model
Exercise 8.2 – The Iris Data Set
Classification with Naïve Bayes
Exercise 8.3 – The HIKING Data Set
Computing the Conditional Probabilities
Case Study 8.1: Classification with the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 8.2: Classification with the SBA Loans Data Set
Solution in R
Solution in Python
Case Study 8.3: Classification with the Florence Nightingale Data Set
Solution in Python
Reference
Chapter 9: Clustering
What is Unsupervised Machine Learning?
What is Clustering Analysis?
Applying Clustering to Old Faithful Eruptions
Examples of Applications of Clustering Analysis
A Simple Clustering Example Using Regression
Hierarchical Clustering
Applying Hierarchical Clustering to Old Faithful Eruptions
Exercise 9.1 – Hierarchical Clustering and the Iris Data Set
K-Means Clustering
How Does the K-Means Algorithm Compute Cluster Centroids?
Applying K-Means Clustering to Old Faithful Eruptions
Exercise 9.2 – K-Means Clustering and the Iris Data Set
Hierarchical vs. K-Means Clustering
Case Study 9.1: Clustering with the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 9.2: Clustering with the SBA Loans Data Set
Solution in R
Solution in Python
Chapter 10: Time Series Forecasting
What is a Time Series?
Time Series Analysis
Types of Time Series Analysis
What is Forecasting?
Exercise 10.1 – Analysis of the US and China GDP Data Set
Case Studies
Case Study 10.1: Time Series Analysis of the SFO Survey Data Set
Solution in Excel
Case Study 10.2: Time Series Analysis of the SBA Loans Data set
Solution in R
Solution in Python
Case Study 10.3: Time Series Analysis of a Nest Data Set
Solution in Python
Reference
Chapter 11: Feature Selection
Using the Covariance Matrix
Factor Analysis
When to Use Factor Analysis
First Step in FA – Correlation
FA for Exploratory Analysis
Selecting the Number of Factors – The Scree Plot
Example 11.1: Restaurant Feedback
Factor Interpretation
Summary Activities to Perform a Factor Analysis
Case Study 11.1: Variable Reduction with the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 11.2: Hunting Diamonds
Solution in R
Solution in Python
Chapter 12: Anomaly Detection
What is an Anomaly?
What is an Outlier?
The Case Studies for the Exercises in Anomaly Detection
Anomaly Detection by Standardization – A Single Numerical Variable
Exercise 12.1 – Outliers in the Airline Delays Data Set – Z-Score
Anomaly Detection by Quartiles – Tukey Fences – With a Single Variable
Comparing Z-scores and Tukey Fences
Exercise 12.2 – Outliers in the Airline Delays Data Set – Tukey Fences
Anomaly Detection by Category – A Single Variable
Exercise 12.3 – Outliers in the Airline Delays Data Set – Categorical
Anomaly Detection by Clustering – Multiple Variables
Exercise 12.4 – Outliers in the Airline Delays Data Set – Clustering
Anomaly Detection Using Linear Regression by Residuals – Multiple Variables
Exercise 12.5 – Outliers in the Airline Delays Data Set – Residuals
Case Study 12.1: Outliers in the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 12.2: Outliers in the SBA Loans Data Set
Solution in R
Solution in Python
References
Chapter 13: Text Data Mining
What is Text Data Mining?
What are Some Examples of Text-Based Analytical Questions?
Tools for Text Data Mining
Sources and Formats of Text Data
Term Frequency Analysis
How Does It Apply to Text Business Data Analysis?
Exercise 13.1 – Case Study Using a Training Survey Data Set
Word Frequency Analysis Using R
Keyword Analysis
Exercise 13.2 – Case Study Using Data Set D: Résumé and Job Description
Keyword Word Analysis in Voyant
Term Frequency Analysis in R
Visualizing Text Data
Exercise 13.3 – Case Study Using the Training Survey Data Set
Visualizing the Text Using Excel
Visualizing the Text Using Voyant
Visualizing the Text Using R
Text Similarity Scoring
What is Text Similarity Scoring?
Exercise 13.4 – Case Study Using the Occupation Description Data Set
Analysis Using an Online Text Similarity Scoring Tool
Similarity Scoring Analysis Using R
Exercise 13.5 – Résumé and Job Descriptions Similarly Scoring Using R
Case Study 13.1 – Term Frequency Analysis of Product Reviews
Term Frequency Analysis Using Voyant
Term Frequency Analysis Using R
References
Chapter 14: Working with Large Data Sets
Using Sampling to Work with Large Data Files
Exercise 14.1 – Big Data Analysis
Case Study 14.1 Using the BankComplaints Big Data File
Exercise 12.3 – Outliers in the Airline Delays Data Set – Categorical
Anomaly Detection by Clustering – Multiple Variables
Exercise 12.4 – Outliers in the Airline Delays Data Set – Clustering
Anomaly Detection Using Linear Regression by Residuals – Multiple Variables
Exercise 12.5 – Outliers in the Airline Delays Data Set – Residuals
Case Study 12.1: Outliers in the SFO Survey Data Set
Solution in R
Solution in Python
Case Study 12.2: Outliers in the SBA Loans Data Set
Solution in R
Solution in Python
References
Chapter 13: Text Data Mining
What is Text Data Mining?
What are Some Examples of Text-Based Analytical Questions?
Tools for Text Data Mining
Sources and Formats of Text Data
Term Frequency Analysis
How Does It Apply to Text Business Data Analysis?
Exercise 13.1 – Case Study Using a Training Survey Data Set
Word Frequency Analysis Using R
Keyword Analysis
Exercise 13.2 – Case Study Using Data Set D: Résumé and Job Description
Keyword Word Analysis in Voyant
Term Frequency Analysis in R
Visualizing Text Data
Exercise 13.3 – Case Study Using the Training Survey Data Set
Visualizing the Text Using Excel
Visualizing the Text Using Voyant
Visualizing the Text Using R
Text Similarity Scoring
What is Text Similarity Scoring?
Exercise 13.4 – Case Study Using the Occupation Description Data Set
Analysis Using an Online Text Similarity Scoring Tool
Similarity Scoring Analysis Using R
Exercise 13.5 – Résumé and Job Descriptions Similarly Scoring Using R
Case Study 13.1 – Term Frequency Analysis of Product Reviews
Term Frequency Analysis Using Voyant
Term Frequency Analysis Using R
References
Chapter 14: Working with Large Data Sets
Using Sampling to Work with Large Data Files
Exercise 14.1 – Big Data Analysis
Case Study 14.1 Using the BankComplaints Big Data File

Data Mining and Predictive Analytics for Business Decisions
 9781683926757

  • Commentary
  • True EPUB
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers