Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 9781837636303

Transform your data into insights with must-know techniques and mathematical concepts to unravel the secrets hidden with

121 52

English Pages 433 Year 2024

Table of contents :
Principles of Data Science
Contributor
About the author
About the reviewer
Preface
Who is this book for?
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
1
Data Science Terminology
What is data science?
Understanding basic data science terminology
Why data science?
Example – predicting COVID-19 with machine learning
The data science Venn diagram
The math
Computer programming
Example – parsing a single tweet
Domain knowledge
Some more terminology
Data science case studies
Case study – automating government paper pushing
Case study – what’s in a job description?
Summary
2
Types of Data
Structured versus unstructured data
Quantitative versus qualitative data
Digging deeper
The four levels of data
The nominal level
Measures of center
The ordinal level
The interval level
The ratio level
Data is in the eye of the beholder
Summary
Questions and answers
3
The Five Steps of Data Science
Introduction to data science
Overview of the five steps
Exploring the data
Guiding questions for data exploration
DataFrames
Series
Exploration tips for qualitative data
Summary
4
Basic Mathematics
Basic symbols and terminology
Vectors and matrices
Arithmetic symbols
Summation
Logarithms/exponents
Set theory
Linear algebra
Matrix multiplication
How to multiply matrices together
Summary
5
Impossible or Improbable – A Gentle Introduction to Probability
Basic definitions
What do we mean by “probability”?
Bayesian versus frequentist
Frequentist approach
The law of large numbers
Compound events
Conditional probability
How to utilize the rules of probability
The addition rule
Mutual exclusivity
The multiplication rule
Independence
Complementary events
Introduction to binary classifiers
Summary
6
Advanced Probability
Bayesian ideas revisited
Bayes’ theorem
More applications of Bayes’ theorem
Random variables
Discrete random variables
Continuous random variables
Summary
7
What Are the Chances? An Introduction to Statistics
What are statistics?
How do we obtain and sample data?
Obtaining data
Observational
Experimental
Sampling data
How do we measure statistics?
Measures of center
Measures of variation
The coefficient of variation
Measures of relative standing
The insightful part – correlations in data
The empirical rule
Example – exam scores
Summary
8
Advanced Statistics
Understanding point estimates
Sampling distributions
Confidence intervals
Hypothesis tests
Conducting a hypothesis test
One-sample t-tests
Type I and Type II errors
Hypothesis testing for categorical variables
Chi-square goodness of fit test
Chi-square test for association/independence
Summary
9
Communicating Data
Why does communication matter?
Identifying effective visualizations
Scatter plots
Line graphs
Bar charts
Histograms
Box plots
When graphs and statistics lie
Correlation versus causation
Simpson’s paradox
If correlation doesn’t imply causation, then what does?
Verbal communication
It’s about telling a story
On the more formal side of things
The why/how/what strategy for presenting
Summary
10
How to Tell if Your Toaster is Learning – Machine Learning Essentials
Introducing ML
Example – facial recognition
ML isn’t perfect
How does ML work?
Types of ML
SL
UL
RL
Overview of the types of ML
ML paradigms – pros and cons
Predicting continuous variables with linear regression
Correlation versus causation
Causation
Adding more predictors
Regression metrics
Summary
11
Predictions Don’t Grow on Trees, or Do They?
Performing naïve Bayes classification
Classification metrics
Understanding decision trees
Measuring purity
Exploring the Titanic dataset
Dummy variables
Diving deep into UL
When to use UL
k-means clustering
The Silhouette Coefficient
Feature extraction and PCA
Summary
12
Introduction to Transfer Learning and Pre-Trained Models
Understanding pre-trained models
Benefits of using pre-trained models
Commonly used pre-trained models
Decoding BERT’s pre-training
TL
Different types of TL
Inductive TL
Transductive TL
Unsupervised TL – feature extraction
TL with BERT and GPT
Examples of TL
Example – Fine-tuning a pre-trained model for text classification
Summary
13
Mitigating Algorithmic Bias and Tackling Model and Data Drift
Understanding algorithmic bias
Types of bias
Sources of algorithmic bias
Measuring bias
Consequences of unaddressed bias and the importance of fairness
Mitigating algorithmic bias
Mitigation during data preprocessing
Mitigation during model in-processing
Mitigation during model postprocessing
Bias in LLMs
Uncovering bias in GPT-2
Emerging techniques in bias and fairness in ML
Understanding model drift and decay
Model drift
Data drift
Mitigating drift
Understanding the context
Continuous monitoring
Regular model retraining
Implementing feedback systems
Model adaptation techniques
Summary
14
AI Governance
Mastering data governance
Current hurdles in data governance
Data management: crafting the bedrock
Data ingestion – the gateway to information
Data integration – from collection to delivery
Data warehouses and entity resolution
The quest for data quality
Documentation and cataloging – the unsung heroes of governance
Understanding the path of data
Regulatory compliance and audit preparedness
Change management and impact analysis
Upholding data quality
Troubleshooting and analysis
Navigating the intricacy and the anatomy of ML governance
ML governance pillars
Model interpretability
The many facets of ML development
Beyond training – model deployment and monitoring
A guide to architectural governance
The five pillars of architectural governance
Transformative architectural principles
Zooming in on architectural dimensions
Summary
15
Navigating Real-World Data Science Case Studies in Action
Introduction to the COMPAS dataset case study
Understanding the task/outlining success
Preliminary data exploration
Preparing the data for modeling
Final thoughts
Text embeddings using pretrainedmodels and OpenAI
Setting up and importing necessary libraries
Data collection – fetching the textbook data
Converting text to embeddings
Querying – searching for relevant information
Concluding thoughts – the power of modern pre-trained models
Summary
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts
Download a free PDF copy of this book

$Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 9781837636303$

Author / Uploaded
Sinan Ozdemir

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Recommend Papers

$Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 1837636303, 9781837636303$

Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 1837636303, 9781837636303

Transform your data into insights with must-know techniques and mathematical concepts to unravel the secrets hidden with

107 61 30MB Read more

$Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 9781837636303$

Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 9781837636303

Transform your data into insights with must-know techniques and mathematical concepts to unravel the secrets hidden with

102 30 3MB Read more

$Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 1837636303, 9781837636303$

Principles of Data Science: A beginner's guide to essential math and coding skills for data fluency and machine learning [3 ed.] 1837636303, 9781837636303

Transform your data into insights with must-know techniques and mathematical concepts to unravel the secrets hidden with

98 98 9MB Read more

Python Data Science: Deep Learning Guide for Beginners with Data Science. Python Programming and Crush Course.

Everything You Need to Know About Python Data ScienceDo you want to get started on Python Data Science?Wondering what yo

274 130 4MB Read more

Python for Data Science: A step-by-step Python Programming Guide to Master Big Data, Analysis, Machine Learning, and Artificial Intelligence (Learn Python ... data analysis and machine learning)

Are you a new business owner? Or an entrepreneur looking to catch up to the big companies in your industrial sector? Do

276 110 1MB Read more

Python Crash Course For Beginners Master Data Analysis Data Science Machine Learning. [2 ed.]

820 137 4MB Read more

Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals 8119177061, 9788119177066

Mastering Data Wrangling and Analysis for Modern Data Science "Learning Pandas 2.0" is an essential guide for

369 121 3MB Read more

Python for Beginners: Comprehensive Guide to the Basics of Programming, Machine Learning, Data Science and Analysis with Python.

Python is one of the most powerful computer programming languages of all time, for several reasons we’ll discuss in the

189 108 3MB Read more

Python Machine Learning for Beginners: Learning from scratch NumPy, Pandas, Matplotlib, Seaborn, Scikitlearn, and TensorFlow for Machine Learning and ... Learning & Data Science for Beginners) 9781734790153, 1734790156

Python Machine Learning for Beginners Machine Learning (ML) and Artificial Intelligence (AI) are here to stay. Yes, that

202 103 16MB Read more

Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals 8119177061, 9788119177066

Mastering Data Wrangling and Analysis for Modern Data Science "Learning Pandas 2.0" is an essential guide for

102 2 1MB Read more