Table of contents : Principles of Data Science Contributor About the author About the reviewer Preface Who is this book for? What this book covers To get the most out of this book Download the example code files Conventions used Get in touch Share Your Thoughts Download a free PDF copy of this book 1 Data Science Terminology What is data science? Understanding basic data science terminology Why data science? Example – predicting COVID-19 with machine learning The data science Venn diagram The math Computer programming Example – parsing a single tweet Domain knowledge Some more terminology Data science case studies Case study – automating government paper pushing Case study – what’s in a job description? Summary 2 Types of Data Structured versus unstructured data Quantitative versus qualitative data Digging deeper The four levels of data The nominal level Measures of center The ordinal level The interval level The ratio level Data is in the eye of the beholder Summary Questions and answers 3 The Five Steps of Data Science Introduction to data science Overview of the five steps Exploring the data Guiding questions for data exploration DataFrames Series Exploration tips for qualitative data Summary 4 Basic Mathematics Basic symbols and terminology Vectors and matrices Arithmetic symbols Summation Logarithms/exponents Set theory Linear algebra Matrix multiplication How to multiply matrices together Summary 5 Impossible or Improbable – A Gentle Introduction to Probability Basic definitions What do we mean by “probability”? Bayesian versus frequentist Frequentist approach The law of large numbers Compound events Conditional probability How to utilize the rules of probability The addition rule Mutual exclusivity The multiplication rule Independence Complementary events Introduction to binary classifiers Summary 6 Advanced Probability Bayesian ideas revisited Bayes’ theorem More applications of Bayes’ theorem Random variables Discrete random variables Continuous random variables Summary 7 What Are the Chances? An Introduction to Statistics What are statistics? How do we obtain and sample data? Obtaining data Observational Experimental Sampling data How do we measure statistics? Measures of center Measures of variation The coefficient of variation Measures of relative standing The insightful part – correlations in data The empirical rule Example – exam scores Summary 8 Advanced Statistics Understanding point estimates Sampling distributions Confidence intervals Hypothesis tests Conducting a hypothesis test One-sample t-tests Type I and Type II errors Hypothesis testing for categorical variables Chi-square goodness of fit test Chi-square test for association/independence Summary 9 Communicating Data Why does communication matter? Identifying effective visualizations Scatter plots Line graphs Bar charts Histograms Box plots When graphs and statistics lie Correlation versus causation Simpson’s paradox If correlation doesn’t imply causation, then what does? Verbal communication It’s about telling a story On the more formal side of things The why/how/what strategy for presenting Summary 10 How to Tell if Your Toaster is Learning – Machine Learning Essentials Introducing ML Example – facial recognition ML isn’t perfect How does ML work? Types of ML SL UL RL Overview of the types of ML ML paradigms – pros and cons Predicting continuous variables with linear regression Correlation versus causation Causation Adding more predictors Regression metrics Summary 11 Predictions Don’t Grow on Trees, or Do They? Performing naïve Bayes classification Classification metrics Understanding decision trees Measuring purity Exploring the Titanic dataset Dummy variables Diving deep into UL When to use UL k-means clustering The Silhouette Coefficient Feature extraction and PCA Summary 12 Introduction to Transfer Learning and Pre-Trained Models Understanding pre-trained models Benefits of using pre-trained models Commonly used pre-trained models Decoding BERT’s pre-training TL Different types of TL Inductive TL Transductive TL Unsupervised TL – feature extraction TL with BERT and GPT Examples of TL Example – Fine-tuning a pre-trained model for text classification Summary 13 Mitigating Algorithmic Bias and Tackling Model and Data Drift Understanding algorithmic bias Types of bias Sources of algorithmic bias Measuring bias Consequences of unaddressed bias and the importance of fairness Mitigating algorithmic bias Mitigation during data preprocessing Mitigation during model in-processing Mitigation during model postprocessing Bias in LLMs Uncovering bias in GPT-2 Emerging techniques in bias and fairness in ML Understanding model drift and decay Model drift Data drift Mitigating drift Understanding the context Continuous monitoring Regular model retraining Implementing feedback systems Model adaptation techniques Summary 14 AI Governance Mastering data governance Current hurdles in data governance Data management: crafting the bedrock Data ingestion – the gateway to information Data integration – from collection to delivery Data warehouses and entity resolution The quest for data quality Documentation and cataloging – the unsung heroes of governance Understanding the path of data Regulatory compliance and audit preparedness Change management and impact analysis Upholding data quality Troubleshooting and analysis Navigating the intricacy and the anatomy of ML governance ML governance pillars Model interpretability The many facets of ML development Beyond training – model deployment and monitoring A guide to architectural governance The five pillars of architectural governance Transformative architectural principles Zooming in on architectural dimensions Summary 15 Navigating Real-World Data Science Case Studies in Action Introduction to the COMPAS dataset case study Understanding the task/outlining success Preliminary data exploration Preparing the data for modeling Final thoughts Text embeddings using pretrainedmodels and OpenAI Setting up and importing necessary libraries Data collection – fetching the textbook data Converting text to embeddings Querying – searching for relevant information Concluding thoughts – the power of modern pre-trained models Summary Index Why subscribe? Other Books You May Enjoy Packt is searching for authors like you Share Your Thoughts Download a free PDF copy of this book