Building Statistical Models in Python [1 ed.] 9781804614280

Make data-driven, informed decisions and enhance your statistical expertise in Python by turning raw data into meaningfu

113 16 14MB

English Pages 702 Year 2023

Table of contents :
Building Statistical Models in Python
Contributors
About the authors
About the reviewers
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
Part 1:Introduction to Statistics
1
Sampling and Generalization
Software and environment setup
Population versus sample
Population inference from samples
Randomized experiments
Observational study
Sampling strategies – random, systematic, stratified, and clustering
Probability sampling
Non-probability sampling
Summary
2
Distributions of Data
Technical requirements
Understanding data types
Nominal data
Ordinal data
Interval data
Ratio data
Visualizing data types
Measuring and describing distributions
Measuring central tendency
Measuring variability
Measuring shape
The normal distribution and central limit theorem
The Central Limit Theorem
Bootstrapping
Confidence intervals
Standard error
Correlation coefficients (Pearson’s correlation)
Permutations
Permutations and combinations
Permutation testing
Transformations
Summary
References
3
Hypothesis Testing
The goal of hypothesis testing
Overview of a hypothesis test for the mean
Scope of inference
Hypothesis test steps
Type I and Type II errors
Type I errors
Type II errors
Basics of the z-test – the z-score, z-statistic, critical values, and p-values
The z-score and z-statistic
A z-test for means
z-test for proportions
Power analysis for a two-population pooled z-test
Summary
4
Parametric Tests
Assumptions of parametric tests
Normally distributed population data
Equal population variance
T-test – a parametric hypothesis test
T-test for means
Two-sample t-test – pooled t-test
Two-sample t-test – Welch’s t-test
Paired t-test
Tests with more than two groups and ANOVA
Multiple tests for significance
ANOVA
Pearson’s correlation coefficient
Power analysis examples
Summary
References
5
Non-Parametric Tests
When parametric test assumptions are violated
Permutation tests
The Rank-Sum test
The test statistic procedure
Normal approximation
Rank-Sum example
The Signed-Rank test
The Kruskal-Wallis test
Chi-square distribution
Chi-square goodness-of-fit
Chi-square test of independence
Chi-square goodness-of-fit test power analysis
Spearman’s rank correlation coefficient
Summary
Part 2:Regression Models
6
Simple Linear Regression
Simple linear regression using OLS
Coefficients of correlation and determination
Coefficients of correlation
Coefficients of determination
Required model assumptions
A linear relationship between the variables
Normality of the residuals
Homoscedasticity of the residuals
Sample independence
Testing for significance and validating models
Model validation
Summary
7
Multiple Linear Regression
Multiple linear regression
Adding categorical variables
Evaluating model fit
Interpreting the results
Feature selection
Statistical methods for feature selection
Performance-based methods for feature selection
Recursive feature elimination
Shrinkage methods
Ridge regression
LASSO regression
Elastic Net
Dimension reduction
PCA – a hands-on introduction
PCR – a hands-on salary prediction study
Summary
Part 3:Classification Models
8
Discrete Models
Probit and logit models
Multinomial logit model
Poisson model
The Poisson distribution
Modeling count data
The negative binomial regression model
Negative binomial distribution
Summary
9
Discriminant Analysis
Bayes’ theorem
Probability
Conditional probability
Discussing Bayes’ Theorem
Linear Discriminant Analysis
Supervised dimension reduction
Quadratic Discriminant Analysis
Summary
Part 4:Time Series Models
10
Introduction to Time Series
What is a time series?
Goals of time series analysis
Statistical measurements
Mean
Variance
Autocorrelation
Cross-correlation
The white-noise model
Stationarity
Summary
References
11
ARIMA Models
Technical requirements
Models for stationary time series
Autoregressive (AR) models
Moving average (MA) models
Autoregressive moving average (ARMA) models
Models for non-stationary time series
ARIMA models
Seasonal ARIMA models
More on model evaluation
Summary
References
12
Multivariate Time Series
Multivariate time series
Time-series cross-correlation
ARIMAX
Preprocessing the exogenous variables
Fitting the model
Assessing model performance
VAR modeling
Step 1 – visual inspection
Step 2 – selecting the order of AR(p)
Step 3 – assessing cross-correlation
Step 4 – building the VAR(p,q) model
Step 5 – testing the forecast
Step 6 – building the forecast
Summary
References
Part 5:Survival Analysis
13
Time-to-Event Variables – An Introduction
What is censoring?
Left censoring
Right censoring
Interval censoring
Type I and Type II censoring
Survival data
Survival Function, Hazard and Hazard Ratio
Summary
14
Survival Models
Technical requirements
Kaplan-Meier model
Model definition
Model example
Exponential model
Model example
Cox Proportional Hazards regression model
Step 1
Step 2
Step 3
Step 4
Step 5
Summary
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts

Building Statistical Models in Python [1 ed.]
9781804614280

Author / Uploaded
Huy Hoang Nguyen
Paul N Adams
Stuart J Miller

Similar Topics
Computers
Algorithms and Data Structures: Pattern Recognition

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

Recommend Papers

Statistical Language Models for Information Retrieval 159829590X

As online information grows dramatically, search engines such as Google are playing a more and more important role in ou

458 49 926KB Read more

Statistical Tableau: How to Use Statistical Models and Decision Science in Tableau 9781098151799

In today's data-driven world, understanding statistical models is crucial for effective analysis and decision makin

103 15 24MB Read more

$Statistical Learning with Math and Python: 100 Exercises for Building Logic 9789811578779, 9789811578762, 981157877X$

Statistical Learning with Math and Python: 100 Exercises for Building Logic 9789811578779, 9789811578762, 981157877X

The most crucial ability for machine learning and data science is mathematical logic for grasping their essence rather t

145 44 31MB Read more

Linear Models with Python 9781138483958

Like its widely praised, best-selling companion version, Linear Models with R, this book replaces R with Python to seaml

426 70 4MB Read more

New Concepts in Imaging: Optical and Statistical Models 9782759824878, 9782759809585

This book is a collection of 19 articles which reflect the courses given at the Collège de France/Summer school “Reconst

138 87 39MB Read more

Hands-On Markov Models with Python 9781788625449

1,015 36 14MB Read more

Mastering Large Language Models with Python 9788197081828

"Mastering Large Language Models with Python" is an indispensable resource that offers a comprehensive explora

105 105 5MB Read more

Pretrain Vision and Large Language Models in Python 9781804618257

Master the art of training vision and large language models with conceptual fundaments and industry-expert guidance. Lea

813 32 7MB Read more

Statistical Analysis with Swift: Data Sets, Statistical Models, and Predictions on Apple Platforms 1484277643, 9781484277645

Work with large data sets, create statistical models, and make predictions with statistical methods using the Swift prog

108 99 3MB Read more

Pretrain Vision and Large Language Models in Python: End-to-end techniques for building and deploying foundation models on AWS [1 ed.] 9781804618257

Master the art of training vision and large language models with conceptual fundaments and industry-expert guidance. Lea

118 11 8MB Read more