Building Statistical Models in Python [1 ed.] 9781804614280

Make data-driven, informed decisions and enhance your statistical expertise in Python by turning raw data into meaningfu

113 16 14MB

English Pages 702 Year 2023

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
Building Statistical Models in Python
Contributors
About the authors
About the reviewers
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
Part 1:Introduction to Statistics
1
Sampling and Generalization
Software and environment setup
Population versus sample
Population inference from samples
Randomized experiments
Observational study
Sampling strategies – random, systematic, stratified, and clustering
Probability sampling
Non-probability sampling
Summary
2
Distributions of Data
Technical requirements
Understanding data types
Nominal data
Ordinal data
Interval data
Ratio data
Visualizing data types
Measuring and describing distributions
Measuring central tendency
Measuring variability
Measuring shape
The normal distribution and central limit theorem
The Central Limit Theorem
Bootstrapping
Confidence intervals
Standard error
Correlation coefficients (Pearson’s correlation)
Permutations
Permutations and combinations
Permutation testing
Transformations
Summary
References
3
Hypothesis Testing
The goal of hypothesis testing
Overview of a hypothesis test for the mean
Scope of inference
Hypothesis test steps
Type I and Type II errors
Type I errors
Type II errors
Basics of the z-test – the z-score, z-statistic, critical values, and p-values
The z-score and z-statistic
A z-test for means
z-test for proportions
Power analysis for a two-population pooled z-test
Summary
4
Parametric Tests
Assumptions of parametric tests
Normally distributed population data
Equal population variance
T-test – a parametric hypothesis test
T-test for means
Two-sample t-test – pooled t-test
Two-sample t-test – Welch’s t-test
Paired t-test
Tests with more than two groups and ANOVA
Multiple tests for significance
ANOVA
Pearson’s correlation coefficient
Power analysis examples
Summary
References
5
Non-Parametric Tests
When parametric test assumptions are violated
Permutation tests
The Rank-Sum test
The test statistic procedure
Normal approximation
Rank-Sum example
The Signed-Rank test
The Kruskal-Wallis test
Chi-square distribution
Chi-square goodness-of-fit
Chi-square test of independence
Chi-square goodness-of-fit test power analysis
Spearman’s rank correlation coefficient
Summary
Part 2:Regression Models
6
Simple Linear Regression
Simple linear regression using OLS
Coefficients of correlation and determination
Coefficients of correlation
Coefficients of determination
Required model assumptions
A linear relationship between the variables
Normality of the residuals
Homoscedasticity of the residuals
Sample independence
Testing for significance and validating models
Model validation
Summary
7
Multiple Linear Regression
Multiple linear regression
Adding categorical variables
Evaluating model fit
Interpreting the results
Feature selection
Statistical methods for feature selection
Performance-based methods for feature selection
Recursive feature elimination
Shrinkage methods
Ridge regression
LASSO regression
Elastic Net
Dimension reduction
PCA – a hands-on introduction
PCR – a hands-on salary prediction study
Summary
Part 3:Classification Models
8
Discrete Models
Probit and logit models
Multinomial logit model
Poisson model
The Poisson distribution
Modeling count data
The negative binomial regression model
Negative binomial distribution
Summary
9
Discriminant Analysis
Bayes’ theorem
Probability
Conditional probability
Discussing Bayes’ Theorem
Linear Discriminant Analysis
Supervised dimension reduction
Quadratic Discriminant Analysis
Summary
Part 4:Time Series Models
10
Introduction to Time Series
What is a time series?
Goals of time series analysis
Statistical measurements
Mean
Variance
Autocorrelation
Cross-correlation
The white-noise model
Stationarity
Summary
References
11
ARIMA Models
Technical requirements
Models for stationary time series
Autoregressive (AR) models
Moving average (MA) models
Autoregressive moving average (ARMA) models
Models for non-stationary time series
ARIMA models
Seasonal ARIMA models
More on model evaluation
Summary
References
12
Multivariate Time Series
Multivariate time series
Time-series cross-correlation
ARIMAX
Preprocessing the exogenous variables
Fitting the model
Assessing model performance
VAR modeling
Step 1 – visual inspection
Step 2 – selecting the order of AR(p)
Step 3 – assessing cross-correlation
Step 4 – building the VAR(p,q) model
Step 5 – testing the forecast
Step 6 – building the forecast
Summary
References
Part 5:Survival Analysis
13
Time-to-Event Variables – An Introduction
What is censoring?
Left censoring
Right censoring
Interval censoring
Type I and Type II censoring
Survival data
Survival Function, Hazard and Hazard Ratio
Summary
14
Survival Models
Technical requirements
Kaplan-Meier model
Model definition
Model example
Exponential model
Model example
Cox Proportional Hazards regression model
Step 1
Step 2
Step 3
Step 4
Step 5
Summary
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share Your Thoughts

Building Statistical Models in Python [1 ed.]
 9781804614280

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers