Statistics Slam Dunk: Statistical analysis with R on real NBA data 9781633438682

Learn statistics by analyzing professional basketball data! In this action-packed book, you’ll build your skills in expl

149 103

English Pages 951 Year 2024

Report DMCA / Copyright

DOWNLOAD EPUB FILE

Table of contents :
about the cover illustration

1 Getting started

1.1 Brief introductions to R and RStudio

1.2 Why R?

Visualizing data

Installing and using packages to extend R’s functional footprint

Networking with other users

Interacting with big data

Landing a job

1.3 How this book works

2 Exploring data

2.1 Loading packages

2.2 Importing data

2.3 Wrangling data

Removing variables

Removing observations

Viewing data

Converting variable types

Creating derived variables

2.4 Variable breakdown

2.5 Exploratory data analysis

Computing basic statistics

Returning data

Computing and visualizing frequency distributions

Computing and visualizing correlations

Computing and visualizing means and medians

2.6 Writing data

3 Segmentation analysis

3.1 More on tanking and the draft

3.2 Loading packages

3.3 Importing and viewing data

3.4 Creating another derived variable

3.5 Visualizing means and medians

Regular season games played

Minutes played per game

Career win shares

Win shares every 48 minutes

3.6 Preliminary conclusions

3.7 Sankey diagram

3.8 Expected value analysis

3.9 Hierarchical clustering

4 Constrained optimization

4.1 What is constrained optimization?

4.2 Loading packages

4.3 Importing data

4.4 Knowing the data

4.5 Visualizing the data

Density plots

Boxplots

Correlation plot

Bar chart

4.6 Constrained optimization setup

4.7 Constrained optimization construction

4.8 Results

5 Regression models

5.1 Loading packages

5.2 Importing data

5.3 Knowing the data

5.4 Identifying outliers

Prototype

Identifying other outliers

5.5 Checking for normality

Prototype

Checking other distributions for normality

5.6 Visualizing and testing correlations

Prototype

Visualizing and testing other correlations

5.7 Multiple linear regression

Subsetting data into train and test

Fitting the model

Returning and interpreting the results

Checking for multicollinearity

Running and interpreting model diagnostics

Comparing models

Predicting

5.8 Regression tree

6 More wrangling and visualizing data

6.1 Loading packages

6.2 Importing data

6.3 Wrangling data

Subsetting data sets

Joining data sets

6.4 Analysis

First quarter

Second quarter

Third quarter

Fourth quarter

Comparing best and worst teams

Second-half results

7 T-testing and effect size testing

7.1 Loading packages

7.2 Importing data

7.3 Wrangling data

7.4 Analysis on 2018-19 data

2018-19 regular season analysis

2019 postseason analysis

Effect size testing

7.5 Analysis on 2019-20 data

2019-20 regular season analysis (pre-COVID)

2019-20 regular season analysis (post-COVID)

More effect size testing

8 Optimal stopping

8.1 Loading packages

8.2 Importing images

8.3 Importing and viewing data

8.4 Exploring and wrangling data

8.5 Analysis

Milwaukee Bucks

Atlanta Hawks

Charlotte Hornets

NBA

9 Chi-square testing and more effect size testing

9.1 Loading packages

9.2 Importing data

9.3 Wrangling data

9.4 Computing permutations

9.5 Visualizing results

Creating a data source

Visualizing the results

Conclusions

9.6 Statistical test of significance

Creating a contingency table and a balloon plot

Running a chi-square test

Creating a mosaic plot

9.7 Effect size testing

10 Doing more with ggplot

10.1 Loading packages

10.2 Importing and viewing data

10.3 Salaries and salary cap analysis

10.4 Analysis

Plotting and computing correlations between team payrolls and regular season wins

Payrolls versus end-of-season results

Payroll comparisons

11 K-means clustering

11.1 Loading packages

11.2 Importing data

11.3 A primer on standard deviations and z-scores

11.4 Analysis

Wrangling data

Evaluating payrolls and wins

11.5 K-means clustering

More data wrangling

K-means clustering

12 Computing and plotting inequality

12.1 Gini coefficients and Lorenz curves

12.2 Loading packages

12.3 Importing and viewing data

12.4 Wrangling data

12.5 Gini coefficients

12.6 Lorenz curves

12.7 Salary inequality and championships

Wrangling data

T-test

Effect size testing

12.8 Salary inequality and wins and losses

T-test

Effect size testing

12.9 Gini coefficient bands versus winning percentage

13 More with Gini coefficients and Lorenz curves

13.1 Loading packages

13.2 Importing and viewing data

13.3 Wrangling data

13.4 Gini coefficients

13.5 Lorenz curves

13.6 For loops

Simple demonstration

Applying what we’ve learned

13.7 User-defined functions

13.8 Win share inequality and championships

Wrangling data

T-test

Effect size testing

13.9 Win share inequality and wins and losses

T-test

Effect size testing

13.10 Gini coefficient bands versus winning percentage

14 Intermediate and advanced modeling

14.1 Loading packages

14.2 Importing and wrangling data

Subsetting and reshaping our data

Extracting a substring to create a new variable

Joining data

Importing and wrangling additional data sets

Joining data (one more time)

Creating standardized variables

14.3 Exploring data

14.4 Correlations

Computing and plotting correlation coefficients

Running correlation tests

14.5 Analysis of variance models

Data wrangling and data visualization

One-way ANOVAs

14.6 Logistic regressions

Data wrangling

Model development

14.7 Paired data before and after

15 The Lindy effect

15.1 Loading packages

15.2 Importing and viewing data

15.3 Visualizing data

Creating and evaluating violin plots

Creating paired histograms

Printing our plots

15.4 Pareto charts

ggplot2 and ggQC packages

qcc package

16 Randomness versus causality

16.1 Loading packages

16.2 Importing and wrangling data

16.3 Rule of succession and the hot hand

16.4 Player-level analysis

Player 1 of 3: Giannis Antetokounmpo

Player 2 of 3: Julius Randle

Player 3 of 3: James Harden

16.5 League-wide analysis

17 Collective intelligence

17.1 Loading packages

17.2 Importing data

17.3 Wrangling data

17.4 Automated exploratory data analysis

Baseline EDA with tableone

Over/under EDA with DataExplorer

Point spread EDA with SmartEDA

17.5 Results

Over/under

Point spreads

18 Statistical dispersion methods

18.1 Loading a package

18.2 Importing data

18.3 Exploring and wrangling data

18.4 Measures of statistical dispersion and intra-season parity

Variance method

Standard deviation method

Range method

Mean absolute deviation method

Median absolute deviation method

18.5 Churn and inter-season parity

Data wrangling

Computing and visualizing churn

19 Data standardization

19.1 Loading a package

19.2 Importing and viewing data

19.3 Wrangling data

Treating duplicate records

Final trimmings

19.4 Standardizing data

Z-score method

Standard deviation method

Centering method

Range method

20 Finishing up

20.1 Cluster analysis

20.2 Significance testing

20.3 Effect size testing

20.4 Modeling

20.5 Operations research

20.6 Probability

20.7 Statistical dispersion

20.8 Standardization

20.9 Summary statistics and visualization

appendix More ggplot2 visualizations

index

Statistics Slam Dunk: Statistical analysis with R on real NBA data
 9781633438682

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
Recommend Papers