*142*
*58*
*15MB*

*English*
*Pages 516*
*Year 2023*

Table of contents :

The Statistics and Machine Learning with R Workshop

Contributors

About the author

About the reviewer

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share your thoughts

Download a free PDF copy of this book

Part 1:Statistics Essentials

Chapter 1: Getting Started with R

Technical requirements

Introducing R

Covering the R and RStudio basics

Common data types in R

Common data structures in R

Vector

Matrix

Data frame

List

Control logic in R

Relational operators

Logical operators

Conditional statements

Loops

Exploring functions in R

Summary

Chapter 2: Data Processing with dplyr

Technical requirements

Introducing tidyverse and dplyr

Data transformation with dplyr

Slicing the dataset using the filter() function

Sorting the dataset using the arrange() function

Adding or changing a column using the mutate() function

Selecting columns using the select() function

Selecting the top rows using the top_n() function

Combining the five verbs

Introducing other verbs

Data aggregation with dplyr

Counting observations using the count() function

Aggregating data via group_by() and summarize()

Data merging with dplyr

Case study – working with the Stack Overflow dataset

Summary

Chapter 3: Intermediate Data Processing

Technical requirements

Transforming categorical and numeric variables

Recoding categorical variables

Creating variables using case_when()

Binning numeric variables using cut()

Reshaping the DataFrame

Converting from long format into wide format using spread()

Converting from wide format into long format using gather()

Manipulating string data

Creating strings

Converting numbers into strings

Connecting strings

Working with stringr

Basics of stringr

Pattern matching in a string

Splitting a string

Replacing a string

Putting it together

Introducing regular expressions

Working with tidy text mining

Converting text into tidy data using unnest_tokens()

Working with a document-term matrix

Summary

Chapter 4: Data Visualization with ggplot2

Technical requirements

Introducing ggplot2

Building a scatter plot

Understanding the grammar of graphics

Geometries in graphics

Understanding geometry in scatter plots

Introducing bar charts

Introducing line plots

Controlling themes in graphics

Adjusting themes

Exploring ggthemes

Summary

Chapter 5: Exploratory Data Analysis

Technical requirements

EDA fundamentals

Analyzing categorical data

Summarizing categorical variables using counts

Converting counts into proportions

Marginal distribution and faceted bar charts

Analyzing numerical data

Visualization in higher dimensions

Measuring the central concentration

Measuring variability

Working with skewed distributions

EDA in practice

Obtaining the stock price data

Univariate analysis of individual stock prices

Correlation analysis

Summary

Chapter 6: Effective Reporting with R Markdown

Technical requirements

Fundamentals of R Markdown

Getting started with R Markdown

Getting to know the YAML header

Formatting textual information

Writing R code

Generating a financial analysis report

Getting and displaying the data

Performing data analysis

Adding plots to the report

Adding tables to the report

Configuring code chunks

Customizing R Markdown reports

Adding a table of contents

Creating a report with parameters

Customizing the report style

Summary

Part 2:Fundamentals of Linear Algebra and Calculus in R

Chapter 7: Linear Algebra in R

Technical requirements

Introducing linear algebra

Working with vectors

Working with matrices

Matrix vector multiplication

Matrix multiplication

The identity matrix

Transposing a matrix

Inverting a matrix

Solving a system of linear equations

System of linear equations

The solution to matrix-vector equations

Geometric interpretation of solving a system of linear equations

Obtaining a unique solution to a system of linear equations

Overdetermined and underdetermined systems of linear equations

Summary

Chapter 8: Intermediate Linear Algebra in R

Technical requirements

Introducing the matrix determinant

Interpreting the determinant

Connection to the matrix rank

Introducing the matrix trace

Special properties of the matrix trace

Understanding the matrix norm

Understanding the vector norm

Calculating the L 1-norm of a vector

Calculating the L 2-norm of a vector

Calculating the L ∞-norm of a vector

Understanding the matrix norm

Calculating the L 1-norm of a matrix

Calculating the Frobenius norm of a matrix

Calculating the infinity norm of a matrix

Getting to know eigenvalues and eigenvectors

Understanding scalar-vector multiplication

Defining eigenvalues and eigenvectors

Computing eigenvalues and eigenvectors

Introducing principal component analysis

Understanding the variance-covariance matrix

Connecting to PCA

Performing PCA

Summary

Chapter 9: Calculus in R

Technical requirements

Introducing calculus

Differential and integral calculus

More on functions

Vertical line test

Functional symmetry

Increasing and decreasing functions

Slope of a function

Function composition

Common functions

Understanding limits

Infinite limit

Limit at infinity

Introducing derivatives

Common derivatives

Common properties and rules of derivatives

Introducing integral calculus

Indefinite integrals

Indefinite integrals of basic functions

Properties of indefinite integrals

Integration by parts

Definite integrals

Working with calculus in R

Plotting basic functions

Working with derivatives

Using symbolic parameters

Working with the second derivative

Working with partial derivatives

Working with integration in R

More on antiderivatives

Evaluating the definite integral

Summary

Part 3:Fundamentals of Mathematical Statistics in R

Chapter 10: Probability Basics

Technical requirements

Introducing probability distribution

Exploring common discrete probability distributions

The Bernoulli distribution

The binomial distribution

The Poisson distribution

Poisson approximation to binomial distribution

The geometric distribution

Comparing different discrete probability distributions

Discovering common continuous probability distributions

The normal distribution

The exponential distribution

Uniform distribution

Generating normally distributed random samples

Understanding common sampling distributions

Common sampling distributions

Understanding order statistics

Extracting order statistics

Calculating the value at risk

Summary

Chapter 11: Statistical Estimation

Statistical inference for categorical data

Statistical inference for a single parameter

Introducing the General Social Survey dataset

Calculating the sample proportion

Calculating the confidence interval

Interpreting the confidence interval of the sample proportion

Hypothesis testing for the sample proportion

Inference for the difference in sample proportions

Type I and Type II errors

Testing the independence of two categorical variables

Introducing the contingency table

Applying the chi-square test for independence between two categorical variables

Statistical inference for numerical data

Generating a bootstrap distribution for the median

Constructing the bootstrapped confidence interval

Re-centering a bootstrap distribution

Introducing the central limit theorem used in t-distribution

Constructing the confidence interval for the population mean using the t-distribution

Performing hypothesis testing for two means

Introducing ANOVA

Summary

Chapter 12: Linear Regression in R

Introducing linear regression

Understanding simple linear regression

Introducing multiple linear regression

Seeking a higher coefficient of determination

More on adjusted R 2

Developing an MLR model

Introducing Simpson’s Paradox

Working with categorical variables

Introducing the interaction term

Handling nonlinear terms

More on the logarithmic transformation

Working with the closed-form solution

Dealing with multicollinearity

Dealing with heteroskedasticity

Introducing penalized linear regression

Working with ridge regression

Working with lasso regression

Summary

Chapter 13: Logistic Regression in R

Technical requirements

Introducing logistic regression

Understanding the sigmoid function

Grokking the logistic regression model

Comparing logistic regression with linear regression

Making predictions using the logistic regression model

More on log odds and odds ratio

Introducing the cross-entropy loss

Evaluating a logistic regression model

Dealing with an imbalanced dataset

Penalized logistic regression

Extending to multi-class classification

Summary

Chapter 14: Bayesian Statistics

Technical requirements

Introducing Bayesian statistics

A first look into the Bayesian theorem

Understanding the generative model

Understanding prior distributions

Introducing the likelihood function

Introducing the posterior model

Diving deeper into Bayesian inference

Introducing the normal-normal model

Introducing MCMC

The full Bayesian inference procedure

Bayesian linear regression with a categorical variable

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share your thoughts

Download a free PDF copy of this book

The Statistics and Machine Learning with R Workshop

Contributors

About the author

About the reviewer

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share your thoughts

Download a free PDF copy of this book

Part 1:Statistics Essentials

Chapter 1: Getting Started with R

Technical requirements

Introducing R

Covering the R and RStudio basics

Common data types in R

Common data structures in R

Vector

Matrix

Data frame

List

Control logic in R

Relational operators

Logical operators

Conditional statements

Loops

Exploring functions in R

Summary

Chapter 2: Data Processing with dplyr

Technical requirements

Introducing tidyverse and dplyr

Data transformation with dplyr

Slicing the dataset using the filter() function

Sorting the dataset using the arrange() function

Adding or changing a column using the mutate() function

Selecting columns using the select() function

Selecting the top rows using the top_n() function

Combining the five verbs

Introducing other verbs

Data aggregation with dplyr

Counting observations using the count() function

Aggregating data via group_by() and summarize()

Data merging with dplyr

Case study – working with the Stack Overflow dataset

Summary

Chapter 3: Intermediate Data Processing

Technical requirements

Transforming categorical and numeric variables

Recoding categorical variables

Creating variables using case_when()

Binning numeric variables using cut()

Reshaping the DataFrame

Converting from long format into wide format using spread()

Converting from wide format into long format using gather()

Manipulating string data

Creating strings

Converting numbers into strings

Connecting strings

Working with stringr

Basics of stringr

Pattern matching in a string

Splitting a string

Replacing a string

Putting it together

Introducing regular expressions

Working with tidy text mining

Converting text into tidy data using unnest_tokens()

Working with a document-term matrix

Summary

Chapter 4: Data Visualization with ggplot2

Technical requirements

Introducing ggplot2

Building a scatter plot

Understanding the grammar of graphics

Geometries in graphics

Understanding geometry in scatter plots

Introducing bar charts

Introducing line plots

Controlling themes in graphics

Adjusting themes

Exploring ggthemes

Summary

Chapter 5: Exploratory Data Analysis

Technical requirements

EDA fundamentals

Analyzing categorical data

Summarizing categorical variables using counts

Converting counts into proportions

Marginal distribution and faceted bar charts

Analyzing numerical data

Visualization in higher dimensions

Measuring the central concentration

Measuring variability

Working with skewed distributions

EDA in practice

Obtaining the stock price data

Univariate analysis of individual stock prices

Correlation analysis

Summary

Chapter 6: Effective Reporting with R Markdown

Technical requirements

Fundamentals of R Markdown

Getting started with R Markdown

Getting to know the YAML header

Formatting textual information

Writing R code

Generating a financial analysis report

Getting and displaying the data

Performing data analysis

Adding plots to the report

Adding tables to the report

Configuring code chunks

Customizing R Markdown reports

Adding a table of contents

Creating a report with parameters

Customizing the report style

Summary

Part 2:Fundamentals of Linear Algebra and Calculus in R

Chapter 7: Linear Algebra in R

Technical requirements

Introducing linear algebra

Working with vectors

Working with matrices

Matrix vector multiplication

Matrix multiplication

The identity matrix

Transposing a matrix

Inverting a matrix

Solving a system of linear equations

System of linear equations

The solution to matrix-vector equations

Geometric interpretation of solving a system of linear equations

Obtaining a unique solution to a system of linear equations

Overdetermined and underdetermined systems of linear equations

Summary

Chapter 8: Intermediate Linear Algebra in R

Technical requirements

Introducing the matrix determinant

Interpreting the determinant

Connection to the matrix rank

Introducing the matrix trace

Special properties of the matrix trace

Understanding the matrix norm

Understanding the vector norm

Calculating the L 1-norm of a vector

Calculating the L 2-norm of a vector

Calculating the L ∞-norm of a vector

Understanding the matrix norm

Calculating the L 1-norm of a matrix

Calculating the Frobenius norm of a matrix

Calculating the infinity norm of a matrix

Getting to know eigenvalues and eigenvectors

Understanding scalar-vector multiplication

Defining eigenvalues and eigenvectors

Computing eigenvalues and eigenvectors

Introducing principal component analysis

Understanding the variance-covariance matrix

Connecting to PCA

Performing PCA

Summary

Chapter 9: Calculus in R

Technical requirements

Introducing calculus

Differential and integral calculus

More on functions

Vertical line test

Functional symmetry

Increasing and decreasing functions

Slope of a function

Function composition

Common functions

Understanding limits

Infinite limit

Limit at infinity

Introducing derivatives

Common derivatives

Common properties and rules of derivatives

Introducing integral calculus

Indefinite integrals

Indefinite integrals of basic functions

Properties of indefinite integrals

Integration by parts

Definite integrals

Working with calculus in R

Plotting basic functions

Working with derivatives

Using symbolic parameters

Working with the second derivative

Working with partial derivatives

Working with integration in R

More on antiderivatives

Evaluating the definite integral

Summary

Part 3:Fundamentals of Mathematical Statistics in R

Chapter 10: Probability Basics

Technical requirements

Introducing probability distribution

Exploring common discrete probability distributions

The Bernoulli distribution

The binomial distribution

The Poisson distribution

Poisson approximation to binomial distribution

The geometric distribution

Comparing different discrete probability distributions

Discovering common continuous probability distributions

The normal distribution

The exponential distribution

Uniform distribution

Generating normally distributed random samples

Understanding common sampling distributions

Common sampling distributions

Understanding order statistics

Extracting order statistics

Calculating the value at risk

Summary

Chapter 11: Statistical Estimation

Statistical inference for categorical data

Statistical inference for a single parameter

Introducing the General Social Survey dataset

Calculating the sample proportion

Calculating the confidence interval

Interpreting the confidence interval of the sample proportion

Hypothesis testing for the sample proportion

Inference for the difference in sample proportions

Type I and Type II errors

Testing the independence of two categorical variables

Introducing the contingency table

Applying the chi-square test for independence between two categorical variables

Statistical inference for numerical data

Generating a bootstrap distribution for the median

Constructing the bootstrapped confidence interval

Re-centering a bootstrap distribution

Introducing the central limit theorem used in t-distribution

Constructing the confidence interval for the population mean using the t-distribution

Performing hypothesis testing for two means

Introducing ANOVA

Summary

Chapter 12: Linear Regression in R

Introducing linear regression

Understanding simple linear regression

Introducing multiple linear regression

Seeking a higher coefficient of determination

More on adjusted R 2

Developing an MLR model

Introducing Simpson’s Paradox

Working with categorical variables

Introducing the interaction term

Handling nonlinear terms

More on the logarithmic transformation

Working with the closed-form solution

Dealing with multicollinearity

Dealing with heteroskedasticity

Introducing penalized linear regression

Working with ridge regression

Working with lasso regression

Summary

Chapter 13: Logistic Regression in R

Technical requirements

Introducing logistic regression

Understanding the sigmoid function

Grokking the logistic regression model

Comparing logistic regression with linear regression

Making predictions using the logistic regression model

More on log odds and odds ratio

Introducing the cross-entropy loss

Evaluating a logistic regression model

Dealing with an imbalanced dataset

Penalized logistic regression

Extending to multi-class classification

Summary

Chapter 14: Bayesian Statistics

Technical requirements

Introducing Bayesian statistics

A first look into the Bayesian theorem

Understanding the generative model

Understanding prior distributions

Introducing the likelihood function

Introducing the posterior model

Diving deeper into Bayesian inference

Introducing the normal-normal model

Introducing MCMC

The full Bayesian inference procedure

Bayesian linear regression with a categorical variable

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share your thoughts

Download a free PDF copy of this book

- Author / Uploaded
- Liu Peng

- Similar Topics
- Computers
- Algorithms and Data Structures: Pattern Recognition