R for Data Science: Import, Tidy, Transform, Visualize, and Model Data [2 ed.]
9781492097402
Learn how to use R to turn data into insight, knowledge, and understanding. Ideal for current and aspiring data scientis
441
105
12MB
English
Pages 600
Year 2023
Report DMCA / Copyright
DOWNLOAD EPUB FILE
Table of contents :
logo
Hello, Welcome to EPUB Reader
Click button to select your book
Open EPUB book
This Online Web App is made by Neo Reader for experimental purpose, it is a very simple EPUB Reader. We recommend you try our Neo Reader for better experience.
Take a look now
neat reader pc
AD
Ultimate EPUB Reader
Totally free to try
Support multiple file types, such as EPUB, MOBI, AZW3, AZW, PDF and TXT.
Learn more about Neo Reader
General Ebook Solution
Introduction
Preface to the Second Edition
What You Will Learn
How This Book Is Organized
What You Won’t Learn
Modeling
Big Data
Python, Julia, and Friends
Prerequisites
R
RStudio
The Tidyverse
Other Packages
Running R Code
Other Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Online Edition
I. Whole Game
1. Data Visualization
Introduction
Prerequisites
First Steps
The penguins Data Frame
Ultimate Goal
Creating a ggplot
Adding Aesthetics and Layers
Exercises
ggplot2 Calls
Visualizing Distributions
A Categorical Variable
A Numerical Variable
Exercises
Visualizing Relationships
A Numerical and a Categorical Variable
Two Categorical Variables
Two Numerical Variables
Three or More Variables
Exercises
Saving Your Plots
Exercises
Common Problems
Summary
2. Workflow: Basics
Coding Basics
Comments
What’s in a Name?
Calling Functions
Exercises
Summary
3. Data Transformation
Introduction
Prerequisites
nycflights13
dplyr Basics
Rows
filter()
Common Mistakes
arrange()
distinct()
Exercises
Columns
mutate()
select()
rename()
relocate()
Exercises
The Pipe
Groups
group_by()
summarize()
The slice_ Functions
Grouping by Multiple Variables
Ungrouping
.by
Exercises
Case Study: Aggregates and Sample Size
Summary
4. Workflow: Code Style
Names
Spaces
Pipes
ggplot2
Sectioning Comments
Exercises
Summary
5. Data Tidying
Introduction
Prerequisites
Tidy Data
Exercises
Lengthening Data
Data in Column Names
How Does Pivoting Work?
Many Variables in Column Names
Data and Variable Names in the Column Headers
Widening Data
How Does pivot_wider() Work?
Summary
6. Workflow: Scripts and Projects
Scripts
Running Code
RStudio Diagnostics
Saving and Naming
Projects
What Is the Source of Truth?
Where Does Your Analysis Live?
RStudio Projects
Relative and Absolute Paths
Exercises
Summary
7. Data Import
Introduction
Prerequisites
Reading Data from a File
Practical Advice
Other Arguments
Other File Types
Exercises
Controlling Column Types
Guessing Types
Missing Values, Column Types, and Problems
Column Types
Reading Data from Multiple Files
Writing to a File
Data Entry
Summary
8. Workflow: Getting Help
Google Is Your Friend
Making a reprex
Investing in Yourself
Summary
II. Visualize
9. Layers
Introduction
Prerequisites
Aesthetic Mappings
Exercises
Geometric Objects
Exercises
Facets
Exercises
Statistical Transformations
Exercises
Position Adjustments
Exercises
Coordinate Systems
Exercises
The Layered Grammar of Graphics
Summary
10. Exploratory Data Analysis
Introduction
Prerequisites
Questions
Variation
Typical Values
Unusual Values
Exercises
Unusual Values
Exercises
Covariation
A Categorical and a Numerical Variable
Two Categorical Variables
Two Numerical Variables
Patterns and Models
Summary
11. Communication
Introduction
Prerequisites
Labels
Exercises
Annotations
Exercises
Scales
Default Scales
Axis Ticks and Legend Keys
Legend Layout
Replacing a Scale
Zooming
Exercises
Themes
Exercises
Layout
Exercises
Summary
III. Transform
12. Logical Vectors
Introduction
Prerequisites
Comparisons
Floating-Point Comparison
Missing Values
is.na()
Exercises
Boolean Algebra
Missing Values
Order of Operations
%in%
Exercises
Summaries
Logical Summaries
Numeric Summaries of Logical Vectors
Logical Subsetting
Exercises
Conditional Transformations
if_else()
case_when()
Compatible Types
Exercises
Summary
13. Numbers
Introduction
Prerequisites
Making Numbers
Counts
Exercises
Numeric Transformations
Arithmetic and Recycling Rules
Minimum and Maximum
Modular Arithmetic
Logarithms
Rounding
Cutting Numbers into Ranges
Cumulative and Rolling Aggregates
Exercises
General Transformations
Ranks
Offsets
Consecutive Identifiers
Exercises
Numeric Summaries
Center
Minimum, Maximum, and Quantiles
Spread
Distributions
Positions
With mutate()
Exercises
Summary
14. Strings
Introduction
Prerequisites
Creating a String
Escapes
Raw Strings
Other Special Characters
Exercises
Creating Many Strings from Data
str_c()
str_glue()
str_flatten()
Exercises
Extracting Data from Strings
Separating into Rows
Separating into Columns
Diagnosing Widening Problems
Letters
Length
Subsetting
Exercises
Non-English Text
Encoding
Letter Variations
Locale-Dependent Functions
Summary
15. Regular Expressions
Introduction
Prerequisites
Pattern Basics
Key Functions
Detect Matches
Count Matches
Replace Values
Extract Variables
Exercises
Pattern Details
Escaping
Anchors
Character Classes
Quantifiers
Operator Precedence and Parentheses
Grouping and Capturing
Exercises
Pattern Control
Regex Flags
Fixed Matches
Practice
Check Your Work
Boolean Operations
Creating a Pattern with Code
Exercises
Regular Expressions in Other Places
Tidyverse
Base R
Summary
16. Factors
Introduction
Prerequisites
Factor Basics
General Social Survey
Exercise
Modifying Factor Order
Exercises
Modifying Factor Levels
Exercises
Ordered Factors
Summary
17. Dates and Times
Introduction
Prerequisites
Creating Date/Times
During Import
From Strings
From Individual Components
From Other Types
Exercises
Date-Time Components
Getting Components
Rounding
Modifying Components
Exercises
Time Spans
Durations
Periods
Intervals
Exercises
Time Zones
Summary
18. Missing Values
Introduction
Prerequisites
Explicit Missing Values
Last Observation Carried Forward
Fixed Values
NaN
Implicit Missing Values
Pivoting
Complete
Joins
Exercises
Factors and Empty Groups
Summary
19. Joins
Introduction
Prerequisites
Keys
Primary and Foreign Keys
Checking Primary Keys
Surrogate Keys
Exercises
Basic Joins
Mutating Joins
Specifying Join Keys
Filtering Joins
Exercises
How Do Joins Work?
Row Matching
Filtering Joins
Non-Equi Joins
Cross Joins
Inequality Joins
Rolling Joins
Overlap Joins
Exercises
Summary
IV. Import
20. Spreadsheets
Introduction
Excel
Prerequisites
Getting Started
Reading Excel Spreadsheets
Reading Worksheets
Reading Part of a Sheet
Data Types
Writing to Excel
Formatted Output
Exercises
Google Sheets
Prerequisites
Getting Started
Reading Google Sheets
Writing to Google Sheets
Authentication
Exercises
Summary
21. Databases
Introduction
Prerequisites
Database Basics
Connecting to a Database
In This Book
Load Some Data
DBI Basics
dbplyr Basics
SQL
SQL Basics
SELECT
FROM
GROUP BY
WHERE
ORDER BY
Subqueries
Joins
Other Verbs
Exercises
Function Translations
Summary
22. Arrow
Introduction
Prerequisites
Getting the Data
Opening a Dataset
The Parquet Format
Advantages of Parquet
Partitioning
Rewriting the Seattle Library Data
Using dplyr with Arrow
Performance
Using dbplyr with Arrow
Summary
23. Hierarchical Data
Introduction
Prerequisites
Lists
Hierarchy
List Columns
Unnesting
unnest_wider()
unnest_longer()
Inconsistent Types
Other Functions
Exercises
Case Studies
Very Wide Data
Relational Data
Deeply Nested
Exercises
JSON
Data Types
jsonlite
Starting the Rectangling Process
Exercises
Summary
24. Web Scraping
Introduction
Prerequisites
Scraping Ethics and Legalities
Terms of Service
Personally Identifiable Information
Copyright
HTML Basics
Elements
Attributes
Extracting Data
Find Elements
Nesting Selections
Text and Attributes
Tables
Finding the Right Selectors
Putting It All Together
Star Wars
IMDb Top Films
Dynamic Sites
Summary
V. Program
25. Functions
Introduction
Prerequisites
Vector Functions
Writing a Function
Improving Our Function
Mutate Functions
Summary Functions
Exercises
Data Frame Functions
Indirection and Tidy Evaluation
When to Embrace?
Common Use Cases
Data Masking Versus Tidy Selection
Exercises
Plot Functions
More Variables
Combining with Other Tidyverse Packages
Labeling
Exercises
Style
Exercises
Summary
26. Iteration
Introduction
Prerequisites
Modifying Multiple Columns
Selecting Columns with .cols
Calling a Single Function
Calling Multiple Functions
Column Names
Filtering
across() in Functions
Versus pivot_longer()
Exercises
Reading Multiple Files
Listing Files in a Directory
Lists
purrr::map() and list_rbind()
Data in the Path
Save Your Work
Many Simple Iterations
Heterogeneous Data
Handling Failures
Saving Multiple Outputs
Writing to a Database
Writing CSV Files
Saving Plots
Summary
27. A Field Guide to Base R
Introduction
Prerequisites
Selecting Multiple Elements with [
Subsetting Vectors
Subsetting Data Frames
dplyr Equivalents
Exercises
Selecting a Single Element with $ and [[
Data Frames
Tibbles
Lists
Exercises
Apply Family
for Loops
Plots
Summary
VI. Communicate
28. Quarto
Introduction
Prerequisites
Quarto Basics
Exercises
Visual Editor
Exercises
Source Editor
Exercises
Code Chunks
Chunk Label
Chunk Options
Global Options
Inline Code
Exercises
Figures
Figure Sizing
Other Important Options
Exercises
Tables
Exercises
Caching
Exercises
Troubleshooting
YAML Header
Self-Contained
Parameters
Bibliographies and Citations
Workflow
Summary
29. Quarto Formats
Introduction
Output Options
Documents
Presentations
Interactivity
htmlwidgets
Shiny
Websites and Books
Other Formats
Summary
Index
About the Authors