386 80 27MB
english Pages [632]
AN R COMPANION TO POLITICAL ANALYSIS Third Edition
AN R COMPANION TO POLITICAL ANALYSIS Philip H. Pollock University of Central Florida Barry C. Edwards University of Central Florida
Third Edition
FOR INFORMATION: CQ Press
2455 Teller Road Thousand Oaks, California 91320 E-mail: [email protected] SAGE Publications Ltd. 1 Oliver’s Yard
55 City Road London EC1Y 1SP United Kingdom SAGE Publications India Pvt. Ltd. B 1/I 1 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India SAGE Publications Asia-Pacific Pte. Ltd. 18 Cross Street #10-10/11/12 China Square Central Singapore 048423
Copyright © 2023 by CQ Press, an imprint of SAGE Publications, Inc. CQ Press is a registered trademark of Congressional Quarterly, Inc. All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. All third party trademarks referenced or depicted herein are included solely for the purpose of illustration and are the property of their respective owners. Reference to these trademarks in no way indicates any relationship with, or endorsement by, the trademark owner. Printed in the United States of America Library of Congress Cataloging-in-Publication Data Names: Pollock, Philip H., III., author. | Edwards, Barry C., author. Title: An R companion to political analysis / Philip H. Pollock III, University of Central Florida, Barry C. Edwards, University of Central Florida. Identifiers: LCCN 2022004782 | ISBN 9781071862414 (paperback) | ISBN 9781071862421 (epub) | ISBN 9781071862445 (epub) | ISBN 9781071862407 (pdf) Subjects: LCSH:
Political statistics—Computer programs—Handbooks, manuals, etc. | Analysis of variance —Computer programs—Handbooks, manuals, etc. | R (Computer program language)— Handbooks, manuals, etc. Classification: LCC JA86 .P639 2023 | DDC 320.0285/5133—dc23 LC record available at https://lccn.loc.gov/2022004782 This book is printed on acid-free paper.
Acquisitions Editor: Leah Fargotstein Editorial Assistant: Ivey Mellem Production Editor: Astha Jaiswal Copy Editor: Christina West Typesetter: C&M Digitals (P) Ltd. Cover Designer: Candice Harman Marketing Manager: Victoria Velasquez
BRIEF TABLE OF CONTENTS List of Figures Preface Acknowledgments About the Authors A Quick Reference Guide to R Companion Functions Introduction: Getting Started with R Chapter 1 • Using R for Data Analysis Chapter 2 • Descriptive Statistics Chapter 3 • Creating and Transforming Variables Chapter 4 • Making Comparisons Chapter 5 • Graphing Relationships and Describing Patterns Chapter 6 • Random Assignment and Sampling Chapter 7 • Making Controlled Comparisons Chapter 8 • Foundations of Statistical Inference Chapter 9 • Hypothesis Tests with One or Two Samples Chapter 10 • Chi-Square Test and Analysis of Variance Chapter 11 • Correlation and Bivariate Regression Chapter 12 • Multiple Regression Chapter 13 • Analyzing Regression Residuals Chapter 14 • Logistic Regression Chapter 15 • Doing Your Own Political Analysis Appendix, Table A-1: Variables in the Debate Experiment Dataset Appendix, Table A-2: Variables in the NES Dataset Appendix, Table A-3: Variables in the States Dataset by Topic Appendix, Table A-4: Variables in the World Dataset by Topic
CONTENTS List of Figures Preface Acknowledgments About the Authors A Quick Reference Guide to R Companion Functions Introduction: Getting Started with R I.1 About R I.2 Installing R I.3 A Quick Tour of the R Environment I.4 Installing the “RCPA3” Package • The Companion Datasets I.5 Troubleshooting Installation Problems I.6 Installing R Studio I.7 Instant Access to Tutorials and Resources Chapter 1 • Using R for Data Analysis 1.1 Interacting with R 1.2 Objects 1.2.1 Examples of Objects 1.2.2 Creating Objects • A Closer Look: R Environments 1.2.3 Accessing Some of an Object’s Values 1.3 Functions 1.3.1 Using a Function’s Arguments 1.3.2 The Widget Factory Needs Your Help! 1.4 Writing and Running Scripts 1.5 Managing R Output: Graphics and Text 1.6 Getting Help Chapter 1 Exercises Chapter 2 • Descriptive Statistics
2.1 Identifying Levels of Measurement 2.2 Describing Nominal Variables 2.2.1 Generating Tables of Descriptive Statistics 2.2.2 Using Tables and Figures to Describe Distributions of Values 2.3 Describing Ordinal Variables 2.3.1 Describing Two Variables in One Table 2.3.2 Assessing the Dispersion of Ordinal Variables • A Closer Look: Weighted and Unweighted Analysis: What’s the Difference? 2.4 Describing Interval-Level Variables 2.4.1 Using Statistics to Describe Interval Variables • A Closer Look: Additional Math and Statistics Functions 2.4.2 Visualizing Dispersion with Histograms • A Closer Look: Editing Plots with Purpose 2.5 Obtaining Case-Level Information Chapter 2 Exercises Chapter 3 • Creating and Transforming Variables 3.1 Applying Mathematical and Logical Operators to Variables 3.1.1 Mathematical Operators 3.1.2 Logical Operators 3.2 Creating Dummy Variables 3.3 Adding or Modifying Variable Attributes 3.3.1 Labels 3.3.2 Class Attributes 3.3.3 Levels of a Variable’s Values 3.4 Collapsing Variables into Simplified Categories 3.5 Centering, Standardizing, and Ranking Numeric Variables 3.6 Creating an Additive Index
Chapter 3 Exercises Chapter 4 • Making Comparisons • Analysis Guide 4.1 Creating Cross-Tabulations to Make Comparisons • A Closer Look: Debugging R Code 4.2 Mean Comparison Analysis 4.3 Making Comparisons with Interval-Level Independent Variables • A Closer Look: Creating Plots from Summary Data Chapter 4 Exercises Chapter 5 • Graphing Relationships and Describing Patterns 5.1 Graphs for Binary Dependent Variables 5.1.1 Simple Bar Charts with Nominal-Level Independent Variables 5.1.2 Simple Line Plots with Ordinal-Level Independent Variables 5.1.3 Graphs with Interval-Level Independent Variables 5.2 Graphs for Nominal Dependent Variables 5.2.1 Clustered Bar Charts with Nominal Independent Variables 5.2.2 Multiple Line Plots with Ordinal Independent Variables 5.2.3 Graphs with Interval-Level Independent Variables 5.3 Graphs for Ordinal-Level Dependent Variables 5.3.1 Using Bars or Lines to Represent Select Values 5.3.2 Mosaic Plots for an Ordinal–Ordinal Relationship 5.3.3 Graphs with Interval-Level Independent Variables 5.4 Graphs for Interval-Level Dependent Variables 5.4.1 Plotting Means with Bars or Lines 5.4.2 Box Plots
5.4.3 Scatterplots • A Closer Look: Creating Custom Graphs with R Chapter 5 Exercises Chapter 6 • Random Assignment and Sampling 6.1 Random Assignment 6.1.1 Two Groups with Equal Probability 6.1.2 Multiple Groups with Varying Probabilities 6.2 Analyzing the Results of an Experiment 6.2.1 Assessing Random Assignment 6.2.2 Evaluating the Effect of Treatment 6.3 Random Sampling 6.3.1 Simple Random Samples with Replacement 6.3.2 Simple Random Samples without Replacement 6.3.3 Systematic Random Samples 6.3.4 Clustered and Stratified Random Samples 6.4 Selecting Cases for Qualitative Analysis 6.4.1 Most Similar Systems 6.4.2 Most Different Systems 6.5 Analyzing Data Ethically: Clarity, Replication, and Transparency 6.5.1 Ethical Issues in Data Analysis 6.5.2 Ten Tips for Writing R Scripts Chapter 6 Exercises Chapter 7 • Making Controlled Comparisons 7.1 Cross-Tabulation Analysis with Control Variables 7.1.1 Start with a Basic Cross-Tabulation 7.1.2 Adding Control Variables 7.1.3 Interpreting Controlled Cross-Tabulations 7.1.4 Options for Graphing Controlled CrossTabulations
• A Closer Look: The Legend Function 7.2 Mean Comparison Analysis with Control Variables 7.2.1 Start with a Basic Mean Comparison Table 7.2.2 Adding Control Variables 7.2.3 Interpreting a Controlled Mean Comparison 7.2.4 Options for Graphing Controlled Mean Comparisons 7.3 Controlled Comparisons with Interval-Level Control Variables Chapter 7 Exercises Chapter 8 • Foundations of Statistical Inference 8.1 Estimating a Population Proportion with R Simulation 8.2 Estimating a Population Mean with R Simulation • A Closer Look: Using Probability Distributions to Simulate Raw Data 8.3 Expected Shape of Sampling Distributions 8.3.1 Central Limit Theorem and the Normal Distribution 8.3.2 Normal Distribution of Sample Proportions 8.3.3 Normal Distribution of Sample Means 8.3.4 The Standard Normal Distribution 8.3.5 The Empirical Rule (68-95-99 Rule) 8.4 Confidence Intervals and Margins of Error 8.4.1 Critical Values for Confidence Intervals 8.4.2 Reporting the Confidence Interval for a Sample Proportion 8.4.3 Reporting the Confidence Interval for a Sample Mean 8.5 Student’s t-Distribution: When You’re Not Completely Normal 8.5.1 The t-Distribution’s Role in Inferential Statistics 8.5.2 Critical Values of t-Distributions
Chapter 8 Exercises Chapter 9 • Hypothesis Tests with One or Two Samples 9.1 Role of the Null Hypothesis 9.2 Testing a Hypothesis about One Sample Proportion 9.3 Testing the Difference between Two Sample Proportions 9.3.1 Testing the Difference of Proportions with z Scores and P-Values 9.3.2 Confidence Interval for the Difference of Proportions • A Closer Look: Changing the Order of Groups 1 and 2 9.3.3 Comparing Two Similarly Coded Variables (x1 vs. x2) 9.4 Testing a Hypothesis about One Sample Mean • A Closer Look: Treating Census as a Sample 9.5 Testing the Difference between Two Sample Means 9.5.1 Variants of the Difference of Means Test 9.5.2 Testing the Difference of Means with the testmeansC Function 9.5.3 Confidence Intervals for the Difference of Means 9.5.4 Testing the Difference of Means Assuming Equal Variances 9.5.5 Testing the Difference of Means with Two Similarly Coded Variables 9.5.6 Paired Difference of Means Test Chapter 9 Exercises Chapter 10 • Chi-Square Test and Analysis of Variance 10.1 The Chi-Square Test of Independence 10.1.1 How the Chi-Square Test Works 10.1.2 Conducting a Chi-Square Test 10.1.3 Example with a Nominal Independent Variable 10.1.4 R’s Chi-Square Distribution Functions
• A Closer Look: Other Applications of Chi-Square Tests 10.2 Measuring the Strength of Association between Categorical Variables 10.2.1 Lambda 10.2.2 Somers’ d 10.2.3 Cramer’s V 10.3 Chi-Square Test and Measures of Association in Controlled Comparisons 10.3.1 Analyzing an Ordinal-Level Relationship with a Control Variable 10.3.2 Analyzing a Nominal-Level Relationship with a Control Variable 10.4 Analysis of Variance 10.4.1 How ANOVA Works 10.4.2 Single-Factor ANOVA 10.4.3 Two-Factor ANOVA 10.4.4 R’s F-Distribution Functions Chapter 10 Exercises Chapter 11 • Correlation and Bivariate Regression 11.1 Correlation Analysis 11.1.1 Correlation between Two Variables 11.1.2 Correlation among More Than Two Variables • A Closer Look: Other Types of Applications of Correlation Analysis 11.2 Bivariate Regression Analysis 11.2.1 Are Democratic Countries More Peaceful? • A Closer Look: R-Squared and Adjusted R-Squared: What’s the Difference? 11.2.2 Turnout in Battleground States Example 11.2.3 Using Regression Results to Make Informed Predictions
11.3 Visualizing Bivariate Regression Analysis 11.3.1 Scatterplots with Regression Lines 11.3.2 Visualizing Regression Analysis with Categorical Independent Variables 11.3.3 Alternatives to the Plot Function • A Closer Look: What If a Scatterplot Doesn’t Show a Linear Relationship? 11.4 Regression Analysis with Weighted Observations • A Closer Look: Creating Tables of Regression Results Chapter 11 Exercises Chapter 12 • Multiple Regression 12.1 Estimating a Multiple Regression Model 12.1.1 Voter Turnout Revisited 12.1.2 Another Look at Democratic Peace 12.2 Regression with Multiple Dummy Variables 12.2.1 Estimating a Regression Model with Multiple Dummy Variables 12.2.2 Interpreting Regression Results with Multiple Dummy Variables • A Closer Look: Changing the Reference Category 12.2.3 Visualizing Regression Analysis with Multiple Dummy Variables 12.3 Interaction Effects in Multiple Regression 12.3.1 Regression Equation with an Interaction Term 12.3.2 Estimating and Interpreting a Linear Model with an Interaction Term 12.3.3 Visualizing Interactive Relationships 12.4 Visualizing Multiple Regression Analysis 12.4.1 One Categorical and One Interval Independent Variable
12.4.2 Visualizing Regression with Two Interval Independent Variables 12.4.3 Visualizing Regression with Two Categorical Independent Variables • A Closer Look: Visualizing Multiple Regression with Many Independent Variables Chapter 12 Exercises Chapter 13 • Analyzing Regression Residuals 13.1 Expected Values, Observed Values, and Regression Residuals 13.1.1 Example from Bivariate Regression Analysis 13.1.2 Residuals from Multiple Regression Analysis 13.2 Squared and Standardized Residuals 13.2.1 Squared Residuals 13.2.2 Standardized Residuals 13.3 Assumptions about Regression Residuals 13.4 Analyzing Graphs of Regression Residuals 13.4.1 Histogram of Regression Residuals 13.4.2 Residual Diagnostic Plots 13.5 Testing Regression Assumptions with Residuals 13.5.1 Testing the Assumption That Residuals Are Normally Distributed 13.5.2 Testing the Constant Variance Assumption 13.5.3 Regression Diagnostics for Multiple Regression Analysis • A Closer Look: Other Regression Diagnostic Tests 13.6 What If You Diagnose Problems with Residuals? Chapter 13 Exercises Chapter 14 • Logistic Regression 14.1 Odds, Logged Odds, and Probabilities 14.2 Estimating Logistic Regression Models
14.2.1 Logistic Regression with One Independent Variable 14.2.2 Reporting and Interpreting Odds Ratios 14.2.3 Evaluating Model Fit 14.3 Logistic Regression with Multiple Independent Variables 14.3.1 Estimating Model with Multiple Independent Variables 14.3.2 Odds Ratios and Model Fit Statistics 14.3.3 Logistic Regression Analysis with Weighted Observations 14.4 Plotting Predicted Probabilities with One Independent Variable 14.4.1 Interval-Level Independent Variables 14.4.2 Categorical Independent Variables • A Closer Look: Marginal Effects and Expected Changes in Probability 14.5 Plotting Predicted Probabilities with Multiple Independent Variables 14.5.1 One Categorical and One Interval-Level Independent Variable 14.5.2 Multiple Categorical Independent Variables 14.5.3 Two Interval-Level Independent Variables 14.5.4 Plotting Predicted Probabilities with Many Independent Variables Chapter 14 Exercises Chapter 15 • Doing Your Own Political Analysis 15.1 Doable Research Ideas 15.1.1 Political Knowledge and Interest 15.1.2 Self-Interest and Policy Preferences 15.1.3 Economic Performance and Electronic Outcomes 15.1.4 Electoral Turnout in Comparative Perspective
15.1.5 Correlates of State Policies 15.1.6 Religion and Politics 15.1.7 Race and Politics 15.2 Reading R-Format Datasets 15.3 Importing Data into R 15.3.1 SPSS- and Stata-Formatted Datasets 15.3.2 Microsoft Excel Datasets 15.3.3 Data from HTML Tables 15.4 Writing It Up 15.4.1 The Research Question 15.4.2 Previous Research 15.4.3 Data, Hypotheses, and Analysis 15.4.4 Conclusions and Implications Chapter 15 Exercises Appendix, Table A-1: Variables in the Debate Experiment Dataset Appendix, Table A-2: Variables in the NES Dataset Appendix, Table A-3: Variables in the States Dataset by Topic Appendix, Table A-4: Variables in the World Dataset by Topic
LIST OF FIGURES I-1 Downloading the R Program from the R Project’s Website 3 I-2 The R Console 4 I-3 Installing the RCPA3 Package 5 I-4 The RCPA3 Package’s Welcome Function 6 I-5 R Studio Screenshot 9 I-6 QR Codes for Getting Started and Chapters 1–15 9 1-1 Objects with Multiple Values (Vectors) 14 1-2 Creating Simple Objects 15 1-3 Creating Objects That Store Text 15 1-4 Accessing a Subset of an Object’s Values Using Square Brackets 17 1-5 Accessing a Dataset Variable by Name 18 1-6 How to Play Widget Factory 22 1-7 R Script Editor 23 1-8 Sample Text and Graphic Output from an R Function 25 1-9 Using Console Output in a Word Document 26 1-10 Generating Formatted Tables and High-Resolution Figures 27 1-11 Function Help File 28
1-12 Extended Search Results 29 2-1 Table of Descriptive Statistics for a Nominal Variable 34 2-2 Frequency Table and Bar Chart for a Nominal-Level Variable 36 2-3 Descriptive Statistics for Two Ordinal Variables 38 2-4 Describing an Ordinal Variable with High Dispersion 39 2-5 Describing an Ordinal Variable with Low Dispersion 39 2-6 Descriptive Statistics for an Interval-Level Variable 42 2-7 Frequency Table and Histogram for an Interval-Level Variable 46 2-8 Histogram of an Interval Variable with Optional Arguments 47 2-9 Case-Level Information in Tables 49 2-10 Table of Observations Sorted by Two Criteria 50 3-1 Calculating the COVID Case Rate per Thousand People 55 3-2 Calculating Per Capita Income in Thousands of Dollars by State 56 3-3 Inverting Sequential Rankings 57 3-4 Transforming a Peace Index to a Conflict Index 58 3-5 Applying a Logical Test to Create a New Variable 59 3-6 Use the Direction and Strength of Opinion to Create an Ordinal Variable 61
3-7 Creating a Dummy Variable for a Single Value 64 3-8 Creating a Dummy Variable Using Multiple Values 65 3-9 Viewing an Object’s Attributes 66 3-10 Labeling a Variable 67 3-11 Converting a Variable from Ordinal to Interval 69 3-12 Converting a Variable from Interval to Ordinal 70 3-13 Assigning Levels to a Nominal or Ordinal Variable 71 3-14 Reordering the Levels of a Nominal or Ordinal Variable 73 3-15 Transforming an Interval Variable with Cutpoints 74 3-16 Collapsing an Interval Variable into Equal-Size Groups 75 3-17 Collapsing an Ordinal Variable into Fewer Categories 77 3-18 Centering and Standardizing Interval Variables 78 3-19 Sequential and Percentile Rankings of Interval Variables 79 3-20 Additive Index Created from Dummy Variables 81 3-21 Creating an Additive Index from Ordinal Variables 82 4-1 Making a Comparison with a Cross-Tabulation and Bar Plot 91 4-2 Making a Comparison with a Mean Comparison Table and Plot of Means 96 4-3 Making a Comparison by Collapsing an Interval-Level Independent Variable 98
4-4 Creating a Plot from Summary Statistics 100 5-1 Simple Bar Chart for a Binary Dependent Variable and a Nominal Independent Variable 108 5-2 Simple Line Plot for a Binary Dependent Variable and an Ordinal Independent Variable 109 5-3 Noisy Line Chart with Too Much Detail 110 5-4 Line Chart Based on Grouped Independent Variable Values 111 5-5 Clustered Bar Chart for Nominal Dependent and Independent Variables 112 5-6 Multiple Line Chart for a Nominal Dependent Variable and an Ordinal Independent Variable 114 5-7 Multiple Line Chart with a Collapsed Interval-Level Independent Variable 115 5-8 Bar Chart Showing Percentages for a Select Value of the Dependent Variable 116 5-9 Line Chart of a Select Value of an Ordinal-Level Dependent Variable 117 5-10 Mosaic Plot of the Relationship between Two Ordinal Variables 118 5-11 Transformed Interval-Level Independent Variable in a Line Chart 119 5-12 Simple Bar Chart for a Mean Comparison 120 5-13 Line Chart for a Mean Comparison 121 5-14 Box Plot with a Nominal Independent Variable 122
5-15 Box Plot with an Ordinal Independent Variable 123 5-16 Scatterplot of the Relationship between Two Interval-Level Variables 124 5-17 Plotting a Subset of Observations on a Scatterplot 126 5-18 Strip Chart Based on Formula Syntax 127 6-1 Random Assignment to Treatment and Control Groups 133 6-2 Random Assignment to Multiple Groups with Varying Probabilities 134 6-3 Evaluating the Effectiveness of Random Assignments 135 6-4 Evaluating the Outcome of an Experiment 136 6-5 Random Sampling with Replacement 137 6-6 Random Sampling without Replacement 138 6-7 Systematic Random Sample from a Directory of Names 139 6-8 Sorting Observations to Select Cases for a Most Similar Systems Design 141 6-9 Sorting Observations to Select Cases for a Most Different Systems Design 142 7-1 Cross-Tabulation and a Bar Chart without a Control Variable 150 7-2 Making a Controlled Comparison with the crosstabC Function 151 7-3 Multiple Line and Mosaic Plots for Controlled CrossTabulation 154
7-4 Customizing Plot Legends 155 7-5 Mean Comparison without a Control Variable 157 7-6 Mean Comparison Analysis with a Control Variable 158 7-7 Controlled Mean Comparison with a Compact Table and Plot of Mean Points 160 7-8 Making a Controlled Cross-Tabulation with an Interval-Level Control Variable 162 7-9 Making a Controlled Mean Comparison with an IntervalLevel Control Variable 163 8.1 Simulating Estimation of a Population Proportion 169 8-2 Distribution of Simulated Estimates of a Population Proportion 170 8-3 Simulating Estimation of a Population Mean 171 8-4 Distribution of Simulated Estimates of a Population Mean 172 8-5 Visualizing a Sampling Distribution of a Proportion with the sampdistC Function 175 8-6 Visualizing a Sampling Distribution of a Mean with the sampdistC Function 176 8-7 The Standard Normal Distribution 177 8-8 Critical Values for Confidence Intervals Based on the Normal Distribution 179 8-9 Generating Confidence Intervals for Sample Proportions 181 8-10 Generating Confidence Intervals for Sample Means 183
8-11 Visualizing t-Distributions with the sampdistC Function 185 8-12 Critical Values for Confidence Intervals Based on tDistributions 186 9-1 Testing a Sample Proportion against a Hypothesized Value 192 9-2 Testing the Difference between Two Sample Proportions 195 9-3 Do Americans Perceive a Greater Threat from China or Russia? 197 9-4 Testing the Difference between a Sample Mean and a Hypothesized Value 200 9-5 Testing the Difference between Two Sample Means 203 9-6 Testing the Difference of Means Assuming Equal Variance 206 9-7 Comparing the Means of Trump and Republican Party Feeling Thermometers 207 9-8 Conducting a Paired Difference of Means Test 208 10-1 Comparing Observed and Expected Frequencies 213 10-2 Conducting a Chi-Square Test of Independence 215 10-3 Chi-Square Test with a Nominal Independent Variable 217 10-4 Measuring Cross-Tabulation Association with Lambda 220 10-5 Lambda Measure of Association with a Nominal-Level Independent Variable 221 10-6 Measuring Ordinal–Ordinal Association with Somers’ d 222
10-7 Using Cramer’s V to Measure Association with a NominalLevel Variable 224 10-8 Chi-Square Test and Measures of Association in a Controlled Comparison 225 10-9 Chi-Square Test and Measures of Association for a Controlled Comparison with a Nominal-Level Independent Variable 226 10-10 Illustration of Variance between and within Groups in a Population 228 10-11 Using ANOVA to Test Differences among Multiple Means 229 10-12 Conducting Two-Factor ANOVA 231 11-1 Analyzing the Correlation between Two Variables 237 11-2 Correlation Analysis with Inferential Statistics and a Scatterplot 238 11-3 Analyzing Correlation among Multiple Variables 239 11-4 Bivariate Regression Analysis with the regC Function 242 11-5 Bivariate Regression Analysis with a Dummy Independent Variable 245 11-6 Scatterplot with a Bivariate Regression Line 248 11-7 Binary Independent Variable Doesn’t Work for a Scatterplot 250 11-8 Plot Depicting Results of Regression Analysis with a Dummy Independent Variable 251
11-9 Enhanced Scatterplot to Visualize Bivariate Regression Analysis 252 11-10 Regression Analysis with Weighted Observations 253 11-11 Creating an Academic-Style Table of Regression Results 255 12-1 Multiple Regression Analysis of State Voter Turnout 261 12-2 Multiple Regression Analysis of the Peacefulness of Countries 262 12-3 Regression Analysis with Multiple Dummy Variables 264 12-4 Changing the Reference Category for Regression Analysis with Multiple Dummy Variables 266 12-5 Expected Values from Regression with a Nominal-Level Independent Variable 268 12-6 Regression Analysis with an Interaction Term 271 12-7 Visualizing Multiple Regression with an Interaction Term 272 12-8 Multiple Regression Results with Interval and Nominal Independent Variables 274 12-9 Multiple Regression Results with Two Interval-Level Independent Variables 276 12-10 Three-Dimensional Scatterplot of Multiple Regression Results with Two Interval-Level Independent Variables 277 12-11 Three-Dimensional Representation of Multiple Regression Results with Two Interval-Level Independent Variables 278
12-12 Multiple Regression Analysis with Two Categorical Variables 279 12-13 Visualizing Multiple Regression Results with Two Categorical Independent Variables 280 13-1 Saving Results of Regression Analysis as a Named Object 286 13-2 Observed Values, Expected Values, and Regression Residuals 287 13-3 Visualizing Residual Values on a Scatterplot with Regression Line 288 13-4 Table of Residuals from Multiple Regression Analysis 289 13-5 Visualizing Multiple Regression Residuals on a ThreeDimensional Scatterplot 290 13-6 Squared Residual Values 291 13-7 Standardized Residual Values 292 13-8 Histogram of Regression Residuals 295 13-9 Residual Diagnostic Plots from the regC Function 296 13-10 Testing the Assumption That Residuals Follow a Normal Distribution 297 13-11 Testing the Constant Variance Assumption 298 13-12 Histogram of Residuals from Multiple Regression 299 13-13 Analyzing Residuals from a Multiple Regression Model with Diagnostic Plots and Statistical Tests 300
14-1 Relationship between Probabilities, Odds, and Logged Odds 309 14-2 Logistic Regression Analysis with One Independent Variable 311 14-3 Odds Ratios for Logistic Regression Coefficients 312 14-4 Model Fit Statistics and Proportional Reduction in Error 313 14-5 Logistic Regression Analysis with Multiple Independent Variables 315 14-6 Odds Ratios for Multiple Logistic Regression Coefficients 316 14-7 Model Fit Statistics and PRE for Logistic Regression Analysis with Multiple Independent Variables 317 14-8 Logistic Regression Analysis with Weighted Observations 318 14-9 Predicted Probabilities Curve with an Interval-Level Independent Variable 320 14-10 Logistic Regression Analysis with a Binary Independent Variable 321 14-11 Graphing Predicted Probabilities for Binary Values of an Independent Variable 322 14-12 Predicted Probabilities Curve with Nominal- and IntervalLevel Independent Variables 325 14-13 Logistic Regression Analysis with Two Categorical Independent Variables 325
14-14 Graphing Predicted Probabilities with Two Categorical Independent Variables 326 14-15 Logistic Regression Analysis with Two Interval-Level Independent Variables 327 14-16 Visualizing Predicted Probabilities with a ThreeDimensional Figure 329 14-17 Logistic Regression Analysis with More Than Two Independent Variables 330 14-18 Plotting Predicted Probabilities with More Than Two Independent Variables 331 15-1 List of Datasets in Installed Packages 340 15-2 Details about a Dataset in an Installed Package 341 15-3 Reading R Datasets 342 15-4 Importing an SPSS-Formatted Dataset 343 15-5 Importing a Stata-Formatted Dataset 344 15-6 R-Unfriendly Excel Dataset 345 15-7 R-Friendly Excel Dataset 346 15-8 Creating an R-Friendly Excel Spreadsheet 346 15-9 Importing a Dataset Saved as Comma-Separated Values 347 15-10 Data from the Web in HTML Format 348 15-11 Converting HTML-Format Data into an R-Friendly File 349
PREFACE In many ways, the third edition of An R Companion to Political Analysis follows the template of the books that preceded it. Like prior editions, this volume guides students in the use of R for constructing meaningful descriptions of variables and performing substantive analysis of political relationships, from bivariate cross-tabulation analysis to logistic regression. As before, all of the examples and exercises use research-quality data—including a survey dataset (the 2020 American National Election Study), an experiment dataset, and two aggregate-level datasets (one based on the 50 U.S. states, and one based on countries of the world). And, consistent with prior editions, each chapter is written as a tutorial, taking students through a series of guided examples that they then use to perform the analysis. The third edition improves upon its predecessors in several ways. First, we’ve improved our suite of political analysis functions so students can do more analysis with fewer functions. Our “RCPA3” R package supersedes and replaces the “poliscidata” package that accompanied the second edition of this book. We’ve streamlined the Companion’s package by simplifying functions, using consistent syntax, and reducing dependencies.1 For example, there are no longer separate functions to do analysis with weighted observations (essential at times, but often unnecessary) as we’ve incorporated a simple “w” option into functions where it might be useful. We’ve also customized the functions to alert students to common mistakes and display results clearly. Students can simply install this book’s R package from the Comprehensive R Archive Network (CRAN), load it in R, and then jump right into executing commands and analyzing political science datasets. Second, this edition features all new figures and updated examples. We’ve heavily annotated this edition’s figures in the style of text messages (they look neat, and students seem to enjoy reading them). The datasets have all been updated. We hope you find our screenshots of tables and figures reasonably attractive. Third, the chapter content has been improved by bolding and defining key terms in the book’s margins. Each chapter concludes with 5 exercises that give students an opportunity to practice new skills. Finally, this third edition has 15 chapters (prior editions had 11 chapters). The four “new” chapters develop themes from prior editions, allowing us to discuss essential methods of political analysis more carefully. Chapter 5 (Graphing Relationships and Describing Patterns) focuses on making comparisons visually. Chapter 6 (Random Assignment and Sampling) shows students how to control for rival explanations using random assignment and provides some much-needed discussion of research ethics. Chapter 8 (Foundations of Inference) shows students how we quantify the uncertainty of sample statistics. Chapter 13 (Analyzing Regression Residuals) shows students how to analyze regression residuals to test a linear model’s assumptions. This book’s new 15-chapter structure is more modular than the prior edition’s, which makes it easier to fit the material to a semester-length course and/or teach political analysis online. If you’ve taught political analysis with the second edition, these new chapters should not necessitate major syllabus revisions and will probably fit your syllabus better than the second edition did. A number of instructors have urged us to devote more attention to data visualization, the logic of statistical inference, and regression analysis. We listened to you and used your feedback to improve this book. Thank you! Throughout the text, we emphasize simple solutions that can be adapted to solve more complicated problems. Where possible, we’ve revised our examples to make them easier to follow. We’ve also made a special effort to show how to use R to create publication-level tables and figures. Data visualization is an especially exciting field and a relative strength of R. Of course, a book like this leaves out more than it covers. We don’t, for example, discuss writing functions with R, scraping webpages, or advanced packages for data analysis like tidyverse. As students progress beyond the scope of this book, they will want to learn how to use new functions (or perhaps write their own functions), and we’re happy to help them begin that journey.
ADVICE FOR INSTRUCTORS This book is intended to help students learn to apply political science research methods using the R program. We emphasize developing good writing habits, proper interpretation of statistics, and clear presentation of results. This book isn’t a comprehensive reference to R’s data analysis functions. This book is intended to serve as a companion to textbooks that emphasize the general concepts of political science research. This book helps students apply textbook and lecture concepts with R to solve problems and conduct research. Those of us who teach political science research methods understand there are pros and cons to using different statistics programs. We think instructors should be aware of the advantages and disadvantages of using R and, if they choose R, work to maximize its advantage and minimize its potential problems. The primary benefit of using R for teaching students to use political science research methods is that R is a free program that works well on both Windows and Mac OS platforms. Students don’t have to work on certain computers on campus or under an expiring software license. In our experience teaching this class, students like being able to work on their own laptops, even though we have computer labs on campus. Working with R gives students the option of working on or off campus, at times that fit in their schedules. Although R is sometimes seen as a program reserved for hardcore quants, it may be more appropriate to view R, and R Studio, as programs made for everybody. We think it’s great that students can build a toolkit of R scripts over the course of a term and take it with them into other classes or the workplace. The only real limitation to using R is the willingness to learn how. In this book, we try to identify the fundamental research methods used by political scientists and demonstrate the simplest ways of applying these methods using R. We’ve written simple functions to execute essential tasks using a consistent set of arguments and feature them in this book (these companion functions end with capital C, like freqC and crosstabC). Of course, we recognize that there are many different ways to implement research methods in R. We don’t pretend that our functions are all students will ever need—far from it. We think it makes sense to teach students how R functions are called, demonstrate the simplest possible solution to a problem, and encourage students to demonstrate their creativity and initiative by refining the basic solution or trying other solutions to the problem. If you’ve mastered different approaches to some of the methods we discuss in this book, we’d encourage you to teach R functions that are familiar to you in place of, or as alternatives to, the functions we demonstrate here. As noted in the prefaces to earlier editions, teaching students to use R presents some challenges. Students are accustomed to using consumer electronics that don’t require much thought. Chances are, they’ve never had to use an instruction manual to operate a computer or electronic device, so using a manual to operate a statistics program is an unfamiliar task. Our suggestion is to be frank with students about the pros and cons of R and explain why you’re using it to teach research methods. We’ve found that many students (although often reluctant to admit it) actually enjoy the challenge of learning a new skill that demands precision and attention to detail. When students learn that R is widely used in the private sector and familiarity with R is a desirable skill to potential employers, they are likely to prefer using R to working with other statistics programs. One specific suggestion for instructors who plan on using R is to devote at least part of one class session to helping students download and install R, the RCPA3 R package that bundles the functions and datasets used in this book, and R Studio. Encourage students to bring their personal laptops to this session to get them set up to work independently. If you’ve worked with R for a while, it’s easy to forget how confusing the R environment appears to a new user. Be wary of the curse of expertise.2 Help students get to the point where they can execute commands and interact with R. Have students execute our welcome function to verify their machine is set up properly. Make sure your students are prepared to start making mistakes and learning from them; trial and error is essential, so you don’t want students to get caught up on one-time, setup issues.
If you think that learning how to use R is a learning objective in and of itself and not merely a means to other ends, consider incorporating some computer lab sessions into your course if time and facilities allow you to do so. One of us (Edwards) teaches research methods with equal parts lecture and lab sessions. In the lab sessions, students work on solving problem sets using R. When students have questions, they raise their hands and receive one-on-one instruction. Edwards has been fortunate to work with some excellent graduate teaching assistants who join the class lab sessions to work one-onone with students. He has also recruited top students to return to lab sessions in subsequent terms to help other students learn to use the R program. It’s a lot of fun and the hands-on experience with R reinforces the general concepts from lectures and the textbook.
ACCOMPANYING CORE TEXT Instructors will find that this book makes an effective supplement to any of a variety of methods textbooks. However, it is a particularly suitable companion to our own core text, The Essentials of Political Analysis (see Table P-1). The textbook’s substantive chapters cover basic and intermediate methodological issues and ideas: measurement, explanations and hypotheses, univariate statistics and bivariate analysis, controlled relationships, sampling and inference, statistical significance, correlation and linear regression, and logistic regression. For instructors who have used Pollock and Edwards’s political analysis textbooks before, here is how the contents of the third edition of the R Companion correspond with the current (sixth) edition of The Essentials of Political Analysis. Table P-1Essentials of Political AnalysisR Companion
As noted earlier, each chapter of this book has end-of-chapter exercises. Students can read the textbook chapters, do the exercises, and then work through the guided examples and exercises in An R Companion to Political Analysis. The idea is to get students in front of the computer, experiencing political research firsthand, as soon as possible. Theory and practice go great together. An instructor’s solutions manual, available for download online at edge.sagepub.com/Pollock and free to adopters, provides solutions for all of the textbook and workbook exercises. 1
See “A Quick Reference Guide to R Companion Functions” for a summary of featured functions and lists of additional and deprecated functions. 2
If you’re accustomed to using R for political analysis, it’s hard to see it the way a novice user does. For further reading on the curse of expertise and its implication for teaching, see Carl Wieman, “The ‘Curse of Knowledge,’ or Why Intuition About Teaching Often Fails,” American Physical Society News, November 2007, https://www.aps.org/publications/apsnews/200711/backpage.cfm; and Steven Pinker, The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century (New York: Penguin Books, 2014), Chapter 3. Working with students in lab sessions can help instructors identify what they need to learn. Students are good at teaching classmates what they’re recently learned and enjoy active learning in lab sessions.
ACKNOWLEDGMENTS We would like to thank the wonderful editorial team at SAGE Publications for their continued support and encouragement. It’s a real pleasure to work with such a talented and professional group. We would also like to thank the R Development Core Team and the authors whose functions we use throughout this book. We also thank reviewers for pointing us in the right direction: Renato Corbetta, University of Alabama at Birmingham; Sarah Croco, University of Maryland, College Park; Ransford F. Edwards Jr., Nova Southeastern University; Frank Griggs, University of Connecticut— Avery Point; Frank M. Häge, University of Limerick; Md Mujahedul Islam, University of Toronto; Brad Lockerbie, East Carolina University; Kevin Mullinix, University of Kansas; Matthew B. Platt, Morehouse College; Kevin Reuning, Miami University of Ohio; and Duy Trinh, University of California San Diego.
ABOUT THE AUTHORS Philip H. Pollock III is a professor of political science at the University of Central Florida. He has taught courses in research methods at the undergraduate and graduate levels for more than 30 years. His main research interests are American public opinion, voting behavior, techniques of quantitative analysis, and the scholarship of teaching and learning. His recent research has been of the effectiveness of Internet-based instruction. He has served as co-editor of the Journal of Political Science Education. Pollock’s research has appeared in the American Journal of Political Science, Social Science Quarterly, and the British Journal of Political Science. Recent scholarly publications include articles in Political Research Quarterly, the Journal of Political Science Education, and PS: Political Science and Politics. Barry C. Edwards is an associate lecturer in the Department of Political Science at the University of Central Florida. He received his B.A. from Stanford University, a J.D. from New York University, and a Ph.D. from the University of Georgia. His teaching and research interests include American politics, public law, dispute resolution, and research methods. His research has been published in the Journal of Politics, Political Research Quarterly, the Election Law Journal, the NYU Journal of Legislation and Public Policy, the Emory Law Journal, the Harvard Negotiation Law Review, the Georgia Bar Journal, American Politics Research, Presidential Studies Quarterly, Public Management Review, and State Politics and Policy Quarterly.
A QUICK REFERENCE GUIDE TO R COMPANION FUNCTIONS Featured Functions3
ADDITIONAL FUNCTIONS4 Colors (see R’s color palette with codes) CramersV (calculate Cramer’s V from summary stats) fit.svyglm (weighted logistic model fit stats) inverse.logit (translate logged odds to probabilities) lambda (calculate Lambda from table) lineType (see plot line types with codes) logregR2 (logistic model fit stats) orci (odds ratios for logistic model) pchisqC (compare full and restricted logistic regression models) plotChar (see plotting character types with codes) random.names (generates nationally representative last names) scatterplot (pass-thru to car::scatterplot) svyby (pass-thru to survey::svyby) svydesign (pass-thru to survey::svydesign) svyplot (pass-thru to survey::svyplot) tablesomersDC (calculate Somers’ d for table) welcome (RCPA3 package welcome function, with reset=T option) widgetFactory (play Widget Factory to practice function calls) wtd.cor (correlation with weights option) wtd.hist (histogram with weights option) wtd.kurtosis (kurtosis with weights option) wtd.mean (mean with weights option) wtd.median (median with weights option) wtd.mode (mode with weights option) wtd.quantile (quantiles with weights option) wtd.sd (standard deviation with weights option) wtd.skewness (skewness with weights option) wtd.var (variance with weights option)
DEPRECATED FUNCTIONS5 AdjR2 (built into regC), CI95 and CI99 (use CIprop, CImean, or sampdistC instead), colPercents (built into crosstabC with z arg.), compADPQ (build into crosstabC with somers=T), compmeans (use compmeansC instead), crosstab (use crosstabC instead), csv.get (use getC instead), cut2 (use transformC with type=“cut”), ddply (use compmeansC with z arg. instead), describe (use describeC instead), freq (use freqC instead), imeansC (use compmeansC with z arg. instead), iplotC (use compmeansC with z and plot args. instead), plotmeans and plotmeansC (use compmeansC with plot arg. instead), prop.testC (use testpropsC instead), SetTextContastColor (build into Colors), somersD (use crosstabC with somers=T instead), spss.get (use getC instead), stata.get (use getC instead), svyboxplot (use boxplotC with w arg. instead), svychisq and svychisqC (use crosstabC with w arg. and chisq=T), svyglm (built into logregC with w arg.), svymean (use describeC or wtd.mean with w arg.), svytable (use crosstabC with w arg.), wtd.boxplot (use boxplotC with w arg. instead), wtd.chi.sq (use crosstabC with w arg. and chisq=T), wtd.t.test and wtd.ttestC (use testmeansC with w arg.), xtabC (use crosstabC with z arg.), xtp (use crosstabC instead), and xtp.chi2 (use crosstabC with chisq=T instead). For more detailed help files on these functions, enter ? followed by the function’s name or help(function_name) in R. Functions from other packages are not listed. 3
See Table 1-1 in Chapter 1 for a list of common arguments in our Companion functions and their meaning. 4
These functions aren’t featured in the text but provide useful reference information and perform special tasks. You can call them in RCPA3 and they are documented. 5
These functions were used in An R Companion to Political Analysis (2nd ed.) and distributed in our poliscidata package. They have been superseded by other functions in this edition. When called in RCPA3, they generate notes.
INTRODUCTION: GETTING STARTED WITH R R Functions Used install.packages library welcome As you have learned about political research and explored techniques of political analysis, you have studied many examples of other people’s work. You may have read textbook chapters that present frequency distributions, or you may have pondered research articles that use cross-tabulation, correlation, or regression analysis to investigate interesting relationships between variables. As valuable as these learning experiences are, they can be enhanced greatly by performing political analysis firsthand—handling and modifying social science datasets, learning to use data analysis software, learning to describe variables, setting up the appropriate analysis for interesting relationships, and running the analysis and interpreting your results. This book will guide you as you learn these practical and creative skills. Using R, powerful data analysis software, to analyze researchready datasets, you will learn to obtain and interpret descriptive statistics (Chapter 2), to collapse and combine variables (Chapter 3), to perform cross-tabulation and mean analysis (Chapter 4), and to show the relationship between variables using charts and graphs (Chapter 5). You will learn how to control for other factors that might be affecting your results using random assignment (Chapter 6) and controlled comparisons (Chapter 7). Chapter 8 addresses the fundaments of statistical inference, which are then used to test
hypotheses about sample statistics (Chapter 9) and the relationship between variables (Chapter 10). On the somewhat more advanced side, this book introduces correlation and linear regression (Chapter 11), multiple regression analysis (Chapter 12), and diagnostic analysis of regression residuals (Chapter 13). Chapter 14 introduces logistic regression, an analytic technique that has gained wide currency in recent decades. Chapter 15 shows you how to import new datasets to conduct your own political analysis, and it provides guidance on writing up your results. Virtually every chapter in this book places special emphasis on the graphic display of data, an area of increasing interest to the scholarly community. To get started with this book, you will need access to a computer with an Internet connection. After you set up your computer with the right software and add-ons, you’ll be able to work offline. All of the necessary files are freely accessible on the Internet. Depending on your connection speed, completing the recommended installation process may take 15–20 minutes in all, so we encourage you to be patient and complete one step at a time.
I.1 ABOUT R What is R? R is free software developed in the public domain to analyze data. You can run R on a variety of operating systems. The base version of R performs many of the statistical procedures you will learn in this book. In addition, hundreds of users have written a multitude of specialized packages for R, all of which are available from the Comprehensive R Archive Network (CRAN), a clearinghouse for R resources of all kinds.6 Base version of R: the components and packages included in standard R installation. Packages: collections of R functions and files developed by R user to extend the functionality of base R. 6See
https://cran.r-project.org.
In the world of computer software, R is something of a youthful upstart—version 1.0.0 was released in early 2000—but its user base has steadily expanded.7 Indeed, by 2014, R had an estimated 2 million regular users. Large corporations, such as Google and Facebook, use R for special applications, such as data visualization. 7David
Robinson, “The Impressive Growth of R,” The Overflow, October 10, 2017, https://stackoverflow.blog/2017/10/10/impressivegrowth-r/. Powerful, flexible, richly supported, increasingly popular—and free. What’s the downside? This: R is hard. The learning curve is steep. The R interface can be described as either retro or primitive, depending on how charitable you wish to be. Although a handful of promising graphical user interfaces (GUIs) for R exist, R’s core power is unlocked by the keyboard, not the mouse. (Yes, R is command line.) Because different programmers have contributed to R’s development, not all commands adhere to the same syntactical rules. Until you get the hang of it, you will find yourself frequently
referring to the reference card provided with this book. Above all— and subsuming all these challenges—R’s approach to computing, its idea of computing, is almost certainly different from the approach you have grown accustomed to. The R statistical environment takes some getting used to. However, when you get comfortable working with objects and using functions, you’ll appreciate the program’s flexibility and the wealth of tools available for data analysis.
I.2 INSTALLING R There is no substitute for practical experience with R. Let’s install R so we can see it running and try interacting with it. To install the base R program, follow these steps, illustrated in Figure I.1: 1. Open http://cran.r-project.org/, the homepage of the R Project for Statistical Computing. 2. Under the heading “Download and Install R,” select the link that corresponds to your computer’s operating system. For Windows: Click “base” or “install R for the first time” to install the basic version of the most recent version of R (4.1.2 at the time of this writing).8 For Mac: Select the most recent .pkg version of the R program your operating system can support. As of the time of this writing, the most recent version of R (4.1.2) requires Mac OS 10.13 or higher. If your Mac OS is older than that, select the R version appropriate for your system. For Linux: Follow instructions specific to your Linux distributor. 3. Follow normal installation procedures. Click through the installation dialogs. Accept the default settings. 8Note
to Windows users: The Windows installer should determine whether to install the 32-bit or the 64-bit version of R. However, if you need to determine your machine’s bit count, find help here: http://support.microsoft.com/kb/827218.
Description Figure I-1 Downloading the R Program from the R Project’s Website
I.3 A QUICK TOUR OF THE R ENVIRONMENT Now that you’ve installed the R program, let’s run the program and see the R environment. Double-click the R icon. The window that opens on the left side of screen is called the R Console. Above the R Console, at the top of the screen, you’ll see a row of drop-down menus (see Figure I-2). You can edit some settings to customize your R environment, but the base program’s drop-down menus are spare. If you’re running R on Mac OS or Linux, your R environment may look different than how it’s depicted here. (You have some options to customize the look and feel of your R environment with the “GUI preferences…” option under the Edit menu tab.) R Console: the window in R that shows command line input and program output. Did you notice the > sign on the last line of the R Console? R is awaiting your commands.
Description Figure I-2 The R Console
I.4 INSTALLING THE “RCPA3” PACKAGE After following the instructions in Section 1.2 for installing R, you should be able to open and run the base version of the R program on your computer. You must have the base version of R installed to use R, but one of the program’s best features is the ability to customize the base version. We developed specialized material for this book that will permit you to analyze, present, and interpret data. A specialized collection of R elements is called a package. The RCPA3 package we created for this third edition of the R Companion to Political Analysis contains the functions and datasets we use in this book and is available through an online repository. In this section, you will (1) install the RCPA3 package, (2) load the RCPA3 package in an R session, and (3) run the RCPA3 package’s welcome function to produce some basic information about your working environment. RCPA3: the R package we developed for the third edition of An R Companion to Political Analysis. To install the RCPA3 package, enter the following command at the > sign in the R Console.
install.packages("RCPA3")
This command will prompt you to select a repository from which to download the RCPA3 package. The default “0-Cloud [https]” option is a good choice.9 9The
CRAN repositories mirror each other so there’s no real difference among them.
You can also download an R package by selecting Packages ► Install package(s)… from R’s drop-down menus (see Figure I-3). This method will also prompt you to select a repository to download the package from, and then select the “RCPA3” package from the long alphabetical list of R packages. You can’t do much data analysis with R’s drop-down menus, so we suggest you try writing and running a command in the R Console.
Description Figure I-3 Installing the RCPA3 Package When you install the RCPA3 package, R will automatically install the functions and datasets you’ll be using as well as the packages that the RCPA3 package requires (and the packages those packages require, and so on). The installation process may take a couple minutes. If you encounter problems, see Section I.5 for troubleshooting tips. You will need to install the RCPA3 package on each computer you use. You might wonder why the R program does not come with all the packages you need. There are thousands of different packages available to extend R’s capabilities. Chances are, you will use only a
fraction of them (even if you become a lifetime R user). So, the R Project keeps the base version of the program relatively light and allows users to add on functionality based on their individual preferences. It might be helpful to think about R packages likes apps you download to your phone; your phone doesn’t come with all available apps pre-installed: it lets you pick and choose which ones you want. Additional Instructions for Mac OS If R asks whether you would like to install a package from a source file, rather than a binary file, enter “no” or else you will need additional tools for building packages from source files. You can update these packages when binary files are available. You may need to install the XQuartz utility to create some R graphics, but it should not be necessary for the methods discussed in this book. After you install R, you can move its installer to the trash. Now that you’ve downloaded the RCPA3 package, you need to load the package in your current R session using the library command. When you download R packages, they aren’t automatically available every time you use R. The program allows you to selectively load installed packages so you can make efficient use of your computer’s memory. You can think of loading an installed package like running on app on your phone. After you’ve downloaded an app, you have to open it to use it; to keep your device working efficiently, all of your phone apps aren’t open and running all the time. To load a package that you’ve installed in an active R session, you execute the following command.10 This time, RCPA3 is not in quotation marks. 10You
can also load the RCPA3 package by selecting Packages ► Load package… from R’s drop-down menus.
library(RCPA3)
If R generates an error message when you try executing this command, you may to need to troubleshoot issues in the installation process (see Section I.5, below, for troubleshooting tips). When you execute the library(RCPA3) command, it may look like R didn’t do anything. Actually, there is a lot going on behind the scenes, but R won’t output any messages to the console unless there is a problem. We created the RCPA3 package to make getting started with R as simple and as straightforward as possible. At this point, you should be ready to go. To acquaint you with the R working environment and the contents of the RCPA3 package, we’ve written a special function called welcome. This command will generate a welcome message, output some basic information about your R session, and list the objects and functions in the RCPA3 package (see Figure I-4).
welcome()
Description Figure I-4 The RCPA3 Package’s Welcome Function The RCPA3 package may look complicated at first, but we’ll introduce it gradually and, with some practice, you’ll learn how to use all sorts of R functions to analyze politics.
The Companion Datasets The RCPA3 package contains four datasets. 1. debate. This dataset comes from a laboratory experiment on the effect of televised images on our assessments of presidential candidates. We thank Jamie Druckman of Northwestern University for sharing this dataset with us. The names and basic descriptions of variables in the debate dataset appear in Appendix Table A-1. 2. nes. This dataset includes selected variables from the 2020 National Election Study, a random sample of 8,280 citizens of voting age, conducted by the University of Michigan’s Institute
for Social Research. See Appendix Table A-2 for variable names and descriptions. 3. states. This dataset includes variables on the 50 states. These variables were compiled by the authors from various sources. The names and basic descriptions of variables in the states dataset appear in Appendix Table A-3. 4. world. This dataset includes variables on 169 countries. Many of these variables are based on data compiled by Pippa Norris (John F. Kennedy School of Government, Harvard University) and made available to the scholarly community through her web site.11 Other variables were compiled by the authors from various sources. The names and basic descriptions of variables in the world dataset appear in Appendix Table A-4. 11http://www.pippanorris.com
The four datasets included in the R package that accompanies this book contain a wealth of information about political behavior and institutions. We’ll use these datasets to demonstrate a variety of research methods, but we hope your curiosity will be sparked to explore variables and relationships that we don’t address here. Datasets: data that have been compiled and organized for research purposes, usually in digital format. Here is an important note to commit to long-term memory: Every time that you open a new session to work with the RCPA3 package, you will need to execute the following command:
library(RCPA3)
Typing this command into the R Console is fine for now, but as soon as Chapter 1 we’ll show you how to create and save commands in a script, so you don’t have to type commands over and over.
I.5 TROUBLESHOOTING INSTALLATION PROBLEMS We created the RCPA3 package so you can start analyzing real political science data with R quickly and easily. You may still encounter some problems or receive some unexpected warnings from the R program. You might, for example, see a warning message that one or more packages were built under an earlier version of R than the one you are running. This issue does not seem to pose any serious problem. When you install the RCPA3 package, R should automatically install all the packages that our package depends on (and the packages those packages depend on). We’ve found, however, that R sometimes fails to install all the required dependencies. If this happens, you will see an error message that you are missing a required package. Don’t panic. You can fix the missing package problem. You just need to read R’s error message for the name of the missing package(s) and install the missing package(s) manually. You can select the “Install package(s)…” option from the Packages drop-down menu, select a repository near you, and select the missing package from the very long list of available packages. Or you can type the following on the Console command line, substituting the name of the missing package inside the quotation marks: Missing package problem: fixable error that occurs when R fails to download and install all packages the RCPA3 package depends on directly or indirectly.
install.packages("insert_name_of_missing_package")
If R reports that it is missing another package when you try the library(RCPA3) command, keep installing missing packages until the missing package error messages go away. The problem will go away and you will be able to load RCPA3. You will not have to do all this each time you use R. It is simply a setup issue. Complete installation make take a few minutes. Some downstream packages contain a lot of files. We noticed that one dependency package, readr, took an especially long to time complete installation.
I.6 INSTALLING R STUDIO In this section, we discuss the R Studio program, an integrated development environment (IDE), which makes R easier to use. As we’ve discussed, the R environment is relatively spare and sterile. Its graphical user interface is limited and little analysis can be conducted using its pull-down menus. Fortunately, some software developers are working to address this void and make R more intuitive and user-friendly. R Studio is an interface for R that is available for Windows, Mac OS, and Linux. It’s a free program (commercial enterprises may pay more for technical support). You can download R Studio and learn more about it from its website: https://www.rstudio.com/. R Studio is not a substitute for the R program; it works with the R program so you’ll need to download and install both programs to use R Studio. R Studio for Mac OS There is an open source (free) edition of R Studio’s desktop IDE for Mac OS. At the time of this writing, you download a .dmg file which will appear in your locations. After you open it, you can move R Studio into your Apps and move the .dmg file to the trash. This installation method bypasses Mac OS’s Installer utility. R Studio makes it easier to use R and we strongly recommend using it (see Figure I-5). We particularly like R Studio’s ability to suggest and auto-complete code. R Studio can also alert you to common coding problems, like syntax errors, unpaired parentheses, and quotation marks. You can user R Studio’s preference options to customize the user interface. Other nice features include an enhanced Editor with line numbers and smart text coloring, a command history pane, a help file pane, and some nice options for saving graphics. Our students say R Studio makes it easier to use R and would strongly encourage you to use it.
I.7 INSTANT ACCESS TO TUTORIALS AND RESOURCES To augment the step-by-step instructions in this text, we have compiled tutorial videos and supplemental resources for every chapter. The tutorial videos offer concise, focused demonstrations of the methods discussed in this book—plus some other topics of interest.
Description Figure I-5 R Studio Screenshot To make supplemental resources as accessible as possible, we’ve embedded QR codes (“quick response” codes) in the text. You can simply point your smartphone camera at one of these codes and your phone will open a webpage with that chapter’s videos and supplemental resources. The numbers on our QR code graphics correspond to the chapter numbers. We encourage you to practice new skills on your computer as you watch tutorial videos. You can pause, rewind, and replay the video to follow along with the demonstration. If you want to watch a screencast on your computer rather than streaming it on a device, you can find and view links to
each chapter’s online resources on the book’s website: edge.sagepub.com/Pollock.
Figure I-6 QR Codes for Getting Started and Chapters 1–15
Descriptions of Images and Figures Back to Figure The screenshot shows the homepage of the R Project for Statistical Computing. The cursor hovers over the link labeled download R. Text reads, Go to the r-project.org website. Click the link download R. The link leads to a page titled C R A N mirrors. Text reads, The Comprehensive R Archive Network is available at the following U R Ls, please choose a location close to you. Some statistics on the status of the mirrors can be found here: main page, windows release, and windows old release. If you want to host a new mirror at your institution, please have a look at the C R A N Mirror HOW TO. The cursor hovers over the link h t t p s colon forward slash forward slash cloud-r-project dot o r g. Text reads, You’ll be aske to select download site. Select-0-Cloud option. Next, pick the version for your computer’s operating system. The link leads to another page titled The Comprehensive R Archive Network. Text reads, Download and Install R. Precompiled binary distribution of the base system and contributed packages, Windows and Mac users most likely want one of these versions of R: Download R for Linux, Debian, Fedor or Redhat, Ubuntu; Download R for mac O S; Download R for Windows. R is part of many Linux distributions, you should check with your Linux package management system in addition to the link above. A page titled R for Windows shows four subdirectories: base, c o n t r i b, Old c o n t r i b, and R tools. The cursor hovers over the link labeled R for the first time in base. Text reads, Windows Users: click install R for first time. Linux Users: follow link for your distribution. A page titled R for mac O S gives the description of the package and latest release. The cursor hovers over the link for latest release of the product. Text reads, Mac users: Select most recent release
compatible with your O S version. There is no R version for Android devices. Back to Figure The screenshot shows the welcome message in R Console. Below the message, there is a right angle bracket followed by the cursor. Text reads, Welcome to R! The user interface is simple and clean. You can type commands at the right angle bracket, cursor. We don’t recommend using R this way. You’ll want to use R studio and type commands in script files, but this is a good place to start. Back to Figure The screenshot shows R Gui window with various menu options on the top. Packages menu is expanded, and Install packages option is selected. In the Secure C R A N mirrors dialog box, 0-Cloud h t t p s is selected. In the Packages dialog box, R C P A 3 is selected. There are two buttons, O K and Cancel, at the bottom of both the dialog boxes. Text reads, To install the R C P A 3 package, you can select Packages, Install packages… You can also enter a command to install the R C P A 3 package. If you’re using R Studio, it will look different. Select O-Cloud mirror and then scroll down the long list of packages to find R C P A 3, select it, O K. Back to Figure Text in the message reads, Welcome. The r c p a 3 package bundle datasets and functions featured in An R Companion to Political Analysis, Third Edition, by Philip H. Pollock the third and Barry C. Edwards. Your current working directory is C colon forward slash Users forward slash Barry Edwards forward slash Documents. Use the set w d, left parenthesis, right parenthesis, function to change your working directory. This package contains four datasets, debate, n e s, states, and world, and many functions. To see a list of all objects in this package, enter 1s, left parenthesis, double quotes, package colon R C P A 3, double quotes, right parenthesis. You can enter welcome, left parenthesis, reset equals TRUE, right
parenthesis, to clear workspace objects and restore default graphical parameters. For help with this function, or any other R function, type question mark followed by the function’s name, or help, left parenthesis, function underscore name, right parenthesis. If you want to play Widget Factory, just type widget Factory, left parenthesis, right parenthesis, then press enter. We hope you enjoy using the R C P A 3 package! Text reads, If you see this welcome message, you’ve installed and loaded the R C P A 3 package correctly. Be sure to read the message text. It shares useful info about working with R and the R C P A 3 package. If you can’t complete the library, left parenthesis, R C P A 3, right parenthesis, and welcome, left parenthesis, right parenthesis, commands to get this welcome message, see Section I.5 for help. Section I.5 tells you how to troubleshoot common installation problems. Back to Figure There are four panes. The first pane shows a tab titled Untitled 1 and a cursor in an empty space. The second pane shows four tabs: Environment, History, Connections, and Tutorial. Environment tab is selected. Text reads, Environment is empty. The third pane shows three tabs: Console, Terminal, and Jobs. Console is selected. The pane shows the welcome message. The fourth pane shows five tabs: Files, Plots, Packages, Help, and Viewer. Help is selected. The pane consists of details of the R Base Packages.
1 USING R FOR DATA ANALYSIS
Learning Objectives In this chapter you will learn to: Navigate the R environment Extend R’s data analysis capabilities with packages Generate results with R functions Define and use objects in the R environment Create and save R scripts Identify academic-style tables and figures Get additional help with R functions
R Functions Used seq install.packages widgetFactory freqC printC help Briefly mentioned: getwd, setwd By the time you read this chapter, you should have downloaded and installed the R program (required), the RCPA3 package (required), and R Studio (highly recommended). If you haven’t done all of this, please follow the instructions in the Introduction (“Getting Started With R”) of this book. You’ll learn this material more quickly if you follow along by replicating our examples on your personal computer. Reading in Essentials We cover the definition and measurement of political science concepts in Chapter 1 of the sixth edition of The Essentials of Political Analysis, pp. 1–33. In this chapter, we introduce some R basics: interacting with R, using objects, executing functions, and writing/running scripts. We also discuss how R outputs the results of data analysis and how you can
format R output for papers and presentations. It is not possible to cover everything about R in one chapter, or one book even, so we show you how to get help with R functions.
1.1 INTERACTING WITH R Now that you’ve got R running and know where you can enter commands, let’s see what R can do. It can be helpful to think of R as an overgrown programmable calculator. Like a calculator, if you ask R to perform a calculation, like “2 + 2,” and it will return the result, 4, to you.
# Example computation by R 2+2 # Enter "2+2" and R returns "4" # R does not try executing text to right of # symbol. # Statements to right of # symbols are called comments.
Notice that R’s response to the command “2 + 2” starts with [1]. Rather than clear your command, R indexes its answer, “[1]”, and returns it on the next line. In this case, the answer is just one number, but we’ll soon see that R can work with long series of numbers, in which case indexing helps us make sense of results.
Console Output from a Simple Command In our simple 2 + 2 example, we see an operator used by the R program. The plus sign (+) is a mathematical operator that adds numbers together. As you might guess, R also uses familiar mathematical operators such as the dash (–) to subtract, the forward slash (/) to divide, the asterisk (*) to multiply, and the caret (^) to raise to a power. The equals sign (=) is particularly important in the R environment and we will focus on it in the next section. Comparing R to a calculator helps us get started, but it only scratches the surface of what R can do. To start unlocking R’s potential, we need to learn about objects and functions.
To understand computations in R, two slogans from John Chambers (a cocreator of R) are helpful: Everything that exists is an object. Everything that happens is a function call.
1.2 OBJECTS You can think of objects as all-purpose containers for information. When researchers define and measure concepts, they need some way to store and organize the data they collect. If you’re analyzing a long series of numbers, you can name the series of numbers and then use the object’s name whenever you want to recall the series. Objects in the R environment serve a similar purpose as contacts saved in your phone. You could manually enter a friend’s number every time you want to call them, but it’s far more convenient to save their contact information and dial them by name. You can store information as a named object and later reference that information by using its name. Object: R’s way of storing data in its memory, comparable to containers for physical things. Everything that exists in R is an object.
1.2.1 Examples of Objects To gain some firsthand experience with an R object that should look familiar, enter “pi” in the R Console.
# Example of object with one value pi
Description Accessing an Object’s Values The numerical value of pi (approximately 3.141593) is stored as an object named “pi” to make it easier to calculate values using pi. You can type the object’s name anytime you want to use its stored value. It’s similar to storing a value in a calculator’s memory and then recalling that value later, but you are not limited to storing one value at a time. Objects are versatile storage containers. Some objects, like pi, encapsulate just one value; other objects store long series of values. Think about the contacts in your phone again. The contact may have several elements in addition to the stored phone number, such as an e-mail address, mailing address, and photo. Similarly, in the R environment, an object can be a list of other objects or a dataset organized into rows and columns (data frames). Objects in R come in different shapes and sizes, but they are all accessible in your workspace—ready to be called and employed whenever you need them. Data frame: a specific type of R object; a set of values organized into rows and columns (has two dimensions). Datasets are stored as data frames. Let’s look at some objects in the base R environment that have multiple values (vectors). Enter “letters” and then “LETTERS” (without the quotation marks). Vector: a specific type of R object; a series of values connected in sequence, like cars on a train or links on a chain (has one dimension, length).
# Examples of objects with multiple values (vectors) letters LETTERS
Description Figure 1-1 Objects with Multiple Values (Vectors) The 26 letters of the alphabet are stored as “letters” (the lowercase version) and “LETTERS” (the uppercase version). You can see their values in Figure 1-1. Both objects contain 26 values. The output in Figure 1-1 uses [19] to indicate that s/S is the nineteenth value (you may get a different line break depending on the width of your R Console). Because R is case sensitive, “letters” and “LETTERS” are different objects. The RCPA3 package you installed and loaded contains several objects we’ll use throughout this book. The package’s datasets are objects in the R environment. The dataset objects (named states, world, debate, and nes) contain a lot of information organized by row and column; their contents would fill many pages. These datasets show the results of careful efforts to define and measure concepts that vary among states, countries, and individuals. In Section 1.2.3, we’ll show you how to access information stored in datasets.
1.2.2 Creating Objects R makes it easy to create objects. You don’t need to declare what type of information your object will hold or what size it will be. R will create objects as soon as you assign some values to a legal name. We use the assignment operator to assign values to objects. The equals sign (=) is the intuitive choice, although R traditionalists prefer the classic assignment operator (