119 62 4MB
English Pages 381 [382] Year 2023
nd co ion Se dit E
R Packages
Organize, Test, Document, and Share Your Code
Hadley Wickham & Jennifer Bryan
R Packages Turn your R code into packages that others can easily install and use. With this fully updated edition, developers and data scientists will learn how to bundle reusable R functions, sample data, and documentation together by applying the package development philosophy used by the team that maintains the “tidyverse” suite of packages. In the process, you’ll learn how to automate common development tasks using a set of R packages, including devtools, usethis, testthat, and roxygen2. Authors Hadley Wickham and Jennifer Bryan from Posit (formerly known as RStudio) help you create packages quickly, then teach you how to get better over time. You’ll be able to focus on what you want your package to do as you progressively develop greater mastery of the structure of a package. With this book, you will:
• Learn the key components of an R package, including code, documentation, and tests
• Streamline your development process with devtools and the RStudio IDE
• Get tips on effective habits such as organizing functions into files
• Get caught up on important new features in the devtools ecosystem
• Learn about the art and science of unit testing, using features in the third edition of testthat
• Turn your existing documentation into a beautiful and user friendly website with pkgdown
• Gain an appreciation of the benefits of modern code hosting platforms, such as GitHub
DATA / DATA SCIENCE
US $65.99 CAN $82.99 ISBN: 978-1-098-13494-5
“A stellar reference book for package development beginners as well as more experienced folks who are curious about the fantastic devtools ecosystem.” —Maëlle Salmon
“R Packages is an excellent and comprehensive guide to making your R code easier for others to reuse– or for future self!” —Sam Lau
Author of Learning Data Science, assistant teaching professor in Data Science at UC San Diego
Hadley Wickham is chief scientist at Posit, winner of the 2019 COPSS award, and a member of the R Foundation. Jennifer Bryan is a software engineer at Posit, a member of the R Foundation, and part of the tidyverse team that maintains more than 150 R packages.
Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia
SECOND EDITION
R Packages
Organize, Test, Document, and Share Your Code
Hadley Wickham and Jennifer Bryan
Beijing
Boston Farnham Sebastopol
Tokyo
R Packages by Hadley Wickham and Jennifer Bryan Copyright © 2023 Hadley Wickham and Jennifer Bryan. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected].
Acquisitions Editor: Aaron Black Development Editor: Sara Hunter Production Editor: Aleeya Rahman Copyeditor: Kim Cofer Proofreader: Piper Editorial Consulting, LLC June 2023:
Indexer: Potomac Indexing, LLC Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea
Second Edition
Revision History for the Second Edition 2023-06-14:
First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098134945 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. R Packages, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-098-13494-5 [LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Part I.
Getting Started
1. The Whole Game. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Load devtools and Friends Toy Package: regexcite Preview the Finished Product create_package() use_git() Write the First Function use_r() load_all() Commit strsplit1() check() Edit DESCRIPTION use_mit_license() document() NAMESPACE Changes check() Again install() use_testthat() use_package() use_github()
1 2 3 3 5 6 7 7 9 9 10 11 12 13 14 14 15 16 19
iii
use_readme_rmd() The End: check() and install() Review
20 23 24
2. System Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 devtools, usethis, and You Personal Startup Configuration R Build Toolchain Windows macOS Linux Verify System Prep
27 28 29 30 30 30 30
3. Package Structure and State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Package States Source Package Bundled Package .Rbuildignore Binary Package Installed Package In-Memory Package Package Libraries
33 34 35 37 38 40 42 43
4. Fundamental Development Workflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Create a Package Survey the Existing Landscape Name Your Package Package Creation Where Should You create_package()? RStudio Projects Benefits of RStudio Projects How to Get an RStudio Project What Makes an RStudio Project? How to Launch an RStudio Project RStudio Project Versus Active usethis Project Working Directory and Filepath Discipline Test Drive with load_all() Benefits of load_all() Other Ways to Call load_all()
iv
|
Table of Contents
47 47 48 50 51 52 52 53 54 56 56 56 58 58 59
check() and R CMD check Workflow Background on R CMD check
60 61 62
5. The Package Within. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Alfa: A Script That Works Bravo: A Better Script That Works Charlie: A Separate File for Helper Functions Delta: A Failed Attempt at Making a Package Echo: A Working Package Foxtrot: Build Time Versus Run Time Golf: Side Effects Concluding Thoughts Script Versus Package Finding the Package Within Package Code Is Different
63 65 67 68 71 74 76 78 78 78 79
Part II. Package Components 6. R Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Organize Functions Into Files Fast Feedback via load_all() Code Style Understand When Code Is Executed Example: A Path Returned by system.file() Example: Available Colors Example: Aliasing a Function Respect the R Landscape Manage State with withr Restore State with base::on.exit() Isolate Side Effects When You Do Need Side Effects Constant Health Checks
83 85 86 87 88 89 90 91 93 94 95 95 97
7. Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Exported Data Preserve the Origin Story of Package Data Documenting Datasets
100 102 104
Table of Contents
|
v
Non-ASCII Characters in Data Internal Data Raw Data File Filepaths pkg_example() Path Helpers Internal State Persistent User Data
104 105 106 107 109 109 112
8. Other Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Other Directories Installed Files Package Citation Configuration Tools
Part III.
115 116 117 119
Package Metadata
9. DESCRIPTION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 The DESCRIPTION File Title and Description: What Does Your Package Do? Author: Who Are You? URL and BugReports The License Field Imports, Suggests, and Friends Minimum Versions Depends and LinkingTo An R Version Gotcha Other Fields Custom Fields
123 125 126 128 129 129 130 131 132 133 134
10. Dependencies: Mindset and Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 When Should You Take a Dependency? Dependencies Are Not Equal Prefer a Holistic, Balanced, and Quantitative Approach Dependency Thoughts Specific to the tidyverse Whether to Import or Suggest Namespace Motivation The NAMESPACE File
vi
|
Table of Contents
136 136 138 140 141 142 143 144
Search Path Function Lookup for User Code Function Lookup Inside a Package Attaching Versus Loading Whether to Import or Depend
146 146 148 151 152
11. Dependencies: In Practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Confusion About Imports Conventions for This Chapter NAMESPACE Workflow Package Is Listed in Imports In Code Below R/ In Test Code In Examples and Vignettes Package Is Listed in Suggests In Code Below R/ In Test Code In Examples and Vignettes Package Is Listed in Depends In Code Below R/ and in Test Code In Examples and Vignettes Package Is a Nonstandard Dependency Depending on the Development Version of a Package Config/Needs/* Field Exports What to Export Re-exporting Imports and Exports Related to S3
155 156 156 157 158 161 161 161 161 162 163 164 164 165 165 165 166 167 167 168 169
12. Licensing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Big Picture Code You Write Key Files More Licenses for Code Licenses for Data Relicensing Code Given to You Code You Bundle License Compatibility
173 174 174 175 176 176 177 178 178
Table of Contents
|
vii
How to Include Code You Use
179 180
Part IV. Testing 13. Testing Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Why Is Formal Testing Worth the Trouble? Introducing testthat Test Mechanics and Workflow Initial Setup Create a Test Run Tests Test Organization Expectations Testing for Equality Testing Errors Snapshot Tests Shortcuts for Other Common Patterns
183 185 186 186 187 188 190 192 193 193 195 199
14. Designing Your Test Suite. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 What to Test Test Coverage High-Level Principles for Testing Self-Sufficient Tests Self-Contained Tests Plan for Test Failure Repetition Is OK Remove Tension Between Interactive and Automated Testing Files Relevant to Testing Hiding in Plain Sight: Files Below R/ tests/testthat.R testthat Helper Files testthat Setup Files Files Ignored by testthat Storing Test Data Where to Write Files During Testing
viii
|
Table of Contents
201 202 203 203 205 208 210 211 212 212 213 213 214 215 216 216
15. Advanced Testing Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Test Fixtures Create useful_things with a Helper Function Create (and Destroy) a Local useful_thing Store a Concrete useful_thing Persistently Building Your Own Testing Tools Helper Defined Inside a Test Custom Expectations When Testing Gets Hard Skipping a Test Mocking Secrets Special Considerations for CRAN Packages Skip a Test Speed Reproducibility Flaky Tests Process and Filesystem Hygiene
Part V.
219 220 220 221 222 222 223 224 224 226 227 228 228 228 229 229 230
Documentation
16. Function Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 roxygen2 Basics The Documentation Workflow roxygen2 Comments, Blocks, and Tags Key Markdown Features Title, Description, Details Title Description Details Arguments Multiple Arguments Inheriting Arguments Return Value Examples Contents Leave the World as You Found It
234 234 237 239 240 240 242 243 243 244 245 246 248 249 250
Table of Contents
|
ix
Errors Dependencies and Conditional Execution Intermixing Examples and Text Reusing Documentation Multiple Functions in One Topic Inheriting Documentation Child Documents Help Topic for the Package
251 252 254 254 255 255 256 256
17. Vignettes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Workflow for Writing a Vignette Metadata Advice on Writing Vignettes Diagrams Links Filepaths How Many Vignettes? Scientific Publication Special Considerations for Vignette Code Article Instead of Vignette How Vignettes Are Built and Checked R CMD build and Vignettes R CMD check and Vignettes
260 261 263 264 264 265 266 266 267 268 269 269 271
18. Other Markdown Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 README README.Rmd and README.md NEWS
273 274 277
19. Website. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Initiate a Site Deployment Now What? Logo Reference Index Rendered Examples Linking Index Organization Vignettes and Articles
x
|
Table of Contents
281 283 284 285 286 286 286 287 288
Linking Index Organization NonVignette Articles Development Mode
288 289 289 290
Part VI. Maintenance and Distribution 20. Software Development Practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Git and GitHub Standard Practice Continuous Integration GitHub Actions R CMD check via GHA Other Uses for GHA
296 296 298 298 298 299
21. Lifecycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Package Evolution Package Version Number Tidyverse Package Version Conventions Backward Compatibility and Breaking Change Major Versus Minor Versus Patch Release Package Version Mechanics Pros and Cons of Breaking Change Lifecycle Stages and Supporting Tools Lifecycle Stages and Badges Deprecating a Function Deprecating an Argument Deprecation Helpers Dealing with Change in a Dependency Superseding a Function
302 304 306 307 309 310 310 312 312 314 315 316 317 318
22. Releasing to CRAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Decide the Release Type Initial CRAN Release: Special Considerations CRAN Policies Keeping Up with Change Double R CMD Checking CRAN Check Flavors and Related Services
323 323 325 326 327 329
Table of Contents
|
xi
Reverse Dependency Checks Revdeps and Breaking Changes Update Comments for CRAN The Submission Process Failure Modes Celebrating Success
330 333 334 336 337 338
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
xii
|
Table of Contents
Preface
Welcome! Welcome to R Packages by Hadley Wickham and Jennifer Bryan. Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this book you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first, so start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better. If you’re familiar with the first edition of the book, this preface describes the major changes so that you can focus your reading on the new areas. There are several main goals for this edition: • Update to reflect changes in the devtools package, specifically, its “conscious uncoupling” into a set of smaller, more focused packages. • Expand coverage of workflow and process, alongside the presentation of all the important moving parts that make up an R package. • Cover entirely new topics, such as package websites and GitHub Actions (GHA). All content has been completely revised and updated. Many chapters are new or reorganized and a couple have been removed: • New Chapter 1, “The Whole Game” previews the entire package development process. • New Chapter 2, “System Setup” has been carved out of the previous Introduction and gained more detail.
xiii
• The chapter formerly known as “Package Structure” has been expanded and split into two chapters, one covering package structure and state (Chapter 3) and another on workflows and tooling (Chapter 4). • New Chapter 5, “The Package Within” demonstrates how to extract reusable logic out of data analysis scripts and into a package. • The sections “Organizing Your Functions” and “Code Style,” from Chapter 6, “R Code” have been removed, in favor of an online style guide. The style guide is paired with the new styler package,1 which can automatically apply many of the rules. • The coverage of testing has expanded into three chapters: Chapter 13 for testing basics, Chapter 14 for test suite design, and Chapter 15 for various advanced topics. • Material around the NAMESPACE file and dependency relationships has been re-organized into two chapters: Chapter 10 provides technical context for think‐ ing about dependencies, and Chapter 11 gives practice instructions for using different types of dependencies in different settings. • New Chapter 12, “Licensing” expands earlier content on licensing into its own chapter. • The chapter on C/C++ has been removed. It didn’t have quite enough informa‐ tion to be useful, and since the first edition of the book, other materials have arisen that are better learning resources. • The chapter on Git/GitHub has been reframed around the more general topic of software development practices (Chapter 20). This no longer includes step-bystep instructions for basic tasks. The use of Git/GitHub has exploded since the first edition, accompanied by an explosion of learning resources, both general and specific to R (e.g., the website Happy Git and GitHub for the useR). Git/ GitHub still feature prominently throughout the book, most especially in Chapter 20. • The very short inst chapter has been combined into Chapter 8, with all the other directories that can be important in specific contexts, but that aren’t mission critical to all packages.
1 Kirill Müller and Lorenz Walthert, “Styler: Non-Invasive Pretty Printing of R Code,” 2018. http://styler.r-
lib.org.
xiv |
Preface
Introduction In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests and is easy to share with others. As of March 2023, there were over 19,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearinghouse for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem you’re working on, and you can benefit from their work by downloading their package. If you’re reading this book, you already know how to work with packages in the following ways: • You install them from CRAN with install.packages("x"). • You use them in R with library("x") or library(x). • You get help on them with package?x and help(package = "x"). The goal of this book is to teach you how to develop packages so that you can write your own, not just use other people’s. Why write a package? One compelling reason is that you have code that you want to share with others. Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any R user can easily download it, install it, and learn how to use it. But packages are useful even if you never share your code. As Hilary Parker says in her introduction to packages: “Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time.” Organizing code in a package makes your life easier because packages come with conventions. For example, you put R code in R/, you put tests in tests/, and you put data in data/. These conventions are helpful because: • They save time—you don’t need to think about the best way to organize a project, you can just follow a template. • Standardized conventions lead to standardized tools—if you buy into R’s package conventions, you get many tools for free.
Preface
|
xv
It’s even possible to use packages to structure your data analyses (e.g., “Packaging Data Analytical Work Reproducibly Using r (and Friends)” in The American Statisti‐ cian or PeerJ Preprints),2 although we won’t delve deeply into that use case here.
Philosophy This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure. This philosophy is realized primarily through the devtools package, which is the public face for a suite of R functions that automate common development tasks. The release of version 2.0.0 in October 2018 marked its internal restructuring into a set of more focused packages, with devtools becoming more of a metapackage. The usethis package is the subpackage you are most likely to interact with directly; we explain the devtools-usethis relationship in “devtools, usethis, and You” on page 27. As always, the goal of devtools is to make package development as painless as possible. It encapsulates the best practices developed by Hadley Wickham, initially from his years as a prolific solo developer. More recently, he has assembled a team of developers at Posit (formerly known as RStudio), who collectively look after hundreds of open source R packages, including those known as the tidyverse. The reach of this team allows us to explore the space of all possible mistakes at an extraordinary scale. Fortunately, it also affords us the opportunity to reflect on both the successes and failures, in the company of expert and sympathetic colleagues. We try to develop practices that make life more enjoyable for both the maintainer and users of a package. The devtools metapackage is where these lessons are made concrete. devtools works hand-in-hand with RStudio, which we believe is the best development environment for most R users. The most popular alternative to RStudio is currently Visual Studio Code (VS Code) with the R extension enabled. This can be a rewarding and powerful environment; however, it does require a bit more work to set up and customize.3
2 Ben Marwick, Carl Boettiger, and Lincoln Mullen, “Packaging Data Analytical Work Reproduci‐
bly Using r (and Friends),” The American Statistician 72, no. 1 (2018): 80–88, https://doi.org/ 10.1080/00031305.2017.1375986; Ben Marwick, Carl Boettiger, and Lincoln Mullen, “Packaging Data Ana‐ lytical Work Reproducibly Using r (and Friends)”, PeerJ Preprints 6 (2018):e3192v2, https://doi.org/10.7287/ peerj.preprints.3192v2.
3 Users of Emacs Speaks Statistics (ESS) will find that many of the workflows described in this book are also
available there. For those loyal to vim, we recommend the Nvim-R plugin.
xvi
|
Preface
RStudio Throughout the book, we highlight specific ways that RStudio can expedite your package development workflow, in specially format‐ ted sections like this.
Together, devtools and RStudio insulate you from the low-level details of how pack‐ ages are built. As you start to develop more packages, we highly recommend that you learn more about those details. The best resource for the official details of package development is always the official Writing R Extensions manual.4 However, this manual can be hard to understand if you’re not already familiar with the basics of packages. It’s also exhaustive, covering every possible package component, rather than focusing on the most common and useful components, as this book does. Writing R Extensions is a useful resource once you’ve mastered the basics and want to learn what’s going on under the hood.
In This Book The first part of the book is all about giving you the tools you need to start your package development journey, and we highly recommend that you read it in order. We begin in Chapter 1 with a run-through of the complete development of a small package. It’s meant to paint the big picture and suggest a workflow, before we descend into the detailed treatment of the key components of an R package. Then in Chapter 2 you’ll learn how to prepare your system for package development, and in Chapter 3 you’ll learn the basic structure of a package and how that varies across different states. Next, in Chapter 4, we’ll cover the core workflows that come up repeatedly for package developers. The first part of the book ends with another case study (Chapter 5), this time focusing on how you might convert a script to a package and discussing the challenges you’ll face along the way. The remainder of the book is designed to be read as needed. Pick and choose between the chapters as the various topics come up in your development process. First we cover key package components: Chapter 6 discusses where your code lives and how to organize it, Chapter 7 shows you how to include data in your package, and Chapter 8 covers a few less important files and directories that need to be discussed somewhere. Next we’ll dive into the package metadata, starting with DESCRIPTION in Chapter 9. We’ll then go deep into dependencies. In Chapter 10, we’ll cover the costs and bene‐ fits of taking on dependencies and provide some technical background on package namespaces and the search path. In Chapter 11, we focus on practical matters, such as
4 You might also enjoy the “quarto-ized” version at https://rstudio.github.io/r-manuals/r-exts/.
Preface
|
xvii
how to use different types of dependencies in different parts of your package. This is also where we discuss exporting functions, which is what makes it possible for other packages and projects to depend on your package. We’ll finish off this part with a look at licensing in Chapter 12. To ensure your package works as designed (and continues to work as you make changes), it’s essential to test your code, so the next three chapters cover the art and science of testing. Chapter 13 gets you started with the basics of testing with the testthat package. Chapter 14 teaches you how to design and organize tests in the most effective way. Then we finish off our coverage of testing in Chapter 15, which teaches you advanced skills to tackle challenging situations. If you want other people (including future-you!) to understand how to use the functions in your package, you’ll need to document them. Chapter 16 gets you started using roxygen2 to document the functions in your package. Function documenta‐ tion is helpful only if you know what function to look up, so next in Chapter 17 we’ll discuss vignettes, which help you document the package as a whole. We’ll finish up documentation with a discussion of other important markdown files like README.md and NEWS.md in Chapter 18, and creating a package website with pkgdown in Chapter 19. The book concludes by zooming back out to consider development practices, such as the benefit of using version control and continuous integration (Chapter 20). We wrap things up by discussing the lifecycle (Chapter 21) of a package, including releasing it on CRAN (Chapter 22). This is a lot to learn, but don’t feel overwhelmed. Start with a minimal subset of useful features (e.g., just an R/ directory!) and build up over time. To paraphrase the Zen monk Shunryu Suzuki: “Each package is perfect the way it is—and it can use a little improvement.”
What’s Not Here There are also specific practices that have little to no treatment here simply because we do not use them enough to have any special insight. Does this mean that we actively discourage those practices? Probably not, as we try to be explicit about practices we think you should avoid. So if something is not covered here, it just means that a couple hundred heavily used R packages are built without meaningful reliance on that technique. That observation should motivate you to evaluate how likely it is that your development requirements truly don’t overlap with ours. But sometimes the answer is a clear “yes,” in which case you’ll simply need to consult another resource.
xviii
| Preface
Conventions Used in This Book Throughout this book, we write fun() to refer to functions, var to refer to variables and function arguments, and path/ for paths. Larger code blocks intermingle input and output. Output is commented so that if you have an electronic version of the book, e.g., https://r-pkgs.org, you can easily copy and paste examples into R. Output comments look like #> to distinguish them from regular comments. The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold
Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
Preface
|
xix
Colophon This book was authored using Quarto inside RStudio. The website is hosted with Net‐ lify and automatically updated after every commit by GitHub Actions. The complete source is available from GitHub. This version of the book was built with: library(devtools) #> Loading required package: usethis library(roxygen2) library(testthat) #> #> Attaching package: 'testthat' #> The following object is masked from 'package:devtools': #> #> test_file devtools::session_info() #> ─ Session info ──────────────────────────────────────────────────── #> setting value #> version R version 4.2.2 (2022-10-31) #> os macOS Big Sur ... 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Vancouver #> date 2023-06-06 #> pandoc 2.19.2 @ /Applications/RStudio.app/.../bin/tools/ (via rmarkdown) #> #> ─ Packages ──────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> brio 1.1.3 2021-11-30 [1] CRAN (R 4.2.0) #> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.2.0) #> callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.0) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.0) #> devtools * 2.4.5 2022-10-11 [1] CRAN (R 4.2.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0) #> fs 1.6.2 2023-04-25 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.2) #> htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.2.0) #> httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.2.0) #> knitr 1.43 2023-05-25 [1] CRAN (R 4.2.0) #> later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
xx
|
Preface
#> mime 0.12 2021-09-28 [1] CRAN (R 4.2.0) #> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0) #> pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.0) #> pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0) #> processx 3.8.1 2023-04-18 [1] CRAN (R 4.2.0) #> profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.0) #> promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0) #> ps 1.7.5 2023-04-18 [1] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.0) #> remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.0) #> rmarkdown 2.22 2023-06-01 [1] CRAN (R 4.2.0) #> roxygen2 * 7.2.3 2022-12-08 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.0) #> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0) #> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.0) #> styler 1.10.1 2023-06-05 [1] CRAN (R 4.2.2) #> testthat * 3.1.8 2023-05-04 [1] CRAN (R 4.2.0) #> urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.0) #> usethis * 2.2.0 2023-06-06 [1] CRAN (R 4.2.2) #> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.2.0) #> xml2 1.3.4 2023-04-27 [1] CRAN (R 4.2.0) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) #> #> [1] /Users/jenny/Library/R/x86_64/4.2/library #> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library #> #> ───────────────────────────────────────────────────────────────────
Preface
|
xxi
O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com.
How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-889-8969 (in the United States or Canada) 707-829-7019 (international or local) 707-829-0104 (fax) [email protected] We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/oreillyr-packages-2e. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly-media Follow us on Twitter: https://twitter.com/oreillymedia Watch us on YouTube: https://www.youtube.com/oreilly
Acknowledgments Since the first edition of R Packages was published, the packages supporting the workflows described here have undergone extensive development. The original trio of devtools, roxygen2, and testthat has expanded to include the packages created by the “conscious uncoupling” of devtools, as described in “devtools, usethis, and You” on page 27. Most of these packages originate with Hadley Wickham (HW), because of their devtools roots. There are many other significant contributors, many of whom now serve as maintainers: xxii
|
Preface
• devtools: HW, Winston Chang, Jim Hester (maintainer, >= v1.13.5), Jennifer Bryan (maintainer >= v2.4.3) • usethis: HW, Jennifer Bryan (maintainer >= v1.5.0), Malcolm Barrett • roxygen2: HW (maintainer), Peter Danenburg, Manuel Eugster • testthat: HW (maintainer) • desc: Gábor Csárdi (maintainer), Kirill Müller, Jim Hester • pkgbuild: HW, Jim Hester, Gábor Csárdi (maintainer >= v1.2.1) • pkgload: HW, Jim Hester, Winston Chang, Lionel Henry (maintainer >= v1.2.4) • rcmdcheck: Gábor Csárdi (maintainer) • remotes: HW, Jim Hester, Gábor Csárdi (maintainer), Winston Chang, Martin Morgan, Dan Tenenbaum • revdepcheck: HW, Gábor Csárdi (maintainer) • sessioninfo: HW, Gábor Csárdi (maintainer), Winston Chang, Robert Flight, Kirill Müller, Jim Hester This book was written and revised in the open and it is truly a community effort: many people read drafts, fix typos, suggest improvements, and contribute content. Without those contributors, the book wouldn’t be nearly as good as it is, and we are deeply grateful for their help. We are indebted to our colleagues at Posit, especially the tidyverse team, for being perpetually game to discuss package development prac‐ tices. The book has been greatly improved by the suggestions from our fantastic team of technical reviewers: Malcolm Barrett, Laura DeCicco, Zhian Kamvar, Tom Mock, and Maëlle Salmon. Thanks go to all contributors who submitted improvements via GitHub (in alpha‐ betical order): @aaelony, @aaronwolen (Aaron Wolen), @ablejec (Andrej Blejec), @adamcduncan (Adam Duncan), @adessy, @adrtod (Adrien Todeschini), @aghaynes (Alan Haynes), @agrueneberg (Alexander Grueneberg), @alejandrohagan (Alejandro Hagan), @alesantuz (Ale Santuz), @alexandrehsd (Alexandre Henrique), @alexhol‐ combe (Alex O. Holcombe), @alexpghayes (alex hayes), @alforj (Justin Alford), @almartin82 (Andrew Martin), @aluxh (Alex Ho), @AmelZulji, @andreaphsz (Andrea Cantieni), @andrewdolman (Andrew Dolman), @andrewpbray (Andrew Bray), @AndrewsOR (John Andrews), @andycraig (Andrew Craig), @angela-li (Angela Li), @anjalisilva (Anjali Silva), @apomatix (Brad Friedman), @apreshill (Alison Presmanes Hill), @arashHaratian (Arash), @arilamstein (Ari Lamstein), @arneschillert (Arne Schillert), @arni-magnusson (Arni Magnusson), @asadow (Adam Sadowski), @ateucher (Andy Teucher), @avisser (Andy Visser), @ayormark (Adam Yormark), @azzaea (Azza Ahmed), @batpigandme (Mara Averick), @bclipp (Brian L), @beevabeeva, @behrman (Bill Behrman), @benmarwick (Ben Marwick), @BernhardKonrad (Bernhard Konrad), @bgreenwell (Brandon Greenwell), @Bisa‐ Preface
|
xxiii
loo (Hugo Gruson), @bklamer (Brett Klamer), @bm5tev3, @bms63 (Ben Straub), @bpbond (Ben Bond-Lamberty), @bquast (Bastiaan Quast), @Br-Johnson (Brett Johnson), @brews (Brewster Malevich), @brianrice2 (Brian Rice), @brry (Berry Boessenkool), @btruel, @calligross (Calli), @carldotac (Carl Lieberman), @carlosci‐ nelli (Carlos Cinelli), @CDCookJr, @cderv (Christophe Dervieux), @chambm (Matt Chambers), @charliejhadley (Charlie Joey Hadley), @chezou (Aki Ariga), @chsa‐ fouane (Safouane Chergui), @clente (Caio Lente), @cmarmstrong, @cooknl (CAPN), @CorradoLanera (Corrado Lanera), @craigcitro (Craig Citro), @crtahlin (Crt Ahlin), @daattali (Dean Attali), @danhalligan (Dan Halligan), @daroczig (Gergely Daróczi), @datarttu (Arttu Kosonen), @davidkane9 (David Kane), @DavisVaughan (Davis Vaughan), @deanbodenham, @dfalbel (Daniel Falbel), @dgrtwo (David Robinson), @dholstius (David Holstius), @DickStartz, @dkgaraujo (Douglas K. G. Araujo), @dlukes (David Lukes), @DOH-PXC5303 (Philip Crain), @dongzhuoer (Zhuoer Dong), @DougManuel (Doug Manuel), @dpprdan (Daniel Possenriede), @dracodoc (dracodoc), @drag05 (Dragos Bandur), @drvinceknight (Vince Knight), @dryzliang, @dyavorsky (Dan Yavorsky), @e-pet, @earino (E. Ariño de la Rubia), @echelle‐ burns, @eeholmes (Eli Holmes), @eipi10 (Joel Schwartz), @ekbrown (Earl Brown), @EllaKaye (Ella Kaye), @EmilHvitfeldt (Emil Hvitfeldt), @eogoodwin, @erictleung (Eric Leung), @erikerhardt (Erik Erhardt), @espinielli (Enrico Spinielli), @ewan (Ewan Dunbar), @fbertran (Frederic Bertrand), @federicomarini (Federico Marini), @fenguoerbian (Chao Cheng), @fkohrt (Florian Kohrt), @florisvdh (Floris Vander‐ haeghe), @floswald (Florian Oswald), @franrodalg (Francisco Rodríguez-Algarra), @franticspider (Simon Hickinbotham), @frycast (Daniel Vidali Fryer), @fsavje (Fre‐ drik Sävje), @gajusmiknaitis, @gcpoole (Geoffrey Poole), @geanders (Brooke Ander‐ son), @georoen (Jee Roen), @GerardTromp (Gerard Tromp), @GillesSanMartin (Gilles San Martin), @gmaubach (Georg Maubach), @gonzalezgouveia (Rafael Gon‐ zalez Gouveia), @gregmacfarlane (Greg Macfarlane), @gregrs-uk (Greg), @grst (Gre‐ gor Sturm), @gsrohde (Scott Rohde), @guru809, @gustavdelius (Gustav W Delius), @haibin (Liu Haibin), @hanneoberman (Hanne Oberman), @harrismcgehee (Harris McGehee), @havenl (Haven Liu), @hcyvan (程一航), @hdraisma (Harmen), @hed‐ )ა, @helske (Jouni Helske), @henningte derik (Hedderik van Rijn), @heists (( (Henning Teickner), @HenrikBengtsson (Henrik Bengtsson), @heogden (Helen Ogden), @hfrick (Hannah Frick), @Holzhauer (Sascha Holzhauer), @howardbaek (Howard Baek), @howbuildingsfail (How Buildings Fail), @hq9000 (Sergey Grechin), @hrbrmstr (boB Rudis), @iangow (Ian Gow), @iargent, @idmn (Iaroslav Domin), @ijlyttle (Ian Lyttle), @imchoyoung (Choyoung Im), @InfiniteCuriosity (Russ Conte), @ionut-stefanb (Ionut Stefan-Birdea), @Ironholds (Os Keyes), @ismayc (Chester Ismay), @isomorphisms (i), @jackwasey (Jack Wasey), @jacobbien (Jacob Bien), @jadeynryan (Jadey Ryan), @jameelalsalam (Jameel Alsalam), @jameslaird‐ smith (James Laird-Smith), @janzzon (Stefan Jansson), @JayCeBB, @jcainey (Joe Cai‐ ney), @jdblischak (John Blischak), @jedwards24 (James Edwards), @jemus42 (Lukas Burk), @jenniferthompson (Jennifer Thompson), @jeremycg (Jeremy Gray), @jgar‐
xxiv
|
Preface
thur (Joey Arthur), @jimhester (Jim Hester), @jimr1603 (James Riley), @jjesusfilho (José de Jesus Filho), @jkeirstead (James Keirstead), @jmarca (James Marca), @jmar‐ shallnz (Jonathan Marshall), @joethorley (Joe Thorley), @johnbaums (John), @jolars (Johan Larsson), @jonthegeek (Jon Harmon), @jowalski (John Kowalski), @jpinelo (Joao Pinelo Silva), @jrdnbradford (Jordan), @jthomasmock (Tom Mock), @julianurbano (Julián Urbano), @jwpestrak, @jzadra (Jonathan Zadra), @jzhaoo (Joanna Zhao), @kaetschap (Sonja), @karthik (Karthik Ram), @KasperThystrup (Kas‐ per Thystrup Karstensen), @KatherineCox, @katrinleinweber (Katrin Leinweber), @kbroman (Karl Broman), @kekecib (Ibrahim Kekec), @KellenBrosnahan, @ken‐ donB (Kendon Bell), @kevinushey (Kevin Ushey), @kikapp (Kristopher Kapphahn), @KirkDSL, @KJByron (Karen J. Byron), @klmr (Konrad Rudolph), @KoderKow (Kyle Harris), @kokbent (Ben Toh), @kongdd (Dongdong Kong), @krlmlr (Kirill Müller), @kwenzig (Knut Wenzig), @kwstat (Kevin Wright), @kylelundstedt (Kyle G. Lundstedt), @lancelote (Pavel Karateev), @lbergelson (Louis Bergelson), @LechMa‐ deyski (Lech Madeyski), @Lenostatos (Leon), @lindbrook, @lionel- (Lionel Henry), @LluisRamon (Lluís Ramon), @lorenzwalthert (Lorenz Walthert), @lwjohnst86 (Luke W Johnston), @maelle (Maëlle Salmon), @maiermarco, @maislind (David M), @majr-red (Matthew Roberts), @malcolmbarrett (Malcolm Barrett), @malexan (Alexander Matrunich), @manuelreif (Manuel Reif), @MarceloRTonon (Marcelo Tonon), @mariacuellar (Maria Cuellar), @markdly (Mark Dulhunty), @Marlin-Na (Marlin), @martin-mfg, @matanhakim (Matan Hakim), @matdoering, @matinang (Matina Angelopoulou), @mattflor (Matthias Flor), @maurolepore (Mauro Lepore), @maxheld83 (Max Held), @mayankvanani (Mayank Vanani), @mbjones (Matt Jones), @mccarthy-m-g (Michael McCarthy), @mdequeljoe (Matthew de Queljoe), @mdsumner (Michael Sumner), @michaelboerman (Michael Boerman), @Michael‐ Chirico (Michael Chirico), @michaelmikebuckley (Michael Buckley), @michaelwey‐ landt (Michael Weylandt), @miguelmorin, @MikeJohnPage, @mikelnrd (Michael Leonard), @mikelove (Mike Love), @mikemc (Michael McLaren), @MilesMcBain (Miles McBain), @mjkanji (Muhammad Jarir Kanji), @mkuehn10 (Michael Kuehn), @mllg (Michel Lang), @mohamed-180 (Mohamed El-Desokey), @moodymudskip‐ per (Antoine Fabri), @Moohan (James McMahon), @MrAE (Jesse Leigh Patsolic), @mrcaseb, @ms609 (Martin R. Smith), @mskyttner (Markus Skyttner), @MWilson92 (Matthew Wilson), @myoung3, @nachti (Gerhard Nachtmann), @nanxstats (Nan Xiao), @nareal (Nelson Areal), @nattalides, @ncarchedi (Nick Carchedi), @ndphil‐ lips (Nathaniel Phillips), @nick-youngblut (Nick Youngblut), @njtierney (Nicholas Tierney), @nsheff (Nathan Sheffield), @osorensen (Øystein Sørensen), @PabRod (Pablo Rodríguez-Sánchez), @paternogbc (Gustavo Brant Paterno), @paulrougieux (Paul Rougieux), @pdwaggoner (Philip Waggoner), @pearsonca (Carl A. B. Pearson), @perryjer1 (Jeremiah), @petermeissner (Peter Meissner), @petersonR (Ryan Peter‐ son), @petzi53 (Peter Baumgartner), @PhilipPallmann (Philip Pallmann), @philliplab (Phillip Labuschagne), @phonixor (Gerrit-Jan Schutten), @pkimes (Patrick Kimes), @pnovoa (Pavel Novoa), @ppanko (Pavel Panko), @pritesh-shrivastava (Pritesh
Preface
|
xxv
Shrivastava), @PrzeChoj (PrzeChoj), @PursuitOfDataScience (Y. Yu), @pwaeckerle, @raerickson (Richard Erickson), @ramiromagno (Ramiro Magno), @ras44, @rbir‐ kelbach (Robert Birkelbach), @rcorty (Robert W. Corty), @rdiaz02 (Ramon DiazUriarte), @realAkhmed (Akhmed Umyarov), @reikookamoto (Reiko Okamoto), @renkun-ken (Kun Ren), @retowyss (Reto Wyss), @revodavid (David Smith), @rgknight (Ryan Knight), @rhgof (Richard), @rmar073, @rmflight (Robert M Flight), @rmsharp (R. Mark Sharp), @rnuske (Robert Nuske), @robertzk (Rob‐ ert Krzyzanowski), @Robinlovelace (Robin Lovelace), @robiRagan (Robi Ragan), @Robsteranium (Robin Gower), @romanzenka (Roman Zenka), @royfrancis (Roy Francis), @rpruim (Randall Pruim), @rrunner, @rsangole (Rahul), @ryanatanner (Ryan), @salim-b (Salim B), @SamEdwardes (Sam Edwardes), @SangdonLim (Sang‐ don Lim), @sathishsrinivasank (Sathish), @sbgraves237, @schifferl (Lucas Schiffer), @scw (Shaun Walbridge), @sdarodrigues (Sabrina Rodrigues), @sebffischer (Sebas‐ tian Fischer), @serghiou (Stylianos Serghiou), @setoyama60jp, @sfirke (Sam Firke), @shannonpileggi (Shannon Pileggi), @Shelmith-Kariuki (Shel), @SheridanLGrant (Sheridan Grant), @shntnu (Shantanu Singh), @sibusiso16 (S’busiso Mkhondwane), @simdadim (Simen Buodd), @SimonPBiggs (SPB), @simonthelwall (Simon Thel‐ wall), @SimonYansenZhao (Simon He Zhao), @singmann (Henrik Singmann), @Skenvy (Nathan Levett), @Smudgerville (Richard M. Smith), @sn248 (Satyapra‐ kash Nayak), @sowla (Praer (Suthira) Owlarn), @srushe (Stephen Rushe), @statn‐ map (Sébastien Rochette), @steenharsted (Steen Harsted), @stefaneng (Stefan Eng), @stefanherzog (Stefan Herzog), @stephen-frank (Stephen Frank), @stephenll (Ste‐ phen Lienhard), @stephenturner (Stephen Turner), @stevenprimeaux (Steven Pri‐ meaux), @stevensbr, @stewid (Stefan Widgren), @sunbeomk (Sunbeom Kwon), @superdesolator (Po Su), @syclik (Daniel Lee), @symbolrush (Adrian StämpfliSchmid), @taekyunk (Taekyun Kim), @talgalili (Tal Galili), @tanho63 (Tan Ho), @tbrugz (Telmo Brugnara), @thisisnic (Nic Crane), @TimHesterberg (Tim Hes‐ terberg), @titaniumtroop (Nathan), @tjebo, @tklebel (Thomas Klebel), @tmstauss (Tanner Stauss), @tonybreyal (Tony Breyal), @tonyfischetti (Tony Fischetti), @Tony‐ Ladson (Tony Ladson), @trickytank (Rick Tankard), @TroyVan, @uribo (Shinya Uryu), @urmils, @valeonte, @vgonzenbach (Virgilio Gonzenbach), @vladpetyuk (Vlad Petyuk), @vnijs (Vincent Nijs), @vspinu (Vitalie Spinu), @wcarlsen (Willi Carlsen), @wch (Winston Chang), @wenjie2wang (Wenjie Wang), @werkstattcodes, @wiaidp, @wibeasley (Will Beasley), @wilkinson (Sean Wilkinson), @williamlief (Lief Esbenshade), @winterschlaefer (Christof Winter), @wlamnz (William Lam), @wrathematics (Drew Schmidt), @XiangyunHuang (Xiangyun Huang), @xiaochi-liu (Xiaochi), @XiaoqiLu (Xiaoqi Lu), @xiaosongz (Xiaosong Zhang), @yihui (Yihui Xie), @ynsec37, @yonicd, @ysdgroot, @yui-knk (Yuichiro Kaneko), @Zedseayou (Calum You), @zeehio (Sergio Oller), @zekiakyol (Zeki Akyol), @zenggyu (Guangyu Zeng), @zhaoy, @zhilongjia (Zhilong), @zhixunwang, @zkamvar (Zhian N. Kamvar), @zouter (Wouter Saelens)
xxvi
|
Preface
PART I
Getting Started
CHAPTER 1
The Whole Game
Spoiler alert! This chapter runs through the development of a small toy package. It’s meant to paint the Big Picture and suggest a workflow, before we descend into the detailed treatment of the key components of an R package. To keep the pace brisk, we exploit the modern conveniences in the devtools package and the RStudio IDE. In later chapters, we are more explicit about what those helpers are doing for us. This chapter is self-contained, in that completing the exercise is not a strict require‐ ment to continue with the rest of the book; however, we strongly suggest you follow along and create this toy package with us.
Load devtools and Friends You can initiate your new package from any active R session. You don’t need to worry about whether you’re in an existing or new project. The functions we use ensure that we create a new clean project for the package. Load the devtools package, which is the public face of a set of packages that support various aspects of package development. The most obvious of these is the usethis package, which you’ll see is also being loaded: library(devtools) #> Loading required package: usethis
1
Do you have an old version of devtools? Compare your version against ours and upgrade if necessary: packageVersion("devtools") #> [1] '2.4.5'
Toy Package: regexcite To help walk you through the process, we use various functions from devtools to build a small toy package from scratch, with features commonly seen in released packages: • Functions to address a specific need, in this case helpers for work with regular expressions • Version control and an open development process — This is completely optional in your work but highly recommended. You’ll see how Git and GitHub help us expose all the intermediate stages of our toy package. • Access to established workflows for installation, getting help, and checking quality — Documentation for individual functions via roxygen2. — Unit testing with testthat. — Documentation for the package as a whole via an executable README.Rmd. We call the package regexcite, and it contains a couple of functions that make com‐ mon tasks with regular expressions easier. Please note that these functions are very simple and we’re using them here only as a means to guide you through the package development process. If you’re looking for actual helpers for work with regular expressions, there are several proper R packages that address this problem space: • stringr (which uses stringi) • stringi • rex • rematch2 Again, the regexcite package itself is just a device for demonstrating a typical work‐ flow for package development with devtools.
2
|
Chapter 1: The Whole Game
Preview the Finished Product The regexcite package is tracked during its development with the Git version control system. This is purely optional, and you can certainly follow along without imple‐ menting this. A nice side benefit is that we eventually connect it to a remote reposi‐ tory on GitHub, which means you can see the glorious result we are working toward by visiting regexcite on GitHub. By inspecting the commit history and especially the diffs, you can see exactly what changes at each step of the process laid out below.
create_package() Call create_package() to initialize a new package in a directory on your computer. create_package() will automatically create that directory if it doesn’t exist yet (and that is usually the case). See “Create a Package” on page 47 for more on creating packages. Make a deliberate choice about where to create this package on your computer. It should probably be somewhere within your home directory, alongside your other R projects. It should not be nested inside another RStudio Project, R package, or Git repo. Nor should it be in an R package library, which holds packages that have already been built and installed. The conversion of the source package we create here into an installed package is part of what devtools facilitates. Don’t try to do devtools’ job for it! Once you’ve selected where to create this package, substitute your chosen path into a create_package() call like this: create_package("~/path/to/regexcite")
For the creation of this book we have to work in a temporary directory, because the book is built noninteractively in the cloud. Behind the scenes, we’re executing our own create_package() command, but don’t be surprised if our output differs a bit from yours: #> #> #> #> #> #> #> #> #> #> #> #> #> #>
✔ Creating '/tmp/Rtmpk6VXyE/regexcite/' ✔ Setting active project to '/private/tmp/Rtmpk6VXyE/regexcite' ✔ Creating 'R/' ✔ Writing 'DESCRIPTION' Package: regexcite Title: What the Package Does (One Line, Title Case) Version: 0.0.0.9000 Authors@R (parsed): * First Last [aut, cre] (YOUR-ORCID-ID) Description: What the package does (one paragraph). License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a license Encoding: UTF-8 Roxygen: list(markdown = TRUE)
create_package()
|
3
#> #> #> #> #> #> #>
RoxygenNote: 7.2.3 ✔ Writing 'NAMESPACE' ✔ Writing 'regexcite.Rproj' ✔ Adding '^regexcite\\.Rproj$' to '.Rbuildignore' ✔ Adding '.Rproj.user' to '.gitignore' ✔ Adding '^\\.Rproj\\.user$' to '.Rbuildignore' ✔ Setting active project to ''
If you’re working in RStudio, you should find yourself in a new instance of RStudio, opened into your new regexcite package (and RStudio Project). If you somehow need to do this manually, navigate to the directory and double-click regexcite.Rproj. RStudio has special handling for packages, and you should now see a Build tab in the same pane as Environment and History. You probably need to call library(devtools) again, because create_package() has probably dropped you into a fresh R session, in your new package: library(devtools)
What’s in this new directory that is also an R package and, probably, an RStudio Project? Here’s a listing (locally, you can consult your Files pane): Path .Rbuildignore
Type File
.gitignore
File
DESCRIPTION
File
NAMESPACE
File
R
Directory
regexcite.Rproj
File
RStudio In the Files pane, go to More (gear symbol) > Show Hidden Files to toggle the visibility of hidden files (a.k.a. “dotfiles”). A select few are visible all the time, but sometimes you want to see them all.
• .Rbuildignore lists files that we need to have around but that should not be included when building the R package from source. If you aren’t using RStudio, create_package() may not create this file (nor .gitignore) at first, since there’s no RStudio-related machinery that needs to be ignored. However, you will likely develop the need for .Rbuildignore at some point, regardless of what editor you are using. This is discussed in more detail in “.Rbuildignore” on page 37. • .Rproj.user, if you have it, is a directory used internally by RStudio.
4
|
Chapter 1: The Whole Game
• .gitignore anticipates Git usage and tells Git to ignore some standard, behind-thescenes files created by R and RStudio. Even if you do not plan to use Git, this is harmless. • DESCRIPTION provides metadata about your package. We edit this shortly, and Chapter 9 covers the general topic of the DESCRIPTION file. • NAMESPACE declares the functions your package exports for external use and the external functions your package imports from other packages. At this point it is empty, except for a comment declaring that this is a file you should not edit by hand. • The R/ directory is the “business end” of your package. It will soon contain .R files with function definitions. • regexcite.Rproj is the file that makes this directory an RStudio Project. Even if you don’t use RStudio, this file is harmless. Or you can suppress its creation with create_package(..., rstudio = FALSE). More in “RStudio Projects” on page 52.
use_git() The regexcite directory is an R source package and an RStudio Project. Now we make it also a Git repository, with use_git(). (By the way, use_git() works in any project, regardless of whether it’s an R package.) use_git() #> ✔ Initialising Git repo #> ✔ Adding '.Rhistory', '.Rdata', '.httr-oauth', '.DS_Store' to '.gitignore'
In an interactive session, you will be asked if you want to commit some files here, and you should accept the offer. Behind the scenes, we’ll also commit those same files. So what has changed in the package? Only the creation of a .git directory, which is hidden in most contexts, including the RStudio file browser. Its existence is evidence that we have indeed initialized a Git repo here: Path .git
Type Directory
If you’re using RStudio, it probably requested permission to relaunch itself in this Project, which you should do. You can do so manually by quitting, then relaunching RStudio by double-clicking regexcite.Rproj. Now, in addition to package development support, you have access to a basic Git client in the Git tab of the Environment/His‐ tory/Build pane.
use_git()
|
5
Click History (the clock icon in the Git pane) and, if you consented, you will see an initial commit made via use_git().
RStudio RStudio can initialize a Git repository in any Project, even if it’s not an R package, as long you’ve set up RStudio + Git integration. Go to Tools > Version Control > Project Setup. Then choose “Version control system: Git” and “initialize a new git repository for this project.”
Write the First Function A fairly common task when dealing with strings is the need to split a single string into many parts. The strsplit() function in base R does exactly this: (x [1] "alfa,bravo,charlie,delta" strsplit(x, split = ",") #> [[1]] #> [1] "alfa" "bravo" "charlie" "delta"
Take a close look at the return value: str(strsplit(x, split = ",")) #> List of 1 #> $ : chr [1:4] "alfa" "bravo" "charlie" "delta"
The shape of this return value often surprises people or, at least, inconveniences them. The input is a character vector of length one and the output is a list of length one. This makes total sense in light of R’s fundamental tendency toward vectorization. But sometimes it’s still a bit of a bummer. Often you know that your input is morally a scalar, i.e., it’s just a single string, and really want the output to be the character vector of its parts. This leads R users to employ various methods of “unlist”-ing the result: unlist(strsplit(x, split = ",")) #> [1] "alfa" "bravo" "charlie" "delta" strsplit(x, split = ",")[[1]] #> [1] "alfa" "bravo" "charlie" "delta"
The second, safer solution is the basis for the inaugural function of regexcite, strsplit1(): strsplit1 • Edit 'R/strsplit1.R'
Put the definition of strsplit1() and only the definition of strsplit1() in R/ strsplit1.R and save it. The file R/strsplit1.R should not contain any of the other top-level code we have recently executed, such as the definition of our practice input x, library(devtools), or use_git(). This foreshadows an adjustment you’ll need to make as you transition from writing R scripts to R packages. Packages and scripts use different mechanisms to declare their dependency on other packages and to store example or test code. We explore this further in Chapter 6.
load_all() How do we test drive strsplit1()? If this were a regular R script, we might use RStudio to send the function definition to the R console and define strsplit1() in the global environment. Or maybe we’d call source("R/strsplit1.R"). For package development, however, devtools offers a more robust approach.
load_all()
|
7
Call load_all() to make strsplit1() available for experimentation: load_all() #> ℹ Loading regexcite
Now call strsplit1(x) to see how it works: (x [1] "alfa,bravo,charlie,delta" strsplit1(x, split = ",") #> [1] "alfa" "bravo" "charlie" "delta"
Note that load_all() has made the strsplit1() function available, although it does not exist in the global environment: exists("strsplit1", where = globalenv(), inherits = FALSE) #> [1] FALSE
If you see TRUE instead of FALSE, that indicates you’re still using a script-oriented workflow and sourcing your functions. Here’s how to get back on track: 1. Clean out the global environment and restart R. 2. Reattach devtools with library(devtools) and reload regexcite with load_all(). 3. Redefine the test input x and call strsplit1(x, split = ",") again. This should work! 4. Run exists("strsplit1", where = globalenv(), inherits = FALSE) again and you should see FALSE. load_all() simulates the process of building, installing, and attaching the regexcite package. As your package accumulates more functions—some exported, some not, some of which call each other, some of which call functions from packages you depend on—load_all() gives you a much more accurate sense of how the pack‐ age is developing than test driving functions defined in the global environment. load_all() also allows much faster iteration than actually building, installing, and attaching the package. See “Test Drive with load_all()” on page 58 for more about load_all().
To review what we’ve done so far: • We wrote our first function, strsplit1(), to split a string into a character vector (not a list containing a character vector). • We used load_all() to quickly make this function available for interactive use, as if we’d built and installed regexcite and attached it via library(regexcite).
8
|
Chapter 1: The Whole Game
RStudio RStudio exposes load_all() in the Build menu, in the Build pane via More > Load All, and in keyboard shortcuts Ctrl+Shift+L (Win‐ dows & Linux) or Cmd-Shift-L (macOS).
Commit strsplit1() If you’re using Git, use your preferred method to commit the new R/strsplit1.R file. We do so behind the scenes here, and here’s the associated diff: diff --git a/R/strsplit1.R b/R/strsplit1.R new file mode 100644 index 0000000..29efb88 --- /dev/null +++ b/R/strsplit1.R @@ -0,0 +1,3 @@ +strsplit1 ✔ Setting License field in DESCRIPTION to 'MIT + file LICENSE' #> ✔ Writing 'LICENSE' #> ✔ Writing 'LICENSE.md' #> ✔ Adding '^LICENSE\\.md$' to '.Rbuildignore'
This configures the License field correctly for the MIT license, which promises to name the copyright holders and year in a LICENSE file. Open the newly created LICENSE file and confirm it looks something like this: YEAR: 2023 COPYRIGHT HOLDER: regexcite authors
use_mit_license()
|
11
Like other license helpers, use_mit_license() also puts a copy of the full license in LICENSE.md and adds this file to .Rbuildignore. It’s considered a best practice to include a full license in your package’s source, such as on GitHub, but CRAN disallows the inclusion of this file in a package tarball. You can learn more about licensing in Chapter 12.
document() Wouldn’t it be nice to get help on strsplit1(), just like we do with other R functions? This requires that your package have a special R documentation file, man/strsplit1.Rd, written in an R-specific markup language that is sort of like LaTeX. Luckily we don’t necessarily have to author that directly. We write a specially formatted comment right above strsplit1(), in its source file, and then let a package called roxygen2 handle the creation of man/strsplit1.Rd. The motivation and mechanics of roxygen2 are covered in Chapter 16. If you use RStudio, open R/strsplit1.R in the source editor and put the cursor somewhere in the strsplit1() function definition. Now do Code > Insert roxygen skeleton. A very special comment should appear above your function, in which each line begins with #'. RStudio inserts only a barebones template, so you will need to edit it to look something like the comment below. If you don’t use RStudio, create the comment yourself. Regardless, you should modify it to look something like this: #' Split a string #' #' @param x A character vector with one element. #' @param split What to split on. #' #' @return A character vector. #' @export #' #' @examples #' x ℹ Loading regexcite #> Warning: Objects listed as exports, but not present in namespace: #> • strsplit1 #> Writing 'NAMESPACE' #> Writing 'str_split_one.Rd' #> Deleting 'strsplit1.Rd'
Try out the new str_split_one() function by simulating package installation via load_all(): load_all() #> ℹ Loading regexcite str_split_one("a, b, c", pattern = ", ") #> [1] "a" "b" "c"
use_github() You’ve seen us making commits during the development process for regexcite. You can see an indicative history at https://github.com/jennybc/regexcite. Our use of ver‐ sion control and the decision to expose the development process means you can inspect the state of the regexcite source at each developmental stage. By looking at so-called diffs, you can see exactly how each devtools helper function modifies the source files that constitute the regexcite package.
use_github()
|
19
How would you connect your local regexcite package and Git repository to a com‐ panion repository on GitHub? Here are three approaches: • use_github() is a helper that we recommend for the long term. We won’t demonstrate it here because it requires some credential setup on your end. We also don’t want to tear down and rebuild the public regexcite package every time we build this book. • Set up the GitHub repo first! It sounds counterintuitive, but the easiest way to get your work onto GitHub is to initiate there, then use RStudio to start working in a synced local copy. This approach is described in Happy Git’s workflows New project, GitHub first and Existing project, GitHub first. • Command-line Git can always be used to add a remote repository post hoc. This is described in the Happy Git workflow Existing project, GitHub last. Any of these approaches will connect your local regexcite project to a GitHub repo, public or private, which you can push to or pull from using the Git client built into RStudio. In Chapter 20, we elaborate on why version control (e.g., Git) and, specifically, hosted version control (e.g., GitHub) is worth incorporating into your package development process.
use_readme_rmd() Now that your package is on GitHub, the README.md file matters. It is the package’s home page and welcome mat, at least until you decide to give it a website (see Chapter 19), add a vignette (see Chapter 17), or submit it to CRAN (see Chapter 22). The use_readme_rmd() function initializes a basic, executable README.Rmd ready for you to edit: use_readme_rmd() #> ✔ Writing 'README.Rmd' #> ✔ Adding '^README\\.Rmd$' to '.Rbuildignore' #> • Update 'README.Rmd' to include installation instructions. #> ✔ Writing '.git/hooks/pre-commit'
20
|
Chapter 1: The Whole Game
In addition to creating README.Rmd, this adds some lines to .Rbuildignore and creates a Git precommit hook to help you keep README.Rmd and README.md in sync. README.Rmd already has sections that prompt you to: • Describe the purpose of the package. • Provide installation instructions. If a GitHub remote is detected when use_readme_rmd() is called, this section is prefilled with instructions on how to install from GitHub. • Show a bit of usage. How to populate this skeleton? Copy stuff liberally from DESCRIPTION and any formal and informal tests or examples you have. Anything is better than nothing. This is helpful because people probably won’t install your package and comb through individual help files to figure out how to use it. We like to write the README in R Markdown, so it can feature actual usage. The inclusion of live code also makes it less likely that your README grows stale and out-of-sync with your actual package. To make your own edits, if RStudio has not already done so, open README.Rmd for editing. Make sure it shows some usage of str_split_one(). Here’s what the README.Rmd file contains: --output: github_document --
```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) ```
# somepackage
The goal of somepackage is to ... ## Installation You can install the development version of somepackage from [GitHub](https://github.com/) with: ``` r # install.packages("devtools") devtools::install_github("jane/somepackage") ``` ## Example This is a basic example which shows you how to solve a common problem: ```{r example} library(somepackage) ## basic example code ```
1 If it really doesn’t make sense to include any executable code chunks, usethis::use_readme_md() is similar,
except that it gives you a basic README.md file.
README
|
275
What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so: ```{r cars} summary(cars) ```
You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. You can also embed plots, for example: ```{r pressure, echo = FALSE} plot(pressure) ```
In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN.
A few things to note about this starter README.Rmd: • It renders to GitHub Flavored Markdown. • It includes a comment to remind you to edit README.Rmd, not README.md. • It sets up our recommended knitr options, including saving images to man/fig‐ ures/README—which ensures that they’re included in your built package. This is important so that your README works when it’s displayed by CRAN. • It sets up a place for future badges, such as results from automatic continuous integration checks (see “Continuous Integration” on page 298). Examples of functions that insert development badges: — usethis::use_cran_badge() reports the current version of your package on CRAN. — usethis::use_coverage() reports test coverage. — use_github_actions() and friends report the R CMD check status of your development package. • It includes placeholders where you should provide code for package installation and for some basic usage. • It reminds you of key facts about maintaining your README.
276
| Chapter 18: Other Markdown Files
You’ll need to remember to rerender README.Rmd periodically and, most especially, before release. The best function to use for this is devtools::build_readme(), because it is guaranteed to render README.Rmd against the current source code of your package. The devtools ecosystem tries to help you keep README.Rmd up-to-date in two ways: • If your package is also a Git repo, use_readme_rmd() automatically adds the following precommit hook: #!/bin/bash if [[ README.Rmd -nt README.md ]]; then echo "README.md is out of date; please re-knit README.Rmd" exit 1 fi
This prevents a git commit if README.Rmd is more recently modified than README.md. If the hook is preventing a commit you really want to make, you can override it with git commit --no-verify. Note that Git commit hooks are not stored in the repository, so this hook needs to be added to any fresh clone. For example, you could rerun usethis::use_readme_rmd() and discard the changes to README.Rmd. • The release checklist placed by usethis::use_release_issue() includes a reminder to call devtools::build_readme().
NEWS The README is aimed at new users, whereas the NEWS file is aimed at existing users: it should list all the changes in each release that a user might notice or want to learn more about. As with README, it’s a well-established convention for open source software to have a NEWS file, sometimes called a changelog. As with README, base R tooling does not require that NEWS be a markdown file, but it does allow for that and it’s our strong preference. A NEWS.md file is pleasant to read on GitHub, on your pkgdown site, and is reachable from your package’s CRAN landing page. We demonstrate this again with dplyr: • NEWS.md in dplyr’s GitHub repo: — https://github.com/tidyverse/dplyr/blob/main/NEWS.md • On CRAN, if you release your package there: — https://cran.r-project.org/web/packages/dplyr/index.html Notice the hyperlinked “NEWS” under “Materials.”
NEWS
|
277
• On your package site, available as the “Changelog” from the “News” drop-down menu in the main navbar: — https://dplyr.tidyverse.org/news/index.html You can use usethis::use_news_md() to initiate the NEWS.md file; many other lifecycle- and release-related functions in the devtools ecosystem will make appropri‐ ate changes to NEWS.md as your package evolves. Here’s a hypothetical NEWS.md file: # foofy (development version) * Better error message when grooving an invalid grobble (#206). # foofy 1.0.0 ## Major changes * Can now work with all grooveable grobbles! ## Minor improvements and bug fixes * Printing scrobbles no longer errors (@githubusername, #100). * Wibbles are now 55% less jibbly (#200).
This example demonstrates some organizing principles for NEWS.md: • Use a top-level heading for each version: e.g., # somepackage 1.0.0. The most recent version should go at the top. Typically the top-most entry in NEWS.md of your source package will read # somepackage (development version).2 • Each change should be part of a bulleted list. If you have a lot of changes, you might want to break them up using subheadings—## Major changes, ## Bug fixes, etc. We usually stick with a simple list until we’re close to a release, at which point we organize into sections and refine the text. It’s hard to know in advance exactly what sections you’ll need. The release checklist placed by usethis::use_release_issue() includes a reminder to polish the NEWS.md file. In that phase, it can be helpful to remember that NEWS.md is a user-facing record of change, in contrast to, e.g., commit messages, which are developerfacing.
2 pkgdown supports a few other wording choices for these headings; see more at https://pkgdown.r-lib.org/
reference/build_news.html.
278
|
Chapter 18: Other Markdown Files
• If an item is related to an issue in GitHub, include the issue number in parenthe‐ ses, e.g., (#10). If an item is related to a pull request, include the pull request number and the author, e.g., (#101, @hadley). This helps an interested reader to find relevant context on GitHub and, in your pkgdown site, these issue and pull request numbers and usernames will be hyperlinks. We generally omit the username if the contributor is already recorded in DESCRIPTION. The main challenge with NEWS.md is getting into the habit of noting any user-visible change when you make it. It’s especially easy to forget this when accepting external contributions. Before release, it can be useful to use your version control tooling to compare the source of the release candidate to the previous release. This often surfaces missing NEWS items.
NEWS
|
279
CHAPTER 19
Website
At this point, we’ve discussed many ways to document your package: • Function documentation or, more generally, help topics (see Chapter 16). • Documentation of datasets (see “Documenting Datasets” on page 104). • Vignettes (and articles) (see Chapter 17). • README and NEWS (see Chapter 18). Wouldn’t it be divine if all of that somehow got bundled up together into a beautiful website for your package? The pkgdown package is meant to provide exactly this magic, and that is the topic of this chapter.
Initiate a Site Assuming your package has a valid structure, pkgdown should be able to make a website for it. Obviously that website will be more substantial if your package has more of the documentation elements just listed. But something reasonable should happen for any valid R package. We hear that some folks put off “learning pkgdown,” because they think it’s going to be a lot of work. But then they eventually execute the two commands we show next and have a decent website in less than five minutes!
281
usethis::use_pkgdown() is a function you run once and it does the initial, minimal setup necessary to start using pkgdown: usethis::use_pkgdown() #> #> #> #> #> #>
✔ Setting active project to '/private/tmp/RtmpRf8Oqf/mypackage' ✔ Adding '^_pkgdown\\.yml$', '^docs$', '^pkgdown$' to '.Rbuildignore' ✔ Adding 'docs' to '.gitignore' ✔ Writing '_pkgdown.yml' • Edit '_pkgdown.yml' ✔ Setting active project to ''
Here’s what use_pkgdown() does: • Creates _pkgdown.yml, which is the main configuration file for pkgdown. In an interactive session, _pkgdown.yml will be opened for inspection and editing. But there’s no immediate need to change or add anything here. • Adds various patterns to .Rbuildignore, to keep pkgdown-specific files and direc‐ tories from being included in your package bundle. • Adds docs, the default destination for a rendered site, to .gitignore. This is harm‐ less for those who don’t use Git. For those who do, this opts you in to our recommended lifestyle, where the definitive source for your pkgdown site is built and deployed elsewhere (probably via GitHub Actions and Pages; more on this soon). This means the rendered website at docs/ just serves as a local preview. pkgdown::build_site() is a function you’ll call repeatedly to rerender your site locally. In an extremely barebones package, you’ll see something like this: pkgdown::build_site() #> #> #> #> #> #> #> #> #> #> #> #> #> #> #> #> #>
✔ Setting active project to '/private/tmp/RtmpRf8Oqf/mypackage' -- Installing package into temporary library ----------------------== Building pkgdown site ==================================================== Reading from: '/private/tmp/RtmpRf8Oqf/mypackage' Writing to: '/private/tmp/RtmpRf8Oqf/mypackage/docs' -- Initialising site -------------------------------------------------------Copying '../../../../Users/jenny/Library/R/...link.svg' to 'link.svg' Copying '../../../../Users/jenny/Library/R/...pkgdown.js' to 'pkgdown.js' -- Building home -----------------------------------------------------------Writing 'authors.html' Writing '404.html' -- Building function reference ---------------------------------------------Writing 'reference/index.html' Writing 'sitemap.xml' -- Building search index ---------------------------------------------------== DONE ===================================================================== ✔ Setting active project to ''
In an interactive session your newly rendered site should appear in your default web browser. 282
|
Chapter 19: Website
RStudio Another nice gesture to build your site is via Addins > pkgdown > Build pkgdown.
You can look in the local docs/ directory to see the files that constitute your package’s website. To manually browse the site, open docs/index.html in your preferred browser. This is almost all you truly need to know about pkgdown. It’s certainly a great start and, as your package and ambitions grow, the best place to learn more is the pkgdown-made website for the pkgdown package itself.
Deployment Your next task is to deploy your pkgdown site somewhere on the web, so that your users can visit it. The path of least resistance looks like this: • Use Git and host your package on GitHub. The reasons to do this go well beyond offering a package website, but this will be one of the major benefits to adopting Git and GitHub, if you’re on the fence. • Use GitHub Actions (GHA) to build your website, i.e., to run pkgdown::build_site(). GHA is a platform where you can configure certain actions to happen automatically when some event happens. We’ll use it to rebuild your website every time you push to GitHub. • Use GitHub Pages to serve your website, i.e., the files you see below docs/ locally. GitHub Pages is a static website hosting service that creates a site from files found in a GitHub repo. The advice to use GitHub Action and Pages are implemented for you in the func‐ tion usethis::use_pkgdown_github_pages(). It’s not an especially difficult task, but there are several steps, and it would be easy to miss or flub one. The output of use_pkgdown_github_pages() should look something like this: usethis::use_pkgdown_github_pages() #> ✔ Initializing empty, orphan 'gh-pages' branch in GitHub repo #> 'jane/mypackage' #> ✔ GitHub Pages is publishing from: #> • URL: 'https://jane.github.io/mypackage/' #> • Branch: 'gh-pages' #> • Path: '/' #> ✔ Creating '.github/' #> ✔ Adding '^\\.github$' to '.Rbuildignore' #> ✔ Adding '*.html' to '.github/.gitignore' #> ✔ Creating '.github/workflows/' #> ✔ Saving 'r-lib/actions/examples/pkgdown.yaml@v2' to
Deployment
|
283
#> #> #> #> #> #> #>
'.github/workflows/pkgdown.yaml' • Learn more at . ✔ Recording 'https://jane.github.io/mypackage/' as site's url in '_pkgdown.yml' ✔ Adding 'https://jane.github.io/mypackage/' to URL field in DESCRIPTION ✔ Setting 'https:/jane.github.io/mypackage/' as homepage of GitHub repo 'jane/mypackage'
Like use_pkgdown(), this is a function you basically call once, when setting up a new site. In fact, the first thing it does is to call use_pkgdown() (it’s OK if you’ve already called use_pkgdown()), so we usually skip straight to use_pkgdown_github_pages() when setting up a new site. Let’s walk through what use_pkgdown_github_pages() actually does: • Initializes an empty, “orphan” branch in your GitHub repo, named gh-pages (for “GitHub Pages”). The gh-pages branch will live only on GitHub (there’s no reason to fetch it to your local computer) and it represents a separate, parallel universe from your actual package source. The only files tracked in gh-pages are those that constitute your package’s website (the files that you see locally below docs/). • Turns on GitHub Pages for your repo and tells it to serve a website from the files found in the gh-pages branch. • Copies the configuration file for a GHA workflow that does pkgdown “build and deploy.” The file shows up in your package as .github/workflows/pkgdown.yaml. If necessary, some related additions are made to .gitignore and .Rbuildignore. • Adds the URL for your site as the home page for your GitHub repo. • Adds the URL for your site to DESCRIPTION and _pkgdown.yml. The autolink‐ ing behavior we’ve touted elsewhere relies on your package listing its URL in these two places, so this is a high-value piece of configuration. After successful execution of use_pkgdown_github_pages(), you should be able to visit your new site at the URL displayed in the previous output.1 By default the URL has this general form: https://USERNAME.github.io/REPONAME/.
Now What? For a typical package, you could stop here—after creating a basic pkgdown site and arranging for it to be rebuilt and deployed regularly—and people using (or considering using) your package would benefit greatly. Everything beyond this point is a “nice to have.” 1 Sometimes there’s a small delay, so give it up to a couple of minutes to deploy.
284
|
Chapter 19: Website
Overall, we recommend vignette("pkgdown", package = "pkgdown") as a good place to start, if you think you want to go beyond the basic defaults. In the following sections, we highlight a few areas that are connected to other topics in the book or customizations that are particularly rewarding.
Logo It’s fun to have a package logo! In the R community, we have a strong tradition of hex stickers, so it can be nice to join in with a hex logo of your own. Keen R user Amelia McNamara made herself a dress out of custom hex logo fabric, and useR! 2018 featured a spectacular hex photo wall. Here are some resources to guide your logo efforts: • The convention is to orient the logo with a vertex at the top and bottom, with flat vertical sides, as shown in Figure 19-1. • If you think you might print stickers, make sure to comply with the de facto standard for sticker size. hexb.in is a reliable source for the dimensions and also provides a list of potential vendors for printed stickers Figure 19-1.
Figure 19-1. Standard dimensions of a hex sticker • The hexSticker package helps you make your logo from within the comfort of R.
Logo
|
285
Once you have your logo, the usethis::use_logo() function places an appropriately scaled copy of the image file at man/figures/logo.png and provides a copy-paste-able markdown snippet to include your logo in your README. pkgdown will also dis‐ cover a logo placed in the standard location and incorporate it into your site.
Reference Index pkgdown creates a function reference in reference/ that includes one page for each .Rd help topic in man/. This is one of the first pages you should admire in your new site. As you look around, there are a few things to contemplate, which we review in the following sections.
Rendered Examples pkgdown executes all your examples (see “Examples” on page 248) and inserts the rendered results. We find this is a fantastic improvement over just showing the source code. This view of your examples can be eye-opening and often you’ll notice things you want to add, omit, or change. If you’re not satisfied with how your examples appear, this is a good time to review techniques for including code that is expected to error (see “Errors” on page 251) or that can be executed only under certain conditions (see “Dependencies and Conditional Execution” on page 252).
Linking These help topics will be linked to from many locations within and, potentially, beyond your pkgdown site. This is what we are talking about in “Key Markdown Features” on page 239 when we recommend putting functions inside square brackets when mentioning them in a roxygen comment: #' I am a big fan of [thisfunction()] in my package. I #' also have something to say about [otherpkg::otherfunction()] #' in somebody else's package.
On pkgdown sites, those square-bracketed functions become hyperlinks to the rele‐ vant pages in your pkgdown site. This is automatic within your package. But inbound links from other people’s packages (and websites, etc.) require two things:2
2 Another prerequisite is that your package has been released on CRAN, because the auto-linking machinery
has to look up the DESCRIPTION somewhere. It is possible to allow locally installed packages to link to each other, which is described in vignette("linking", package = "pkgdown").
286
|
Chapter 19: Website
• The URL field of your DESCRIPTION file must include the URL of your pkgdown site (preferably followed by the URL of your GitHub repo): URL: https://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr
• Your _pkgdown.yml file must include the URL for your site: url: https://dplyr.tidyverse.org
devtools takes every chance it gets to do this sort of configuration for you. But if you elect to do things manually, this is something you might overlook. A general resource on auto-linking in pkgdown is vignette("linking", package = "pkgdown").
Index Organization By default, the reference index is just an alphabetically ordered list of functions. For packages with more than a handful of functions, it’s often worthwhile to curate the index and organize the functions into groups. For example, dplyr uses this technique. You achieve this by providing a reference field in _pkgdown.yml. Here’s a redacted excerpt from dplyr’s _pkgdown.yml file that gives you a sense of what’s involved: reference: - title: Data frame verbs - subtitle: Rows desc: > Verbs that principally operate on rows. contents: - arrange - distinct ... - subtitle: Columns desc: > Verbs that principally operate on columns. contents: - glimpse - mutate ... - title: Vector functions desc: > Unlike other dplyr functions, these functions work on individual vectors, not data frames. contents: - between - case_match ... - title: Built in datasets
Reference Index
|
287
contents: - band_members - starwars - storms ... - title: Superseded desc: > Superseded functions have been replaced by new approaches that we believe to be superior, but we don't want to force you to change until you're ready, so the existing functions will stay around for several years. contents: - sample_frac - top_n ...
To learn more, see ?pkgdown::build_reference.
Vignettes and Articles Chapter 17 deals with vignettes, which are long-form guides for a package. They afford various opportunities beyond what’s possible in function documentation. For example, you have much more control over the integration of prose and code and over the presentation of code itself; e.g., code can be executed but not seen, seen but not executed, and so on. It’s much easier to create the reading experience that best prepares your users for authentic usage of your package. A package’s vignettes appear, in rendered form, in its website, in the Articles drop‐ down menu. “Vignette” feels like a technical term that we might not expect all R users to know, which is why pkgdown uses the term “articles” here. To be clear, the Articles menu lists your package’s official vignettes (the ones that are included in your package bundle) and, optionally, other nonvignette articles (see “Article Instead of Vignette” on page 268), which are only available on the website.
Linking Like function documentation, vignettes also can be the target of automatic inbound links from within your package and, potentially, beyond. We’ve talked about this elsewhere in the book. In “Key Markdown Features” on page 239, we introduced the idea of referring to a vignette with an inline call like vignette("some-topic"). The rationale behind this syntax is because the code can literally be copied, pasted, and executed for local vignette viewing. So it “works” in any context, even without automatic links. But, in contexts where the auto-linking machinery is available, it knows to look for this exact syntax and turn it into a hyperlink to the associated vignette, within a pkgdown site.
288
|
Chapter 19: Website
The need to specify the host package depends on the context: vignette("some-topic")
Use this form in your own roxygen comments, vignettes, and articles to refer to a vignette in your package. The host package is implied. vignette("some-topic", package = "somepackage")
Use this form to refer to a vignette in some other package. The host package must be explicit. Note that this shorthand does not work for linking to nonvignette articles. Since the syntax leans so heavily on the vignette() function, it would be too confusing; i.e., evaluating the code in the console would fail because R won’t be able to find such a vignette. Nonvignette articles must be linked like any other URL. When you refer to a function in your package, in your vignettes and articles make sure to put it inside backticks and to include parentheses. Qualify functions from other packages with their namespace. Here’s an example of prose in one of your own vignettes or articles: I am a big fan of `thisfunction()` in my package. I also have something to say about `otherpkg::otherfunction()` in somebody else's package.
Remember that automatic inbound links from other people’s packages (and websites, etc.) require that your package advertises the URL of its website in DESCRIPTION and _pkgdown.yaml, as configured by usethis::use_pkgdown_github_pages() and as described in “Linking” on page 286.
Index Organization As with the reference index, the default listing of the articles (broadly defined) in a package is alphabetical. But if your package has several articles, it can be worthwhile to provide additional organization. For example, you might feature the articles aimed at the typical user and tuck those meant for advanced users or developers behind “More articles …”. You can learn more about this in ?pkgdown::build_articles.
NonVignette Articles In general, Chapter 17 is our main source of advice on how to approach vignettes and that also includes some coverage of nonvignette articles (“Article Instead of Vignette” on page 268). Here we review some reasons to use a nonvignette article and give some examples.
Vignettes and Articles
|
289
An article is morally like a vignette (e.g., it tells a story that involves multiple func‐ tions and is written with R Markdown), except it does not ship with the package bundle. usethis::use_article() is the easiest way to create an article. The main reason to use an article is when you want to show code that is impossible or very painful to include in a vignette or official example. Possible root causes of this pain include: • Use of a package you don’t want to formally depend on. In vignettes and exam‐ ples, it’s forbidden to show your package working with a package that you don’t list in DESCRIPTION, e.g., in Imports or Suggests. There is a detailed example of this in “Config/Needs/* Field” on page 166, featuring a readxl article that uses the tidyverse metapackage. The key idea is to list such a dependency in the Config/Needs/website field of DESCRIPTION. This keeps tidyverse out of readxl’s dependencies, but ensures it’s installed when the website is built. • Code that requires authentication or access to specific assets, tools, or secrets that are not available on CRAN. The googledrive package has no true vignettes, only nonvignette articles, because it’s essentially impossible to demonstrate usage without authentication. It is possible to access secure environment variables on GitHub Actions, where the pkgdown site is built and deployed, but this is impossible to do on CRAN. • Content that involves a lot of figures, which cause your package to bump up against CRAN’s size constraints. The ggplot2 package presents several FAQs as articles for this reason.
Development Mode Every pkgdown site has a so-called development mode, which can be specified via the development field in _pkgdown.yml. If unspecified, the default is mode: release, which results in a single pkgdown site. Despite the name, this single site reflects the state of the current source package, which could be either a released state or a development state. The diagram below shows the evolution of a hypothetical package that is on CRAN and that has a pkgdown site in “release” mode: ... | V Tweaks before release | V Increment version number | V
290
| Chapter 19: Website
v0.1.9000
v0.2.0
#> #> #> #>
✔ ✔ ✔ ✔
Creating '.github/' Adding '*.html' to '.github/.gitignore' Creating '.github/workflows/' Saving 'r-lib/actions/examples/check-standard.yaml@v2' to .github/workflows/R-CMD-check.yaml' • Learn more at . ✔ Adding R-CMD-check badge to 'README.md'
The key things that happen here are: • A new GHA workflow file is written to .github/workflows/R-CMD-check.yaml. GHA workflows are specified via YAML files. The message reveals the source of the YAML and gives a link to learn more. • Some helpful additions may be made to various “ignore” files. • A badge reporting the R CMD check result is added to your README, if it has been created with usethis and has an identifiable badge “parking area.” Otherwise, you’ll be given some text you can copy and paste. Commit these file changes and push to GitHub. If you visit the “Actions” section of your repository, you should see that a GHA workflow run has been launched. In due course, its success (or failure) will be reported there, in your README badge, and in your GitHub notifications (depending on your personal settings). Congratulations! Your package will now benefit from even more regular checks.
Other Uses for GHA As suggested by the interactive menu, usethis::use_github_action() gives you access to premade workflows other than R CMD check. In addition to the featured choices, you can use it to configure any of the example workflows in r-lib/ Continuous Integration
|
299
actions by passing the workflow’s name. For example, use_github_action("testcoverage") configures a workflow to track the test coverage of your package, as
described in “Test Coverage” on page 202.
Since GHA allows you to run arbitrary code, you can use it for other things: • Building your package’s website and deploying the rendered site to GitHub Pages, as described in “Deployment” on page 283. See also ?usethis:: use_pkgdown_github_pages(). • Republishing a book website every time you make a change to the source. (Like we do for this book!). If the example workflows don’t cover your exact use case, you can also develop your own workflow. Even in this case, the example workflows are often useful as inspi‐ ration. The r-lib/actions repository also contains important lower-level building blocks, such as actions to install R or to install all of the dependencies indicated in a DESCRIPTION file.
300
|
Chapter 20: Software Development Practices
CHAPTER 21
Lifecycle
This chapter is about managing the evolution of your package. The trickiest part of managing change is balancing the interests of various stakeholders: • The maintainer(s), which includes you and possibly others, especially in the future. • The existing users, which could be just you or a small group of colleagues or it could be tens or hundreds of thousands of people. • The future users, which hopefully includes the existing users, but could poten‐ tially include many more people. It’s impossible to optimize for all of these folks, all of the time, all at once. So we’ll describe how we think about various trade-offs. Even if your priorities differ from those of the tidyverse team, this chapter still should help you identify issues you want to consider. Very few users complain when a package gains features or gets a bug fix. Instead, we’re mostly going to talk about so-called breaking changes, such as removing a func‐ tion or narrowing the acceptable inputs for a function. In “Backward Compatibility and Breaking Change” on page 307, we explore how to determine whether something is a breaking change or, more realistically, to gauge where it lies on a spectrum of “breakingness.” Even though it can be painful, sometimes a breaking change is beneficial for the long-term health of a package (see “Pros and Cons of Breaking Change” on page 310).
301
Since change is inevitable, the kindest thing you can do for your users is to communi‐ cate clearly and help them adapt to change. Several practices work together to achieve this: Package version number The main form of user-facing change is a package release. Be intentional about what sort of changes are included in, e.g., a patch release versus a major release (see “Package Version Number” on page 304, and “Major Versus Minor Versus Patch Release” on page 309). Lifecycle stage Be explicit when a function or argument is regarded as experimental, superseded, or deprecated, as opposed to stable (the assumed default) (see “Lifecycle Stages and Supporting Tools” on page 312). Deprecation process Enact change in a phased way, which makes it easier for users to adjust their code (see “Lifecycle Stages and Supporting Tools” on page 312).
Package Evolution First we should establish a working definition of what it means for your package to change. Technically, you could say that the package has changed every time any file in its source changes. This level of pedantry isn’t terribly useful, though. The smallest increment of change that’s meaningful is probably a Git commit. This represents a specific state of the source package that can be talked about, installed from, compared to, subjected to R CMD check, reverted to, and so on. This level of granularity is really of interest only to developers. But the package states accessible via the Git history are genuinely useful for the maintainer, so if you needed any encouragement to be more intentional with your commits, let this be it. The primary signal of meaningful change is to increment the package version num‐ ber and release it, for some definition of release, such as releasing on CRAN (see Chapter 22). Recall that this important piece of metadata lives in the Version field of the DESCRIPTION file: Package: usethis Title: Automate Package and Project Setup Version: 2.1.6 ...
302
|
Chapter 21: Lifecycle
If you visit the CRAN landing page for usethis, you can access its history via Down‐ loads > Old sources > usethis archive. That links to a folder of package bundles (see “Bundled Package” on page 35), reflecting usethis’s source for each version released on CRAN, presented in Table 21-1. Table 21-1. Releases of the usethis package Version 1.0.0
Date 2017-10-22 17:36:29 UTC
1.1.0
2017-11-17 22:52:07 UTC
1.2.0
2018-01-19 18:23:54 UTC
1.3.0
2018-02-24 21:53:51 UTC
1.4.0
2018-08-14 12:10:02 UTC
1.5.0
2019-04-07 10:50:44 UTC
1.5.1
2019-07-04 11:00:05 UTC
1.6.0
2020-04-09 04:50:02 UTC
1.6.1
2020-04-29 05:50:02 UTC
1.6.3
2020-09-17 17:00:03 UTC
2.0.0
2020-12-10 09:00:02 UTC
2.0.1
2021-02-10 10:40:06 UTC
2.1.0
2021-10-16 23:30:02 UTC
2.1.2
2021-10-25 07:30:02 UTC
2.1.3
2021-10-27 15:00:02 UTC
2.1.5
2021-12-09 23:00:02 UTC
2.1.6
2022-05-25 20:50:02 UTC
This is the type of package evolution we’re going to address in this chapter. In “Package Version Number” on page 304, we’ll delve into the world of software version numbers, which is a richer topic than you might expect. R also has some specific rules and tools around package version numbers. Finally, we’ll explain the conventions we use for the version numbers of tidyverse packages (see “Tidyverse Package Version Conventions” on page 306). But first, this is a good time to revisit a resource we first pointed out in “Source Package” on page 34, when introducing the different states of an R package. Recall that the (unofficial) cran organization on GitHub provides a read-only history of all CRAN packages. For example, you can get a different view of usethis’s released versions at https://github.com/cran/usethis.
Package Evolution
|
303
The archive provided by CRAN itself allows you to download older versions of usethis as .tar.gz files, which is useful if you truly want to get your hands on the source of an older version. However, if you just want to quickly check something about a version or compare two versions of usethis, the read-only GitHub mirror is much more useful. Each commit in this repo’s history represents a CRAN release, which makes it easy to see exactly what changed. Furthermore, you can browse the state of all the package’s source files at any specific version, such as usethis’s initial release at version 1.0.0.1 This information is technically available from the repository where usethis is actually developed. But you have to work much harder to zoom out to the level of CRAN releases, amid the clutter of the small incremental steps in which development actually unfolds. These three different views of usethis’s evolution are all useful for different purposes: https://cran.r-project.org/src/contrib/Archive/usethis The official CRAN package bundles. https://github.com/cran/usethis/commits/HEAD The unofficial read-only CRAN mirror, obtained by unpacking CRAN’s bundles. https://github.com/r-lib/usethis/commits/HEAD The official development home for usethis.
Package Version Number Formally, an R package version is a sequence of at least two integers separated by either . or -. For example, 1.0 and 0.9.1-10 are valid versions, but 1 and 1.0-devel are not. Base R offers the utils::package_version()2 function to parse a package version string into a proper S3 class by the same name. This class makes it easier to do things like compare versions: package_version(c("1.0", "0.9.1-10")) #> [1] '1.0' '0.9.1.10' class(package_version("1.0")) #> [1] "package_version" "numeric_version" # these versions are not allowed for an R package package_version("1") #> Error: invalid version specification '1' package_version("1.0-devel")
1 It’s unusual for an initial release to be version 1.0.0, but remember that usethis was basically carved out of a
very mature package (devtools).
2 We can call package_version() directly here, but in package code, you should use the utils::package_ver
sion() form and list the utils package in Imports.
304
|
Chapter 21: Lifecycle
#> Error: invalid version specification '1.0-devel' # comparing package versions package_version("1.9") == package_version("1.9.0") #> [1] TRUE package_version("1.9") < package_version("1.9.2") #> [1] TRUE package_version(c("1.9", "1.9.2")) < package_version("1.10") #> [1] TRUE TRUE
The previous examples make it clear that R considers version 1.9 to be equal to 1.9.0 and to be less than 1.9.2. And both 1.9 and 1.9.2 are less than 1.10, which you should think of as version “one point ten,” not “one point one zero.” If you’re skeptical that the package_version class is really necessary, check out this example: "2.0" > "10.0" #> [1] TRUE package_version("2.0") > package_version("10.0") #> [1] FALSE
The string 2.0 is considered to be greater than the string 10.0, because the character 2 comes after the character 1. By parsing version strings into proper package_ver sion objects, we get the correct comparison, i.e., that version 2.0 is less than version 10.0. R offers this support for working with package versions, because it’s necessary, for example, to determine whether package dependencies are satisfied (see “Minimum Versions” on page 130). Under-the-hood, this tooling is used to enforce minimum versions recorded like this in DESCRIPTION: Imports: dplyr (>= 1.0.0), tidyr (>= 1.1.0)
In your own code, if you need to determine which version of a package is installed, use utils::packageVersion():3 packageVersion("usethis") #> [1] '2.2.0' str(packageVersion("usethis")) #> Classes 'package_version', 'numeric_version' #> $ : int [1:3] 2 2 0
hidden list of 1
packageVersion("usethis") > package_version("10.0") #> [1] FALSE
3 As with package_version(), in package code, you should use the utils::packageVersion() form and list
the utils package in Imports.
Package Version Number
|
305
packageVersion("usethis") > "10.0" #> [1] FALSE
The return value of packageVersion() has the package_version class and is there‐ fore ready for comparison to other version numbers. Note the last example where we seem to be comparing a version number to a string. How can we get the correct result without explicitly converting 10.0 to a package version? It turns out this conversion is automatic as long as one of the comparators has the package_version class.
Tidyverse Package Version Conventions R considers 0.9.1-10 to be a valid package version, but you’ll never see a version number like that for a tidyverse package. Here is our recommended framework for managing the package version number: • Always use . as the separator, never -. • A released version number consists of three numbers, . .. For version number 1.9.2, 1 is the major number, 9 is the minor number, and 2 is the patch number. Never use versions like 1.0. Always spell out the three components, 1.0.0. • An in-development package has a fourth component: the development version. This should start at 9000. The number 9000 is arbitrary, but provides a clear signal that there’s something different about this version number. There are two reasons for this practice: First, the presence of a fourth component makes it easy to tell if you’re dealing with a released or in-development version. Also, the use of the fourth place means that you’re not limited to what the next released version will be. 0.0.1, 0.1.0, and 1.0.0 are all greater than 0.0.0.9000. Increment the development version, e.g., from 9000 to 9001, if you’ve added an important feature and you (or others) need to be able to detect or require the presence of this feature. For example, this can happen when two packages are developing in tandem. This is generally the only reason that we bother to incre‐ ment the development version. This makes in-development versions special and, in some sense, degenerate. Since we don’t increment the development component with each Git commit, the same package version number is associated with many different states of the package source, in between releases. The preceding advice is inspired in part by Semantic Versioning and by the X.Org versioning schemes. Read them if you’d like to understand more about the standards of versioning used by many open source projects. But we should underscore that our practices are inspired by these schemes and are somewhat less regimented. Finally, know that other maintainers follow different philosophies on how to manage the package version number. 306
| Chapter 21: Lifecycle
Backward Compatibility and Breaking Change The version number of your package is always increasing, but it’s more than just an incrementing counter—the way the number changes with each release can convey information about the nature of the changes. The transition from 0.3.1 to 0.3.2, which is a patch release, has a very different vibe from the transition from 0.3.2 to 1.0.0, which is a major release. A package version number can also convey information about where the package is in its lifecycle. For example, the version 1.0.0 often signals that the public interface of a package is considered stable. How do you decide which type of release to make, i.e., which component(s) of the version should you increment? A key concept is whether the associated changes are backward compatible, meaning that preexisting code will still “work” with the new version. We put “work” in quotes, because this designation is open to a certain amount of interpretation. A hardliner might take this to mean “the code works in exactly the same way, in all contexts, for all inputs.” A more pragmatic interpretation is that “the code still works but could produce a different result in some edge cases.” A change that is not backward compatible is often described as a breaking change. Here we’re going to talk about how to assess whether a change is breaking. In “Pros and Cons of Breaking Change” on page 310 we’ll talk about how to decide if a breaking change is worth it. In practice, backward compatibility is not a clear-cut distinction. It is typical to assess the impact of a change from a few angles: Degree of change in behavior The most extreme is to make something that used to be possible into an error, i.e., impossible. How the changes fit into the design of the package A change to low-level infrastructure, such as a utility that gets called in all userfacing functions, is more fraught than a change that affects only one parameter of a single function. How much existing usage is affected This is a combination of how many of your users will perceive the change and how many existing users there are to begin with.
Backward Compatibility and Breaking Change
|
307
Here are some concrete examples of breaking change: • Removing a function • Removing an argument • Narrowing the set of valid inputs to a function Conversely, these are usually not considered breaking: • Adding a function. Caveat: there’s a small chance this could introduce a conflict in user code. • Adding an argument. Caveat: this could be breaking for some usage, e.g., if a user is relying on position-based argument matching. This also requires some care in a function that accepts ... • Increasing the set of valid inputs. • Changing the text of a print method or error. Caveat: This can be breaking if other packages depend on yours in fragile ways, such as building logic or a test that relies on an error message from your package. • Fixing a bug. Caveat: It really can happen that users write code that “depends” on a bug. Sometimes such code was flawed from the beginning, but the problem went undetected until you fixed your bug. Other times this surfaces code that uses your package in an unexpected way, i.e., it’s not necessarily wrong, but neither is it right. If reasoning about code was a reliable way to assess how it will work in real life, the world wouldn’t have so much buggy software. The best way to gauge the conse‐ quences of a change in your package is to try it and see what happens. In addition to running your own tests, you can also run the tests of your reverse dependencies and see if your proposed change breaks anything. The tidyverse team has a fairly extensive set of tools for running so-called reverse dependency checks (see “Reverse Dependency Checks” on page 330), where we run R CMD check on all the packages that depend on ours. Sometimes we use this infrastructure to study the impact of a potential change, i.e., reverse dependency checks can be used to guide development, not only as a last-minute, prerelease check. This leads to yet another, deeply prag‐ matic definition of a breaking change: A change is breaking if it causes a CRAN package that was previously passing R CMD check to now fail AND the package’s original usage and behavior is correct.
This is obviously a narrow and incomplete definition of breaking change, but at least it’s relatively easy to get solid data.
308
|
Chapter 21: Lifecycle
Hopefully we’ve made the point that backward compatibility is not always a clearcut distinction. But hopefully we’ve also provided plenty of concrete criteria to consider when thinking about whether a change could break someone else’s code.
Major Versus Minor Versus Patch Release Recall that a version number will have one of these forms, if you’re following the conventions described in “Tidyverse Package Version Conventions” on page 306: .. ...
# released version # in-development version
If the current package version is 0.8.1.9000, here’s our advice on how to pick the version number for the next release: Increment patch, e.g., 0.8.2 for a patch release You’ve fixed bugs, but you haven’t added any significant new features and there are no breaking changes. For example, if we discover a show-stopping bug shortly after a release, we would make a quick patch release with the fix. Most releases will have a patch number of 0. Increment minor, e.g., 0.9.0, for a minor release A minor release can include bug fixes, new features, and changes that are back‐ ward compatible.4 This is the most common type of release. It’s perfectly fine to have so many minor releases that you need to use two (or even three!) digits, e.g., 1.17.0. Increment major, e.g., 1.0.0, for a major release This is the most appropriate time to make changes that are not backward com‐ patible and that are likely to affect many users. The 1.0.0 release has special significance and typically indicates that your package is feature complete with a stable API. The trickiest decision you are likely to face is whether a change is “breaking” enough to deserve a major release. For example, if you make an API-incompatible change to a rarely used part of your code, it may not make sense to increase the major number. But if you fix a bug that many people depend on (it happens!), it will feel like a breaking change to those folks. It’s conceivable that such a bug fix could merit a major release.
4 For some suitably pragmatic definition of “backward compatible.”
Major Versus Minor Versus Patch Release
|
309
We’re mostly dwelling on breaking changes, but let’s not forget that sometimes you also add exciting new features to your package. From a marketing perspective, you probably want to save these for a major release, because your users are more likely to learn about the new goodies from reading a blog post or NEWS. Here are a few tidyverse blog posts that have accompanied different types of package releases: • Major release: dplyr 1.0.0, purrr 1.0.0, pkgdown 2.0.0, readr 2.0.0 • Minor release: stringr 1.5.0, ggplot2 3.4.0 • Patch release: These are usually not considered worthy of a blog post.
Package Version Mechanics Your package should start with version number 0.0.0.9000. usethis::create_ package() starts with this version, by default. From that point on, you can use usethis::use_version() to increment the package version. When called interactively, with no argument, it presents a helpful menu: usethis::use_version() #> Current version is 0.1. #> What should the new version be? (0 to exit) #> #> 1: major --> 1.0 #> 2: minor --> 0.2 #> 3: patch --> 0.1.1 #> 4: dev --> 0.1.0.9000 #> #> Selection:
In addition to incrementing Version in DESCRIPTION (see Chapter 9), use_ver sion() also adds a new heading in NEWS.md (“NEWS” on page 277).
Pros and Cons of Breaking Change The big difference between major and minor releases is whether or not the code is backward compatible. In the general software world, the idea is that a major release signals to users that it may contain breaking changes and they should upgrade only when they have the capacity to deal with any issues that emerge.
310
|
Chapter 21: Lifecycle
Reality is a bit different in the R community, because of the way most users manage package installation. If we’re being honest, most R users don’t manage package ver‐ sions in a very intentional way. Given the way update.packages() and install.pack ages() work, it’s quite easy to upgrade a package to a new major version without really meaning to, especially for dependencies of the target package. This, in turn, can lead to unexpected exposure to breaking changes in code that previously worked. This unpleasantness has implications both for users and for maintainers. If it’s important to protect a data product against change in its R package dependen‐ cies, we recommend the use of a project-specific package library. In particular, we like to implement this approach using the renv package. This supports a lifestyle where a user’s default package library is managed in the usual, somewhat haphazard way. But any project that has a specific, higher requirement for reproducibility is managed with renv. This keeps package updates triggered by work in project A from breaking the code in project B and also helps with collaboration and deployment. We suspect that package-specific libraries and tools like renv are currently underutilized in the R world. That is, lots of R users still use just one package library. Therefore, package maintainers still need to exercise considerable caution and care when they introduce breaking changes, regardless of what’s happening with the version number. In the next section, we describe how tidyverse packages approach this, supported by tools in the lifecycle package. As with dependencies (see “When Should You Take a Dependency?” on page 136), we find that extremism isn’t a very productive stance. Extreme resistance to breaking change puts a significant drag on ongoing development and maintenance. Backward compatible code tends to be harder to work with because of the need to maintain multiple paths to support functionality from previous versions. The harder you strive to maintain backward compatibility, the harder it is to develop new features or fix old mistakes. This, in turn, can discourage adoption by new users and can make it harder to recruit new contributors. On the other hand, if you constantly make breaking changes, users will become very frustrated with your package and will decide they’re better off without it. Find a happy medium. Be concerned about backward compatibility, but don’t let it paralyze you. The importance of backward compatibility is directly proportional to the number of people using your package: you are trading your time and pain for that of your users. There are good reasons to make backward incompatible changes. Once you’ve decided it’s necessary, your main priority is to use a humane process that is respectful of your users.
Pros and Cons of Breaking Change
|
311
Lifecycle Stages and Supporting Tools The tidyverse team’s approach to package evolution has become more structured and deliberate over the years. The associated tooling and documentation lives in the lifecycle package. The approach relies on two major components: • Lifecycle stages, which can be applied at different levels, i.e., to an individual argument or function or to an entire package. These stages are depicted in Figure 21-1. • Conventions and functions to use when transitioning a function from one lifecycle stage to another. The deprecation process is the one that demands the most care. We won’t duplicate too much of the lifecycle documentation here. Instead, we high‐ light the general principles of lifecycle management and present specific examples of successful lifecycle “moves.”
Lifecycle Stages and Badges
Figure 21-1. The four primary stages of the tidyverse lifecycle: stable, deprecated, super‐ seded, and experimental The four lifecycle stages are: Stable This is the default stage and signals that users should feel comfortable relying on a function or package. Breaking changes should be rare and should happen gradually, giving users sufficient time and guidance to adapt their usage. Experimental This is appropriate when a function is first introduced and the maintainer reserves the right to change it without much of a deprecation process. This is the implied stage for any package with a major version of 0, i.e., that hasn’t had a 1.0.0 release yet.
312
|
Chapter 21: Lifecycle
Deprecated This applies to functionality that is slated for removal. Initially, it still works, but it triggers a deprecation warning with information about preferred alternatives. After a suitable amount of time and with an appropriate version change, such functions are typically removed. Superseded This is a softer version of deprecated, where legacy functionality is preserved as if in a time capsule. Superseded functions receive only minimal maintenance, such as critical bug fixes. You can get much more detail in vignette("stages", package = "lifecycle"). The lifecycle stage is often communicated through a badge. If you’d like to use lifecycle badges, call usethis::use_lifecycle() to do some one-time setup: usethis::use_lifecycle() #> ✔ Adding 'lifecycle' to Imports field in DESCRIPTION #> • Refer to functions with `lifecycle::fun()` #> ✔ Adding '@importFrom lifecycle deprecated' to 'R/somepackage-package.R' #> ✔ Writing 'NAMESPACE' #> ✔ Creating 'man/figures/' #> ✔ Copied SVG badges to 'man/figures/' #> • Add badges in documentation topics by inserting one of: #> #' `r lifecycle::badge('experimental')` #> #' `r lifecycle::badge('superseded')` #> #' `r lifecycle::badge('deprecated')`
This leaves you in a position to use lifecycle badges in help topics and to use lifecycle functions, as described in the remainder of this section. For a function, include the badge in its @description block. Here’s how we indicate that dplyr::top_n() is superseded: #' #' #' #' #'
Select top (or bottom) n rows (by value) @description `r lifecycle::badge("superseded")` `top_n()` has been superseded in favour of ...
For a function argument, include the badge in the @param tag. Here’s how the depre‐ cation of readr::write_file(path =) is documented: #' @param path `r lifecycle::badge("deprecated")` Use the `file` argument #' instead.
Call usethis::use_lifecycle_badge() if you want to use a badge in README to indicate the lifecycle of an entire package (see “README” on page 273).
Lifecycle Stages and Supporting Tools
|
313
If the lifecycle of a package is stable, it’s not really necessary to use a badge, since that is the assumed default stage. Similarly, we typically use a badge for a function only if its stage differs from that of the associated package and likewise for an argument and the associated function.
Deprecating a Function If you’re going to remove or make significant changes to a function, it’s usually best to do so in phases. Deprecation is a general term for the situation where something is explicitly discouraged, but it has not yet been removed. Various deprecation scenarios are explored in vignette("communicate", package = "lifecycle"); we’re just going to cover the main idea here. The lifecycle::deprecate_warn() function can be used inside a function to inform your user that they’re using a deprecated feature and, ideally, to let them know about the preferred alternative. In this example, the plus3() function is being replaced by add3(): # new function add3 ℹ Please use `add3()` instead. #> [1] 6
At this point, a user who calls plus3() sees a warning explaining that the function has a new name, but we go ahead and call add3() with their inputs. Preexisting code still “works.” In some future major release, plus3() could be removed entirely. lifecycle::deprecate_warn() and friends have a few features that are worth highlighting:
• The warning message is built up from inputs like when, what, with, and details, which gives deprecation warnings a predictable form across different functions, packages, and time. The intent is to reduce the cognitive load for users who may already be somewhat stressed. • By default, a specific warning is issued once every 8 hours, in an effort to cause just the right amount of aggravation. The goal is to be just annoying enough 314
| Chapter 21: Lifecycle
to motivate the user to update their code before the function or argument goes away, but not so annoying that they fling their computer into the sea. Near the end of the deprecation process, the always argument can be set to TRUE to warn on every call. • If you use lifecycle::deprecate_soft(), instead of lifecycle::depre cate_warn(), the warning is issued only if the person reading it is the one who can actually do something about it, i.e., update the offending code. If a user calls a deprecated function indirectly, i.e., because they are using a package that’s using a deprecated function, by default that user doesn’t get a warning. (But the maintainer of the guilty package will see these warnings in their test results.) Here’s a hypothetical schedule for removing a function fun(): Package version 1.5.0: fun() exists The lifecycle stage of the package is stable, as indicated by its post-1.0.0 version number and, perhaps, a package-level badge. The lifecycle stage of fun() is also stable, by extension, since it hasn’t been specifically marked as experimental. Package version 1.6.0 The deprecation process of fun() begins. We insert `r lifecycle::badge ("deprecated")` in its @description to place a badge in its help topic. In the body of fun(), we add a call to lifecycle::deprecate_warn() to inform users about the situation. Otherwise, fun() still works as it always has. Package version 1.7.0 or 2.0.0 fun() is removed. Whether this happens in a minor or major release will depend on the context, i.e., how widely used this package and function are. If you’re using base R only, the .Deprecated() and .Defunct() functions are the closest substitutes for lifecycle::deprecate_warn() and friends.
Deprecating an Argument lifecycle::deprecate_warn() is also useful when deprecating an argument. In this case, it’s also handy to use lifecycle::deprecated() as the default value for the
deprecated argument. Here we continue an example from the preceding section, i.e., the switch from path to file in readr::write_file(): write_file 1.4.0. #> ℹ Please use the `file` argument instead.
The use of deprecated() as the default accomplishes two things. First, if the user reads the documentation, this is a strong signal that an argument is deprecated. But deprecated() also has benefits for the package maintainer. Inside the affected func‐ tion, you can use lifecycle::is_present() to determine if the user has specified the deprecated argument and proceed accordingly, as shown in the preceding code. If you’re using base R only, the missing() function has substantial overlap with lifecycle::is_present(), although it can be trickier to finesse issues around default values.
Deprecation Helpers Sometimes a deprecation affects code in multiple places and it’s clunky to inline the full logic everywhere. In this case, you might create an internal helper to centralize the deprecation logic. This happened in googledrive, when we changed how to control the package’s verbo‐ sity. The original design let the user specify this in every single function, via the verbose = TRUE/FALSE argument. Later, we decided it made more sense to use a global option to control verbosity at the package level. This is a case of (eventually) removing an argument, but it affects practically every single function in the package. Here’s what a typical function looks like after starting the deprecation process: drive_publish 2.2.0 3: patch --> 2.1.7 Selection:
The immediate question feels quite mechanical: which component of the version number do you want to increment? But remember that we discussed the substantive differences in release types in “Major Versus Minor Versus Patch Release” on page 309. In our workflow, this planned version number is recorded in the GitHub issue that holds the release checklist, but we don’t actually increment the version in DESCRIPTION until later in the process (see “The Submission Process” on page 336). However, it’s important to declare the release type up front, because the process (and, therefore, the checklist) looks different e.g., for a patch release versus a major release.
Initial CRAN Release: Special Considerations Every new package receives a higher level of scrutiny from CRAN. In addition to the usual automated checks, new packages are also reviewed by a human, which inevitably introduces a certain amount of subjectivity and randomness. There are many packages on CRAN that would not be accepted in their current form, if submitted today as a completely new package. This isn’t meant to discourage you. But
Initial CRAN Release: Special Considerations
|
323
you should be aware: just because you see some practice in an established package (or even in base R), that doesn’t mean you can do the same in your new package. Luckily, the community maintains lists of common “gotchas” for new packages. If your package is not yet on CRAN, the checklist begins with a special section that reflects this recent collective wisdom. Attending to these checklist items has dramati‐ cally improved our team’s success rate for initial submissions. First release: • usethis::use_news_md() • usethis::use_cran_comments() • Update (aspirational) install instructions in README • Proofread Title: and Description: • Check that all exported functions have @returns and @examples • Check that Authors@R: includes a copyright holder (role “cph”) • Check licensing of included files • Review https://github.com/DavisVaughan/extrachecks If you don’t already have a NEWS.md file, you are encouraged to create one now with
usethis::use_news_md(). You’ll want this file eventually, and this anticipates the fact
that the description of your eventual GitHub release (see “Celebrating Success” on page 338) is drawn from NEWS.md.
usethis::use_cran_comments() initiates a file to hold submission comments for your package. It’s very barebones at first, e.g.: ## R CMD check results 0 errors | 0 warnings | 1 note * This is a new release.
In subsequent releases, this file becomes less pointless; for example, it is where we report the results of reverse dependency checks. This is not a place to wax on with long explanations about your submission. In general, you should eliminate the need for such explanations, especially for an initial submission. We highly recommend that your package have a README file (see “README” on page 273). If it does, this is a good time to check the installation instructions provided there. You may need to switch from instructions to install it from GitHub, in favor of installing from CRAN, in anticipation of your package’s acceptance.
324
|
Chapter 22: Releasing to CRAN
The Title and Description fields of DESCRIPTION are real hotspots for nitpick‐ ing during CRAN’s human review. Carefully review the advice given in “Title and Description: What Does Your Package Do?” on page 125. Also check that Authors@R includes a copyright holder, indicated by the “cph” role. The two most common scenarios are that you add “cph” to your other roles (probably “cre” and “aut”) or that you add your employer to Authors@R: with the “cph” and, perhaps, “fnd” role. (When you credit a funder via the “fnd” role, they are acknowledged in the footer of your pkgdown website.) This is also a good time to ensure that the maintainer’s e-mail address is appropriate. This is the only way that CRAN can correspond with you. If there are problems and they can’t get in touch with you, they will remove your package from CRAN. Make sure this email address is likely to be around for a while and that it’s not heavily filtered. Double-check that each of your exported functions documents its return value (with the @returns tag; see “Return Value” on page 246) and has an @examples section (see “Examples” on page 248). If you have examples that cannot be run on CRAN, you absolutely must use the techniques in “Dependencies and Conditional Execution” on page 252 to express the relevant preconditions properly. Do not take shortcuts, such as having no examples, commenting out your examples, or putting all of your examples inside \dontrun{}. If you have embedded third-party code in your package, check that you are correctly abiding by and declaring its license (see “Code You Bundle” on page 178). Finally, take advantage of any list of ad hoc checks that other package developers have recently experienced with CRAN. At the time of writing, https://github.com/Davis Vaughan/extrachecks is a good place to find such firsthand reports. Reading such a list and preemptively modifying your package often can make the difference between a smooth acceptance and a frustrating process requiring multiple attempts.
CRAN Policies We alert you to specific CRAN policies throughout this book and, especially, through this chapter. However, this is something of a moving target, so it pays to make some effort to keep yourself informed about future changes to CRAN policy. The official home of CRAN policy is https://cran.r-project.org/web/packages/poli cies.html. However, it’s not very practical to read this document, e.g., once a week and simply hope that you’ll notice any changes. The GitHub repository https:// github.com/eddelbuettel/crp monitors the CRAN Repository Policy by tracking the evolution of the underlying files in the source of the CRAN website. Therefore the commit history of that repository makes policy changes much easier to navigate. You also may want to follow the CRAN Policy Watch Mastodon account, which toots whenever a change is detected.
Initial CRAN Release: Special Considerations
|
325
The R-package-devel mailing list is another good resource for learning more about package development. You could subscribe to it to keep tabs on what other maintain‐ ers are talking about. Even if you don’t subscribe, it can be useful to search this list when you’re researching a specific topic.
Keeping Up with Change Now we move into the main checklist items for a minor or major release of a package that is already on CRAN. Many of these items also appear in the checklist for a patch or initial release. • Check current CRAN check results • Check if any deprecation processes should be advanced, as described in Gradual deprecation • Polish NEWS • urlchecker::url_check() • devtools::build_readme() These first few items confirm that your package is keeping up with its surroundings and with itself. The first item, “Check current CRAN check results,” will be a hyper‐ link to the CRAN check results for the version of the package that is currently on CRAN. If there are any WARNINGs or ERRORs or NOTEs there, you should investigate and determine what’s going on. Occasionally there can be an intermittent hiccup at CRAN, but generally speaking, any result other than “OK” is something you should address with the release you are preparing. You may discover your package is in a dysfunctional state due to changes in base R, CRAN policies, CRAN tooling, or packages you depend on. If you are in the process of deprecating a function or an argument, a minor or major release is a good time to consider moving that process along as described in “Lifecycle Stages and Supporting Tools” on page 312. This is also a good time to look at all the NEWS bullets that have accumulated since the last release (“Polish NEWS”). Even if you’ve been diligent about jotting down all the newsworthy changes, chances are these bullets will benefit from some reorganization and editing for consistency and clarity (see “NEWS” on page 277).
326
|
Chapter 22: Releasing to CRAN
Another very important check is to run urlchecker::url_check(). CRAN’s URL checks are described at https://cran.r-project.org/web/packages/URL_checks.html and are implemented by code that ships with R itself. However, these checks are not exposed in a very usable way. The urlchecker package was created to address this and exposes CRAN’s URL-checking logic in the url_check() function. The main problems that surface tend to be URLs that don’t work anymore or URLs that use redirection. Obviously, you should update or remove any URL that no longer exists. Redirection, however, is trickier. If the status code is “301 Moved Permanently,” CRAN’s view is that your package should use the redirected URL. The problem is that many folks don’t follow RFC7231 to the letter and use this sort of redirect even when they have a different intent, i.e., their intent is to provide a stable, user-friendly URL that then redirects to something less user-friendly or more volatile. If a legitimate URL you want to use runs afoul of CRAN’s checks, you’ll have to choose between a couple of less-than-appealing options. You could try to explain the situation to CRAN, but this requires human review, and thus is not recommended. Or you can convert such URLs into nonhyperlinked, verbatim text. Note also that even though urlchecker is using the same code as CRAN, your local results may still differ from CRAN’s, due to differences in other ambient conditions, such as environment vari‐ ables and system capabilities. If you have a README.Rmd file, you will also want to rebuild the static README.md file with the current version of your package. The best function to use for this is devtools::build_readme(), because it is guaranteed to render README.Rmd against the current source code of your package.
Double R CMD Checking Next come a couple of items related to R CMD check. Remember that this should not be the first time you’ve run R CMD check since the previous release! Hopefully, you are running R CMD check often during local development and are using a continuous integration service, like GitHub Actions. This is meant to be a last-minute, final reminder to double-check that all is still well: devtools::check(remote = TRUE, manual = TRUE) This happens on your primary development machine, presumably with the cur‐ rent version of R, and with some extra checks that are usually turned off to make day-to-day development faster.
Double R CMD Checking
|
327
devtools::check_win_devel() This sends your package off to be checked with CRAN’s win-builder service, against the latest development version of R (a.k.a. r-devel). You should receive an email within about 30 minutes with a link to the check results. It’s a good idea to check your package with r-devel, because base R and R CMD check are constantly evolving. Checking with r-devel is required by CRAN policy, and it will be done as part of CRAN’s incoming checks. There is no point in skipping this step and hoping for the best. Note that the brevity of this list implicitly reflects that tidyverse packages are checked after every push via GitHub Actions, across multiple operating systems and versions of R (including the development version), and that most of the tidyverse team develops primarily on macOS. CRAN expects you to “make all reasonable efforts” to get your package working across all of the major R platforms, and packages that don’t work on at least two will typically not be accepted. The next subsection is optional reading with more details on all the platforms that CRAN cares about and how you can access them. If your ongoing checks are more limited than ours, you may want to make up for that with more extensive presubmis‐ sion checks. You may also need this knowledge to troubleshoot a concrete problem that surfaces in CRAN’s checks, either for an incoming submission or for a package that’s already on CRAN. When running R CMD check for a CRAN submission, you have to address any problems that show up: • You must fix all ERRORs and WARNINGs. A package that contains any errors or warnings will not be accepted by CRAN. • Eliminate as many NOTEs as possible. Each NOTE requires human oversight, which creates friction for both you and CRAN. If there are notes that you do not believe are important, it is almost always easier to fix them (even if the fix is a bit of a hack) than to persuade CRAN that they’re OK. See our online-only guide to R CMD check for details on how to fix individual problems. • If you can’t eliminate a NOTE, list it in cran-comments.md and explain why you think it is spurious. We discuss this file further in “Update Comments for CRAN” on page 334. Note that there will always be one NOTE when you first submit your package. This reminds CRAN that this is a new submission and that they’ll need to do some extra checks. You can’t eliminate this NOTE, so just mention in cran-comments.md that this is your first submission.
328
|
Chapter 22: Releasing to CRAN
CRAN Check Flavors and Related Services CRAN runs R CMD check on all contributed packages upon submission and on a regular basis, on multiple platforms or what they call “flavors”. You can see CRAN’s current check flavors page. There are various combinations of: • Operating system and CPU: Windows, macOS (x86_64, arm64), Linux (various distributions) • R version: r-devel, r-release, r-oldrel • C, C++, FORTRAN compilers • Locale, in the sense of the LC_CTYPE environment variable (this is about which human language is in use and character encoding) CRAN’s check flavors almost certainly include platforms other than your preferred development environment(s), so you will eventually need to make an explicit effort to check and perhaps troubleshoot your package on these other flavors. It would be impractical for individual package developers to personally maintain all of these testing platforms. Instead, we turn to various community- and CRANmaintained resources for this. Here is a selection, in order of how central they are to our current practices: • GitHub Actions (GHA) is our primary means of testing packages on multiple flavors, as covered in “GitHub Actions” on page 298. • R-hub builder (R-hub) is a service supported by the R Consortium where pack‐ age developers can submit their package for checks that replicate various CRAN check flavors. You can use R-hub via a web interface or, as we recommend, through the rhub R package. rhub::check_for_cran() is a good option for a typical CRAN package and is morally similar to the GHA workflow configured by usethis::use_git hub_action("check-standard"). However, unlike GHA, R-hub currently does
not cover macOS, only Windows and Linux.
rhub also helps you access some of the more exotic check flavors and offers specialized checks relevant to packages with compiled code, such as rhub::check_with_sanitizers(). • macOS builder is a service maintained by the CRAN personnel who build the macOS binaries for CRAN packages. This is a relatively new addition to the list and checks packages with “the same setup and available packages as the CRAN M1 build machine.”
Double R CMD Checking
|
329
You can submit your package using the web interface or with dev tools::check_mac_release().
Reverse Dependency Checks • revdepcheck::revdep_check(num_workers = 4) This innocuous checklist item can actually represent a considerable amount of effort. At a high level, checking your reverse dependencies (“revdeps”) breaks down into: • Form a list of your reverse dependencies. These are CRAN packages that list your package in their Depends, Imports, Suggests, or LinkingTo fields. • Run R CMD check on each one. • Make sure you haven’t broken someone else’s package with the planned changes in your package. Each of these steps can require considerable work and judgment. So, if you have no reverse dependencies, you should rejoice that you can skip this step. If you have only a couple of reverse dependencies, you can probably do this “by hand,” i.e., download each package’s source and run R CMD check. Here we explain ways to do reverse dependency checks at scale, which is the problem we face. Some of the packages maintained by our team have thousands of reverse dependencies and even some of the lower-level packages have hundreds. We have to approach this in an automated fashion, and this section will be most useful to other maintainers in the same boat. All of our reverse dependency tooling is concentrated in the revdepcheck pack‐ age. Note that, at least at the time of writing, the revdepcheck package is not on CRAN. You can install it from Github via devtools::install_github("r-lib/ revdepcheck") or pak::pak ("r-lib/revdepcheck"). Do this when you’re ready to do revdep checks for the first time: usethis::use_revdep()
This does some one-time setup in your package’s .gitignore and .Rbuildignore files. Revdep checking will create some rather large folders below revdep/, so you defi‐ nitely want to configure these ignore files. You will also see this reminder to actually perform revdep checks like so, as the checklist item suggests: revdepcheck::revdep_check(num_workers = 4)
This runs R CMD check on all of your reverse dependencies, with our recommenda‐ tion to use four parallel workers to speed things along. The output looks something like this: 330
|
Chapter 22: Releasing to CRAN
> revdepcheck::revdep_check(num_workers = 4) ── INIT ───────────────────────────────────── Computing revdeps ── ── INSTALL ───────────────────────────────────────── 2 versions ── Installing CRAN version of cellranger also installing the dependencies 'cli', 'glue', 'utf8', 'fansi', 'lifecycle', 'magrittr', 'pillar', 'pkgconfig', 'rlang', 'vctrs', 'rematch', 'tibble' Installing DEV version of cellranger Installing 13 packages: rlang, lifecycle, glue, cli, vctrs, utf8, fansi, pkgconfig, pillar, magrittr, tibble, rematch2, rematch ── CHECK ─────────────────────────────────────────── 8 packages ── ✔ AOV1R 0.1.0 ── E: 0 | W: 0 | N: 0 ✔ mschart 0.4.0 ── E: 0 | W: 0 | N: 0 ✔ googlesheets4 1.0.1 ── E: 0 | W: 0 | N: 1 ✔ readODS 1.8.0 ── E: 0 | W: 0 | N: 0 ✔ readxl 1.4.2 ── E: 0 | W: 0 | N: 0 ✔ readxlsb 0.1.6 ── E: 0 | W: 0 | N: 0 ✔ unpivotr 0.6.3 ── E: 0 | W: 0 | N: 0 ✔ tidyxl 1.0.8 ── E: 0 | W: 0 | N: 0 OK: 8 BROKEN: 0 Total time: 6 min ── REPORT ──────────────────────────────────────────────────────── Writing summary to 'revdep/README.md' Writing problems to 'revdep/problems.md' Writing failures to 'revdep/failures.md' Writing CRAN report to 'revdep/cran.md'
To minimize false positives, revdep_check() runs R CMD check twice per revdep: once with the released version of your package currently on CRAN and again with the local development version, i.e., with your release candidate. Why two checks? Because sometimes the revdep is already failing R CMD check and it would be incorrect to blame your planned release for the breakage. revdep_check() reports the packages that can’t be checked and, most importantly, those where there are so-called “changes to the worse,” i.e., where your release candidate is associated with new prob‐ lems. Note also that revdep_check() always works with a temporary, self-contained package library, i.e., it won’t modify your default user or system library.
Reverse Dependency Checks
|
331
tidyverse Team We actually use a different function for our reverse dependency checks: revdepcheck::cloud_check(). This runs the checks in the cloud, massively in parallel, making it possible to run revdep checks for packages like testthat (with >10,000 revdeps) in just a few hours! cloud_check() has been a gamechanger for us, allowing us to run
revdep checks more often. For example, we even do this now when assessing the impact of a potential change to a package (see “Back‐ ward Compatibility and Breaking Change” on page 307), instead of only right before a release. At the time of writing, cloud_check() is only available for package maintainers at Posit, but we hope to offer this service for the broader R community in the future.
In addition to some interactive messages, the revdep check results are written to the revdep/ folder: revdep/README.md This is a high-level summary aimed at maintainers. The filename and markdown format are very intentional, in order to create a nice landing page for the revdep folder on GitHub. revdep/problems.md This lists the revdeps that appear to be broken by your release candidate. revdep/failures.md This lists the revdeps that could not be checked, usually because of an installation failure, either of the revdep itself or one of its dependencies. revdep/cran.md This is a high-level summary aimed at CRAN. You should copy and paste this into cran-comments.md (see “Update Comments for CRAN” on page 334). checks.noindex, data.sqlite, library.noindex, and other files and folders These are for revdepcheck’s internal use and we won’t discuss them further. The easiest way to get a feel for these different files is to look around at the latest revdep results for some tidyverse packages, such as dplyr or tidyr. The revdep check results—local, cloud, or CRAN—are not perfect, because this is not a simple task. There are various reasons a result might be missing, incorrect, or contradictory in different runs:
332
|
Chapter 22: Releasing to CRAN
False positives Sometimes revdepcheck reports a package has been broken, but things are actually fine (or, at least, no worse than before). This most commonly happens because of flaky tests that fail randomly (see “Skip a Test” on page 228), such as HTTP requests. This can also happen because the instance runs out of disk space or other resources, so the first check using the CRAN version succeeds and the second check using the dev version fails. Sometimes it’s obvious that the problem is not related to your package. False negatives Sometimes a package has been broken, but you don’t detect that. For us, this usually happens when cloud_check() can’t check a revdep because it can’t be installed, typically because of a missing system requirement (e.g., Java). These are separately reported as “failed to test” but are still included in problems.md, because this could still be direct breakage caused by your package. For example, if you remove an exported function that’s used by another package, installation will fail. Generally these differences are less of a worry now that CRAN’s own revdep checks are well automated, so new failures typically don’t involve a human.
Revdeps and Breaking Changes If the revdep check reveals breakages, you need to examine each failure and deter‐ mine if it’s: • A false positive. • A nonbreaking change, i.e., a failure caused by off-label usage of your package. • A bug in your package that you need to fix. • A deliberate breaking change. If your update will break another package (regardless of why), you need to inform the maintainer, so they hear it first from you, rather than CRAN. The nicest way to do this is with a patch that updates their package to play nicely with yours, perhaps in the form of a pull request. This can be a decent amount of work and is certainly not feasible for all maintainers. But working through a few of these can be a good way to confront the pain that breaking change causes and to reconsider whether the benefits outweigh the costs. In most cases, a change that affects revdeps is likely to also break less visible code that lives outside of CRAN packages, such as scripts, reports, and Shiny apps. If you decide to proceed, functions such as revdepcheck::revdep_maintainers() and revdepcheck::revdep_email() can help you notify revdep maintainers en
Reverse Dependency Checks
|
333
masse. Make sure the email includes a link to documentation that describes the most common breaking changes and how to fix them. You should let the maintainers know when you plan to submit to CRAN (we recommend giving at least two weeks’ notice), so they can submit their updated version before that. When your release date rolls around, re-run your checks to see how many problems have been resolved. Explain any remaining failures in cran-comments.md as demonstrated in “Update Comments for CRAN” on page 334. The two most common cases are that you are unable to check a package because you aren’t able to install it locally or a legitimate change in the API that the maintainer hasn’t addressed yet. As long as you have given sufficient advance notice, CRAN will accept your update, even if it breaks some other packages.
tidyverse Team Lately the tidyverse team is trying to meet revdep maintainers more than halfway in terms of dealing with breaking changes. For example, in GitHub issue tidyverse/dplyr#6262, the dplyr main‐ tainers tracked hundreds of pull requests in the build-up to the release of dplyr v1.1.0. As the PRs are created, it’s helpful to add links to those as well. As the revdep maintainers merge the PRs, they can be checked off as resolved. If some PRs are still in-flight when the announced submission date rolls around, the situation can be summarized in cran-comments.md, as was true in the case of dplyr v1.1.0.
Update Comments for CRAN • Update cran-comments.md We use the cran-comments.md file to record comments about a submission, mainly just the results from R CMD check and revdep checks. If you are making a specific change at CRAN’s request, possibly under a deadline, that would also make sense to mention. We like to track this file in Git, so we can see how it changes over time. It should also be listed in .Rbuildignore, since it should not appear in your package bundle. When you’re ready to submit, devtools::submit_cran() (see “The Submission Process” on page 336) incorporates the contents of cran-comments.md when it uploads your submission. The target audience for these comments is the CRAN personnel, although there is no guarantee that they will read the comments (or when in the submission process they read them). For example, if your package breaks other packages, you will likely receive an automated email about that, even if you’ve explained it in the comments. Sometimes a human at CRAN then reads the comments, is satisfied, and accepts your package anyway, without further action from you. At other times, your package may be stuck in the queue until you copy cran-comments.md and paste it into an email 334
|
Chapter 22: Releasing to CRAN
exchange to move things along. In either case, it’s worth keeping these comments in their own, version-controlled file. Here is a fairly typical cran-comments.md from a recent release of forcats. Note that the R CMD check results are clean, i.e., there is nothing that needs to be explained or justified, and there is a concise summary of the revdep process: ## R CMD check results 0 errors | 0 warnings | 0 notes ## revdepcheck results We checked 231 reverse dependencies (228 from CRAN + 3 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package. We saw 2 new problems: * epikit * stevemisc Both maintainers were notified on Jan 12 (~2 week ago) and supplied with patches. We failed to check 3 packages * genekitr (NA) * OlinkAnalyze (NA) * SCpubr (NA)
This layout is designed to be easy to skim, and easy to match up to the R CMD check results seen by CRAN maintainers. It includes two sections: Check results We always state that there were no errors or warnings (and we make sure that’s true!). Ideally we can also say there were no notes. But if not, any NOTEs are presented in a bulleted list. For each NOTE, we include the message from R CMD check and a brief description of why we think it’s OK. Here is how a NOTE is explained for the nycflights13 data package: ## R CMD check results 0 errors | 0 warnings | 1 note * Checking installed package size: installed size is 6.9Mb sub-directories of 1Mb or more: data 6.9Mb This is a data package that will be rarely updated.
Update Comments for CRAN
|
335
Reverse dependencies If there are revdeps, this is where we paste the contents of revdep/cran.md (see “Reverse Dependency Checks” on page 330). If there are no revdeps, we recom‐ mend that you keep this section, but say something like: “There are currently no downstream dependencies for this package.”
The Submission Process • usethis::use_version('minor') (or 'patch' or 'major') • devtools::submit_cran() • Approve email When you’re truly ready to submit, it’s time to actually bump the version number in DESCRIPTION. This checklist item will reflect the type of release declared at the start of this process (patch, minor, or major), in the initial call to use_release_issue(). We recommend that you submit your package to CRAN by calling devtools:: submit_cran(). This convenience function wraps up a few steps: • Creates the package bundle (see “Bundled Package” on page 35) with pkgbuild::build(manual = TRUE), which ultimately calls R CMD build. • Posts the resulting *.tar.gz file to CRAN’s official submission form, populating your name and email from DESCRIPTION and your submission comments from cran-comments.md. • Confirms that the submission was successful and reminds you to check your email for the confirmation link. • Writes submission details to a local CRAN-SUBMISSION file, which records the package version, SHA, and time of submission. This information is used later by usethis::use_github_release() to create a GitHub release once your package has been accepted. CRAN-SUBMISSION will be added to .Rbuildignore. We generally do not gitignore this file, but neither do we commit it. It’s an ephemeral note that exists during the interval between submission and (hope‐ fully) acceptance. After a successful upload, you should receive an email from CRAN within a few minutes. This email notifies you, as maintainer, of the submission and provides a confirmation link. Part of what this does is confirm that the maintainer’s email address is correct. At the confirmation link, you are required to reconfirm that you’ve followed CRAN’s policies and that you want to submit the package. If you fail to complete this step, your package is not actually submitted to CRAN!
336
| Chapter 22: Releasing to CRAN
Once your package enters CRAN’s system it is automatically checked on Windows and Linux, probably against both the released and development versions of R. You will get another email with links to these check results, usually within a matter of hours. An initial submission (see “Initial CRAN Release: Special Considerations” on page 323) will receive additional scrutiny from CRAN personnel. The process is potentially fully automated when updating a package that is already on CRAN. If a package update passes its initial checks, CRAN will then run reverse dependency checks.
Failure Modes There are at least three ways for your CRAN submission to fail: • It does not pass R CMD check. This is an automated result. • Human review finds the package to be in violation of CRAN policies. This applies mostly to initial submissions, but sometimes CRAN personnel decide to engage in ad hoc review of updates to existing packages that fail any automated checks. • Reverse dependency checks suggest there are “changes to the worse.” This is an automated result. Failures are frustrating and the feedback may be curt and may feel downright insult‐ ing. Take comfort in the fact that this a widely shared experience across the R community. It happens to us on a regular basis. Don’t rush to respond, especially if you are feeling defensive. Wait until you are able to focus your attention on the technical issues that have been raised. Read any check results or emails carefully and investigate the findings. Unless you feel extremely strongly that discussion is merited, don’t respond to the e-mail. Instead: • Fix the identified problems and make recommended changes. Rerun dev tools::check() on any relevant platforms to make sure you didn’t accidentally introduce any new problems. • Increase the patch version of your package. Yes, this means that there might be gaps in your released version numbers. This is not a big deal. • Add a “Resubmission” section at the top of cran-comments.md. This should clearly identify that the package is a resubmission, and list the changes that you made: ## Resubmission This is a resubmission. In this version I have:
Failure Modes
|
337
* Converted the DESCRIPTION title to title case. * More clearly identified the copyright holders in the DESCRIPTION and LICENSE files.
• If necessary, update the check results and revdep sections. • Run devtools::submit_cran() to resubmit the package. If your analysis indicates that the initial failure was a false positive, reply to CRAN’s email with a concise explanation. For us, this scenario mostly comes up with respect to revdep checks. It’s extremely rare for us to see failure for CRAN’s initial R CMD check runs and, when it happens, it’s often legitimate. On the other hand, for pack‐ ages with a large number of revdeps, it’s inevitable that a subset of these packages have some flaky tests or brittle examples. Therefore it’s quite common to see revdep failures that have nothing to do with the proposed package update. In this case, it is appropriate to send a reply email to CRAN explaining why you think these are false positives.
Celebrating Success Now we move into the happiest section of the checklist: • Accepted • git push • usethis::use_github_release() • usethis::use_dev_version() • git push • Finish blog post, share on social media, etc. • Add link to blog post in pkgdown news menu CRAN will notify you by email once your package is accepted. This is when we first push to GitHub with the new version number, i.e., we wait until it’s certain that this version will actually be released on CRAN. Next we create a GitHub release corresponding to this CRAN release, using usethis::use_github_release(). A Git‐ Hub release is basically a glorified Git tag. The only aspect of GitHub releases that we regularly take advantage of is the release notes. usethis::use_github_release() creates release notes from the NEWS bullets relevant to the current release. Note that usethis::use_github_release() depends crucially on the CRAN-SUBMISSION file that was written by devtools::submit_cran(): that’s how it knows which SHA to tag. After the successful creation of the GitHub release, use_github_release() deletes this temporary file.
338
|
Chapter 22: Releasing to CRAN
Now we prepare for the next release by incrementing the version number yet again, this time to a development version using usethis::use_dev_version(). It makes sense to immediately push this state to GitHub so that, for example, any new branches or pull requests clearly have a development version as their base. After the package has been accepted by CRAN, binaries are built for macOS and Windows. It will also be checked across the panel of CRAN check flavors. These processes unfold over a few days post-acceptance, and sometimes they uncover errors that weren’t detected by the less comprehensive incoming checks. It’s a good idea to visit your package’s CRAN landing page a few days after release and just make sure that all still seems to be well. Figure 22-1 highlights where these results are linked from a CRAN landing page.
Figure 22-1. Link to CRAN check results
Celebrating Success
|
339
If there is a problem, prepare a patch release to address it and submit using the same process as before. If this means you are making a second submission less than a week after the previous, explain the situation in cran-comments.md. Getting a package established on CRAN can take a couple of rounds, although the guidance in this chapter is intended to maximize the chance of success on the first try. Future releases, initiated from your end, should be spaced at least one or two months apart, according to CRAN policy. Once your package’s binaries are built and it has passed checks across CRAN’s flavors, it’s time for the fun part: publicizing your package. This takes different forms, depending on the type of release. If this is your initial release (or, at least, the first release for which you really want to attract users), it’s especially important to spread the word. No one will use your helpful new package if they don’t know it exists. There are a number of places to announce your package, such as Twitter, Mastodon, LinkedIn, Slack communities, etc. Make sure to use any relevant tags, such as the #rstats hashtag. If you have a blog, it’s a great idea to write a post about your release. When introducing a package, the vibe should be fairly similar to writing your README or a “Get Started” vignette. Make sure to describe what the package does, so that people who haven’t used it before can understand why they should even care. For existing packages, we tend to write blog posts for minor and major releases, but not for a patch release. In all cases, we find that these blog posts are most effective when they include lots of examples, i.e., “show, don’t tell.” For package updates, remember that the existence of a comprehensive NEWS file frees you from the need to list every last change in your blog post. Instead, you can focus on the most important changes and link to the full release notes, for those who want the gory details. If you do blog about your package, it’s good to capture this as yet another piece of documentation in your pkgdown website. A typical pkgdown site has a “News” item in the top navbar, linking to a “Changelog,” which is built from NEWS.md. This drop-down menu is a common place to insert links to any blog posts about the package. You can accomplish this by having YAML like this in your _pkgdown.yml configuration file: news: releases: - text: "Renaming the default branch (usethis >= 2.1.2)" href: https://www.tidyverse.org/blog/2021/10/renaming-default-branch/ - text: "usethis 2.0.0" href: https://www.tidyverse.org/blog/2020/12/usethis-2-0-0/ - text: "usethis 1.6.0" href: https://www.tidyverse.org/blog/2020/04/usethis-1-6-0/
Congratulations! You have released your first package to CRAN and made it to the end of the book!
340
|
Chapter 22: Releasing to CRAN
Index
Symbols
::operator (access with namespace), 143, 151 > (angle bracket) for plain text in YAML, 262 [] (square brackets), auto-linking function in documentation, 239 ` (backticks), inline code in documentation, 239
A
ad hoc testing, 184 AGPL License, 176 aliasing a function, R code, 90 angle bracket (>) for plain text in YAML, 262 Apache License, 173, 175 arguments deprecating, 315 function documentation, 243-246 articles, 259, 268, 289 ASCII characters, and data with non-ASCII characters, 104 attaching devtools, 28-29 versus loading a package, 151-152 aut role, in Authors@R field, 127 Authors@R field, DESCRIPTION, 126-128 automated checking (see R CMD check) automated testing, 211-212, 216 auxiliary files, 119
B
backports package, 131 backticks (`), inline code in documentation, 239 backward compatibility, 153, 307, 311
binary package, 35, 38-40, 137 blocks, roxygen2 comments in, 237 blog for package release, 340 breaking changes, 301, 307-311, 333 browseVignettes(), 259 BugReports field, DESCRIPTION, 128 bugs, minimizing, 97, 136, 184 build time versus run time, 74-75 build(), 35 bundled package, 35-38, 61, 178-179, 325
C
CC-BY license, 176 CC0 license, 174, 176 check(), 9-10, 23 (see also R CMD check) double R CMD checking, 327 fast feedback value, 97 macro-iteration scale, 189 prior to install, 14 versus R CMD check, 271 workflow for, 60-62 check_win_devel(), 328 child documents, 256 CI/CD (continuous integration and develop‐ ment), 226, 295, 298-299 CITATION file, 117-119 citation(), 117-119 CLA (contributor license agreement), 177 code style, xiv, 86 Collate field, DESCRIPTION, 133 colon, double (::) operator (access with name‐ space), 143, 151 colors available, R code, 89-90
341
command line tools, 29-30 comments Authors@R field, 128 for CRAN update, 334 functions, 237-239 roxygen, 156, 234-235, 254 submitting to CRAN, 324 commits, 9, 19, 302-304 conditional execution, example code, 253 Config/Needs/* field, DESCRIPTION, 166 configuration tools, 119 continuous integration and development (CI/ CD), 226, 295, 298-299 contributor license agreement (CLA), 177 copyleft licenses for code, 173 copyright holder information, 177 covr package, 202 cph role, in Authors@R field, 127 CRAN, 12 (see also release process; submission pro‐ cess) accessing build of package, 35-37 archive access for older versions of usethis, 304 check flavors and related services, 329 package considerations for testing, 228-230 policies, 112, 228, 230, 251, 325 cran-comments.md file, 334 cre role, in Authors@R field, 127 create_package(), 3-5 Creative Commons licenses, 176 credentials management for testing, 227 ctb role, in Authors@R field, 127 custom fields, DESCRIPTION, 134 custom test expectations, 223
D
data, 99-113 documenting, 104, 236 exported, 100-105, 236 external, 106 internal, 105-106 internal state, 109-112 lazily loaded datasets, 101 licensing for, 176 metadata (see metadata) persistent user, 112-113 raw data file, 106-109 storing test data, 216
342
| Index
data packages, 99 data-raw/ directory, 103 data/ directory, 100 Date field, DESCRIPTION, 134 DCF file format, 124 debugging, 184 demo/ directory, 116 dependencies, 135-171 already fulfilled, 137 attaching versus loading, 151-152 burden of package installation, 137 changes during lifecycle, 317 confusion about Imports, 155-156 and Depends, 131, 164-165 differences in, 136-138 exporting functions, 167-169 flaky test consequences, 229 in function documentation, 252 functionality, 138 holistic, balanced, quantitative approach, 138-139 and Imports, 129, 142, 152, 157-161 maintenance capacity, 138 minimum version issue, 131, 132 namespaces, 142-146, 156 package as nonstandard dependency, 165-167 recursive, 137 S3, imports and exports related to, 169-171 search path, 146-151 and Suggests, 129, 142, 161-164 tidyverse team maintenance of package, 140-141 types, 136 Depends field, DESCRIPTION, 131, 152, 164-165 deprecating elements of package, 313, 314-317 desc package, 124 @description, roxygen tag, 242 DESCRIPTION file, 123-134 Authors@R field, 126-128 BugReports field, 128 Collate field, 133 custom fields, 134 Date field, 134 Depends field, 131, 152, 164-165 editing, 10 Encoding field, 133 Enhances field, 132
Imports field, 129, 142, 152, 155, 157-161 LazyData field, 101-102, 133 License field, 129, 174 LinkingTo field, 132 minimum versions, 130-131 R version issue, 132 Suggests field, 129, 142, 161-164, 207 SystemRequirements field, 133 title and description best practices, 125-126 in toy package, 5 URL field, 128 Version field, 133 VignetteBuilder field, 133 description, function, 242 @details, roxygen tag, 243 development mode, pkgdown website, 290 development practices, 47, 295-300 (see also workflows and tooling) development version, 306 devtools, xiii, xvi-xvii attaching/installing, 28-29 initiating a package, 1 package development workflow, 24 and RStudio, 52 system setup, 27 dev_sitrep(), 30 diagrams, in vignettes, 264 directives, NAMESPACE, 144-146 directories, 51, 115-116 document(), 12-13, 61, 97, 235 documentation, 233-292 of exported datasets, 104, 236 functions (see function documentation) README files, 20-23, 273-277, 324, 327 vignettes (see vignettes) website, 259, 268, 281-292 downstream dependencies, 209, 308, 330-336, 335, 337 dplyr package, 43, 71-74, 241, 256
E
Encoding field, DESCRIPTION, 133 Encoding(), 105 Enhances field, DESCRIPTION, 132 environment, managing internal state, 109-112 errors examples in function documentation, 251 expectation testing, 193-195 ERRORs, R CMD check, 61, 328
evaluation, controlling in vignettes, 267 example code and Depends, 165 in function documentation, 248-254 package::function() syntax, 161 and Suggests, 164 website rendering of, 286 example(), 248 @examples, roxygen tag, 248-254 @examplesIf, roxygen tag, 253 exec/ directory, 116 expectations, in testing, 192-199, 223 expect_equal(), 193, 199, 229 expect_false(), 199 expect_identical(), 229 expect_length(), 199 expect_match(), 199 expect_s3_class(), 199 expect_s4_class(), 199 expect_setequal(), 199 expect_snapshot(), 196, 198 expect_true(), 199 experimental stage, package lifecycle, 312 explicit paths, advantages of, 57 export(), NAMESPACE directive, 13, 144 @export, roxygen tag, 104, 145 exportClasses(), NAMESPACE directive, 145 exported data, 100-105 exporting functions, dependencies, 167-169 exportMethods(), NAMESPACE directive, 145 exportPattern(), NAMESPACE directive, 145 external data, 106 external dependencies in function examples, 252
F
failure modes, release process, 337 failure plan for test, 208-209 fast feedback, 85, 97 filepaths raw data file, 107-108 from vignettes, 265 and working directory, 56 filesystem hygiene, 230 flaky tests, 229 fnd role, in Authors@R field, 127 @format, roxygen tag, 104 function documentation, xiv, 233-257 arguments, 243-246
Index
|
343
blocks, 237 comments, 237-239 description, 242 details, 243 example best practices, 248-254 help topic for package, 256 markdown features, 239-240 return value, 246 reusing documentation, 254-256 tags, 237 title, 240 workflow, 234-237 function lookup inside a package, 148-151 for user code, 146-148 functions, xvi (see also helper functions; R code) deprecating, 314-315 exporting, 167-169 filename conventions, 83-91 superseding, 318 using from another package, 16-19 website index, 287 writing, 6-7
G
Git and GitHub, 3, 296 pushing new version of package to, 338 use_git(), 5-6 use_github(), 19 use_release_issue(), 321 as website host, 283 GitHub Actions (GHA), 202, 228, 283, 298-300, 329 GitHub for the useR, 297 GitHub Pages, 283 .gitignore file, 5, 282, 330 global environment, 149 googledrive package, 112, 290 GPL License, 173, 175, 179
H
Happy Git, 297 health checks, constant, 97-97 help topic for package, creating, 256 helper files, testthat, 213 helper functions defined inside a test, 222 deprecating, 316
344
| Index
separate files for, 67-68 for state change, 93 test fixtures, 220 hexSticker package, 285 hosted version control, 295, 297
I
IDE (integrated development environment), 295 @import, roxygen tag, 160, 164 importClassesFrom(), NAMESPACE directive, 145 importFrom(), NAMESPACE directive, 145, 156 importMethodsFrom(), NAMESPACE direc‐ tive, 145 imports environment, 149 Imports field, DESCRIPTION, 129, 142, 152, 155, 157-161 in-memory package, 42 index organization, website, 287, 289 @inherit, roxygen tag, 255 @inheritDotParams, roxygen tag, 255 inheriting arguments, 245 inheriting documentation, 255 @inheritSection, roxygen tag, 255 initiating a package, 1 inst/ directory, 116, 116 inst/CITATION, 117-119 inst/extdata, 100, 106, 117 install() regexcite package, 14, 23-24 install.packages(), 41, 311 installed files, 116-119 installed package, 40-42, 51 install_github(), 42 integrated development environment (IDE), 295 interactive test development mode, 211-212, 216 internal data, 105-106 internal state, data, 109-112 itdepends package, 139
L
LaTeX, and .Rd format, 235 lazily loaded datasets, 101 LazyData field, DESCRIPTION, 101-102, 133 LGPL License, 175 libraries, 43-45
library(), 42-43, 91, 151, 211, 219 license compatibility issue, 178 License field, DESCRIPTION, 129, 174 LICENSE file, 175 LICENSE.md, 11, 175 LICENSE.note, 175, 179 licensing, 173-180 of code given to you, 177 of code you bundle, 178-179 of code you use, 180 of code you write, 174-177 use_mit_license(), 11 lifecycle package, 301-319 backward compatibility, 307 breaking change, 307-311 package evolution, 302-304 releasing to CRAN (see release process) stages and badges, 312-314 superseding a function, 318 supporting tools, 312-319 tidyverse version conventions, 306 version mechanics, 310 version number, 304-306 version releases, 309-310 lifecycle::deprecated(), 315 lifecycle::deprecate_warn(), 314-315 lifecycle::is_present(), 316 linking to documentation, 233 in vignettes, 264, 288 website, 286 LinkingTo field, DESCRIPTION, 132 Linux, 30, 38, 44 lists, function documentation, 240 loading versus attaching dependencies, 151-152 loadNamespace(), 152 load_all(), 7-9, 58-59, 85, 211 local_*(), 94
M
macOS, 30, 38, 44, 329 macro-iteration scale, test runs, 189 maintenance capacity, dependencies, 138 major release, 309, 326 markdown files and features, 273-279 backticks (`) for inline code, 239 lists, 240 NEWS file, 277-279 README files, 20-23, 273-277, 324, 327
square brackets ([]), auto-linking function in documentation, 239 merging topics, 255 metadata, xvii, 173-180, 261-263 (see also dependencies; DESCRIPTION file) mezzo-iteration scale, test runs, 189 micro-iteration scale, test runs, 188 minimum versions, DESCRIPTION, 130-131 minor release, 309 MIT license, 173, 175 mocking, 226 multiple arguments, function documentation, 244 multiple functions in one documentation topic, 255
N
NAMESPACE file, 5, 13, 144-146, 156 namespace system, 135, 142-146 function lookup inside package, 150 requireNamespace(), 152, 163 roxygen2 tags, 145, 156-157, 159 naming your package, 48-50 NEWS file, 277-279, 324 NEWS.md file, 324 non-ASCII characters in exported data, 104 NOTEs, R CMD check, 61, 328
O
on.exit(), 93, 94 .onAttach(), 95-96, 151 .onLoad(), 95-96, 151 options(), 95
P
package development process, 1-25 check() helper, 9-10, 14, 23 create_package(), 3-5 DESCRIPTION file, editing, 10 devtool loading, 1 document(), 12-13 initiating a package, 1 install() regexcite package, 14, 23-24 regexcite, 2 use_git(), 5-6 use_github(), 19 use_mit_license(), 11 use_package(), 16-19
Index
|
345
use_r() helper to create .R file, 7 use_readme_rmd(), 20-23 use_testthat(), 15 package library, 40 package states, 33-42 binary package, 38-40 bundled package, 35-38 in-memory package, 42 installed package, 40-42 managing with withr, 93-94 source package, 34 package.skeleton(), 51 package::function() syntax, calling external functions using, 158-160 packages, 83-119 configuration tools, 119 data (see data) directories, 115-116 installed files, 116-119 metadata (see metadata) as nonstandard dependencies, 165-167 philosophy of development, xvi purposes of, xv R code (see R code) versus scripts in R code, 87-91 structure, 33-45 system setup, 27-30 pak package, 41, 139 @param, roxygen tag, 243 parsed data, 39, 256 paste0(), 7 patch release, 309 path_package, 108 permissive licenses for code, 173-174 persistent storage, test fixtures, 221 persistent user data, 112-113 person(), 127 pkgdown package, 281, 286 (see also website) pkgdown::build_site(), 282 pkgload::load_all(), 59 pkgname.Rproj, 51 pkg_example(), 109 pkg_name_check(), 50 po/ directory, 116 policies, CRAN, 112, 228, 230, 251, 325 Posit, xvi prebuilt vignettes, 271 proj_sitrep(), 56
346
|
Index
proprietary license, 174
R
R CMD build, 269-270 R CMD check, 9-10, 23 accessing directly, 62 and check(), 60 continuous integration advantage, 298 documenting exported functions and data‐ sets, 236 double check for release, 327-330 and example code, 248 failing at CRAN submission, 337 via GHA, 298 importance of running regularly, 321 prior to install, 14 for tests, 228-230 vignettes, 271 workflows and devtools, 60-62 R CMD INSTALL, 40, 119 R code, 83-98 code style, 86 fast feedback via load_all(), 85 health checks, 97-97 organizing functions into files, 83-85 package versus script exercise, 63-79 build versus run times, 74-75 converting failed package, 68-71 minimal work package, 71-74 scripts that work, 63-65 separate files for helper functions, 67-68 side effects, 76-78 respecting R landscape, 91-96 scripts versus packages, 87-91 .R files, 7 R version issue, DESCRIPTION, 132 R-devel, 328 R-hub, 34, 329 R-package-devel mailing list, 326 R/ directory, 5 R6 object system, 169 raw data file, 106-109 .Rbuildignore file, 4, 37-38, 330 Rcpp, 115 .Rd files, 233 .rda file, 100 @rdname, roxygen tag, 255 re-exporting functions, 168 README files, 273-277, 324
README.md, 274-277, 327 README.Rmd, 20-23, 274-277, 327 readxl::readxl_example(), 109 recursive dependencies, 137 redirected URLs in package, 327 reference index, website, 286-287 regexcite, 2 regexcite.Rproj, 5 regular expressions, 2, 16 release process, CRAN, 321-340 celebrating success, 338-340 double R CMD checking, 327-330 failure modes, 337 initial release considerations, 323-326 keeping up with change, 326 policies, 112, 228, 230, 251, 325 release type decision, 323 revdep checks, 330-336 submission process, 336 title and description for, 125 update comments for CRAN, 334 relicensing, 176 rematch2 R package, 2 Remotes field, DESCRIPTION, 165 remotes package, 41 renv package, 311 .Renviron file, 45 repeat code, in testing, 210, 219 reproducibility, CRAN considerations for test‐ ing, 229 require(), 91, 151 requireNamespace(), 152, 163 return value, function documentation, 246 @returns, roxygen tag, 246, 325 reusing documentation, 254-256 revdepcheck package, 330-336 revdepcheck::cloud_check(), 332 revdepcheck::revdep_check, 330 revdepcheck::revdep_email(), 333 revdepcheck::revdep_maintainers(), 333 reverse dependencies, 209, 308, 330-336, 335, 337 rex R package, 2 .Rinstignore, 42 rJava R package, 137 rlang::check_installed(), 162, 318 rlang::is_installed(), 162, 318 .Rmd versus .Rd files and vignettes, 269 roles, in Authors@R field, 127
roxygen2 package, 12 (see also function documentation) comments, blocks, tags, 237-239 documentation workflow, 234-237 documenting datasets, 104 markdown features, 239-240 and NAMESPACE file, 145 and namespace tags, 145, 156-157, 159 .Rd file generation, 234 roxygen2::roxygenise(), 235 .Rprofile, 29 .Rproj file, 54 .Rproj.user, 4 RStudio, xvi-xvii example code, 249 function definition storage, 85 installing package with, 41 pkgdown build in, 283 test runs in, 189 RStudio Desktop, 27 RStudio IDE, 27 RStudio Project, 4, 52-56 run time versus build time, 74-75
S
S3 object system, 169-171 S3method(), NAMESPACE directive, 144 s3_register(), 172 S4 object system, 145, 169 scripts and code execution, 87-91 function name conventions, 83-85 other users' use of, 91-96 package versus script exercise, 63-65 versus packages, 87-91 sd() and var(), 143 search path, dependencies, 146-151 search(), 146 secrets, testing, 227 self-contained tests, 205-208 self-sufficient tests, 203-205 setup files, testthat, 214 side effects, 76-78, 95-96 skipping a test, 61, 224-226, 228 skip_on_cran(), 228 snapshot tests, 195-198, 229 software development practices, 47, 295-300 (see also workflows and tooling) source package, 34, 35-37, 51
Index
|
347
source(), 87, 92 @source, roxygen tag, 104 speed of test run, 228 square brackets ([]), auto-linking function in documentation, 239 src/ directory, 115, 119 stable stage, package lifecycle, 312 Stack Overflow, 179 stats package, 143 stats::sd(), 148-151 stats::var(), 148-151 storing test data, 216 stringi package, 2 stringr package, 2, 17-18, 238, 241 strings in package data, 104 strsplit(), 6-9 style guide for function files, 86 styler package, 86-87 submission process, CRAN, 336 ASCII character requirement, 98 and example code, 252 and number of packages in Imports, 141 package data, 103 return value requirement, 247 timing of, 166 Title and Description, 126 and vignettes, 272 with UTF-8-encoded strings, 105 submit_cran(), 336 Suggests field, DESCRIPTION, 129, 142, 161-164, 207 superseded stage, lifecycle package, 313, 318 system requirements, and installation burden, 137 system setup, 27-30 devtools as metapackage from intro, 27 R build toolchain, 29-30 startup configuration, 28-29 usethis package as most used, 28 verifying system prep, 30 system.file(), 88, 108 SystemRequirements field, DESCRIPTION, 133
T
tags, roxygen2 comments, 237 temp directory, handling test files, 216, 230 test suite design, 201-217 files relevant to testing, 212-217
348
|
Index
interactive and automated testing, 211-212 repetition as OK, 210 self-contained tests, 205-208 self-sufficient tests, 203-205 test failure plan, 208-209 what to test, 201-202 test(), 97, 189 testing (with testthat), 183-230 automated formal testing, 183-185 building testing tools, 222-224 CRAN package considerations, 228-230 expectations, 192-199 initial setup, 186 mechanics and workflow, 186-190 mocking, 226 on multiple platforms to submit to CRAN, 329 organization of tests, 190-192 package::function() syntax for Imports and Depends, 162, 164 running tests, 188-190 secrets, 227 skipping a test, 61, 224-226 skipping at test, 228 snapshot tests, 195-198, 229 speed of test run, 228 and Suggests, 162 test fixtures, 219-222 test suite design (see test suite design) use_testthat(), 15-16 tests/testthat.R file, 213 testthat package, 183, 185 (see also testing) testthat.R file, 186 testthat::local_reproducible_output(), 208 testthat::skip(), 224-226 testthat::snapshot_accept(), 198 testthat::snapshot_review(), 197 testthat::test_path(), 216 test_active_file(), 189 test_that(), 204-208 tidyr package, 84 tidyverse package, 132, 306 tidyverse style guide, 86 time to compile factor, and installation burden, 137 title and description best practices, 125-126, 325 (see also DESCRIPTION file)
title, function documentation, 240 tools/ directory, 116, 119 tools::checkRdaFiles(), 103 tools::resaveRdaFiles(), 103 tools::R_user_dir(), 113 toy package (see package development process)
U
unit testing (see testing) update.packages(), 311 upstream dependencies, 137 URL field, DESCRIPTION, 128 urlchecker::url_check(), 327 useDynLib(), NAMESPACE directive, 145 user content to store sensitive data, 112 user’s landscape, avoiding use of, 92 usethis package, 28-29, 42, 56 usethis, and internal environment, 112 usethis::create_package(), 50, 123 usethis::readme_rmd(), 275 usethis::use_article(), 269, 290 usethis::use_build_ignore(), 37 usethis::use_citation(), 119 usethis::use_cran_comments(), 324 usethis::use_data(), 100, 106 usethis::use_data_raw(), 103, 106 usethis::use_dev_package(), 165 usethis::use_dev_version(), 339 usethis::use_github(), 129 usethis::use_github_action(), 202, 298, 299 usethis::use_github_links(), 129 usethis::use_github_release(), 338 usethis::use_news_md(), 324 usethis::use_package(), 130 usethis::use_package_doc(), 256 usethis::use_pkgdown(), 282 usethis::use_pkgdown_github_pages(), 283 usethis::use_r(), 187-188 usethis::use_release_issue(), 321, 323 usethis::use_revdep(), 330 usethis::use_test(), 187-188 usethis::use_tidy_dependencies(), 140 usethis::use_tidy_description(), 129 usethis::use_version(), 310, 336 usethis::use_vignette(), 260 use_agpl_license(), 176 use_apache_license(), 173, 175 use_cc0_license(), 174, 176 use_ccby_license(), 176
use_git(), 5-6 use_github(), 19 use_gpl_license(), 174, 175, 179 use_import_from(), 159 use_lgpl_license(), 175 use_mit_license(), 11 use_package(), 16-19 use_proprietary_license(), 174 use_r() helper to create .R file, 7 use_readme_rmd(), 20-23 use_testthat(), 15-16 UTF-8 Everywhere manifesto, 104 utils.R file, 84 utils::data(), 102 utils::package_version(), 304
V
var() and sd(), 143 Version field, DESCRIPTION, 133 versioning, 295 (see also Git and GitHub) backward compatibility, 153, 307, 311 breaking change, 307, 309, 310 and dependency change, 317 major, minor, patch release types, 309-310 mechanics, 310 minimum versions, 130-131 package evolution, 302-304 release type decision, 323 version number, 304-306 vignette(), 259, 260, 264, 289 VignetteBuilder field, DESCRIPTION, 133 vignettes, 259-272 articles instead of, 268, 289 building and checking, 269-271 code considerations, 267-268 and Depends, 165 diagrams, 264 fields in template, 262 filepaths, 265 links, 264 markdown features, 239 metadata, 261-263 number of, 266 package::function() syntax, 164 scientific publication, 266 and Suggests, 164 website, 288-290 workflow for writing, 260-261
Index
|
349
Visual Studio Code (VS Code), xvi visualizations, in vignettes, 264
W
waldo package, 195 WARNINGs, R CMD check, 61, 328 website, 281-292 articles, 259, 268, 289 deployment, 283 development mode, 290 initiating a site, 281-283 logo, 285 reference index, 286-287 vignettes, 259, 288-290 Windows, 30, 38, 41, 44 withr package, 93-94, 141-142, 206-208 withr::defer(), 93, 207 withr::deferred_clear(), 94 withr::deferred_run(), 94, 207 withr::local_options(), 215 withr::local_tempfile(), 217 with_*(), 94
350
|
Index
workflows and tooling, 47-62 check(), 60-62 creating a package, 47-52 function documentation, 234-237 load_all() for test drive, 58-59 NAMESPACE file generation, 145 R CMD check, 60-62 RStudio Project, 52-56 vignette writing, 260-261 working directory and filepath, 56 working directory, 56 writing a function, 6-7 writing files during testing, 216 Writing R Extensions, xvii
X
XDG Base Directory Specification, 112
Y
YAML, 262
About the Authors Hadley Wickham is chief scientist at Posit, winner of the 2019 COPSS award, and a member of the R Foundation. He builds computational and cognitive tools to make data science easier, faster, and more fun, working on packages like the tidyverse for data science and principled software development. He is also a writer, educator, and speaker who promotes the use of R for data science. Jennifer Bryan is a software engineer at Posit, a member of the R Foundation, and a part of the tidyverse team that maintains more than 150 R packages. Jennifer main‐ tains packages for importing tabular data, working with Google APIs, and simplifying development workflows.
Colophon The animal on the cover of R Packages, Second Edition is a kaka, or nestor parrot (Nestor meridionalis), found in native forests of New Zealand. Generally heard before they are seen, kaka are very gregarious and move in large flocks. Kaka are obligate forest birds that obtain all their food from trees. They are adept fliers, capable of weaving through trunks and branches, and can cover long distan‐ ces, including over water. They consume seeds, fruit, nectar, sap, honeydew, and tree-dwelling invertebrates. Although forest clearance has destroyed all but a fraction of the kaka’s former habitat, the biggest threat to their survival is introduced mammalian predators, particularly the stoat, but also the brush-tailed possum. Many of the animals on O’Reilly covers are endangered; all of them are important to the world. The cover illustration is by Karen Montgomery, based on an antique line engraving from Wood’s Animate Creation. The cover fonts are Gilroy Semibold and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Con‐ densed; and the code font is Dalton Maag’s Ubuntu Mono.
Learn from experts. Become one yourself. Books | Live online courses Instant answers | Virtual events Videos | Interactive learning
©2023 O’Reilly Media, Inc. O’Reilly is a registered trademark of O’Reilly Media, Inc. 175 7x9.1975
Get started at oreilly.com.