Table of contents : Welcome! How using a few ideas from software engineering can help data scientists, analysts and researchers write reliable code Preface 1 Introduction 1.1 Who is this book for? 1.2 What is the aim of this book? 1.3 Prerequisites 1.4 What actually is reproducibility? 1.4.1 Using open-source tools to build a RAP is a hard requirement 1.4.2 There are hidden dependencies that can hinder the reproducibility of a project 1.4.3 The requirements of a RAP 1.5 Are there different types of reproducibility? 2 Before we start 2.1 Essential knowledge 3 Project start 3.1 Housing in Luxembourg 3.2 Saving trapped data from Excel 3.3 Analysing the data 3.4 Your project is not done 3.4.1 How easy would it be for someone else to rerun the analysis? 3.4.2 How easy would it be to update the project? 3.4.3 How easy would it be to reuse this code for another project? 3.4.4 What guarantee do we have that the output is stable through time? 3.5 Conclusion 4 Version control with Git 4.1 Installing Git and opening a Github account 4.2 Git superbasics 4.3 Git and Github 4.4 Getting to know Github 4.5 Conclusion 5 Collaborating using Trunk-based development 5.1 Collaborating as a team 5.1.1 TBD basics 5.1.2 Handling conflicts 5.1.3 Make sure you blame the right person 5.1.4 Simplified trunk-based development 5.1.5 Conclusion 5.2 Contributing to public repositories 5.3 Further reading 6 Functional programming 6.1 Introduction 6.1.1 The state of your program 6.1.2 Predictable functions 6.1.3 Referentially transparent and pure functions 6.2 Writing good functions 6.2.1 Functions are first-class objects 6.2.2 Optional arguments 6.2.3 Safe functions 6.2.4 Recursive functions 6.2.5 Anonymous functions 6.2.6 The Unix philosophy applied to R 6.3 Lists: a powerful data-structure 6.3.1 Lists all the way down 6.3.2 Lists can hold many things 6.3.3 Lists as the cure to loops 6.3.4 Data frames 6.4 Functional programming in R 6.4.1 Base capabilities 6.4.2 purrr 6.4.3 withr 6.5 Conclusion 7 Literate programming 7.1 A quick history of literate programming 7.2 {knitr} basics 7.2.1 Set up 7.2.2 Markdown ultrabasics 7.3 Keeping it DRY 7.3.1 Generating R Markdown code from code 7.3.2 Tables in R Markdown documents 7.3.3 Parametrized reports 7.4 Conclusion 8 Conclusion of part 1 9 Rewriting our project 9.1 An Rmd for cleaning the data 9.2 An Rmd for analysing the data 9.3 Conclusion 10 Basic reproducibility: freezing packages 10.1 Recording packages’ version with {renv} 10.1.1 Daily {renv} usage 10.1.2 Collaborating with {renv} 10.1.3 {renv}’s shortcomings 10.2 Becoming an R-cheologist 10.3 Conclusion 11 Packaging your code 11.1 Benefits of packages 11.2 {fusen} quickstart 11.3 Turning our Rmds into a package 11.4 Including datasets 11.5 Installing and sharing the package 11.5.1 Code is hosted 11.5.2 Code cannot be hosted 11.5.3 Marketing your work 11.6 Conclusion 12 Testing your code 12.1 Unit testing 12.2 Assertive programming 12.3 Test-driven development 12.4 Code coverage 12.5 Conclusion 13 Build automation with targets 13.1 Introduction 13.2 {targets} quick-start 13.2.1 _targets.R’s anatomy 13.3 A pipeline is a composition of pure functions 13.4 Handling files 13.5 The dependency graph 13.6 Running the pipeline in parallel 13.7 {targets} and RMarkdown (or Quarto) 13.8 Rewriting our project as a pipeline and {renv} redux 13.9 Some little tips before concluding 13.9.1 Load every target at once 13.9.2 Get metadata information on your pipeline 13.9.3 Make a target (or the whole pipeline) outdated 13.9.4 Customize the network’s visualisation 13.9.5 Use targets from one pipeline in another project 13.9.6 Understanding this cryptic error message 13.10 Conclusion 14 Reproducible analytical pipelines with Docker 14.1 What is Docker? 14.2 A primer on Linux 14.3 First steps with Docker 14.4 The Rocker project 14.5 Dockerizing projects 14.6 Dockerizing development environments 14.6.1 Creating a base image for development 14.6.2 Sharing images through Docker Hub 14.6.3 Sharing a compressed archive of your image 14.7 Some issues of relying on Docker 14.7.1 The problems of relying so much on Docker 14.7.2 Is Docker enough? 14.8 Conclusion 15 Continuous integration and continuous deployment 15.1 CI/CD quickstart for R programmers (and others) 15.2 Running a RAP using Github Actions 15.3 Craft a dockerized dev env with GA 15.4 Run a RAP using a dockerized dev env on GA 15.5 Conclusion 16 Conclusion of part 2 17 The end “So what?” References