150 16 4MB
English Pages 317 Year 2022
Except where otherwise noted, this book is licensed under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) by Richard Schneeman. Illustrations Copyright © 2022 Travis Stewart
Publisher Illustrator Cover Designer Editor
Richard Schneeman Travis Stewart Travis Stewart Ruby Ku
i
Contents Welcome ..................................................................................................... 1 Intro ..................................................................................................... 2 Why listen to me? ............................................................................... 4 Why does this book exist? ................................................................. 6 Why do developers want to contribute to open source? ............... 8 Why do projects need contributors? ............................................. 16 What comes next? ............................................................................ 20 START ....................................................................................................... 21 Getting unstuck ................................................................................ 22 Face your contribution fears .......................................................... 31 See yourself as a contributor .......................................................... 37 Find your next contribution opportunity with COIL .................... 43 WORK ....................................................................................................... 53 Building project context ........................................................................ 57 Project etiquette, norms and governance ..................................... 58 Prioritizing contribution opportunities ........................................ 65 Project research exercises .............................................................. 74 Familiarity cheatsheet ..................................................................... 81 Issues and Bug Reports .......................................................................... 82 Reading and categorizing issues .................................................... 83 Reproducing bugs ............................................................................ 88 Debugging issues ............................................................................. 98 Giving feedback on feature requests ...........................................107 Navigating conflict through communication (NVC) ..................117 Issue cheatsheet .............................................................................123 Writing Documentation .......................................................................124 Understanding documentation ....................................................125 Documentation formatting ...........................................................139
ii
Documentation examples .............................................................147 Documentation example prerequisites ......................................159 Documentation descriptions ........................................................166 Documenting inputs ......................................................................175 Documenting outputs ....................................................................187 Documenting unfamiliar code .....................................................196 Documentation cheatsheet ...........................................................212 Making pull requests ............................................................................213 What even is a pull request? .........................................................214 How to make a pull request ..........................................................222 What makes a good pull request? ................................................240 Automated PR checks ....................................................................248 After your pull request ..................................................................256 Responding to comments .............................................................267 Pull request cheatsheet .................................................................277 SUSTAIN .................................................................................................278 Defining contribution goals ..........................................................280 Building a contribution practice ..................................................287 Where contributions meet career ................................................294 Epilogue: You are the future of open source .....................................305 Definitions .............................................................................................307 Acknowledgements ..............................................................................312
iii
Welcome "To me programming is more than an important practical art. It is also a gigantic undertaking in the foundations of knowledge." - Grace Hopper, Inventor of the first compiler
1
Intro Who is this book for? Note: I will use developer, coder, and programmer interchangeably. I use these terms to indicate someone who can modify a repository’s source code.
This book is for programmers who have gone beyond the “learning to program” stage and are looking for ways to expand their skills. If you’re just getting started in your coding journey, you can still gain experience and skills by reading this book. I would recommend focusing your effort and energy toward proficiency in a language, library, or tool until you’re ready to take the next step. If you’re already comfortable with those things, then you’re ready for this book! Do you need to wait to be ready to contribute to open source? No. Do you need to have all the answers to all the questions before you get started? Certainly not. Your open source journey is about growing your skills as a developer and growing your community. The things I expect from you before we get started are as follows: • You’re comfortable enough with a programming language that you could write a feature without directly following a tutorial. • You’re coming to the table excited to learn, and hungry to contribute. If you can pass those two checks, then congrats, this book is for you!
What programming languages do you need to know? The tasks and examples apply to any programming language. When I need to show real-world examples, I pick from the Ruby language and ecosystem. I also provide a few examples from Rust, Python, and Java. 2
Don’t worry if you don’t know these languages. You’ll still be able to follow along; in fact, most of the lessons translate directly into any language.
How should you read this book? The book is divided into three primary parts. “Start” is about building up a contributor’s mindset. “Work” is about building actionable skills and executing on specific tasks. Finally, “Sustain” covers techniques useful for building a long-term contribution plan. I recommend everyone read from the beginning until the end of “Start.” This section will have concepts that we lean on throughout the rest of the actionable “Work” section. If you must compulsively skip around, make sure you do not miss the chapter on Find your next contribution opportunity with COIL. Each section in “Work” starts with a question designed to help you gauge if the section is applicable to you or not. You may either work from front to back or skip around to address specific topics. For example, if you know you already want to work with bug reports you might jump to How to reproduce bugs. In addition to the content in the “Work” section, I’ve provided a series of cheatsheets to help out if you get stuck. For example, even after you’ve read over all the “issue” material, you can see all the actionable steps at How to help triage issues. If you want to get started but can’t seem to find the time, you won’t want to miss Building a contribution practice.
Definitions Words are important. They carry meaning and intent. I’ve kept a list of definitions of terms that I have defined for this book. If you see a new term or it seems I’m using a term differently than you’ve seen before, you can check out Definitions.
3
Why listen to me? I’ve been contributing to open source for over a decade. I am a core contributor to the Ruby language, and I’m in the top fifty contributors 1 (by commits) to Ruby on Rails , a popular web framework. I help maintain a few open source projects, including the Puma webserver and the Ruby Buildpack for Heroku, my day job. I’ve got 1.9 billion open source library downloads to my name. I have seen firsthand how open source projects work, what needs they have, and how anyone with an interest and some guidance can make an impact. Beyond my personal contribution experience, I wrote a service, 2 CodeTriage , that has helped sixty-six thousand developers contribute to open source. I’ve mentored developers through their first contributions. I’ve surveyed countless aspiring contributors. I’ve interviewed top developers from 17 countries around the world. From all of these experiences, I’ve learned the secrets to contribution success. I respect that my lived experience will be different than everyone else’s. What’s hard for me may be easy for you. To see broader patterns, I’ve invested in learning by listening to other’s experiences. Beyond open source, I’ve taught dozens of students how to program. 3 I lectured at the University of Texas as an Adjunct Professor . These teaching opportunities have given me an appreciation for wellwritten technical documentation and respect for the power of learning a new subject. I’ve dedicated my professional life to the exploration of teaching, software, and open source. I hope to teach you what I know to empower the next generation of open source developers.
1. Rails Contributors - All time (2022). https://contributors.rubyonrails.org/ 2. CodeTriage (2022). https://www.codetriage.com/ 3. UT on Rails (2013). https://www.schneems.com/ut-rails
4
In short, this book is everything I wish I had when I was getting started.
5
Why does this book exist? Lots of developers want to contribute to open source projects, and lots of projects need help. However, developers are getting stuck. It’s been my life’s work to bridge this gap. This book is the result of a decade of contribution and a half decade of interviews and research into how to help developers succeed at open source contributions.
CodeTriage came first It was 2012 when I was at a Ruby conference, and I heard about a problem in the community. I learned that the Ruby on Rails Core team (of about seven developers) was responsible for 700+ issues. This imbalance left issues unanswered and feature work neglected because of the time spent responding to tickets. At this time, a 1 developer named Steve Klabnik went through every single issue and responded to all of them in a marathon session spanning multiple days. The effort got him accolades and commit access to the Rails repo. I saw the effort he made, and that’s when I realized that I could emulate his success by reading and responding to tickets. I didn’t want to burn out in a coffee-fueled issue bender, but I also wanted to help. My solution was to write a script that emailed me one issue a day. I read the issue. I built up confidence. Eventually, I found actionable patterns and I was able to make meaningful comments. Eventually, the Rails Core team noticed my contributions and gave me commit access. That could have been the end of the story, but I didn’t just make the script so I could get commit access. I made it because I wanted open
1. Website (2022). https://steveklabnik.com/. Twitter (2022). https://twitter.com/steveklabnik/
6
source to be sustainable. I wanted more community involvement. I didn’t want to keep this “secret sauce” program all to myself. I wanted to change the (open source) world. With this in mind, I turned my script into a web app named CodeTriage. The first person I invited was Steve Klabnik, and he’s still using it ten years later!
From CodeTriage to How to Open Source For a while, I was happy that CodeTriage user numbers were climbing, but there was a big problem brewing. Lots of developers signed up, lots of people helped open source repositories, but many more didn’t. They signed up, and that’s it. I had maintainers who needed help. I had contributors who wanted to help. But there was still a disconnect. I found that people quit before they got started. Some people were stuck waiting for the right opportunity, some people were afraid, and some people didn’t know where to start looking. Almost universally everyone said they needed more time. If that sounds familiar, you’re not alone. I talk about these barriers and more in Overcoming contribution barriers and Building a contribution practice. Once developers got in the right mindset, many felt ready to start but still didn’t know exactly how. They needed actionable ways to build contribution skills. I’ve worked with developers to identify the skills to help them confidently triage issues, write documentation, and make pull requests. Now, that hard-earned experience can be yours after learning more about building actionable skills. What good is spending all that time developing contribution skills if you only use them once? The successful contributors don’t just have the right skills and mindset: they build a contribution practice. I’ve explored the secrets to sustainable contribution habits; with these habits, this book can help you make a plan to get you started and sustain your contribution.
7
Why do developers want to contribute to open source? Note: If you’re ready to get started, you can skip this section and learn why Why projects need contributors. If you’re on the fence about contributing or want to hear why others might want to contribute, then keep reading.
First, why should you contribute to open source? Why should anyone contribute to open source? At the end of the day, the answer will be different for everyone. In this chapter, we look at some theories of motivation, dip into some quotes from real-life interviews, and talk about why you must find your motivation before you start.
What motivates me 1
In his book Drive: The Surprising Truth About What Motivates Us , Daniel Pink breaks down motivation into three parts: • Autonomy: Our desire to be self-directed. This increases engagement over compliance. • Purpose: The desire to do something that has meaning and is important. • Mastery: The urge to get better skills. Open source work checks these boxes and then some:
1. Pink, D. H. (2011). Drive: The Surprising Truth About What Motivates Us. Riverhead Books.
8
• Autonomy: Open source contribution is entirely voluntary. When you pick a topic that excites you or a bug that frustrates you and then scratch your own itch, it can feel like anything is possible. Being able to work on tools that thousands or millions of developers already use is exceptionally impactful. • Purpose: Many developers don’t have access to the same reach and impact through their day jobs compared to when they contribute to open source. • Mastery: Working on open source will challenge your technical abilities and force you to grow as a developer. In addition, you’ll get a crash course in working in async systems, communication and consensus building, and distributed teams. For many, contributing to open source is playing on “hard mode.” Beyond Daniel’s definition, at the top of Maslow’s Hierarchy of 2 Needs , comes Self-actualization:
Maslow’s Hierarchy of Needs
3
2. Wikipedia contributors. Maslow’s hierarchy of needs. Wikipedia. Retrieved from https://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs
9
• Self-actualization: Achieving one’s full potential, including creative activities. To realize in action or make real. For me, this phrase means “the ability to turn a thought or concept in my head into something that exists in the world.” I delight in finding an opportunity to contribute, carrying it through from proposal to pull request, and eventually seeing it shipped in the real world. I love it when developers come to me at conferences and say my code made a difference in their lives.
What motivates other developers These quotes are from interviews I conducted in 2019. I asked them the following: “For the people who contribute to open source, what do you think motivates them?” Below are a collection of some of the answers. “They might be using an open source library and then get really frustrated about a bug that’s not being fixed and then one day got fed up.” “Making your daily tools more delightful.” An example of an early open source inspiration is someone wanting to modify printer software to notify them when there’s a jam. I find frustration a significant motivator to improve tooling and experiences. When a system doesn’t work as well as I think it should, I am empowered to change the system. This concept is also known as “reducing toil.” “It’s about joining a community of people, and feeling like you’re working on a project with other people, rather than working on something on your own, contributing your little chunk.”
3. “Maslow pyramid with hierarchy of human needs classification outline diagram” - licensed through VectorMine.
10
The concept of “community” and working for the “common good” resonates with people. I strongly identify with several open source communities. I’ve made wonderful and deep friendships through the shared act of contributing to open source. Brett Cannon said about becoming a contributor to Python, “I came for the language, but I stayed for the community.” “People want to feel like they can move the world.” “Open source contributors feel like they are super heroes and can use their power for good.” A library that I contribute to has over 330 million downloads. Beyond a vanity metric, this number represents impact and reach. For many, the reach provided by open source is irresistible. I take pride in the fact that the software I’ve worked on has touched so many lives. “My career is built on top of it [open source] effectively.” “An element of giving back. Continue to keep those projects going.” “Open source software has benefited me an awful lot. And that’s probably why [I want to contribute].” “There’s a bit of giving back as well. I’ve benefited so much from [software] and people like yourself giving talks. It’s good to give back and make it easier for someone down the line, you know.” Many developers feel indebted to the software that made our careers possible. A sense of needing to give back is a potent motivator. In addition, I will go so far as to say that developers are giving forwards. When we contribute, it benefits the community and continues to benefit us for years. “Some people are passionate about free open source software; it’s like a political stance for them in some ways — all software should be open source.” Since open source software has become so commonplace, it might not seem like working in the open is a political act. There was a time when the “best” and most dominant software was closed source. The 11
notion that decentralizing development and opening it up to “the commons” could not produce quality software. That debate is somewhat settled (with open source coming out very well), but tech is neither apolitical nor neutral. “Someone they admire in their career does it. Like, oh I want to be that person and follow their footstep.” “It definitely creates a profile or a brand for yourself, like I know if someone ever committed to [open source]; I think very highly of them; I think they are a very capable person.” “I think it’s mostly down to need. But sometimes, it’s down to ego. Sometimes people want to do something that gets them recognition.” “It’s kind like proof that you know your chops.” When I started programming, I saw developers speak at conferences, and I wanted to be like them. At one of the talks, Nick Quaranto, creator of Rubygems.org, had everyone in the audience stand up. Then he asked those who had never contributed to open source to sit down. He went on to get more specific until finally, he asked for those who “Have contributed to Ruby” to stay standing. I think very few (if any) were left. That talk left a strong impression. It made me want to be like the contributors standing. I worked hard and years later, a well-known developer described me jokingly as a “B-list celebrity in the Ruby community.” I would be lying if I said I didn’t get any fame or brand recognition from my open source work. I also blog, give talks, and do podcast interviews about that work. I used to actively pursue fame and recognition. Still, over time, I’ve found that star count on projects and follower counts on social media don’t bring long-lasting joy or motivation. What has? I value the connections with the community. I value the friendships that I’ve made. I value the shared sense of purpose. When the 2020 pandemic hit, my relationships with the humans behind open source helped me weather the storm. 12
“Learning more code helps with getting familiar with new codebases faster. A large codebase is a lot harder to get to know. Contribution requires deep understanding of the code.” “It feels like it will be beneficial to my future career somehow indirectly.” “If I went to an employer or potential client and I said I have contributed, they would say that’s kinda cool.” It can seem that working with open source can advance your career, but the specifics on how it can do this are a bit fuzzy. It may help you during an interview to land a job, or it may not. It may help you land a client if you’re a consultant or freelancer, or it may not. The last time I interviewed for a job (many years ago), I certainly brought up open source contributions wherever I possibly could. Did I get a job offer from every company as a result? No. Did it help? Absolutely. These are examples of tangible career development benefits. In my experience, the intangible benefits are more significant. Working in open source exposed me to extreme cases I might never find at my day job. It forced me to get comfortable working in codebases so large I could never fit it all into my head. I’ve become an expert at collaboration tooling and communication. Many of the best things that open source has given me have been subtle and hard to pin down. You certainly don’t need to contribute to open source to develop and polish the same skills. Open source has the double benefit of being able to help you with these skills while growing your community. There are many more reasons that people contribute — enough to fill entire books. These few examples are meant to get you thinking about what called you to open this book in the first place.
Intrinsic vs. Extrinsic Motivations Motivations are often broken into “intrinsic” and “extrinsic”. Intrinsic motivations are based on internal rewards. For example, the good
13
feeling when you can understand a difficult math concept. Extrinsic motivations are those that are based on external rewards. Such as getting a high grade on a math test. Although extrinsic motivators are much easier to see and quantify, intrinsic motivators are much more potent in creating long-lasting 4 habits . We can break down a few of the open source motivators listed into these two categories: • Intrinsic Motivation ◦ It’s fun and challenging. ◦ Emulate the behavior of those you look up to; that is, when your heroes contribute, you want to do it too. ◦ Build skills to be proud of. ◦ Removing frustration in daily tooling. ◦ Community and commons building. ◦ Sense of belonging and sense of community. ◦ A feeling of mastery over specific technical topics. ◦ Altruism, gratitude, and giving back (giving forwards). ◦ Being politically engaged. • Extrinsic Motivation ◦ Become famous. ◦ Create a brand for yourself. ◦ External marker of success. ◦ Career development; get a better paying job.
Let your motivations evolve If you’re not clear on your own motivations, or you’re reading this because you feel like “you should contribute to open source,” then it
4. Duhigg, C. (1994). The Power of Habit - Why We Do What We Do in Life and Business. Random House.
14
might be a good time to look inward. Many people pitch open source contributions as an unmitigated good that will transform everyone’s life. Although it’s been good to me, it’s also been challenging. Having a mix of both intrinsic and extrinsic motivators will help you to remember why you got started. It will remind you why you wanted to contribute to open source in the first place. It will keep you going when things get tough. When I was getting started, I very wanted to “Get a commit into Rails” to get onto the leaderboard. I very much wanted to be one of those “people standing” at the conference talk. Your story and motivation are likely different, and that’s okay. It’s okay for that motivation to change and evolve over the various seasons of your life. If you’ve got the motivation, then this book can help you develop the skills and mindset needed to actualize your open source dreams into reality.
15
Why do projects need contributors? Projects need code. They need bug reports, technical roadmaps and new features. Projects need governance. Projects need a community. In short, projects need people. There is a proverb that states “If you want to go fast, go alone. If you want to go far, go together.” This rings true in open source. Many people think of open source in two categories: the users who consume the software, and the contributors who produce it. There’s a third, more precarious group that holds it all together; these people are known as the maintainers. While a contributor may come and go, they don’t retain the project’s history and context. They’re not compelled or empowered to make difficult decisions or to be held accountable for those decisions. No amount of good commit messages, good PR messages, or great CHANGELOG entries can compare to having someone who was there at the time of the decision. Maintainers are the guardians of the context of past decisions. They retain and carry a project’s values and vision across releases and years. Maintainers are the beating heart that drives a project forward, but where do maintainers come from, and how can we get more of them?
A maintainer is not forever Although maintainers are extremely important, we can’t rely too much on a single person. Losing maintainers from a project is inevitable. Developers will burn out, move on, and sometimes even
16
pass away. A prominent example in the Ruby community was Jim 1 Weirich , who passed in 2014 and was responsible for maintaining a hugely popular library by the name of Rake. This sentiment of the value and delicacy of this maintainer system is summed up well in a famous XKCD comic about dependencies.
“Dependency” by XKCD
2
Open source is powerful, but it can be fragile. With this knowledge that all maintainers will one day move on from a project, there are two things to consider: How can we retain our maintainers longer, and how can we best prepare for their eventual departure?
1. Wikipedia contributors. (2022, September 19). Jim Weirich. Wikipedia. https://en.wikipedia.org/ wiki/Jim_Weirich 2. Source: https://XKCD.com/2347 Alt-Text: “Someday ImageMagick will finally break for good and we’ll have a long period of scrambling as we try to reassemble civilization from the rubble.” Licensed under Creative Commons Attribution-NonCommercial 2.5 with modifications.
17
How can we retain maintainers longer? To answer this question, we must first answer: “What do maintainers want?” When I talk to maintainers, the vast majority want more time in the day and more help. They want peers and allies to help with their projects. They want appreciation and respect for the countless hours they’ve poured into their projects. Respect and appreciation are fairly low-hanging fruit. Although many developers appreciate the effort contributors and maintainers put into projects, they might not actually vocalize that appreciation. When I engage with a maintainer through issues, either Twitter or in person at a conference, I always try to remember to say “Thank you” for their efforts. The more specific the “thanks”, the more impact it seems to have. Open source is powered by love, and it’s virtually free to spread that love around. The other stuff is harder, all that help and time stuff. That’s what this book is all about. The core idea behind CodeTriage was that if a random developer could spend five minutes looking at an open source issue, then they could save a maintainer five minutes. Other things like trying out beta releases and giving feedback, submitting detailed and meaningful bug reports, and sending pull requests are all help. By engaging with a project, contributors let maintainers know they’re not alone in their efforts to make delightful software. In short, maintainers who have regular and meaningful contributors will stick around longer.
How can we prepare for when maintainers leave? The best case for when a maintainer leaves is to have someone new take their place. To create a maintainer, we’ve got to define what a maintainer does.
18
A maintainer is someone who has the context and knows the stories of the project. It is someone who takes five or ten or thirty minutes to help others. It is someone who can make decisions and drive the software forward. How does a developer get context and build a history with a project? They do it by helping, interacting, filing bug reports, engaging with issues, and reading pull request discussions. They do it by sticking around and paying attention. In short, the way the community makes a new maintainer is also the same way they keep existing maintainers happy: by having more contributors that help the project. At the end of the day, a maintainer is someone who cared enough to contribute to a project actively. To grow new maintainers, we must first grow new contributors.
19
What comes next? Before we get into the nitty-gritty technical details of how to contribute, I want to set you up for success. Many misconceptions are floating around out there about what precisely open source contribution is and how to do it. We’ll start with a clean slate and work to dispel some common myths in START. Once you are ready to start your open source journey, you’ll need some explicit examples of how to contribute. In this area, we’ll cover some concrete actions you can take that will help you build an open source contribution toolbox. Specifically, we’ll take a look at how you can get started by helping with issues, how to audit and write documentation, and how to demystify the inner workings of a project in WORK. To set up projects for success, they will need sustained and active contributors. Why do some developers fall off quickly while others are able to maintain continuous progress? The secret is that the most consistent and active developers have created a contribution practice that works with their goals and lifestyle. I’ll help you overcome your hurdles and set you up for success in SUSTAIN. Let’s get to it!
20
Getting unstuck You can’t contribute to open source if you don’t know what to do next. Many developers I talk to come to me with aspiration and intention to contribute. They ask if I can unblock them by giving them an easy issue to start with. Developers get locked into the mindset that a contribution is an opportunity given to them instead of them being part of the journey. They are paralyzed by a lack of clear direction. We’ll discuss actionable ways on how to find those opportunities, but it’s important to internalize why this core skill of open source is so important. The secret to successfully starting your open source career is knowing that unblocking yourself is part of the process. In this section we will look at common blockers to contribution and what you can do to overcome them.
The curse of knowledge Wouldn’t it make sense to start with some easy issues and then gradually work up to harder things? Unfortunately, there are no easy issues. What is easy for me is different than what is easy for you. There are no beginner issues, only beginner actions. Humans are downright awful at guessing other people’s capabilities. 1 In one famous study , the participants were asked to tap out wellknown songs (such as happy birthday) on a table with no lyrics or melody. A listener was tasked to listen and guess the song.
1. Newton, Elizabeth Louise. 1990. The rocky road from actions to intentions. PhD diss., Stanford University. https://creatorsvancouver.com/wp-content/uploads/2016/06/rocky-road-fromactions-to-intentions.pdf
22
When asked, the “tappers” estimated that the listeners would get 50% of the songs correct. In reality, they got 2.5% of the songs correct. They were off by a factor of 20x. Wow. Why the huge difference? The people tapping on the table were playing a soundtrack in their heads. The song seemed obvious to them. But the listeners only got “click click click.” Even when the tappers tried to compensate for this disparity, the predicted and actual numbers were way off. 2
This phenomenon has become known as the curse of knowledge . Even when people know that they hold more information than others, they cannot effectively guess what life would be like without that information. What does that have to do with “beginner issues?” We all come with different strengths and weaknesses, different experiences, and different perspectives. That’s a huge asset. But it also means that it’s hard, if not impossible, to judge what is easy for someone else. I’ve seen plenty of issues that have been tagged as being “good for beginners” that I wouldn’t be able to work on. Maybe they’re in a language I don’t know or a library I’ve never used. If someone tags an issue with “good for beginners,” but you look at it and don’t know where to start, do you know how that’s going to make you feel? It’s going to make you feel like you’re not even ready to be a beginner. It’s going to make you feel bad. Your expectations of “good for beginners” might differ dramatically from the maintainers who tagged the issue. They might think, “Hey, here’s an issue that will only take about four hours of work to fix and ship,” when you’re thinking of something on the time scale of thirty minutes. If you spend thirty minutes on it and don’t seem to make any progress, that’s going to feel bad. Don’t feel bad. It’s not you. It’s the curse of knowledge of the maintainer who tagged it. They don’t remember what it was like to
2. Wikipedia contributors. (2022, September 19). Curse of knowledge. Wikipedia. https://en.wikipedia.org/wiki/Curse_of_knowledge
23
have no context with the code or to get started. Although they can tell you one issue is more straightforward than another, picking one truly “easy” issue is outside of their ability. Although some opportunities are straightforward to act on than others, know that finding and identifying opportunities is half the battle. Yes, it’s good when people can do that work for you sometimes, but you’ll need to find contribution opportunities and put that work onto your plate in the long term.
Why do maintainers tag issues with labels like “good for beginners”? Let’s put yourself in a maintainer’s shoes for a bit. You’re a proud contributor with commit access to left-pad-kubernetes, a (fake) library (I just made up) for adding indentations with distributed architecture (a nonsensical thing no one needs to do). A new issue comes in. You get an email. You spend time triaging the issue. You conceptualize the fix in your head, and realize it’s pretty easy. Why would you tag it as “beginner friendly?” In my experience, there are two reasons to tag an issue: you want to train and encourage new contributors to the project, or the change requires a bit more time investment than you want to put in, so you look to hand it off. Why am I talking about motivation? While you’re motivated to contribute to a project, it pays to think about a maintainer’s needs. Let’s say they tag it and leave a comment with some vague directions. What happens next? If the maintainer is lucky, the issue stays open for a few days, maybe a few weeks. Eventually, someone with the capacity and the skills to handle the issue comes along. They submit a pull request and knock it out of the park.
24
What happens when a maintainer is unlucky? In the worst case, no one responds; the issue never gets fixed. In a slightly less lousy scenario, someone attempts a fix but doesn’t quite nail it. The contributor needs some help to get it over the line. Now, remember our two different motivations, “looking for contributors” and “doesn’t want to spend time on the issue.” Because you’re role-playing as a fantastic maintainer and have an infinite amount of time and patience, you’ll hopefully recognize the coaching opportunity and help this person. If you tagged the issue because you didn’t have enough time, it might take more energy to coach someone new than it takes to fix it yourself. Now, if you’re the contributor of that fix, it’s not going to feel good to see a “beginner issue” label and then not have your patch accepted. Even if the fix eventually gets merged, you might feel like it was not a success if you have to iterate on it with a maintainer. Getting your pull request closed for an issue that hasn’t been labeled “good for beginners” may not come with the same expectations.
Ghosting and the bystander effect Another common problem in open source is when someone shows up and says, “I’ll take this issue,” and then, they are never heard from 3 again. This ghosting is common enough that when I see someone “claimed” an issue, I ignore the comment. What’s the worst that could happen? Maybe both of us open a pull request, and we’ve both got detailed context on the problem and can have an in-depth conversation. Maybe the other person ghosts and my fix gets accepted. This ghosting problem is more common in “beginner” issues because people don’t know their capabilities.
3. Wikipedia contributors. (2022c, September 19). Ghosting (behavior). Wikipedia. https://en.wikipedia.org/wiki/Ghosting_(behavior)
25
If you feel like you shouldn’t take action because someone else might be taking care of it, you’re experiencing a common response known 4 as the bystander effect . This effect happens in all group situations. If you’re in a real-life emergency, for example, someone collapses in front of you while you’re waiting in line, what should you do? They probably need medical help, but if you call out “someone call a doctor,” then everyone in the crowd will assume someone else is on it. That’s the bystander effect. What’s the fix? In our above scenario, if you’ve ever taken a first aid or CPR (cardiopulmonary resuscitation) class, you may remember that you should single out an individual in the crowd and tell that one person to call emergency services (911 in the United States). By singling someone out, you break the cycle of group indecision. When we have all these contributors wanting to help but seemingly stuck with indecision, they’re experiencing the same reflex. The good news is that because the bystander effect is so well known, it has a known antidote. You can aim to be an active bystander. How do you become an active bystander? By taking an action. Any action. When you take any action, you break out of the cycle of passiveness and into the moment. This strategy works best when you assume you are the only person taking charge of the situation, and if you don’t act, nothing will get done. Now, when you begin to feel like “maybe someone else will take care of this”, notice the sensation, but choose to be an active participant. Pretend you’re the only person in the world who can help. In the worst case, two people will work on the same problem, and you’ll be able to have another developer to work with, which means you can both learn from each other.
4. Wikipedia contributors. (2022b, September 19). Bystander effect. Wikipedia. https://en.wikipedia.org/wiki/Bystander_effect
26
There are no beginner issues, only beginner actions If you have to wait for the perfect issue to be gift wrapped and put in your lap, then you’ll never get started. I believe that the vast majority of people expressing a desire for “beginner issues” either are genuinely unaware of where they can start without help, or they want the feel-good vibes of contributing without the work of identifying a contribution opportunity is part of the task. Instead of “beginner issues,” I want you to focus on “beginner actions.” You cannot control the problems that come to you or what needs work. If you’re waiting for a “beginner issue,” then you’re looking outside for the answer when you should be looking inside yourself. What people mean by “beginner” is “I can do this.” Let’s flip the story. You figure out what you can do, and then go out and find a problem that needs your help. You have control over your actions. As your skills increase, your pool of possible actions also increase. A “beginner issue” mindset is a passive mindset. A “beginner action” mindset gives you the power to take action. It doesn’t matter how small or large the action is. The important part is that you take action.
Floss one tooth — comment on one issue I’m a fan of the late comedian Mitch Hedberg, who said the following: “People who smoke cigarettes, say ‘You don’t know how hard it is to quit smoking.’ Yes, I do. It’s as hard as it is to start flossing.” Intentional habit-building means taking a seemingly impossible task and transforming it into something you can do automatically. You won’t even remember doing it. To that end, I’m in love with the systems 5 thinking concept of “floss one tooth.” . Instead of starting by flossing all your teeth, first, commit to flossing just one.
5. Baer, D. (2013, December 4). The Secret To Changing Your Habits: Start Incredibly Small. Fast Company. https://www.fastcompany.com/3022830/the-secret-to-changing-your-habits-startincredibly-small
27
Showing up takes a massive amount of work. Permit yourself to get started without the guilt that you could be doing more. Flossing one tooth is such an absurdly easy goal that it seems silly not to achieve. That’s part of the point. Use that absurdity to jump-start your activity. The goal isn’t to trick yourself into flossing all your teeth initially. The goal is to get started. If you’re consistent, you’ll one day find that you want to floss the rest of your teeth. It will feel natural and perhaps even easy. You can use this “floss one tooth” mental model while contributing to open source. Make one comment on an open issue. That’s it. You don’t have to start now, but by the end of the book, you’ll be more than ready to get started, and that’s your initial action goal. When you start commenting, it might be challenging at first. It can be intimidating and disheartening if you read an issue but don’t feel like you’ve got anything to add to the conversation. That’s okay. Keep going. It gets easier. I promise.
The goal of open source contribution is to keep playing I’ve focused on individual actions, but I also want you to consider the 6 larger context. Contributing to open source is an open game . An open game is a task where success has not been clearly defined. An example of a closed game is a puzzle. You know that you’re done when all the pieces are in place. You’ve won. You’re finished. An open game would be a career designing puzzles. It might end one day, but it will never be done. The goal of an open game is to keep playing. “Contributing to open source” isn’t an achievable or actionable task. It’s a goal. The trick is to break down large unbounded tasks into manageable “closed” ones. This book will help you build mental
6. Kerr, J. (2022, September 19). From Puzzles to Products by Jessica Kerr Atomist Blog. https://blog.atomist.com/from-puzzles-to-products/
28
frameworks to subdivide large tasks into smaller ones. Then, we’ll look at those well-scoped actions and give you skills to keep your forward momentum.
29
Recap 1. There is no perfect “beginner issue”. You’re better off spending your time taking actions that will build your context of a project so that you can identify contribution opportunities. 2. There is value in working on a task, even if someone else has “claimed it”. Be an active bystander! 3. Focus on actions you can take today. It doesn’t have to be perfect. Floss one tooth.
30
Face your contribution fears If you’re feeling nervous, anxious, or afraid to start contributing, know that you’re not alone. Many programmers are self-conscious of the code they write, and the vast majority of them have never posted code in public where others can judge them. I was conducting interviews on what fears contributors have, and one developer responded instantaneously, “What if I open a pull request, and the creator of a popular framework made fun of me?” That would be terrifying. If I thought that someone I looked up to would make fun of me for trying something new, then you better believe I would steer clear of trying. If you’re timid about jumping in and contributing, you’re not alone. I’m here to help walk you through the anxiety. Together, we can do this with a plan to maximize your chances of success and prevent your worst fears from coming true.
Imposter syndrome The hesitation you’re feeling about working in public has a name: 1 imposter syndrome . From Wikipedia, “imposter syndrome” is described as: A psychological pattern in which an individual doubts their skills, talents, or accomplishments and has a persistent internalized fear of being exposed as a ‘fraud’.
1. Wikipedia contributors. (2005, July 6). Impostor syndrome. Wikipedia. https://en.wikipedia.org/ wiki/Impostor_syndrome
31
People fear they’ll be exposed as a fraud when everyone can see their work. They worry they’ll make mistakes, and their worst fears will be confirmed. People think everyone else knows more than they do since they secretly believe they’re faking competency. In reality, you know things that no one else does. Open source needs your unique experiences and perspectives.
Imposter Syndrome
2
Working in the open heightens this fear of self-doubt because your work is visible for all to see. You will make mistakes where others can see them. That’s okay. Everyone else is making mistakes too. You can also combat this feeling by setting your own goals and avoiding comparing your contributions to others.
2. “Impostor syndrome mental problem and reality comparison outline diagram” - licensed through VectorMine.
32
What happens if I strike out? Focus on batting 300 I’m not much of a sports fan, but I do know a fact about baseball 3 players. The good ones bat 300 or above. This metric is out of 1000 and essentially translates to them hitting one out of every three pitched baseballs. If they can do that, they’re doing pretty well for themselves. Even professionals can’t hit every ball. I mention this because it’s a common misconception that just because someone is successful doesn’t mean that they’re not failing two thirds of the time. I enjoy speaking at conferences and have spoken at thirty-one conferences in seventeen countries, but to date, I still write abstracts that fail to convert into call for proposal (CFP) acceptances. On average, I get about one in three accepted. Essentially, I’m batting 300 with my conference talks. If your first contribution attempt doesn’t go the way you imagined, that’s normal. You’ve still got a lot to learn, and every disappointment is also a learning opportunity. I’ve got a whole chapter that talks about how to approach feedback and work with a maintainer to try to find various ways to turn a rejection into a learning opportunity. The majority of pull requests do not get merged or accepted. That’s another reason I suggest people start somewhere else. When I close someone else’s pull request (PR), I try to explain why I’m closing it and then end by pointing out how much they learned and how much easier the next pull request will be. Finally, I invite them to keep trying to contribute. The following is an example: Thank you for your pull request. Unfortunately, I cannot merge this right now, and I’ll be closing this PR. If you really need this feature, I suggest pulling out this functionality into your own library. If there’s lots of interest in your library, then we can revisit this feature in the future. While I’m closing this pull request, I invite you to continue
3. Wikipedia contributors. (2022d, September 19). Batting average (baseball). Wikipedia. https://en.wikipedia.org/wiki/Batting_average_(baseball)
33
contributing to this project. Even though this code is not getting merged, you’ve learned a lot about the project internals, and the next contribution will be even easier. Thanks again, and have a great day! I’ve found that many people respond very positively to these messages of encouragement, and they are motivated to stay engaged and come back again. While not all maintainers will take the time to write such a lengthy message for every rejection, know that many likely would if they had more time in the day. All current maintainers were once in the same place you are now. They were once intimidated to contribute and a little scared of rejection. Remember, even the best only bat 300.
Will a maintainer yell at you? The good news is that I’ve rarely heard of someone getting chewed out or publicly shamed when they were trying to help. If anything, the most common problem I see with first-time contributions is getting the maintainers’ attention. Quality maintainers know that every developer must start from somewhere, and they will encourage you from start to finish. Lessthan-stellar maintainers might not cheer you on through the whole process, but even then, the responses I’ve seen are curt but not shaming or hurtful. If you don’t know what kind of maintainers you’re dealing with, you can lurk (observe but not post) for a bit, and I’ve got a section on how to research a project like this in a structured way. If you find a maintainer is intentionally mean, that’s a good sign that they don’t know how to run a project effectively and are not a good place to start your open source journey. It might sting, but know that hate has no place in open source, and it’s not your fault.
Get in the arena Even with my reassurance, there’s no way you can know what will happen when you hit enter in that comment field or fire off a pull
34
request. That’s when I’ll defer to the fantastic Brené Brown’s book Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We 4 Live, Love, Parent, and Lead . The book talks at length about facing fears and countering potentially shameful moments. The title of the book is taken from this quote by Theodore Roosevelt: “It is not the critic who counts; not the [person] who points out how the strong [person] stumbles, or where the doer of deeds could have done them better. The credit belongs to the [person] who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if [they] fail, at least fails while daring greatly, so that [their] place shall never be with those cold and timid souls who neither know victory nor defeat.” It takes confidence to work in open source. The good news is that the more you do it, the easier that confidence comes. But even people who’ve been in the arena for years still have to put in the work.
4. Brown, B. (2015). Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead (Reprint ed.). Avery.
35
Recap 1. Feeling fear about working in public is scary. Know everyone feels this way, and the only way to get over that fear is to acknowledge it and contribute anyway. 2. Contribution attempts that are rejected or don’t go as planned are common and should be used as learning opportunities. Even the best bat 300. 3. It’s better to take a risk and to try something that you won’t know if it will fail or succeed than never to try at all. Get in the arena!
36
See yourself as a contributor Whether you realize it or not, who you are, where you’re from, your skin color, your sex, and your gender identity, all influence your perception of what’s possible. This is often referred to as representation, or put another way, “You can’t be what you can’t see.” This section covers representation in open source and dips into the systemic and individual biases that prevent people from becoming a contributor.
“They don’t do that where I’m from” I grew up in a rural town in North Carolina where I drove by rows of crops every morning on the way to school. When I got my Eagle Scout award at the age of eighteen, they asked me what I wanted to be when I grew up. They wanted to pair me with a professional in that area. I told them “I want to work for NASA,” and they paired me with a cop. Not even close. When I was growing up, I didn’t even know what an open source contributor was, let alone aspire to be one. When I interviewed developers about open source, I asked them, “Who contributes to open source?” Many of them focused on who they didn’t see as contributors. One developer was quick to point out that they didn’t see anyone from their city contributing to open source: In [my city], although there is a large software scene, there are very few companies that do actively contribute to the tools that they use. Others focused on different attributes such as schooling or experience:
37
I don’t have a computer science degree. I don’t claim to be this super experienced senior engineer Developers from all over actively told me they didn’t have a single archetype of “who contributes to open source.” Still, they turned around and used attributes that didn’t apply to them to describe contributors. Beyond open source, we can see the impact of representation in other areas of our lives: In the United States, an equal number of girls and boys, at seven years of age, want to become President, but by the time they have reached the age of fifteen, that number has plummeted by fifty percent for girls yet remained the same for boys. • Caroline Heldman, Ph.D. “Miss Representation”
1
Simply seeing that all presidents over the course of eight years were men led fifty percent of girls to conclude that “they couldn’t be president.” Representation is powerful. I mentioned a powerful moment in my open source motivation story where I saw contributors I admired stand up at a conference. It inspired me to want to be a contributor like them. It was also easy for me to imagine myself in their shoes. They looked like me, they dressed like me, and they were in the same room as me. I could imagine being them as easily as breathing. It was easy for me to want to “be” them. It’s not so easy for everyone.
Systemic “-isms” at play in open source: Meritocracy is a lie When I started contributing to open source, the GitHub offices had a rug in a conference room that looked like the presidential seal that said “United meritocracy of GitHub”. The idea was that because open
1. Miss Representation (2011). (2014, April 15). IMDb. https://www.imdb.com/title/tt1784538/
38
source was equally open to everyone, those who had the most merit to contribute would be granted the most influence. On the surface, the term and idea seemed great. Who doesn’t want to be ranked based on merit? If meritocracy were how open source actually worked, we would expect the percentage of women contributing to match the percentage of women developers in the industry, but that’s not the case. Although we do see some Black developers contribute, it’s painfully obvious that it’s not happening at representative levels. If I believed that meritocracy was true, I would have to believe that the reason for these mismatches was merit. That these developers somehow were worse at contributing. This notion is false. In talking with my peers from under-represented communities, I’ve heard so many stories about how poorly their co-workers, bosses, and random strangers on the internet have treated them. To get the same level of recognition as me, they’re having to do more, all while being subjected to structural and individual biases. While many in tech would like to “not see color,” if people are not able to witness the systemic and persistent barriers, then they’ll never be able to dismantle them. People’s lived experiences and biases, both individual and systemic, prevents us from addressing those problems head-on. I am not saying that contributing to open source is easy for white cismen like me. It’s not easy for anyone. The systemic barriers at play make it even harder for everyone else.
Inclusion requires your active role We know that representation matters. We know “you can’t be what you can’t see.” We know that hostile and toxic communities can drive away contributors. By reducing toxicity in the community, we can make open source a more welcoming environment and retain contributors from all backgrounds. Through these steps, we increase diversity and 39
representation. No one has a foolproof way to fix sexism, racism, transphobia, disability discrimination, xenophobia, and more. But some people have taken steps to make things better. One way to move forward is by having projects adopt a “code of conduct” that explicitly lays out the values and standards. The most popular version adopted by thousands upon thousands of projects 2 is the “Contributor Covenant” by Coraline Ada Ehmke. Everyone should read it. The following is the opening paragraph: We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. […] This is a small step in the right direction. It’s only one piece of the puzzle, though. This document serves as a “law” of sorts to be enforced by project maintainers. There are other ways that contributors can shape their communities. 3
The book Code and Other Laws of Cyberspace by Lawrence Lessig lists four areas that regulate behavior online: • Code & architecture – The physical or technical constraints on activities (e.g., locks on a shed or moderator features on an app). • Market – Economic forces. • Law – Explicit mandates that a governing body can enforce. • Norms – Conventions that people feel compelled to follow.
2. Contributor Covenant: A Code of Conduct for Open Source and Other Digital Commons Communities. (2022). Coraline Ada Ehmke. https://www.contributor-covenant.org/ 3. Lessig, L. (1999). Code: And Other Laws Of Cyberspace (First Edition). Basic Books.
40
Note: 4
I learned about Lessig’s work from Building Successful Online Communities . If you’re interested in the community building space, I recommend it.
The contributor covenant only addresses one of these areas head-on. As members of the open source community, you influence the norms by how you behave. As a developer, you influence the architecture of your products and projects. You can use that influence to prioritize the most vulnerable. As a contributor, you can use the economics of how you spend your time to support projects that prioritize inclusion and safety.
4. Kraut, R. E., Resnick, P., Kiesler, S., Burke, M., Chen, Y., Kittur, N., Konstan, J., Ren, Y., & Riedl, J. (2012). Building Successful Online Communities: Evidence-Based Social Design (MIT Press). The MIT Press.
41
Recap 1. The more you can visualize yourself in the role of a contributor, the easier it will be for you to contribute. 2. Existing biases are baked into the system. Some in the community are actively hostile regarding any change toward addressing these biases. We must confront those systems and those individuals. 3. Work to be actively inclusive. Advocate for projects to adopt a code of conduct. Be mindful of your role in shaping your community’s norms.
42
Find your next contribution opportunity with COIL Luck isn’t a strategy, especially when it comes to finding contributions. Developers may stumble into a commit or two. Maybe they find a rare “good for beginners” tagged issue appropriate to their skill level. Maybe they trip over a misspelling in the docs and sent a PR. Successful contributors don’t count on luck. They seek out contribution opportunities deliberately. With some training and practice, you can too. The COIL framework can help. COIL stands for: • Context: Find problems through observation. Learn about a project and how people use it to accomplish their goals. • Opportunity: Brainstorm ways to solve those problems. Pick the most likely option to be merged. • Implementation: Write the code, file the issue, or make the change. • Loop: Repeat the process until it’s done. Zoom-in (on code) or zoom-out (on the problem) as needed. Together we will dive into three (3) real-world examples of COIL in action. Two case studies ended in commits to a popular open source library. Before we get to those, let’s find out how COIL can change the ground under your feet.
First example: Don’t avoid potholes. Fill them One day, I was walking to the park with my family when I noticed a new pothole. It was in the alley behind my house, and it was massive. Wider than my shoulders.
43
I passed by, silently shaking my head and idly wondering when (if ever) anyone would fix it. One weekend it occurred to me: the city didn’t even know there was an issue. Likely everyone walking by was thinking, “someone should do something about this.” I dialed 311, the Austin area help number, and reported the problem. The very next day, three huge trucks showed up and filled the hole. I couldn’t believe they responded so quickly. I went to talk to the crew, and they said my report was responsible for their work order. I was happy they filled the hole. I was even happier that I got a chance to help be part of the process. Now the alley is safer for my kids and my neighbors. Let’s fit this city contribution example into COIL: • Context: An alley shouldn’t have a pothole deeper than a soccer ball. The city should fix it, but maybe they don’t know about the problem. • Opportunity: If I tell the city, they should fix it. If they don’t, I’ll regroup and try something else. • Implementation: I filed a ticket with the city, and they filled the pothole. • Loop: None needed. It worked the first time. Sometimes showing up and making a bug report on the issue tracker is enough to make a mark.
What potholes teach us about open source You can use COIL to turn your lived experiences into contributions. As you go around your daily coding tasks, I want you to engage your critical thinking. Begin to see problems with libraries as potholes to be filled instead of obstacles to be avoided. You can help even if you don’t have the skills to make the fix. I didn’t know anything about road repair. I didn’t have access to asphalt or gravel, or trucks. All I needed to do was take the time to file a bug report. 44
Successful contributors take notes while they work. When they hit something painful, they jot it down. Later, when they have time, they comb through those notes to see what could be better. For example, I was looking for the constant representing “Euler’s number” and couldn’t find it in the math standard library docs. I wrote down the location of the docs. After I found my answer, I annotated the docs with the name I was expecting and sent the library a PR. Once you start looking, you’ll find open source contribution opportunities everywhere.
Second example: Delete the database with confidence I was having lunch with a co-worker when they mentioned strange support tickets. Customers were dropping (i.e., deleting) their production databases and were desperate for a restore. I was surprised to hear this happened regularly. It turns out there was a common thread between these cases. They were logged into their production instances and tried to run their test suites. What do the tests do before running? Delete everything in the database for clean execution. Oops. Armed with that context, I brainstormed some ways we could make the situation better: • Add banners or warnings to the product to indicate that customers were connected to a production database. • Inject code to take over the test task and make it a no-op on the production platform. • Write documentation to warn developers of the problem. • Patch the open source framework, Ruby on Rails, to prevent deleting a production database when running tests, somehow. Which one of these held the best opportunity? Warnings might mitigate but not eliminate the problem. Injecting code is messy and 45
doesn’t solve the issue for anyone outside our platform. Documentation might go unnoticed until it is too late. Patching the framework seemed robust but complex. I wasn’t sure it was even possible. We talked about how to implement each. After some discussion, I thought that if the database knew it was in a production environment, we could use that information to guard against known destructive actions. It seemed like a good idea for a contribution. At this point, I had little experience with Ruby on Rails’ database code. As I began to look at writing a pull request (PR), I realized I didn’t have enough information, so I zoomed in on the code and started my COIL process again. The concept of “zooming in” on a problem comes from the process “semantic zoom” from the book Wicked Problems: Problems 1 Worth Solving . So far the COIL looked like this: • Context: Developers accidentally delete their production database when they run their tests in a production environment. • Opportunity: Make it impossible to delete the production databases through a guard statement in the framework. • Implementation: Store the Rails environment from RAILS_ENV inside the database when it’s created. Use this information to guard against destructive actions. • Loop: Zoom in on the existing codebase. Build context, identify possible change locations, and write a draft pull request. I pulled up the framework source code and checked if there was any other kind of existing internal metadata store for the database adapter. I read the code for how databases are created and populated. With time, trial, and error, I could build enough context to identify places in the code that would give me the opportunity to hook into my implementation. This process turned into a micro COIL process inside of my larger contribution. 1. Kolko, J. (2012). Wicked Problems: Problems Worth Solving. AC4D. (https://www.wickedproblems.com/)
46
After eventually opening a pull request, I iterated on the issue with some back and forth with the maintainers. Eventually, we found an implementation that matched my goals and satisfied everyone’s needs. The patch was finally good enough to merge. Victory! As customers slowly upgraded their Rails versions, support tickets about dropped databases fell to zero. The feature has been a massive success for my company team and the rest of the community.
What database deletion tells us about open source When looking for improvements, you can go beyond your pain. Here I asked a co-worker what problems they were seeing. Once you’ve found something painful, try to understand the root cause. To find problems, you can recruit friends and co-workers. You can look on question-and-answer forums. Read comments on old issues. There are fields of study on intentional research on how humans interact with computers.
Note: If you want to know more about doing intentional research, I recommend the Human Computer Interaction (HCI) course from Georgia Tech Online Masters (OMSCS). The lectures are also available for free online at the Udacity website.
In understanding someone else’s pain try to understand the impact and urgency. While dropping a production database by accident didn’t frequently happen, losing production data is devastating. Maintainers will prioritize changes that help the most common cases or prevent the most catastrophic ones. Treat the process of writing code in a strange codebase as a learning experience. Start slow, gather context, and iteratively work towards your goal.
47
Third example: The slow race to view routes I was an adjunct professor teaching a course using Ruby on Rails at the University of Texas. My students were frustrated when trying to wire together controllers (their code) with routes (URLs). One tool, a CLI command to output the mappings, helped, but it was painfully slow. How painful was it? In a large codebase, it could take up to 10 seconds for that command to run—basically an eternity. After enough time and frustration, I realized the command had to load up a fresh copy of the app first. Listing out routes information was fast, but booting the server was slow. If I could figure out how to get that same information from a web request, we could bypass the slow CLI command. While it seemed like a good enough idea, how do you start implementing such a large feature? I had to zoom into the Rails codebase and find out where the routes information for the CLI lived. I had to brainstorm edge cases. For instance, you likely don’t want to expose all your routing information to the world in production. What problems or objections might people have to such a feature, and could I pre-emptively find solutions? I spent a few timeboxed hours looking around in the Ruby on Rails codebase until I identified where the logic lived. I decided it would be easier to make a “gem” (the name for libraries in the Ruby ecosystem) that hooked into Rails instead of starting with a PR to the framework. I looped and iterated on these ideas doing micro and macro COILs along the way until I had a project I could deliver. Here’s what it ended up looking like on screen:
48
Now developers can get their route information from a URL instantaneously. How does that fit into COIL? • Context: The routes of a Rails app tie a URL to code. To understand routes developers run a CLI command that is painfuly slow. Running the CLI command to inspect routes boots the app, which takes a long time. • Opportunity: I can use the existing app server to provide the same information without incurring a startup cost. • Implementation: Develop a library to render the routes directly from the server without requiring a call to the CLI. • Loop: Hunt for interfaces to hook into for my implementation until the library is done.
One last loop with routes My library for viewing routes in the browser quickly became popular. That’s where the story could end, but it didn’t. People began to wonder why the feature wasn’t baked into Rails directly.
49
It was challenging for me to port my library code into Rails. I hacked together a pull request and then had to go through several iterations of feedback and learning. I discovered new coding practices and codebase conventions along the way. For all of these hurdles I encountered, I gathered context, found opportunities, and then worked on implementing them. After releasing a library to solve my problem I looped by porting my solution to the upstream library. In the end, my PR was merged, and to this day, you can get your routes by visiting http://localhost :3000/rails/info/routes in any Rails 4+ app. • Context: My library was popular enough that people wanted the code directly in the framework. • Opportunity: I can port my code over to Rails and send a pull request. • Implementation: I made a rough attempt and asked for feedback along the way. • Loop: Iterate with the maintainers while learning about framework internals.
What routes in the browser can tell us about open source When you first encounter a problem, you might not have the skills or knowledge to find a viable fix. That is okay. When my students first encountered this issue, I felt the pain but didn’t know what to do. Keep an open mind while brainstorming and reduce your problem statement to its core. If I identified the problem as “we need to make the CLI faster,” I might never realize the web server can deliver the information without the CLI. When brainstorming, ask, “what if there was a different way we could solve this?” And “is there a way to avoid having this problem in the first place?” Don’t let someone else’s inaction stop you from moving forward. One of the most significant complaints I hear is that contributors cannot get a maintainer’s attention. When that happens, PRs sit and rot. 50
While I started this contribution story by porting my library into a commit, you can go the other way too. If you find a maintainer can’t or won’t accept your code, consider making it into a library instead. Read more about what to do when you’re blocked by external feedback. If you keep a growth mindset and keep working, you’ll be unstoppable.
51
Recap • Context: Start actively looking for problems and be curious about what is painful for others. By paying attention to your surroundings and asking what could change for the better, you can contribute to your community. • Opportunity: Once you’ve found problems, search for their root causes and brainstorm as many solutions as possible—separate idea generation from down selection. • Implementation: Look for a fix proportional to the pain you’re solving. A pothole is a minor inconvenience and making a phone call is an easy fix. • Loop: If you find yourself at a dead end, go back to the drawing board and restart the process. Build context of why the first contribution didn’t work, identify new opportunities, and implement them. Zoom in to code or zoom out for perspective.
52
The first section of this book focused on your mindset and common myths that get in the way of people contributing to open source. This section of the book is more hands-on. The goal of the following work chapters is to present you with several possible activities you can do to gain context or find opportunities. Many people think of learning a new skill such as open source contribution as a linear path or vertical ladder, but I think of it as more like hiking through the woods. While sometimes there is a straightforward path, there are often challenges that block our way forward, and it’s our job to figure out how to get around them. I don’t want you to get stuck, so we will build a toolbox of concrete actions you can take with you to regain forward momentum. In the end, you’ll have the tools and experience you need to overcome any obstacles in your way.
Combination square
1
1. Patternmakers Tools : Catalogue B-6 : The Wellman Pattern Supply Company : Wellman Pattern Supply Co. : Free Download, Borrow, and Streaming : (n.d.). Internet Archive. Retrieved September 19, 2022, from https://archive.org/details/WellmanPatternSupplyCoCatalogueB6/page/n11/mode/ 2up (Public domain)
54
Building your toolbox We already touched on why contributing to open source is so much more than making commits and pull requests (PRs), so you might not be surprised to find we’ll be mostly looking beyond pull requests for our getting started journey. Project familiarity: If you want to do the work and have the impact of a maintainer, it helps to get to know some maintainers. People power open source projects, and we can learn more about how a project works by learning about the people behind them. When we talked about context building, we noted that it’s essential to realize that there is a human context. Why are some pull requests accepted and others not? How do maintainers like to do work? How can you help effectively? These are all questions that can only be answered in the context of a specific project. Although I can’t tell you what the maintainers on your projects will be like, I do have a set of exercises to help you intentionally build this human context. At the end of learning how to demystify this process, you’ll be ready to learn open source contributions from the people who know it the best: the existing maintainers. Issues: The beating heart of all established open source projects lives in the issues. From bug reports to feature requests to expressing gratitude to maintainers — the issues are where communication happens and work gets done. The issues are also where maintainers spend a disproportionate amount of time. Because it’s your job as a future maintainer to help offload some of that work, it’s where we’ll be spending time as well. At the end of the issue section, you’ll be able to jump into just about any issue, assess the situation, and drive the issue closer to completion. For more on this, read the issues section. Documentation: The introduction of the printing press accelerated civilization into the industrial revolution. The ability to share collective knowledge is a superpower, and for open source projects, that superpower lives in the documentation.
55
Not only is having well-written documentation impactful, it’s usually easier to work with rather than modifying code, at least at first. Don’t know anything about how to write documentation? Great! Most people don’t. We will cover how to write docs and how to audit and update existing documentation. At the end of the docs section, you’ll be able to write documentation for your code and identify the opportunities for updating existing documentation. For more on this, read the documentation section. Pull Requests: The mechanism that moves open source projects forward is pull requests. They’re how documentation is updated, how bugs are fixed, and how features are implemented. Together, we will look at the stages of creating a pull request and highlight common contributor struggles. After reading this section, you’ll know what to expect, how to polish your contributions, and how to maximize your chances for success. For more on this, read the pull request section.
56
Building project context "Be curious. Read widely. Try new things. I think a lot of what people call intelligence boils down to curiosity." - Aaron Swartz, Programmer and internet hacktivist
57
Project etiquette, norms and governance Before you can find opportunities and implement ideas, you’ll have to gather some context and identify problems. Every project has social etiquette, norms, and a governance structure. Some of these are written down, while others are not. The better you understand and internalize a project’s etiquette, the more confident you’ll be in finding and implementing opportunities. With any collaboration, there are two modes of communication: implicit and explicit. Think of explicit as the words someone will say, while implicit might be the body language and tone that someone uses. There will always be differences between how open source projects communicate and get work done. What does implicit communication mean for a project, though? Typically, implicit communication in an open source project is about a shared set of values and goals. They are likely ever-changing and being remixed, so pinning them down by writing them out isn’t the best use of time. If you’re talking to someone in person, how can you find their implicit values and communication styles? The easiest way is to watch them communicate.
Explicit learning: Contribution guides Although not every project will have documentation on contribution, the most active ones will. This documentation ranges from how they want you to use the issue tracker (some people only accept bugs, while others expect feature proposals), to the logistics of working on the project (how do you run the libraries’ test suite), or to any other details, the maintainers see fit to document.
58
Typically, documentation can be found at the root of a project and usually in files with all caps names such as CONTRIBUTING.md. The following is a small sample of common ones you might find: • README.md • ISSUES.md • CONTRIBUTING.md • CODE_OF_CONDUCT.md • GOVERNANCE.md • etc. Although you don’t need to cozy up in a chair and read all of these files back to back, maintainers and other contributors put in work to write down things that are important. It’s probably a good idea to read them. Right now, GitHub helpfully integrates with some of these files. For example, when you open an issue, it will render a default issue template that might also have instructions from the maintainers. If you know that you’re working on issues, seek out the issue template, and read it first. If you know you’re going to be submitting a PR, check out the CONTRIBUTING.md to see if the project has any special requests or requirements. I’ve gone so far as to spend an hour fixing a bug and making a patch only to find that they do not accept pull requests without an associated issue open for discussion first. Had I read the docs, I could have streamlined my process and opened the issue first. In addition to these files listed here, look for other files in all caps. For instance, the Ruby on Rails project is split into smaller libraries. If you want to contribute to the Active Record library, then there’s a file labeled RUNNING_UNIT_TESTS.rdoc that will save you minutes, if not hours, of head-scratching.
59
Types of contribution guides to look for What kinds of user guides should you look for? Here are some sample questions. Do all contributions have specific requirements? For example, do they need to update the CHANGELOG.md, and do they need to use a specific format? Do all contributions need tests for each bug fix? • Support policy: Massive projects might support several releases simultaneously, for example, a 3.x branch and a 2.x branch. If the policy says that 1.x is no longer receiving any maintenance, then it’s not going to be worth your time to write and submit a patch to a version no longer supported by the maintainers. • Style guidelines or defaults: Although languages these days are tending towards automated code-linters and formatters, some standards are enforced manually. It’s a pain to spend a ton of time on a patch, and wait days for a review only to be told that you’ve got to make a minor style change before the maintainers can merge it. You don’t need to memorize all the style guidelines, but if they exist, then you should know about them and where to reference them later. • Discussion location: Where does discussion happen? Is there an official chat? Are there mailing lists? Is there an external bug tracker that everyone uses? Just because you get comfortable on one project doesn’t mean all the same defaults will be valid for other projects. The older and more successful a project is, the more likely it will have its own quirks. • Specific contribution requests: Some maintainers have put a call out for help with triaging issues in their CONTRIBUTING.md, while others have asked for help specifically with reproducing bugs on a specific platform (like Windows). If you can help fill a contribution gap explicitly called out in a README or contribution doc, that’s a huge win. There’s a limitless variety of needs and processes that projects might explicitly document. You don’t need to memorize all of them for all of the projects you’re helping. These docs are there to help guide you. They can also give you clues about what kinds of tasks are commonly needed. 60
For example, in an issue template I maintain, I explicitly say that bug reports without a reproduction will likely not be fixed. If you want to help out on that project then there is work to be done coaching and coaxing people who report issues but might not know how to make a quality reproduction.
Implicit learning: Unspoken rules through observing Explicit contributor documentation is straightforward to grasp. But not all rules and norms governing behavior are written down. This doesn’t mean that you cannot approach and learn the implicit preferences systematically. If you’re still a little unsure of what I mean by “implicit”, the following are a few examples: • What types of word choices do maintainers choose when they talk to contributors? • Do maintainers frequently ask similar questions? • What preferences towards the contributions do maintainers hold? • Are there some informal rules that are driving standard communication? These aspects of an open source project are often not intentional but emergent. There’s also a mix of history and experience here. One way to think of this “implicit” arm of open source communication is the relationship between the law and a judge. Explicit project docs are like the written law. They state how things should operate and behave. However, it’s up to individual judges to apply this law to specific cases. Sure, the docs might say that all pull request messages have a minimum flair of fifteen emojis (this is a joke reference to the movie Office Space), but a particular reviewer might request more or be okay with fewer. The only way to know is to review previous decisions and understand precedents.
61
Implicit learning example: Snip snip Not all revelations need to be earth-shattering to save a maintainer time. I was following a repo and noticed that before any PRs were merged, any double empty lines would receive a comment telling the contributor to remove the second extra empty line. The maintainer did this so often that they didn’t even bother to write out the request. They just commented on the line with the “scissors” emoji ( ).
✂
After seeing this behavior on several PRs, I realized this “no double empty lines” was an implicit rule. After that point, I started intentionally looking for double empty lines as I reviewed PRs. I eventually caught some and left a comment. The contributors were able to update the code and have it ready to go before a maintainer ever got to the issue.
Implicit learning example: The case of the metaprogramming meltdown One library in Ruby on Rails is known for modifying existing core classes to add new behavior. You might know this behavior has 1 “monkeypatching” .
Note: Originally, the term for “monkeypatching” was “guerrilla patch”, which referred to changing code sneakily”. These “core extensions,” as they’re also known, can be quite useful but can cause chaos because the changes can be so sneaky that no one realizes they’re there. Even worse, competing projects can overwrite each other’s patches without realizing it. If this all sounds scary, it’s because it is.
Maintainers are cautious about modifying this invasive code. New features to the monkeypatching library are, by default, categorically
1. Wikipedia contributors. (2022a, September 19). Monkey patch. Wikipedia. https://en.wikipedia.org/wiki/Monkey_patch
62
rejected. I know this because I saw a few people open up pull requests to the library, and then, they were shot down without really any consideration. After a while, though, I became aware that every so often, a patch or feature was permitted. Although the state of the “law” might be suggesting that all patches to this library that aren’t bug fixes be closed right away, the precedent set by the maintainers indicated that there were exceptions. By studying the exceptions, I could learn from them. The primary litmus test for accepting one of these PRs was whether or not it could be applied in a meaningful way to the existing Ruby on Rails codebase (as opposed to a feature added to be used by people using Ruby on Rails in their codebase). Granted, there’s still a lot of subjective wiggle room here, but this is a much more complete picture than “We don’t accept patches that do ”. You might be thinking, “But couldn’t Rails have made an explicit policy and spelled this out? Wouldn’t it be better if developers wrote this down?” Yes, it would be better to be made explicit, but you have to be consistent before you can be explicit. Before you can be consistent across maintainers and triagers, you have to understand the context and history. Before you can understand the context and history, you have to listen and learn. This is what I mean by implicit learning.
63
Recap 1. The better you understand how a project functions and the etiquette around interaction, the more confident and comfortable you’ll be working in that project. 2. Norms, etiquette, and governance structure might not be written down anywhere, but you can still intentionally increase your understanding via intentional exercises 3. Being able to translate maintainer patterns and norms into actionable contributions is a valuable and worthwhile skill.
64
Prioritizing contribution opportunities As you start to look around for project context, you might encounter some different scenarios that feel like contribution opportunities. To help you prioritize which opportunities are worth exploring more, it can help to look at commonly accepted contributions. Although I can’t guarantee any change will be accepted, I can help you determine which of your opportunity/implementation combos might have the best shot. If you can answer “yes” to one or more of the following questions, then it’s got a higher than average chance of success: • Is the change small? • Is the change provably correct? (Fixing a regression, updating contradictory docs) • Is the change similar to something previously merged? (Housekeeping) • Is there public and visible support for a change? • Are you avoiding additional unnecessary changes? (Say no to mixing features and refactoring)
Is the change small? Projects want small pull requests. They’re easier to read, they’re easier to understand, and the fewer lines that are changed means that there are fewer bugs. One takeaway from this rule is that you don’t want to cram multiple different changes into one PR if possible. It’s usually better to break code changes into their smallest reasonable chunks before submitting a pull request. You might think that giving someone code is essentially the same as giving them a free item, like a promotional product. In reality, giving 65
someone code is like giving them a free puppy. Though the code’s initial contribution might be free, it will require care, maintenance, and “feeding” to sustain. Code is always a liability. Always. What does a maintainer want for their life? They probably want their job to be more comfortable. They probably want fewer bugs and a better experience with their codebase. What they don’t want is to have to clean up after someone else’s mess.
Case study: Big versus small When a change is compact, well documented, and easy to follow, it’s much easier for a maintainer to merge it in. I recently got a very small pull request with a rather lengthy message. The code change was three lines long. They added these lines into my CLI class: def self.exit_on_failure? true end
Then followed it up with a great description, including links regarding why the change is needed: Title: Fix Thor’s deprecation error #195 Thor 1.0.0 CHANGELOG) has deprecated the behavior of relying on default exit_on_failure?. Hence, making the devs define this method. Thor #621, #625 They also gave an example of the output of the CLI without the change, noting that it had a warning:
66
Deprecation warning: Thor exit with status 0 on errors. To keep this behavior, you must define `exit_on_failure?` in `DerailedBenchmarkCLI`. You can silence deprecations warning by setting the environment variable THOR_SILENCE_DEPRECATION.
The small change made this a very easy PR to review. As a bonus, they linked a changelog from my CLI dependency (thor), as well as two related commits. It was clear they did their homework and had plenty of context. Although not every change can be compact, if you’re heading down a path of modifying dozens of files and hundreds of lines of code for your first code contribution, it might be a sign that it’s a tough road ahead.
Is the change provably correct? (Fixing a regression, updating contradictory docs) What is a “provably correct” change? If you find some documentation that diverges from the code, it means that there’s a guaranteed contribution opportunity. Either the docs are wrong, or the code is wrong. The following is an example: # Returns the number two def two return 3 end
The docs say it returns 2, while the code returns 3. They cannot both be correct. Figure out which is the “winner”, update the other one, and you’ll win a commit to the project after you send them a PR. What else are “provable” changes? One of the most common is fixing a regression bug. The code previously worked, and now, it doesn’t work. There was a change, and something broke. If the break is not intentional, then fixing it is a straightforward contribution opportunity.
67
Another “provable” change is a fix to spelling. Fixing typos is typically both a small change and independently verifiable as being “correct”. It’s worth noting that the rules around grammar can be somewhat contentious. If you don’t believe me, try having an opinion on “the Oxford comma” on the internet. For that reason, spelling changes are more likely to be “provable” as correct than grammar changes. At the end of the day, all pull requests are making the argument that the project will be better or “more correct” if their change is accepted. The easier time you have showing that a change is “correct,” the easier your pull request will be accepted.
Is the change performing a common cleanup task? (Housekeeping) When I was a kid, I hated making my bed. I still do. But I love the feeling of laying down in a well-made bed. That’s a chore. I hate doing it, but love it when it’s done well. In open source, many tasks need to be done but aren’t glamorous or high profile. If you can do one of these things, then maintainers will appreciate you more than the cool side of the pillow on a hot night. What kinds of housekeeping tasks exactly? The following are some questions to guide you to some helpful contributions: • Is the test suite passing? • Do dependency versions need to be updated? • Are there “flappy” tests that need to be hardened? • Is the test suite free of deprecation outputs? • Are the primary interfaces documented with examples? • Are the primary code paths tested? • Are linters or other auxiliary tools running without warning? Is the test suite passing? A test suite needs regular care and feeding. There are times when the suite breaks or when tests are introduced that have sporadic and unpredictable failures. Fixing a broken suite can be a big task, but it is one that maintainers will love. 68
Do dependency versions need to be updated? Beyond failures, there are other test-related housekeeping tasks, such as adding new language runtime versions. For instance, every year, Ruby releases a new version, and most libraries haven’t specified to test against those versions. If you notice a recent version is missing from a project’s CI config, you can add it. Are there “flappy” tests that need to be hardened? Most largeenough projects will have “flappy” tests that pass if you rerun them. Tracking those tests, investigating them, and fixing them is a huge time sink, but it is one that can improve the productivity of all contributors. Is the test suite free of deprecation outputs? As languages and libraries progress over time, they need to change their APIs. When this happens, there’s often a “deprecation cycle” to let people know they need to take action or that the next version won’t work as expected. If you find deprecations in a library while using it, you can clean those deprecations up. You can also look through the test suite to see any deprecations for a recent version addition. (We saw this in the example about the CLI PR above where the contributor was handling a deprecation). Are the primary interfaces documented with examples? You can add docs (which we cover in-depth ). Are the primary code paths tested? You can add tests for things that aren’t yet tested. Your language might have code coverage tools to help you identify those areas that need coverage. A popular tool in Ruby is “rcov” and “SimpleCov” for code coverage. Are linters or other auxiliary tools running without warning? Those are all everyday tasks, but then, there can be more subjective tasks, for instance, adding a code linter. If you are context building by reading issues and notice that there are frequent requests for style changes to PRs, it might make sense to solidify those changes in an automated test. The more likely your change is to reduce the work a maintainer would otherwise need to do, the more likely it is that it will be accepted and merged. 69
Is the change similar to something previously merged? (Precedent) When a project accepts a pull request that is a typo fix, they’re setting a precedent. They are sending an implicit message that says “We accept typo fixes as pull requests”. That precedent is one of the reasons I focus so much on context gathering through issues and pull requests. If you’re about to make a change and you can link to another, similar change, then it gives your idea more credibility. As a project maintainer, rereading over a deliberation I made last month or last year about why a change should be accepted will help me do the work faster. It’s a similar concept to how “precedent” works in a court of law. In this case, maintainers act as the judge to determine what is merged and what isn’t. Individual developers “make their case” for why their change should be merged. In the process of submitting “evidence”, if you can remind the judge of how they acted in the past, they’ll be likely to be consistent and merge your change, too. Just like a court of law, no two cases are ever truly the same. Just because a prior PR was merged doesn’t mean you don’t have to do the work. People change their minds all the time. But if I’m looking for a change with the most likely chance of being merged, picking one with some positive “prior art” is always helpful.
Is there public and visible support for a change? Carrying on with the legal analogy if you can bring a “witness” to a pull request, it will make your case stronger. For example, if you find a blog post describing a problem with lots of comments commiserating over a painful feature, then giving a link to that blog post can help you. A stack overflow question with lots of upvotes is another great resource to call out to. Beyond wondering if something is a problem, project maintainers are trying to answer “is this a problem worth my time to solve”. By
70
bringing in external anecdata, you can help to convince them that this particular “code puppy” is worth taking care of. More about this concept will be discussed later.
Are you avoiding additional unnecessary changes? (Say no to mixing features and refactoring) One type of change that most projects do not like: a surprise refactor. A refactor is when you change the code’s structure, but you don’t change the behavior. Often, refactors are done to make code “simpler” or make a later behavior change easier.” Kent Beck describes the process as follows: “For each desired change, make the change easy (warning: this may 1 be hard), then make the easy change.” This concept works great for your own projects but makes for some very messy and hard to reason about PRs. Let’s find out why. It might be very tempting while you’re in a codebase to “simplify” some stuff and implement some new pattern you just learned. Don’t. Or at least, don’t without more context. We’ve already covered how the larger your change is, the less likely it is to be accepted. Maintainers will have a harder time understanding what behavior is different if unrelated changes are also present in the code diff. Imagine how you might feel if someone made a PR to your codebase with a typo fix that you appreciated. They also changed all your space indentations to tabs. It would feel excessive and perhaps a bit overbearing. If you want to fix a bug (behavior change) and refactor (structural change), try to split them into two different PRs. That way, if the refactoring is rejected, the bug fix might still be mergeable.
1. https://twitter.com/kentbeck/status/250733358307500032
71
That isn’t to say formatting and refactoring PRs aren’t accepted, but they’re some of the most challenging changes to get merged in. If you’ve got to start somewhere, start somewhere else.
What do projects need? Sell me this code As you’re hunting for project context in issues, pull requests, and in the code, don’t forget about the human context. Bad salespeople try to sell something to make a buck. Good salespeople try to understand what a client’s needs are so that they can help them. You’re selling your code to a project. Instead of focusing on the code, focus on the outcome. Why will the maintainer who merges your code have a better day as a result of your change? What do projects and maintainers want? What problems do they have that keep them up at night? What do they hate doing? What would they love to see in their codebase? If you can answer these questions accurately, then you’ll drastically increase your ability to make meaningful and lasting contributions to an open source project. Once you’ve decided what code change has the best opportunity to be merged, you’ll need to go through the mechanics of making the pull request.
72
Recap 1. Any code contribution is more like a puppy that requires care and feeding, rather than a free gift. Accept your code contribution has long-term consequences for the maintainers. 2. Focusing on what maintainers want and need will help increase your chances of success. Focus on contributions that are “provably correct”, “housekeeping”, or that make a large impact. Avoid the impulse to refactor. 3. Always be selling your changes. Maintainers are busy and might not understand the value of what you’re bringing to them unless you spell it out. Build your communication skills.
73
Project research exercises We’re going to focus here on the reading and understanding of a project. Although there are ways you can contribute with minimal context, the more knowledge of a project and the more context you have, the more likely you’ll be to make an impactful contribution. Below are some example tasks that you can do at any time to build up implicit knowledge and grow your project-specific context.
Merged or closed pull requests If you want to understand the types of contributions a project wants, then it pays to look at successful examples of other contributions. One way to find “good” contributions is to look at merged pull requests. A merged pull request is a signal that it has provided something of value. Why was it merged? Was it attached to an issue? You can also look at unmerged pull requests. That’s an indicator that something was wrong. What blocked those PRs from being merged? Can you spot any patterns?
74
Merged or closed pull requests in action I peeked at the Rails closed issues and I saw a request to add a style key to the content_tag method. The following is the example they gave:
content_tag( 'div', 'Hello World', style: { color: '#48BC13' } )
The idea is that it would generate this markup:
Hello world
The PR was eventually closed after some conversation. Why? If we can trace the maintainer’s logic, we can learn something about how they make decisions for what code to include. 1
One of the commenters pointed out that using inline stylesheets had some negative consequences; they labeled it as a “bad practice”, along with some more details on their opinions. That commenter also pointed out that this feature could be implemented at the app level fairly easily by making what’s known as a “helper” function. Essentially, they were making the argument, “We shouldn’t take on complexity to perform an action that can be achieved without this PR and that we don’t recommend in the first place.” The maintainer agreed and closed the PR.
1. An “inline stylesheet” is when an HTML element uses the style attribute directly, rather than inheriting styles through an external CSS stylesheet linked via tag.
75
Besides the immediate takeaway of “A PR adding an inline style option to an HTML helper probably won’t be merged into Rails”, what else can we learn from this example? I learned that the subjective quality of whether a developer should use such a feature affects its chances of getting merged. Although there is some consensus around whether or not using inline stylesheets is “bad practice”, it’s ultimately not an immutable provable fact. This example also shows me that features that are fairly easy to implement outside of the project might not be prioritized. Don’t take my word for it, though; go find out yourself. Find some closed and merged pull requests and see if you can pick out a few takeaways.
Age and comment count exercise There are other heuristics to look at beyond “opened and closed.” Viewing different issue ages can be insightful. For this exercise, separate out issues into “young” and “old” buckets. What constitutes a “young” issue exactly? It varies by project, but an issue open for one day or one week likely falls in the young category. By browsing closed issues, you can get a sense of what timeline an issue is usually open for before it receives a response. An “old” issue, in comparison, is one that’s been open longer than that regular period. Anything over a month usually has a good chance of being “old”. Take stock of the age of the issue and how many comments it’s gotten. If it’s a young issue with no comments, it might be no one has had the chance to read it. You might be the first! If it’s an old issue with no comments, then ask why? Is the issue describing a challenging problem, or is there something about the issue itself? Is there something unclear in the issue or that you didn’t understand that perhaps others also don’t understand?
76
Just because an issue is old and has lots of comments doesn’t mean that you can’t move it forward, either toward getting it fixed or closed. Can you determine why the issue stalled? Is it because a decision couldn’t get made? Is it delayed because someone stopped responding? Did it stop because contributors fixed the underlying behavior, but the issue was never closed?
Age and comment count in action I browsed the oldest issue opened on the Rails repo to see what I would find. I found an issue that had a healthy conversation, but the last comment was from five years ago. I added, “The last comment was in 2016. Can we close this issue?” By that evening, it was marked as closed! Digging in a bit on several other old issues, I found that many of them have a common tag of ActiveRecord (the name of an objectrelational-mapping or ORM) pointing to possibly being a common cause of stale issues. Switching gears to the most recent issues: I see that the only issue opened in the last twenty-four hours with no comment is also tagged as ActiveRecord. These two data points could tell me that maintainers would love to have some extra expertise in ActiveRecord. Or it could also tell me that ActiveRecord issues are difficult and might not be the best place to start as a beginner. If you’ve worked long enough in the Rails project, these observations on ActiveRecord should ring true. The key here is that focused observation can build the same insights, but much faster.
Look at who is commenting and what they are saying Before you read the comments, notice that GitHub currently tags comments with little badges that might indicate if someone has a relationship with the project or not, such as if they’ve previously
77
contributed code. Other people commenting on the issue might also be triaging and might not be a source of authority. Just because others have commented doesn’t mean you can’t add additional value. If a person commenting seems to be a maintainer, can you find a few other comments they’ve left? How do you feel about the comments? What do you think the maintainer felt when they wrote the comment? Did it sound like they were stressed, happy, or something else? If you find a very overworked maintainer, they might leave comments that seem sarcastic or are sharp. I do not recommend you imitate this communication style. Have you ever had a doctor barely look at you during a visit? It doesn’t feel great. Part of the value you bring to the table is being able to show up as a whole contributor and invest some extra time, even if the maintainer cannot. Once you’ve pieced out their emotional state, what is the core of their message? They either want to tell the reporter who opened the issue something or ask them to do something. Which is it? Is it both? If it’s a question, why did they need that information? If it’s a message, why did they need to tell the reporter that? Again, the people who respond to issues might not even be that good at it, so you don’t have to mimic them or assimilate their styles or mannerisms. However, these people have experience communicating through issues, which is experience that you lack right now, so there is something to be learned. By asking a few of these questions, you can help decide for yourself how you can have the impact you want.
Is there a standout maintainer you admire? If you find a maintainer or contributor that you admire, maybe you think they ask thoughtful questions or that they seem to have an endless supply of history and context, then make a note of it. As I mentioned before, you can seek out their comments, understand their motivations, and use this person as a positive role model for the type of contributor you want to become.
78
As a bit of a tangent, I’m a parent, and I care deeply about how my kids are raised. An increasingly growing body of work shows that the most important thing I can do to raise my kids like I want to is to be the kind of person I want them to be. Although I can sit here and give you a handful of contribution anecdotes, if you can find someone in a project’s issues that you admire and can strive to be like, then that’s worth a thousand of my anecdotes. Watch and learn from the people you want to be more like. It might take years to develop the context they have now. In time, you’ll build your style, a different project specialty, and a different focus. Together, these techniques won’t explicitly tell you what you should do next, but if you ask all of these questions, you should either begin to get a sense of what you should do next or start to ask other questions.
79
Recap 1. Looking at old issues and pull requests can help inform you what maintainers find valuable. 2. The best way to discover what kinds of work need to be done in the future is to look at what kind of work was helpful in the past. 3. By taking active steps to observe the implicit social etiquette and norms of a project, you’ll be gaining valuable project context and building contribution confidence.
80
Familiarity cheatsheet Cheatsheet: • What are ways to get more comfortable contributing to a project? ◦ Look for written contribution guides. ◦ Observe maintainer actions to absorb unspoken rules. • What kinds of contributions should you look for? ◦ Small changes. ◦ Provably correct changes. ◦ Housekeeping tasks. ◦ Changes with high public demand. ◦ Not a refactoring. • Exercises to build familiarity: ◦ Review merged pull requests to understand successful bright spots. ◦ Review closed (unmerged) pull requests for patterns to avoid. ◦ Review “old” issues for patterns that might prevent successfully closing an issue. ◦ Review “new” issues to see which types of changes get the most time and attention from maintainers. ◦ Seek out positive role models of the maintainers you would like to emulate. • COIL framework for contributions: ◦ Context: Find problems through observation. ◦ Opportunity: Brainstorm ways to solve those problems. ◦ Implementation: Write the code, file the issue, or make the change. ◦ Loop: Repeat the process until it’s done. Zoom-in or zoom-out as needed.
81
Issues and Bug Reports "The number one skill required for learning any complex system is patience." - Kelsey Hightower, Author of “Kubernetes The Hard Way” The beating heart of all established open source projects lives in the issues. From bug reports to feature requests to expressing gratitude to maintainers, the issues are where communication happens and work gets done. The issues are also where maintainers spend a disproportionate amount of their time. Because it’s your job as a future maintainer to help offload some of that work, it’s where we’ll be spending time as well. At the end of the issue section, you’ll be able to jump into just about any issue, assess the situation, and drive it closer to completion.
82
Reading and categorizing issues In open source, issues are 100% critical to any successful project’s life. Issues and pull requests are the primary sources of communication between open source maintainers and the people who use their software. Issues provide valuable context to maintainers and a centralized place of collaboration. Imagine being a maintainer of a popular project. Imagine waking up, grabbing a cup of coffee (or other breakfast beverage), sitting down, and checking your email. There’s a new issue in the inbox! What’s in it? What needs to be done? Can any of those needs be offloaded to an aspiring new contributor? Now, let’s get our hands dirty and walk through what to do with a bug report step by step.
Click an issue, especially if the title is confusing An issue title will give us a hint of what the issue is about, but it doesn’t tell the whole story. Many developers who want to contribute mistakenly think that if they didn’t understand the title, they won’t understand the issue. Because we are practicing context building, click the issue and read it, even if you didn’t understand the title.
Build understanding The quality of issues varies wildly. Some reporters vaguely say “stuff broke,” while others will overshare with the details of the breakfast they ate before they encountered the bug. For some issues, you’ll need to find more information and dig deeper. For others, you might need to help separate critical from non-critical components. One useful practice is to take notes on key details. 83
Many developers expect that they’ll be able to look at a bug report and know an answer immediately. I’ve been working with open source issues for years, and I’ve been professionally answering support tickets at my day job for over a decade. I’ll say that there’s seldom a case when I saw an issue and immediately knew the cause. If we stumble on an issue where the first responder seemed to be psychic and called out the root cause on the first try, it’s likely because it’s not the first time they saw that specific problem.
Tip: If you get to the point where you can recognize and fix an issue on the first pass, then maybe there’s a contribution opportunity to automate or improve parts of the debugging process.
Because issue contents can vary so greatly, it can be helpful to group them to guide your next steps.
Know the Issue types The two high-level categories of issues are “bugs” and “feature requests”, and within those are subcategories: • Bugs ◦ Regressions ◦ Unexpected behavior • Feature requests ◦ Modification ◦ Addition
Types of bugs A “regression” is code that was working but then stopped working. Regression is the easiest type of bug to work with because the desired behavior is usually clear. People want the project to work like it used to. 84
If the code never worked, then the bug falls into the “unexpected behavior” category. The developer tried using the code, but it didn’t work as expected. Unexpected behavior bug reports are the most common.
Types of feature requests If an issue is not a bug report, then it’s likely a feature request. A “modification feature request” is modifying an existing feature, for example, modifying a CLI to take a new option flag. A larger feature request is usually an “addition feature request.” The issue creator wants to add a brand new feature. For example, my work to show Rails routes in the browser) was an entirely new feature.
Figure out what category your issue falls into When people open issues (and pull requests), they’re not thinking in terms of categories. They’re focused on accomplishing some goal. As a result, it is often ambiguous where exactly an issue falls, and there’s some subjective wiggle room. An issue might straddle one or more categories. In general, a “regression” is more manageable than “unexpected behavior,” and both of these are easier to deal with than feature requests. As a result, I try to reframe issues in terms of bugs, if possible. For example, a feature request could be viewed as a “user experience bug” because it doesn’t yet support a particular use case.
Take action Once you’ve figured out what kind of issue you’re dealing with, you may be ready to take action. If you can plausibly classify the issue as a bug, then we would want to focus on bug reproduction. The section on reproductions is very “hands-on” and has lots of actionable advice. 85
If you’ve got a feature request on your hands, then the best thing to do is mostly likely to mimic an existing maintainer because every repo has different ways of dealing with feature requests. More ideas on how to understand maintainer expectations were given earlier.
86
Recap 1. Issues should be broken into types: bugs are either regressions or unexpected behavior. Feature requests are either modifications or additions. 2. Classifying an issue into a category can help guide you on the next steps to take. 3. Any issue that can be reasonably classified as a bug can be reproduced and debugged.
87
Reproducing bugs The most valuable thing you can do while triaging issues is to verify that a bug report can be reproduced. Reproducing a bug is a timeconsuming but necessary step towards being able to fix the bug. The quality of reproduction (or “repro” for short) instructions varies tremendously. The following is the spectrum of reproduction instructions I find on issue reports, starting with the least ability to reproduce to greatest: 1) Says they have an issue but doesn’t say expected versus actual results. 2) Explains the bug in vague details. No code is mentioned. 3) Explains a specific method or piece of code. No instructions or tutorial provided. 4) Gives a tutorial on how to introduce the bug but is missing some steps. 5) Provides a link to a repository that they made for the sole purpose of reproducing the bug. 6) Gives a link to a repository and also includes instructions on how to execute the reproduction. 7) Gives a link to a repository with detailed instructions that include the use of Docker or a containerization tool.
Further information needed Depending on the reproduction instructions provided, here’s how I would respond: 1) Says they have an issue, doesn’t say expected versus actual results. 88
This example is practically not a bug report at this point. Explain to the poster the importance of reproducing an issue and that the behavior they are describing is unclear. 2) Explains the bug in vague details. No code is mentioned. 3) Explains a specific method or piece of code. No instructions or tutorial provided. Comment back on the issue that you appreciate the report, but you have no way to see the behavior for yourself, and that you need a reproduction. Many people don’t fully understand what a “reproduction” entails and will need additional help. If you’re new to the world of reproducing bugs, the following is a set hypothetical of steps someone might take to make a reproduction of a Rails bug: 1. Start with a clean repository with no other features. 2. Add whatever code is necessary to demonstrate the issue. 3. Add instructions to the README describing how to execute the code and reproduce the problem. Include the output from running the example and a description for why it’s unexpected. 4. Push the example app somewhere public, such as GitHub. 5. Bonus: Give the app to a friend or co-worker. See if they can reproduce the problem without any instructions other than what’s in the README. 6. Finally: Give a link to the reproduction code to a maintainer (usually through an issue).
Tip: I’ve written instructions for application-based bugs at https://www.codetriage.com/example_app, which you can send to reporters to help them understand what you’re asking for.
89
Minimum viable reproduction 4) Gives a tutorial for introducing the bug but is likely missing some crucial steps that they didn’t realize. When I used to get bug reports like this, I would try to follow the tutorial. If I could not reproduce the issue, then I post back with that information. Then, the reporter would inevitably comment that I had missed a step or give me something else to try. These could go on for days or weeks. On almost all occasions, the reporter would stop replying, and then, the issue would become stale, and we never found the fix, which makes sense because we were never able to reproduce the behavior. If you’re getting started, and you follow these types of tutorial instructions, you could get lucky, and you might reproduce the bug. However, in the likely event that you don’t reproduce the bug, I wouldn’t suggest asking for clarification. I would instead push back. Say you weren’t able to get it working and that they should make a dedicated reproduction case. If they say, “Oh, just try this one more thing”, then you’re welcome to play that game, but only if you want. The tricky thing with this type of issue is that there is time asymmetry. It might take them a minute to write out “one more thing” to try, but it would possibly take you ten minutes to run through the steps again. Why so long? Maybe you realize there’s more than one way to interpret their instructions and want to try them all. Maybe you want to play around with your own “what if this triggers the bug” issues. It’s frustrating to have a bug report but not to be able to reproduce the issue. Because of time asymmetry, I tend to be gentle but firm on my requests that people provide a reproduction in code form. Most people will say something like, “You ONLY need to …” or “You JUST have to…”. And to those people, I would say that if it is straightforward to reproduce following their instructions, it shouldn’t take much longer to commit that reproduction to git, make a repo, and push it onto a public repo.
90
Asking for reproduction code is one way to keep communication clear, maximize the value of your issue time, and minimize your frustration. Although you might want to uncover the hidden mysteries of the bug, ultimately, it’s not your bug until you can reproduce it. It’s the original reporter’s responsibility to provide adequate reproduction instructions.
Reproduction with code 5) Provides a link to a repository that they made for the sole purpose of reproducing the bug. This example is the start of what I would consider a pretty good bug report. Even if there could be some more information, the poster put in the time to commit some code to git. When I get a reproduction example like this, I’ll usually time box trying to trigger the bug. Maybe five to ten minutes. I’ll set an alarm on my phone and get to work poking at the example. Why set a limit? Depending on the day, I might have ten issues or a hundred in my inbox. There’s only so much time in the day. I also find that I can obsess over wording, and setting a timer reminds me I need to find a “good enough” and move on. As I’m poking at their reproduction example, I keep a scratchpad open with commands that have worked, for example: • Clone the repo locally and cd into the directory • Run $ bundle install • Start the server with $ rails s • Visit page localhost:3000/users • etc.
91
Sometimes, I trigger the bug and see the issue. When that happens, I comment back on the issue with the steps to reproduce the problem from my notes. This comment validates the bug report as having a reproduction. It also helps future contributors who might have tried the reproduction and failed to see they missed one or more steps that weren’t previously explicit. A successful reproduction does a few things: First, it lets anyone interested in solving that bug know that the reproduction is ready to go. Up to this point, we’ve been talking about conversing with the original bug reporter. In this case, we’re shouting to the mountaintop that they could boot up the code and start working on it right away. Second, We’re also sharing our notes for reproduction. It might feel like you didn’t do anything, but your words can save minutes or hours of someone else’s time. The more explicit you can be in your instructions, the better; even if they are “obvious” to you, they might not be to everyone. Note above I explicitly said, “clone and cd into the directory.” Commenting on the issue has the added benefit of auto-subscribing you to future notifications. If you do nothing else, you can follow along with the original reporter and anyone else who works on fixing the bug to see what other questions they might have needed answering or what path they took to resolve. If I hit the end of my time box period and I wasn’t able to reproduce the bug, then I’ll post back with my partial list of instructions and then say I wasn’t able to trigger the same behavior. I then ask the reporter to look at my notes and outputs and see if additional detail or instructions can be provided. Hopefully, it only takes another round or two of comments to reproduce the core issue. It doesn’t feel satisfying to put debugging down before you’ve reproduced the problem. But as I tell my six-year-old with the iPad, “Putting it down and walking away is a skill”. Don’t forget to celebrate your progress. Even if you didn’t reproduce the issue, you put in work to grow your triage and debugging skills. 92
Awesome reproduction 6) Gives a link to a repository and includes instructions on executing the reproduction. or 7) Gives a link to a repository with detailed instructions that include the use of Docker or a containerization tool. Give this person a big ole’ virtual high-five because this is a pretty rocking bug report. I generally don’t worry about time boxing these because if I hit the end of the instructions and haven’t seen the problem, I’ll know I can stop. Report back whether or not you’re able to get the issue to reproduce. Also, take a moment for a self high-five. Not only is this a success for the person who reported the issue, but it’s also a success for you, too! Although isolation techniques like containers (such as docker) aren’t required, they can be an option for issues that you’ve tried in vain to reproduce but couldn’t quite get there.
Common ways reproduction instructions can fail Just about every food recipe online has hidden contexts that the author likely isn’t aware of. When they set their oven to 350 degrees, does it swing from plus or minus twenty degrees or plus or minus forty degrees? When they used a cup of sugar, was it a “heaping” cup or a “level” cup? If you’ve ever found a “bad” recipe online that accidentally left out steps, you’ll know that “just” following instructions can be extremely challenging. To that end, there are some common places you might want to check if you can’t get that “bug recipe” to turn out as you want:
93
• Software versions are the same: Check for language version and library versions and any dependencies if they’re required to reproduce the problem. I prefer reports that use a dependency manager with a lock file such as Cargo.lock for Rust and Gemfile.lock for Ruby. • Service versions: Does the reproduction need to hit any external services such as Postgres or Redis? Maybe Memcached? Different versions of these services will have different behaviors. • Operating System (and version): Not all bugs will show up the same way in all operating systems. This is another reason why providing a reproduction that includes container instructions can be beneficial. • Everything is checked into git: Sometimes, a necessary file or change is present on the reporter’s computer, but maybe, they forgot to commit to source control. Maybe they forgot to push after they committed. Maybe they forgot that the file was in the .gitignore. One way to check for this type of problem is to encourage the reporter to reproduce their own issue in a different directory following their instructions. For example, on a Mac- or Linux-based issue, I would ask that they cd /tmp and then clone their reproduction. • Disk layout: Like problems with git, there can be cases where code references a file in an absolute path that is expected to be on the system or maybe a relative path to the git repository. The way to test this is the same as the “Everything is checked into git” section above. Have the reporter try to run their reproduction in a new, fresh directory. • Containers: There are a billion reasons why your system might be different from someone else’s. When you start from an image (like docker), you’re starting with a fresher environment every time you execute your instructions and a fresh disk. It’s more work and harder to provide these types of reproductions, but they’re the most foolproof. If you’re struggling to get a reproduction to work on your machine, one alternative is to stop. Instead, try to get the reproduction to fail for the original reporter the same way it fails on your machine. That usually gives them enough information to debug the difference. 94
Is a reproduction that valuable? Does it sound like it’s taking a lot of time to reproduce a bug that’s reported? Is it worth all that effort? Every five to ten minute chunk you spend trying to reproduce an issue is a direct five to ten minute chunk that you just saved someone else from doing. The amount of time it takes to go back and forth with someone to get clarification around an issue can add up. Without actually attempting to execute the reproduction, it’s impossible to know if it is “good to go” or will require another thirty minutes spanning the next few days. There’s also a third possibility that I’ve not mentioned before. It’s possible that even though you were able to reproduce the problem, someone else following the same instructions won’t see the same behavior. When this happens, you will have two groups who can compare their environments to look for differences instead of just one. Reproducing reported bugs is a hugely valuable task and can sometimes take a surprising amount of skill and time.
Your first response Whether you’re asking a bug reporter for clarification or reporting back your reproduction findings, your first response to a bug report can be nerve-racking. It helps to first type up a response and then have someone else read through it and give you feedback on your tone. Think of your comments like the gentle blows from an archeologist’s hand-held hammer chipping away at the rubble, that hides the shared goal you two have (to uncover the reproduction and solve the bug), rather than a massive sledgehammer of destruction that you will use to force them into doing your bidding. Remember when I said you were going to learn communication skills? Well, this is it! Learning how to provide helpful and accurate feedback to someone who does not want to do what you want them to do is a moderately tricky skill, but one you can get better at.
95
Once you’re ready with your first reply and you smash that comment button, take a moment to appreciate how far you’ve come. You’re doing the work!
96
Recap 1. Writing a good issue and working with someone to improve their issue report are difficult but rewarding skills. 2. When it comes to debugging, a reproduction case is worth its weight in gold. Working with an issue reporter to get a verified reproduction is one of the most helpful activities in open source. 3. Communication is a skill that developers can learn. By helping to lead issue reporters down the path of a viable reproduction, you’ll also be leveling up your communication.
97
Debugging issues After you have an issue reproduction, you’ll need to debug the problem. This section has a smorgasbord of tasks that can help drive an issue to completion.
Confirm it’s still broken At work, I’m on an escalation support chain. This means before a ticket gets to me, it usually goes through at least one or two other engineers until it lands in my inbox. Even with all that time between when a customer reports a problem and when I get it, you would be surprised by the sheer number of tickets I close just by essentially asking, “Is it still broken?” The open source equivalent is checking to see if an issue has already been patched. Bugs will get reported, reproduced, fixed, and then released. If the reporter is using an older version of the software, their issue might already be fixed! If you see someone reporting a bug with a slightly older release of the software, asking them to check to make sure it’s not already fixed can save everyone time. Some projects, especially the larger ones, have a slow release cycle, and there’s a delay between the “fix” and the “release” part of the cycle. In those cases, it can be helpful to ask, “Did you try using the main branch?” to see if it’s already fixed. If you’ve already got a reproduction in hand, you can manually check to see if it is fixed using the repo’s latest commit. If the problem is already fixed, comment on the issue. That usually means the issue can be closed.
98
Group- and link-related issues If you’re a doctor and a third of your patients have food poisoning, you might start to ask if there’s any commonality between those patients. Maybe they all ate out at the same restaurant or worked for the same company. When you’re working on issues, it can help see if the same problem is being reported multiple times. If you notice similar issues, it is helpful to link them to the other issues. For example, you may see a comment like the following: Hey, this looks similar to #1337 because you see the error message
Linking together issues via comments has a few benefits: • There is extra context for anyone who ultimately ends up working on the issue. • Developers across different issues can share information such as debugging steps. • Anyone who lands on one of the bugs via a search will have access to the whole conversation. • Truly duplicate issues can be closed so that developers can focus effort in one location. • Once a fix has been committed and confirmed, it is easy to go through and close all related issues. If you spend a lot of time on a project, you’ll detect related issues without much prompting. If you’re an infrequent contributor, is there another way? One option is to do a quick search of the existing issues with a few keywords. Searching for related issues should be the first step for anyone submitting an issue, but sometimes, they forget.
Isolating failing behavior If you can accurately answer when something started failing, then you’re very close to being able to isolate the problem.
99
A powerful technique to finding the failing commit that introduced an issue is “bisect”. What is bisection? Imagine that the last release of the software where the bug didn’t exist was a hundred commits ago. You don’t know where in those hundred commits introduced the issue. You could start by checking out commit number two and working your way to the end until you found where it broke, but what if it didn’t break until commit number 99. That’s 97 manual tests! Yikes! A better strategy is to cut the problem area in half. Start at commit fifty. If the bug already exists, we know it must have been introduced earlier, and we’ve ruled out half of our search space. Next, check number 25. Repeat this process until you’ve only got one commit left. With this strategy, in the best case, you find the issue on the first try. In the worst, it’s about six or seven. You might recognize that this process sounds very similar to a tool that exists called git bisect. With git bisect, you tell it a “good” commit and a “bad” commit and provide it with a script that passes or fails. It then does this automatically. Neato! Why not just use that? Why did I explicitly recommend doing it manually? Although it can be handy to learn git bisect, it can be tricky to use. There are a few common scenarios where it doesn’t work well:
100
• The error state is hard to reproduce in a script. For example, imagine a bug report where someone said, “When I click the button, I expect to be taken to the next page”. To reduce this to a script, you would need a way to spin up a web server, navigate the page, click the button, and observe the change was correct. These are all trivially easy things for a human to do, but if you try to bake this into a repeatable script, you’ll find that the “repeatable” part is challenging. • Additional errors raise false positives. Have you ever been working on a bug and ended up making multiple commits: “Works”, “actually works”, “fixing syntax error”, “really seriously, please pass this time”. Imagine these five commits are in the middle of your hundred that you are trying to bisect. If the “actually works” commit introduced a syntax error, it will fail, even if it’s not the same failure mode. Unrelated failures are hard to detect in an automated way but reasonably easy for humans to figure out. When is git bisect the right candidate? If part of the bug fixing work has been writing a test or script that passes in an old version and breaks in the current version, you can re-use that automated test as a part of git bisect. Even if your bisect gets tripped up with false positives, you can still use it to narrow down your search space (but you have to rerun the command with different inputs).
Report your findings and keep digging If you found the commit that caused the issue, you should do a little celebration. Tracking down a specific, breaking commit is no easy task. After that, report the findings so that everyone else is on the same page. If you want to keep digging, then find the pull request where the commit came from. Read the PR message and commit message. Hopefully, there’s more information than “performance improvements and bug fixes”. Look for context around why
101
developers made the changes. Are there tests modified or added in the PR? What was the intent of the change? If the commit or PR references other issues, then you should read them as well. If you’re lucky, the commit will be small, and the logic that causes bad behavior will be apparent. That’s not always the case. Even when a commit is small, it might be tricky to understand how it produced the bug. Once, I tracked down an issue in the asset generation library Sprockets to a specific commit, and only a single line changed. I would have usually been excited about finding such a small commit as the culprit, except the single line change was to bump the library’s version number. Say from 1.0.0 to 1.1.0. The failure made no sense. If the failure was because of a logic change, it shouldn’t have been affected by a version bump. It took hours more digging to find that a cache key in the project was being calculated based on the released version number, and the bug was triggered when a change invalidated the key. It’s common for the bug to be triggered by a seemingly unrelated change—the more of this work you do on a project, the more context you’ll build. The more context you have, the easier it is to track cause and effect. Once you’ve got a reproduction and you’ve tracked down a failing commit, then you’re officially in the debugging zone. You’ve got a list of files from the commit that you can start looking at, and you’ve got a list of changes to investigate.
Writing a failing test Another useful exercise is to turn the reproduction into a failing test, even if you don’t have a fix yet. Why is this useful if there’s already a reproduction? If a regression caused the behavior, then the codebase will need a test once a fix has been found. Otherwise, it might regress again. Automated testing usually has fewer moving parts than a fullblown reproduction, so having one will focus efforts. Finally, automated tests are easier to execute and reduce development cycle
102
time compared to manual reproduction. Although it might take a few minutes to manually test a reproduction, having an automated test might drop your cycle time to a few seconds. Once you’ve got a test written, you can share it on the issue, or depending on the project, you can open a “failing test” PR to give anyone looking to work on the issue a significant head start. How do you turn a reproduction into a failing test, though? The easiest way is to find an existing test that exercises a feature close to the one you want to test. If you’ve got access to a commit that broke, then you’re ahead of the game. Did the commit or PR it is linked to include tests? If so, check in those test files. If not, then one trick that I love doing is to add an exception to the code and then run the existing tests. Any tests that fail are likely candidates for holding testing logic that you can reuse. If the commit only touched a single file, then you can use blame view on GitHub to explore other changes in the file. The blame output will show you the commit that introduced each line in the file. From there, you can click on a few commits and see if any of them introduced new tests. The idea is that if you can find tests for logic in other places in the file, then you can use that extra context to write new tests. To recap, to find a similar test, you can do any of the following: • Check the commit for other tests. • Add an exception near the change and re-run the test suites. • Explore the history of changes in the file to find other commits that touch test files. When you have a test file, you can look at, copy one of the tests, and modify it to introduce the same regression you saw in the reproduction example. It’s a big task and will require a combination of debugging and problem-solving.
103
Debugging a hanging program Often, bugs cause crashes or visible breaks, but what about when the program refuses to exit and just hangs? When a program is stuck, you’ll need to know where it is stuck before you can fix it. The best trick I’ve found is implementing a quick and dirty sampling profiler that essentially spits out the backtrace from all the threads executing in the background on a timer. The following is an example in Ruby: Thread.new do loop do sleep 10 # seconds puts "=" * 80; Thread.list.each.with_index do |t, i| puts "== Thread #{i}"; puts t.backtrace end end end
If you put this Ruby code into a program, then every ten seconds, it will spit out what the program is doing. When the program encounters the hang, then you’ll see the same backtrace location printed out repeatedly. Start looking there with your other tools. In addition to what’s listed here, I also share tips and techniques such as using “reflection”. I use this technique all the time in debugging slow or stuck builds on Heroku. In one case, a customer added the above lines of code to an initializer in their codebase that runs on startup. It didn’t fully hang but took a long time. After capturing several backtrace outputs, we saw that three out of five of them pointed at spending a lot of time in Ruby’s require (similar to import in Python or Node), which is used for loading code. We used that information to remove some dependencies no longer being used and sped up the boot time.
104
Note: Note that the ability to grab a reference to all running thread’s backtraces in this way is fairly unique to Ruby. For other languages, you can still sample the execution location of your program, but it may require specialized tools.
105
Recap 1. Debugging is a skill that developers can learn. I use many of these same tools and techniques for debugging customer applications at work. 2. Bugs that are already fixed don’t need investigation (does it work on main?); related issues might have related fixes; and asking, “When did this last work?” and “When did this break?” can bring magical insights. 3. Isolating the breaking behavior or change is the bulk of the work. Once you’ve gone that far, a fix or automated test that highlights the issue might not be far behind.
106
Giving feedback on feature requests Commenting on feature proposals and pull requests is issue triaging in hard mode. Handling bugs and reproductions have a logical progression. The process is reductive, meaning you’re trying to eliminate possibilities. Whereas with feature requests, it’s generative. There are more tradeoffs to consider. The right answer in a feature proposal will depend on values and judgments much more than “Is the broken code fixed?” This section presents you with a framework for reasoning about feature requests. If you’re just getting started contributing, you might want to skim it and come back later when you’re feeling ready for a challenge. Because a successful feature request needs a successful implementation to be closed, we will also be talking about reviewing pull requests. Here are helpful questions to ask when you find yourself reviewing a proposal: • Can you explain the problem being solved in your own words? • Did you validate the opportunity? • Did you express any hesitation, doubt, or dislikes around the opportunity? • If it’s a pull request, did you validate the implementation? • Did you discuss alternatives? • Did you ask for explicit changes? • Did you thank the issue creator?
107
Can you explain the problem being solved in your own words? Even though a feature request or pull request is framed as “I want to add something to this library”, it’s usually because they’ve hit a problem and want to solve it via a change to the library. A pull request is a specific implementation of a feature request. A pull request may reference an existing “feature request” issue, or it may serve as its own issue with code attached: • Feature request: Issue with no proposed implementation (Context and Opportunity) • Pull request: Issue with attached code (Context, Opportunity, and Implementation) For all feature requests, you’ll need to understand the context and opportunity the reporter is addressing. If you don’t understand what problem is being solved, then it will be impossible to determine if it’s worth solving. For example, before, we looked at a pull request proposed for adding a style key to the content_tag method of Rails: content_tag( 'div', 'Hello World', style: { color: '#48BC13' } )
In this case, the opportunity is that they want to be able to style an HTML element via the content_tag method directly. If you saw that pull request and were unsure exactly why they wanted that code, then maybe the person who opened the issue forgot to add context. If that’s the case, it’s okay to ask for extra information. You could ask the following:
108
I see you’re wanting to add a style key to content_tag . I’m not fully understanding the problem this is solving. Can you give me some more information about why you want this? What problem were you having that led you to identify this as a possible change? If you don’t feel confident explaining the problem to someone else in your own words, then that’s a sign you don’t have enough context. Beyond asking for clarification, you can search for similar issues on the issue tracker and on forums such as Stack Overflow. If you find that it took a significant amount of time to understand the issue, it might be valuable to echo back to the issue what you found and ask if your understanding is correct: I wasn’t completely sure why someone might want an inline style element in a div, but I researched and found . It looks like one possible reason someone might want to quickly add a style tag is to have more control over a custom div in a template. Is that right? If an issue is confusing to you, it might be confusing to others, including the maintainers. Clarifying the problem a feature request addresses can be valuable to anyone new who is coming to the issue.
Did you validate the opportunity? Once you understand the problem being addressed, then you’ll need to build an opinion on whether or not it’s worth being addressed. Adding code to a project adds liability. For a feature request to be successful, the value-added needs to outweigh the cost of implementation. The following is an example feature request that I made to Ruby on Rails: I found a problem where if a route was generated using this code:
It would raise an error like this: 109
No route matches {:action=>"show", :controller=>"users"}
This error was confusing because when I checked my routes, I had a route that matched the “show” action on the “users” controller. Upon investigation, I found that the issue was that this route requires an id parameter so that it knows which user to show. To solve the problem, I proposed adding extra information when this happens: No route matches {:action=>"show", :controller=>"users"}, :id key is missing
As a bug, this is in the “unexpected behavior” category because it’s violating the user’s expectations. As a feature request, it’s a modification because I’m trying to extend an existing error to add additional context. As you’re reading over my problem description, do you agree or disagree that this is a problem worth solving? If you lean one way or the other, how strongly do you feel? How would you go about asking me for more information or giving me feedback about that opportunity? It can be useful to the reporter and maintainers to know how others feel about the feature request. For example, you could respond with the following: You’ve described a very valid pain point, and I feel strongly that we should solve this somehow. We should make it more visible that this route requires an id parameter to be valid. You can validate the opportunity independently of the proposed solution (if there is one). Feature requests can get messy with talk about alternative implementations, additional workarounds, and general contextbuilding questions. Stating your explicit validation for the opportunity within a feature request can help prioritize and focus a request.
110
Do you have any hesitation, doubt, or dislikes you wish to express? It’s rare that you’ll have strong feelings about a feature request the first time reading through it. When you don’t feel strongly about the validity of a contribution opportunity, then you’ll need to search or ask for additional information. Part of validating or rejecting an opportunity is gauging its impact on users and its maintenance cost to the project. Low impact
High impact
Low maintenance
Maybe
Yes
High maintenance
No
Maybe
Ideally, maintainers want high-impact, low-maintenance solutions. Because the maintenance costs are likely tied to the implementation, you can focus on better understanding the impacts: I understand the problem you’re solving. However, I’m unsure of the impact or severity of this problem. How often do you experience this issue? How painful is it when you hit this problem? What are alternative ways you can work around this problem? When expressing that you’re leaning against an opportunity, you’re in a state of conflict with the issue creator. Beyond how you’re feeling (for/against), it’s helpful to state how strongly you hold that viewpoint (little/lot), along with any additional information that leads you to that conclusion. On the content_tag feature request, one commenter linked to a stack overflow entitled “What’s so bad about in-line CSS” and expressed that they didn’t like that practice. Beyond saying, “I don’t like it,” they linked to a resource that talked about some downsides to the practice. In addition to expressing concern, they also came to the table with curiosity and asked the following: Could you share your use case where this feature would be helpful? 111
One saying for such discussions is that those with strong opinions should be “weakly held”. This means that you should still invite curiosity and look to learn more about an issue, even if you feel strongly against it.
If it’s a pull request, did you validate the implementation? Let’s go back to my Rails No route matches example. The first maintainer who reviewed and commented said the following: […] I’d prefer that a url generation error in a template be treated as a template error so that you’d get some context on which route has failed to generate. They didn’t explicitly state whether they agree with the opportunity or not, but they’re also not asking questions about it either. Reading between the lines, it seems like they agree that it’s worth solving, but they’re disagreeing with my proposed implementation. With feedback from the maintainer, I began looking for alternative implementations. My original proposed fix was to catch the error at the last possible minute and check for this one specific missing key. With the guidance of the maintainers, I was able to find a way to write more generic logic that was applicable to any missing key. Mentally separating the validation of “opportunity” from “implementation” can help guide your feedback to the commenter and prevent you from getting hung up on the details. It can be tough to have an opinion on implementation if you’re not already deeply familiar with the codebase. That’s okay. You can still check for smaller implementation details. Is the feature tested? Is there a part of the implementation that’s difficult to understand or confusing?
112
Did you discuss alternatives? When discussing implementation, it’s helpful to also discuss alternatives. This could be a different way to achieve the same outcome, or it might be a different outcome that solves the same problem. There are two categories of fixes for any given issue: a local fix, or “workaround”, and a fix to the “upstream” open source project. A workaround is perhaps a helper method or extra code that a developer can add to get unblocked on their personal problem, but it doesn’t solve the systemic issue. An “upstream” fix is one that fixes the issue for everyone experiencing the issue. A pull request to a project is an “upstream fix”. If a problem is low impact and the workaround is easy to implement, then maintainers might opt to declare the issue out of scope, especially if the proposed implementation is cumbersome. As a reviewer, it’s helpful to brainstorm alternatives. Looking at the content_for pull request, it was ultimately closed by a maintainer who declared that it was behavior they didn’t want to encourage and that the local fix was “easy enough” for developers who wanted that feature. When looking for alternatives, look for ones that will decrease the maintenance burden for maintainers.
Did you ask for explicit changes? As you’re writing up your reply, it’s helpful to be explicit about any requests or changes. Issue creators can have a difficult time identifying if a comment is an observation or a suggestion. Instead of: I see you don’t have a new line at the end of your file. Explicitly state:
113
Please add a new line at the end of your file. Keep in mind that open source work happens asynchronously. If it’s unclear whether you’re asking for a change, then the person who opened the feature (or pull request) will either have to guess your intent or waste a feedback cycle asking for clarification. If you’re unsure about a suggestion, you can also state your uncertainty: I think this looks good, but I believe there needs to be a new line at the end of this file, but I’m unsure. Maybe someone else can verify if that’s correct or not. The goal with triaging feature requests should be to help move it forward and, not to add blockers to the process.
Did you thank the issue creator? When someone opens an issue, it’s usually because they want the project to be better and stronger. We know that they took some extra effort to write up their experiences, so we should validate that effort by saying a quick, “Thanks for opening this issue”. The majority of feature requests do not get merged or accepted. If it’s someone’s first issue, then that can be hugely demoralizing. When I close someone’s feature request, I explain why I’m closing it and tell them that I appreciate their efforts. If it was a pull request, I will point out how much they learned in the process. I’ll tell them how much easier the next PR will be. I invite them to keep contributing. I’ve found that many people respond very positively to these encouragements, and they are motivated to come back again. Projects need contributors, and contributors need validation. Maintainers can help validate these efforts, and you can, too. Thanking the contributor is a part of the invisible work that the maintainers spend their time on, and most of us don’t realize it. By
114
making the invisible visible and making some of these responsibilities our own, we help to shape our communities, our code, and our futures.
115
Recap 1. All feature requests are solving a problem. If that problem is unclear to you, it may be unclear to others. Ask questions or research the issue until you feel you could explain the core problem to someone else. 2. Separate validating a feature’s opportunity from its implementation. Asking if it is worth solving is a different question from how to solve it. 3. Be explicit when asking for changes. Thank issue creators and be nice.
116
Navigating conflict through communication (NVC) Clear communication isn’t just nice to have — it’s pretty much a technical requirement. When conflict comes up, it is critical that developers can express themselves without resorting to verbal attacks or biting sarcasm. In this section, I’m going to introduce you to a framework that I’ve found to help in these situations. Non-violent communication (NVC) is an evidence-based communication model that began in the midst of Civil Rights Movement in the 1960s. It was then formalized as part of a master’s thesis in 2008. Since then, more than fourteen masters and doctoral papers have tested the model and validated its effectiveness. You can 1 read more about the history and details on Wikipedia . At a high-level, NVC contains four components: • Observe: State what you saw/observed. • Emote: State how that observation made you feel. • Need: State what you, as a human, need (Independent of a specific request). • Request: State what specific action you are requesting from the recipient (Note that a request is not a demand). Programmers are communicators. We need to learn to develop societal relationships based on partnership and respect rather than a “retributive, fear-based domination.”
1. Wikipedia contributors. (2022b, September 19). Nonviolent Communication. Wikipedia. https://en.wikipedia.org/wiki/Nonviolent_Communication
117
Example: The Cop library To see NVC in action, I want to share a specific real-world example from the open source world. In the wake of police brutality protests in the United States, a contributor asked a popular Ruby library to remove the word “cop” from the library name. The request sparked what can only be described as a flame war. The following are a few examples of less-than-stellar communication techniques used on the post: • “From the people that actually use [library name].” This is a logical fallacy called “No true Scotsman”. It’s trying to claim moral purity and superiority of the position by claiming people who take the opposite position are not “true users” of [library name]. • Claiming that attempts to change the name are “policing me to do as they say”, that is, basically saying, “No, you’re the real police if you want to police the name of this gem”. The people asking for the name change are not authority figures and are not above the law. This is a false characterization of their action and ask. • “This is just so dumb” and more name-calling. • Claiming that “we need to keep our politics out of X/Y/Z”. Unfortunately, everything we do is political. Existence is political. • Claiming this is not going to accomplish anything, so, we shouldn’t do it. It’s OK that you might think this change is not impactful, but that’s a personal lens, not an objective fact. State your opinions as opinions. • “Get over yourself”, or the notion that you caring about a thing is, in fact, the problem; this is well … a problem. Rather than trying to “win” via dominating the conversation. NVC has a very different feel. The following is an example of how a pro and con comment could have been written:
118
• Pro-removal ◦ Observe: I see that people are protesting police brutality all over the world. ◦ Emote: It makes me uneasy that this project glorifies the work of “cops” and presents them as always right. ◦ Need: I need to express this outrage in every venue possible. ◦ Request: Will you consider removing it? • Anti-removal ◦ Observe: I’m watching people call for the change of the gem name. ◦ Emote: I’m confused as the movie “[gem name]” is supposed to be a satire of 1980s policing, and the actions of the linter are effectively described as an “enforcement”. ◦ Need: I need people to consider the origins of this name has an impact on its usage. ◦ Request: I would like to discuss other ways we can protest police brutality instead. These are wordy, but I believe that they get their points across well without dipping into yelling or insults. My post on the thread was eventually picked up and shared as a positive example with other Reddit moderators.
NVC on issues Often, conflict isn’t quite so dramatic or as clear cut. The most common source of conflict I’ve come across comes in the form of rude comments has been “Why don’t you just”, “You can simply”, or “This is taking forever”. Maintainers and contributors don’t come to open source to be punching bags. If someone is being rude to me, I want to distinguish between their intentions and their actions. If they’re intending to be rude, I shut it down. If the injury was an accident, they will usually apologize. The following is an example from a real issue:
119
Issue reporter: When my schema is generated in ‘[encoding format]’, [library] fails with an error. That’s roughly the full issue description. There’s not much info to go on. No reproduction, no details. The lack of care and attention to the issue is a bit of a red flag. On the surface, it seems they expect the maintainer to do all the work of not only fixing the issue, but of trying to figure out the problem description. But there’s no hostile or aggressive language. The maintainer of that library took the time to respond: Maintainer: Could you share an example schema with this error (and perhaps the error itself)? Thanks! I think that’s reasonable. The maintainer wants to protect their time. There’s a chance the reporter might not realize their issue was unclear. In this case, they’re giving the reporter “homework”. To which the reporter responded with the following: Issue reporter: It is simple, you just […], and it works. This comment comes across as rude to me. They are ignoring the requests of the maintainer to provide a reproduction case. The wording is dismissive of the time and effort required to understand the issue. If I wanted to write up a response in NVC, it might look something like the following: Possible NVC response: I see you’ve classified this topic as being “It is simple,” and “You just…” I feel worried because I don’t fully understand the problem and annoyed at having my attempts to understand it seemingly trivialized. I need issue reporters to help me understand the problem without being dismissive. Can you please help me understand how to move this issue forward without this extra commentary on how difficult or easy it is to understand?
120
How would the issue reporter respond to this? I have no idea. But I do know that my position and my “ask” are clearly laid out. Ideally, they can validate my emotion “I see you’re frustrated,” whether or not they are able to comply with my request. In my experience, trolls and jerks will double down, saying “How can you not see something so OBVIOUS […]”. Developers who are genuine in their desire to work on a bug together are often apologetic and quickly respond to my request: “I am sorry; that was not my intent”. Before working on open source, I always thought that maintainers provided free code. Now, I see that they provide free emotional labor as well. It’s natural for humans to face conflict. Humans make open source, so it’s natural for there to be conflict during contribution. To sum up NVC, using NVC: “I see people posting comments in my communities and accidentally missing one another. This makes me sad that we can’t communicate better as a community. I need a strong open source community. Because I’m a remote worker, my “online” communities are also my “real” communities. I ask that you consider adopting tools and frameworks for better communication, such as NVC.”
121
Recap 1. Using the NVC framework can help you communicate clearly when you’re revved up. 2. NVC has four stages: Observation, Emotion, Need, Request 3. Working through conflict is a valuable skill that contributors can improve with practice.
122
Issue cheatsheet Cheatsheet: • How to read and categorize issues ◦ Is it a bug? If so, is it regression or unexpected behavior? ◦ Is it a feature request? If so, is it a “modification” or an “addition”? • How to reproduce bugs ◦ If it worked: validate the example. ◦ If it failed: timebox and ask for extra steps. ◦ If no reproduction provided: ask for one. ◦ Can you guide the issue to a higher quality reproduction? • How to debug issues beyond reproductions ◦ Did you check to see if the bug is already fixed on the main branch? ◦ Can you write an automated test to reproduce the bug? ◦ Did you attempt to debug the failure to understand the root cause? ◦ For regressions, did you bisect commits? ◦ Slow executing and hangs can be debugged with the help of sampling the process or via reflection. • How to give feedback on a feature request ◦ Can you explain the problem being solved in your own words? ◦ Did you validate the opportunity? ◦ If it’s a pull request, do you validate the implementation? ◦ Do you have any hesitation, doubt, or dislikes you wish to express? ◦ Are you asking for explicit changes? ◦ Do you have alternatives you want to ask about? ◦ Did you thank, connect with, or validate the poster? ◦ The non-violent communication (NVC) framework is observation, emotion, need, request.
123
Writing Documentation "Documentation is a love letter that you write to your future self." - Damian Conway, computer scientist
124
Understanding documentation Documentation (or “docs”) are the bedrock of open source. Documentation allows developers to self-service. When they get stuck, the docs can lead them in the right direction. Documentation acts as a very important piece in the self-directing self-governing ecosystem that is open source. But what makes for good docs? Many “learn to program” courses don’t touch on technical writing, or if they do, they don’t get into the details of writing documentation. Writing documentation is one of those skills that you’re expected to pick up over the years. I’m here to break down the building blocks of what makes good documentation. In this section, we’ll be starting from the basics of what counts as documentation. Then, we’ll go into detail on how you can start improving existing documentation, and then writing documentation from scratch.
Types of Documentation There are two distinct and common types of documentation in a project: • User-level docs: Think tutorials and guides like walk-throughs and guides. • Reference-level docs: Think class and method documentation (program API documentation). Depending on your language ecosystem and the library you’re working with, they might prioritize and provide these two distinct documentation types in different amounts. It’s often common to provide no docs at all in many libraries. We should change that.
125
If anyone says, “It doesn’t need documentation; someone can read the source code to see what it does,” then you can explain that the point of documentation is to provide context: to explain why it exists and to show how to use it. This is distinctly different from “how” the code accomplishes that task. Reading source code can provide the “how”, but it usually does a bad job of explaining the “why”. User-level documentation (project guides) To understand this type of documentation, let’s look at some 1 examples. Ruby on Rails maintains a suite of guides side by side with their source code that gets published to a website. These docs are best for showing the common patterns and for being able to group similar content:
1. Ruby on Rails Guides (2022) https://guides.rubyonrails.org/
126
In this example, the Rails guides describe various interfaces and links to other more specific guides. For instance, The Active Record guide introduces methods such as where and joins and how they all work together to build SQL queries. Another example of a “user-level doc” is “The Rust Programming 2 Language” book . It’s an entire open source book that covers getting started in the language.
2. Klabnik, S., & Nichols, C. (2021). The Rust Programming Language - The Rust Programming Language. The Rust Programming Language. https://doc.rust-lang.org/stable/book/
127
Other smaller projects might not have a dedicated “guide”, but most will at a minimum put some high-level use cases in a readme. Below 3 is a screenshot of the README of a library I maintain that shows you how to plot a histogram.
3. https://github.com/zombocom/mini_histogram
128
One way that I think of project-level docs is as “evergreen” tutorials. Without these types of documentation sources, this type of content usually ends up being created in blog posts that quickly get out of date or through proprietary books and videos that may be difficult to access. Although user-level docs don’t replace blogs or third-party books, there’s value in centralized, open, documentation that goes beyond a pure reference. What kinds of things end up in project-level docs? Sometimes, people don’t need to know the details of specific methods but want a broad overview. They may want to see how different components are expected to be used together. Sometimes, people don’t want to hunt for multiple locations to get the information they want, and lots of people commonly want to see the same kinds of information.
What if a project doesn’t have user-level documentation? If a project you’re working on doesn’t have this kind of “project” documentation, consider how you would group content to start by writing multiple “cheat sheets.” If you have to look up different documentation pages frequently to accomplish the same type of activity, then perhaps this is a sign that it could be promoted to a project-level doc.
129
Another way to think of these higher-level docs is that they’re more focused on tasks than on low-level plumbing. Because of this taskbased focus, you can also have different guides that speak to diverse communities. For example, when I first took over maintenance for the asset processing library Sprockets, all of the “how-to” guides were right in the README. Though it was nice to be able to CMD+f ( ⌘ F ) through to find what you need in one place, the thing was long and unwieldy. Some users reading docs wanted to know how to integrate it with their library. Some wanted to know how to use it to generate javascript. And some wanted to know how to extend the library with plugins. These are the main “personas” of users coming to the documentation. Because each user who came to those docs might have a different goal based on their persona, having all the docs on one page was confusing and overwhelming. As a fix, I ended up making the README focus on the most common use case. Then, I split the other use case into their docs. I also added another persona, which was “developer working on the codebase internals”. This focus on getting a specific task accomplished lets your documentation be more detailed and accurate. The biggest downside with higher-level documentation is that it might not live where the behavior does. For Sprockets, I made a guides/ folder and put markdown files in it. Because of this, it’s more likely that this documentation might get out of date faster than “reference level docs”, which are closer to the code and less likely to change. Even given this downside, I still really like having task-focused guides. They can be a huge time saver and an excellent place for people to document “gotchas” and best practices while using a library that might otherwise be hidden in a blog or forum post.
130
If you find a guide and see that it’s incorrect or not up to date, then you can get a doc commit by fixing it yourself, or you can open a documentation bug report by letting the maintainers know that it needs some attention. Reference-level Documentation
Note: I’m going to start showing code examples. I’ve picked Ruby for my examples, but the high-level concepts still hold for most languages. I’ll explain syntax as needed, so don’t worry if you don’t know Ruby. In Ruby, you start a comment with a pound symbol # (if you didn’t grow up with a landline, you might know it as a hashtag).
Usually, reference documentation (also known as program API documentation) lives in line with the code, typically above the method or class it is documenting. This type of documentation is called “reference” because it’s intended to be useful in isolation. When a developer needs to know how a specific method works, they will “reference” the API docs of that language. Doing this helps to see specific examples.
131
In the example below, the comments are above the method definition 4 of FileUtils.cd # Changes the current directory to the # directory # +dir+. # # If this method is called with block, # resumes to the previous working directory # after the block execution has finished. # # FileUtils.cd('/') # change directory # # FileUtils.cd('/', verbose: true) # # FileUtils.cd('/') do # change directory # # ... # do something # end # return to original # def cd(dir, verbose: nil, &block) fu_output_message "cd #{dir}" if verbose result = Dir.chdir(dir, &block) fu_output_message 'cd -' if verbose \ and block result end
These code comments are the reference docs for FileUtils.cd, making this a “method” doc. Special purpose tools such as Ruby’s RDOC or YARD tools will scan a project for comments in specific places (above methods or above classes) and then generate a webpage that can be published on the internet. Then, these docs can be searched and referenced. The figure below is how that looks in the browser.
4. https://ruby-doc.org/stdlib-3.0.0/libdoc/fileutils/rdoc/FileUtils.html#method-c-cd
132
Because reference documentation is so focused, they can include details about all the possible inputs and options, while a higher level “user-level” tutorial might only show one or two. The other benefit of writing very focused documentation is that it is easier to update. When the code changes, the reference doc usually lives right above it. In comparison to “user-level” guides, that rely on dozens of interfaces and can get out of date quickly.
Documentation layout User-level documentation might be organized by task, whereas reference-level documentation is usually organized by context. For example, you’ll find all of the FileUtils methods on the same page. In addition to code comments being used to generate reference documentation, the documentation-generating tools also use implicit information. For example, if you visit the doc webpage, it also lists the method’s argument signature, even though that’s part of the code and not explicitly added as a piece of documentation.
133
I used Ruby as an example here. Still, most languages support a similar reference-level documentation mechanism, and some even have more exciting and exotic options. You also don’t need to generate and publish docs on the internet for them to be useful.
Documentation Flow In general, most documentation and “technical writing” follows this pattern: • Title or intro (what code are you documenting). • Description of what the code is supposed to do (why does it deserve to exist). • An example or multiple examples (how do you use it). • Common caveats or edge cases (watch out for those sharp corners). Although not all docs are in that order, most complete docs have all of these elements. Depending on the code being documented, the title and description might blend. Looking at the last example: # # # # # # # # # # # # # # #
Changes the current directory to the directory # +dir+. If this method is called with block, resumes to the previous working directory after the block execution has finished. FileUtils.cd('/')
# change directory
FileUtils.cd('/', verbose: true) FileUtils.cd('/') do # ... end
# change directory # do something # return to orig
• Title or intro (what code are you documenting)
134
The title is right at top: “Changes the current directory to the directory +dir+.” • Description of what the code is supposed to do (why does it deserve to exist) The description continues where the title left off: “If this method is called with block, resumes to the previous working directory after the block execution has finished.” • An example or multiple examples (how do you use it): ◦ FileUtils.cd('/') ◦ FileUtils.cd('/', verbose: true) • Common caveats or edge cases (watch out for those sharp corners!) As you’ve seen here, this piece of documentation doesn’t have any caveats. Adding such a caveat might be a good contribution opportunity. If I were to add a caveat, I might mention that running FileUtils.cd within another block results in a warning.
FileUtils.cd("/tmp") { FileUtils.cd("/tmp") } # => warning: conflicting chdir during another chdir block
135
Good documentation says “why”, not “how” If you ever find documented code that seems redundant, then it’s likely the documentation was describing “how” the code works, rather than “why” it was needed. The following is an example of docs saying “what” code is doing: # Adds two elements together def add_two(a, b) return Integer(a) + Integer(b) end
You don’t need to document the comment “Adds two elements together” because that information is right in the source code. It is redundant. But you might ask, “Why does this code exist? Couldn’t the developers add two things together on their own?” Good docs should answer these questions. The following is a hypothetical rewrite on the “why” the above code exists: # Convert to integers and add # # The code commonly takes in integers as # strings and needs to add them. If # the strings have invalid data we want # to raise an error. # # add_two("2", "2") # # => 4 # # add_two("2", "lol") # # => `Integer': invalid value: # # "lol" (ArgumentError) # def add_two(a, b) return Integer(a) + Integer(b) end
Reading this new documentation, it’s clearer that an exception should be raised on invalid values. It’s the same code, but now we know “why” someone wrote it. Looking at it another way, the source code is the implementation; the docs encode the context and opportunity of why it was implemented. 136
Good documentation should help us answer the question, “If I deleted this code, what would the consequences be?” In this example, the consequence isn’t “developers can no longer add two things together”. It would be “developers would need another way to verify their input string values are valid integers”. Good documentation answers, “Why does this code deserve to exist?” Lots of people think that the primary skill of a talented programmer is writing code. Great programmers must also excel in communication. Writing code ultimately boils down to being able to communicate your desires and intentions to a computer. That’s why it’s called a programming language. The more time you spend writing documentation, the better you’ll be as a communicator and programmer. If maintaining and reading code is the hardest part of a programmer’s job, then writing documentation is one way to ease those burdens. It’s a skill that many of us know we should be exercising more, but we don’t know how to get better. Like the other contribution tasks that I’ve introduced you to, I don’t expect you to be an expert on the first attempt. I want you to spend some time reading and thinking about documentation, noticing what makes some documentation better than others. We’ll cover some features of what proper documentation has in common and some checklists of the things you can add or improve yourself. By the end of this section, you should be ready to start documenting your code and code you’ve never seen before.
137
Recap 1. Documentation is commonly produced as user documentation (such as a guide) and reference documentation (such as program API docs). 2. User docs (guides and tutorials) bring more context and are more approachable but are harder to update and maintain. Reference docs are easier to update but may carry less context. 3. Good documentation answers the “why” of code, not the “how”. It should answer, “Why does this code deserve to exist?”
138
Documentation formatting One of the critical pieces that make reference documentation more than “just code comments” is how it is parsed, rendered, and enhanced by documentation software. We’ll be looking at some more Ruby code examples to show the various ideas behind various formatting and enhancement options. Even though your language might have a different syntax, you can see how documentation formatting works in general by following along with this example. In this section, we’ll be focusing on how to answer the following questions about documentation: • Is it free of grammar or spelling mistakes? • Are documentation syntax and formatting features properly used, such as example code highlighting and method linking? • Are common conventions followed?
Note: We will use Ruby and RDOC for example purposes. Your programming language might have more than one set of reference documentation tools. There are some similar features among all documentation tools. We’re focusing on what they can format rather than the specific syntax for formatting. The example documentation has been reformatted for print and ebooks. Specifically, the comment characters (#) in front have been removed in some examples to allow for word wrapping.
Previously, we looked at the docs for FileUtils.cd, and now we’re going to take a peek at Ruby’s standard lib file base64.rb:
139
The Base64 module provides for the encoding (#encode64, #strict_encode64, #urlsafe_encode64) and decoding (#decode64, #strict_decode64, #urlsafe_decode64) of binary data using a Base64 representation. == Example A simple encoding and decoding. require "base64" enc = Base64.encode64('Send') # -> "U2VuZCByZWluZm9yY2VtZW50cw==\\n" plain = Base64.decode64(enc) # -> "Send" The purpose of using base64 to encode data is that it translates any binary data into purely printable characters.
It is helpful to view generated docs to understand how code comments are mapped to a webpage. The image below is what the 1 Base64 doc looks like.
1. https://ruby-doc.org/stdlib-2.6.5/libdoc/base64/rdoc/Base64.html#method-i-urlsafe_decode64
140
Let’s take a look at a few of the components of this doc. At the top is a sentence explaining the purpose of the code. Also notice that some of the elements such as strict_encoding64 and decode64 are turned into links in the generated HTML. Is it free of grammar or spelling mistakes? Yes. When docs have typos, sending in a pull request to fix it is a quick win for everyone. See the pull request guide for detailed instructions on how. Are documentation syntax and formatting features properly used? Previously, we noticed that some elements were converted into links.
141
The next thing to notice is that the example code got turned into a black section with syntax highlighting: require "base64" enc = Base64.encode64('Send') # => "U2VuZCByZWluZm9yY2VtZW50cw==\\n" plain = Base64.decode64(enc) # => "Send"
In Ruby docs, examples are indicated by their indentation. The extra spaces cause this part to be rendered differently. These examples are formatted correctly. If you’ve got a sharp eye, you might notice that #encode64 and #decode64 were not turned into links. We will explore this problem in the next section. Are common conventions followed? Yes All the code here is valid, and the outputs are represented. One thing to notice is that return value examples are specified first with a comment character (#) then an arrow, and then the value: # => "U2VuZCByZWluZm9yY2VtZW50cw==\\n"
and # => "Send"
This hashtag and arrow convention are useful because the example can now be copied and pasted into a Ruby REPL (such as irb) directly from the website. That’s because all the non-executable bits come after the comment character (#). Although these formatting options (links and examples) are being shown using a class doc, they work the same with method docs.
142
Example: Opportunity finding through documentation exploration Originally, when I saw that #encode64 and #decode64 weren’t rendering, I thought that maybe the formatting was off on the docs, but after checking it looks like it’s a bug in the way Ruby handled documentation with RDOC. When something in the documentation is confusing or looks off, it’s worth writing it down. If you research it and it’s correct, then you’ll learn more about the topic. If it’s wrong, then you’ll have a contribution opportunity on your hands. All we were doing is reading the documentation, and without even trying, we’ve already identified a potential open source contribution. When an opportunity like this presents itself, I’ll write that idea down in my backlog notebook. While you’re working on building your “context”, you never know when you’ll stumble on a great contribution “opportunity.” In this real-world example, I identified the issue, and another committer fixed the problem in ruby/rdoc in 2019. How long has that problem been around? When I checked, that documentation has been in Ruby since 2008; that bug may have existed for eleven years with either no one noticing or no one caring enough to report the problem. I guarantee there are plenty of opportunities for improvement out there if someone like you can just take the time to find them and report or fix them.
Documentation conventions: Class versus instance method syntax (for Ruby) Each language and ecosystem will develop conventions for various concepts that might only apply to that language. These conventions may or may not map back to actual code.
143
In some of the below documentation, Ruby uses a convention to denote the difference between methods on a class versus methods on an instance. This can be confusing, so it’s worth explaining the syntax. When documenting a class or module method, the “dot” syntax (.) is used. For example, this syntax HelloWorld.call might be referring to a class method like the following: class HelloWorld def self.call puts "hello world" end end HelloWorld.call #=> "hello world"
In comparison, methods that are an instance of a class are documented differently using a hash symbol ( # ) to denote an instance method GoodNightMoon#read: class GoodNightMoon def read puts "goodnight red balloon." end end moon_instance = GoodNightMoon.new moon_instance.read #=> "goodnight red balloon"
In this case, the read method is not available directly on the class (GoodNigthMoon constant) but is only available on the instance (Good NightMoon.new; note how we had to call new on the class first to get an instance). To recap, HelloWorld.call represents a class method, while Good NightMoon#read represents an instance method. You will see this “hash” syntax to indicate an instance method in some examples, and it’s commonly a source of confusion.
144
The following is another example of a class method: User.where( name: "schneems" ).first
In this case, the where method is called directly on User constant, so it would be documented as User.where. To compare, calling User.where.first will return a user instance: user = User.where( name: "schneems" ).first user.name #=> "schneems"
In this case, the name method on the last line is called on an instance, so we would document it using the syntax: User#name.
145
Recap 1. How the project’s documentation tool formats the output will determine what syntax and features are available in your documentation toolchain. 2. You can find contribution opportunities by reading documentation critically. Common problems include grammar mistakes, syntax or formatting problems, and coding conventions being used incorrectly. 3. In the coming examples, if you see Dog#bark, it means the bark method on a dog instance.
146
Documentation examples Of all the common parts of documentation (title, description, example, etc.), the example is the most important part. Developers of all skill levels will lean on examples to help them build mental models of how the described code behaves. I created these checklist questions to help guide your sense of how an example section is impacting the reader. To see where these questions came from, we’ll review real-world open source documentation and score it against this checklist. • Does it contain an example? • Does it contain multiple examples? • Is the code executable? Can it be copied/pasted? • Does the example code accomplish a real task? Does it avoid foo/ bar variables? • Are the examples demonstrating best practices?
147
Dive into examples: FileUtils.mkdir_p 1
Look at, Ruby’s FileUtils.mkdir_p documentation . It it has an executable example code front and center: # Creates a directory and all its parent # directories. # # For example, # # FileUtils.mkdir_p '/usr/local/ruby' # # causes to make following directories, # if they do not exist. # # * /usr # * /usr/local # * /usr/local/ruby # # You can pass several directories at a # time in a list. # def mkdir_p( list, mode: nil, noop: nil, verbose: nil) # ... end
Does it contain an example or multiple examples? Yes. Does it contain multiple examples? No. If you can find another good use case, there might be a good documentation opportunity. Are there other everyday use cases that could be demonstrated?
1. https://ruby-doc.org/stdlib-3.0.0/libdoc/fileutils/rdoc/FileUtils.html#method-c-mkdir_p
148
Looking at the source code, the following is the method signature: def mkdir_p( list, mode: nil, noop: nil, verbose: nil)
Our example only covers one of these method arguments and hints at another mode (can either pass in a string or array of strings). How would I write an additional example? I would look for real-world use cases of that method first, here by searching through the local code where it’s defined, and then searching externally on Stack Overflow or GitHub. When I search for “stack overflow FileUtils.mkdir_p”; the first result I get is a question about how the mode option works, so that would be a good candidate. Is the code is executable? Can it be copied/pasted? In our code, developers can execute the code directly. You can copy it directly into a Ruby program, and it will execute and have the result described (creating directories). A caveat that isn’t mentioned, is that the path the example is using /usr/local/lib/ruby is expecting a ‘nix-like operating system (such as Mac OS or UNIX). If you’re on windows, you’ll need a different file path (starts with C://), and the path slashes point the other direction. Depending on the common conventions of the library you are documenting, it might be worth mentioning that difference. Does the example code accomplish a real task? Yes, this is representative of a real task. When an example is overly abstract, especially if it uses foo and bar, it’s difficult for the reader to imagine themselves writing or needing that code. The more representative the docs are or if they are real tasks, the easier it will be for a user to mentally “try it on” and see that it fits and makes sense.
149
Are the examples demonstrating best practices? The example is fine as is, “but this criterion doesn’t apply to this particular example.” See the next example below for a better demonstration.
Dive into examples: Dir.mktmpdir Below is the documentation for another standard lib method: Dir.mktmpdir. Let’s compare how the example sections stack up to our checklist.
Note: These docs are a bit longer. Feel free to skim them and come back as needed.
150
Dir.mktmpdir creates a temporary directory. The directory is created with 0700 permission. Application should not change the permission to make the temporary directory accessible from other users. The prefix and suffix of the name of the directory is specified by the optional first argument, prefix_suffix. - If it is not specified or nil, "d" is used as the prefix and no suffix is used. - If it is a string, it is used as the prefix and no suffix is used. - If it is an array, the first element is used as the prefix, and the second element is used as a suffix. Dir.mktmpdir {|dir| dir is ".../d..." } Dir.mktmpdir("foo") {|dir| dir is ".../foo..." } Dir.mktmpdir(["foo", "bar"]) { |dir| dir is ".../foo...bar" } The directory is created under Dir.tmpdir or the optional second argument tmpdir if non-nil value is given. Dir.mktmpdir {|dir| dir is "#{Dir.tmpdir}/d..." } Dir.mktmpdir(nil, "/var/tmp") { |dir| dir is "/var/tmp/d..." } If a block is given, it is yielded with the path of the directory. The directory and its contents are removed using FileUtils.remove_entry before Dir.mktmpdir returns. The value of the block is returned. Dir.mktmpdir {|dir| # use the directory... open("#{dir}/foo", "w") { ... } } If a block is not given, The path of the directory is returned.
151
In this case, Dir.mktmpdir doesn't remove the directory. dir = Dir.mktmpdir begin # use the directory... open("#{dir}/foo", "w") { ... } ensure # remove the directory. FileUtils.remove_entry dir end
Does it contain an example or multiple examples? Yes Does it contain multiple examples? Yes Is the code executable? Can it be copied/pasted? No. You can not run all the code out of the box. This Ruby code isn’t valid because this is not valid ruby code: Dir.mktmpdir { |dir| dir is ".../d..." }
Specifically, the bits in the middle dir is ".../d..." are not valid Ruby. We could instead make this example executable by changing it to something like the following: Dir.mktmpdir { |dir| print dir } # => "../d..."
Now, there’s no syntax error. Another option could be to comment out that second line, but whenever possible, my preference is to “show” example code performing a task instead of describing it. Does the example code accomplish a real task? That example code doesn’t do a great job. Why would I want a temporary directory? Based only on those docs, I’ve got no idea. 152
That being said, it’s not always possible to make an example that focuses only on the code we want to document. If I were to add a real-world case, I would start with something more representative at the top and then keep the existing examples that show various options at the bottom. I would maybe see if Dir.mktmpdir is used anywhere inside of the ruby/ruby codebase and use that as the basis for an example. Are the examples demonstrating best practices? In the File Utils.mkdir_p example, I gave this example a rubber stamp of approval but didn’t go into detail. In this scenario, the method is less “safe” than mkdir_p because every time, you make a temp directory. You also want to make sure to clean it up afterward. The method Dir.mktmpdir is commonly used in either “block mode” in Ruby. Using this mode will auto-clean up the resource for you when it exits, so it’s “safer”. The following is block mode: Dir.mktmpdir { |dir| dir is ".../d..." }
The following is reference mode: dir = Dir.mktmpdir begin # use the directory... open("#{dir}/foo", "w") { ... } ensure # remove the directory. FileUtils.remove_entry dir end
The community best practice is to use block mode, because this is the option that is the most prominent in the docs. The “reference mode” is documented as an option. One important thing to notice is that the documentation explicitly adds error handling and gives you the tools you need to clean up manually right
153
in the docs (FileUtils.remove_entry dir). This is a great example of documentation being used to share the “right way” that code should be used with the reader. With all this context, I would say this method doc does a great job demonstrating best practices.
How do you keep the example code up to date? Example code in documentation is great, but it has to stay in sync with the code for it to be valuable. If a method signature is updated and if behavior changes, then the documentation needs to be updated as well. If the example isn’t valid and representative of the code, then it’s worse than no example at all. If you’re reviewing a PR that changes code that is documented, you have the opportunity to check that the docs are still an accurate reflection of the code changes. Often times, tests will also need to be updated in addition to docs. It’s easier to remember to update tests though because if behavior changes significantly, the test suite will often fail and alert the developer. If you encounter docs that differ from how the code works in the wild, then there’s a high likelihood you’ve found a contribution opportunity. Either the code needs to be fixed to match the docs or the docs need to be fixed to match the code. Make sure you’re using the latest release and that the docs you’re looking at target that release before doing the work. One relatively recent trend among newer languages is to be able to “test” documentation examples. Rust has a feature called 2 “documentation-tests” that allows examples to be executed.
2. https://doc.rust-lang.org/rustdoc/documentation-tests.html
154
3
Python has a similarly named “doctest” module/feature, as shown below.
""" This is the "example" module. The example module supplies one function, factorial(). example,
For
>>> factorial(5) 120 """ def factorial(n): import math result = 1 factor = 2 while factor