Drawbacks of Pull Request Reviews
There is ample quality material on how to effectively review a pull request. This post is not about how to do a code review but what the purpose of a code review is, how most people do it, what the drawbacks of the most popular approaches are, what other techniques we can use to supplement or replace a traditional review process.
The main purpose of a code review is to provide feedback to your peers so that the quality of the codebase, and in turn the product being built, is increased, or at the very least, not reduced. When most people say “code review”, what they mean is that Developer A has created a pull request and Developer B (often more senior) is asking questions and providing feedback on the pull request. This Google Engineering article is one of many great resources on how to effectively do such reviews and if you want to dive into the mechanics, go right ahead. There’s no need to read any further.
I will offer some critiques of pull request code reviews and then provide some alternate approaches, and end off with one tip which you can use if you insist on traditional PR reviews.
Asynchronism: One drawback of the pull request code review is that it is an asynchronous process with feedback between the two parties not happening in real time. The process of creating a PR, reviewing it, providing feedback, implementing feedback, reviewing it again and so on, is inherently wasteful due to the delays naturally built into the process. What could have been a simple 10 minute conversation is spread out across hours or even days in an “offline” manner. Developers end up communicating via Github rather than face-to-face, reducing the amount and quality of what can be communicated.
Browser View: From a reviewer’s perspective, it is tempting to look at the Github PR and review it through the browser interface. This almost always hides the context of the change with Github showing you diffs of changes rather than a holistic view of the repo. Yes, the reviewer can git checkout
the branch and run it locally, but that is often a bridge too far, especially if the reviewer themselves have a stack of tasks to work on. This missing context hampers the effectiveness of code reviews resulting in blind approvals and not the right questions being asked. The design is obfuscated and line-level detail gets too much attention.
Too Big: I have often seen developers nit pick on a pull request containing three files, but blindly approve ones containing 25 files. PR fatigue is a real thing and as the size of the pull request increases, the quality of the review decreases. Kanban enthusiasts would be quick to point out that there is a direct connection here between limiting Work in Process (in this case, limiting the size of PRs) and quality (of the review). But that is a whole another story. Let me leave it at this: PRs have a great probability of being ignored if they get too big, and way too often developers working on feature branches for days submit PRs that the reviewer can barely understand, let alone have the energy to intelligently critique.
Missing Business Context: When the reviewer is reviewing code in a PR, they don’t have the context of the requirement which the code addresses. The developer developing the feature has full context but his or her work is being reviewed by a person with partial context. The reviewer is left wanting to know more about what requirement the developer was trying to satisfy, so that they understand the design decisions better. We can go back-and-forth in PR comments, review the user story to understand things better, but these are all slow approaches to gain the business context needed to effectively provide feedback about how well the solution addresses the requirement. Our focus in a PR review is mostly about the solution, not the problem.
Context Switching: There is a tendency to fire and forget PRs. A developer can create a PR, fire it off to their engineering lead, and start working on the next requirement. A few hours go by and there are review comments to handle, and at this point the developer parks what they were doing and starts addressing round one of the comments. Let’s see what’s happened here: now we’re working on two requirements at the same time and we’re context switching between multiple things. This is a productivity drain if I ever saw one.
Lack of Collective Ownership: If you work in an environment where “his code” and “my code” are part of the lexicon for describing ownership, then this point doesn’t apply to you. I have observed that this is an anti-pattern for most cross-functional teams where it is desirable that one person can pick up where others left off. Just by the nature of PRs, the author of the pull request is the one writing the code and the reviewer reviewing the author’s code. Although they may communicate via a PR and even get on a screen-share to learn more, the implicit ownership of the code remains with the reviewer and it becomes “their code” which is being reviewed. If something goes wrong, we look at the person who wrote it, not who merged it in.
Sunk Cost Reduces Refactoring: Continuous refactoring and test-first development are table stakes for producing high quality code, but many developers don’t follow either. When the lack of test-first development is combined with a traditional approach to PR reviews, the desire to refactor code diminishes as we have to “go back” and change the code based on any feedback. As time has passed since the change was made and the PR was reviewed, the motivation for both the reviewer and the author to change notable parts of the design is low. It’s a classic sunk cost bias where what’s in front of you is accepted when a better solution would be available, only because you’ve already invested time and effort.
Waiting: This is an obvious one. You create a PR and then wait for the reviewer. Whenever I’ve done the Wastes of Software Development retro with any team, this item is one of the first to pop up.
So if a Github PR review process is filled with so many issues, what’s an approach that addresses this? Before I get to that, I want to acknowledge that everyone learns differently and what may be an effective learning method for some may be terrible for others. However, I’ve at this point in my life concluded that pairing and mobbing are far better techniques for providing feedback on code, which also have highly desirable side effects of knowledge sharing, reduced waste, and more importantly, shared code ownership.
I will not go into pairing as much, much smarter people than me have written about pairing and are a Google search away. Sam Fare has written about his experience using mobbing for a prolonged period which I think describes my experience with it well enough.
I will say this about both pairing and mobbing: it can be uncomfortable to get started because both parties (or more than two in the case of mobbing) receive feedback and share their ideas as they are developing the feature, not post factum. The volume and quality of communication is significantly higher allowing design decisions to be critiqued in a synchronous manner, and any feedback is implemented right then and there instead of on a delay.
The amount of experimentation grows exponentially as pairs can diverge and converge on an idea quickly. They can try different things, compare results and pick the best one. Or they can lay groundwork for what’s ahead. For example, it is normal for one developer to be working on a function while another is researching an approach to something they’ll face in a few minutes.
The rotational nature of both allows everyone to experience the code rather than be slotted into a reviewer/author role, which tends to puts artificial bounds on the process. Once you bake in short 5-minute retrospectives into the process (maybe every 45 minutes), you get a continuous improvement loop which is hard to beat.
Lot of people when they try mobbing and pairing will wonder why not do two things at the same time to make things go “faster”? Wouldn’t it be quicker to have two streams of work rather than multiple people working on one thing? That is a fallacy and an example of local optimization which constrains the larger system. It also leads high WIP, low throughput, an increase in cycle time and delayed integration, which means higher risk.
I did promise one tip when doing code reviews post factum and here it is: review the tests that the author has written first. If the tests don’t follow a straightforward setup/act/verify structure and are hard to understand, there is a high likelihood that the design/code is questionable. It isn’t the code that goes to production which tells you whether it’s well-designed, it’s the tests that cover that code which do.