Examples of the hidden wastes of software development
The Toyota Production System’s take on manufacturing processes changed the global automobile market and has been the subject of thorough analysis by researchers and academics. The application of those principles to software development included a focus on waste elimination which has always fascinated me.
The thought of doing less to achieve more is beautiful on many fronts, and as I’ve gone from engineering to architecture to management to whatever-I’m-doing-now, it has stayed with me. The idea of essentialism, i.e., doing less but better is a related one and goes hand-in-hand with the idea that to achieve more and higher quality results, we don’t need to work more but we need to eliminate activities that don’t bring us value.
The seven wastes mapped from lean manufacturing to software development are shown below, and there are several articles describing the relationship. I felt that a deeper look at these hidden wastes is needed because there are many instances of waste beyond the obvious ones that need to be called out. These wastes are hard to spot mostly because they’re ingrained in our thinking and we see them as normal. This article examines the key types of waste, provides non-traditional examples so that we may reflect upon them and examine our working environment with a slightly improved lense to eliminating waste.
The eighth waste which was not explicitly mentioned in the Toyota system but is specific to software development is unused talent, i.e., not utilizing the creativity and talent of the people we have.
The below examples are not organized by as I found myself spending too much time figuring out which category they belong to, which was wasteful. The categories are less important than developing a sense for identifying wasteful behaviours and processes so you may work to eliminate them. The examples are:
In progress stories
Unreleased code
Unimplemented designs
@ignored and slow tests
Stale Definition of Done
Definition of Ready as a gate
Large backlogs
Redundant application architectures
Over-tasking
Lack of release retrospectives
Coupling Peer Review to Pull Requests
Interrupting a programmer
Waiting for remote builds to tell the real story
“I did my part”
Defects
1. In progress stories
Inventory minimization is at the heart of lean thinking. The simplest explanation is that firms should not purchase raw materials unless they plan to use them to produce products in the short-term. Even if the cost of raw materials is lower when purchasing in bulk, the materials lying around on the factory floor have a hidden cost, notably higher cycle time, i.e, the time it takes to go from start to finish. That is the idea behind limiting WIP.
There is a reason that on Kanban boards the In Progress column is smaller than the rest. It is supposed to limit the amount of work people are spread across. The idea is that we’d rather finish 2 out of 5 things than complete 80% of all five things. The illusion of progress when working on five things simultaneously is just that - an illusion. When combined with the Pareto principle that for many things, roughly 80% of the effects come from 20% of the causes, it implies that even though we think we’re 80% through, we’re far from done.
Let’s walk through an example I think is fairly common. Suppose there is a feature to be developed that requires some analysis, design, development, testing, and some refactoring (assume we’re not using TDD). If many such activities are started concurrently, there is a high likelihood that we’ll approach the final days of our sprint with all activities in the testing phase. This testing phase might require the use of a shared resource (e.g., an environment, data set, accessibility expert) that can only be used one at a time. At this point we encounter gridlock and the sprint ends while the first of the five items in testing phase, resulting in high work in progress, but zero items completed.
Contrast this approach where available people are reasonably allocated to a single task through pairing or mobbing, which means that the task gets to the testing phase sooner, alleviating the strain on the shared resource, i.e., there is only one item in the shared resource’s queue. Once processed, all other things being equal, it can service the next task, assuming people started the analysis of the next task at a logical time. The grounds of this thinking lies in Theory of Constraints and smaller batch sizes. We have to fight the illusion of many things being in progress as lots of work getting done. In reality, it is the opposite.
2. Unreleased code
Code developed but not released is arguably the biggest illusion of getting things done there is and a form of yucky inventory. Software needs feedback to be validated and as more code and features pile up in “test” environments it makes it harder and more expensive to develop new capabilities. I will illustrate this with two examples which I think are common enough to resonate.
First, the management of source code becomes difficult as code is queued in non-production environments. As new features are developed they are created in feature branches spawned off the main line which require a merge at some point. The more unreleased code we have the more feature branches spawned off older HEADs. The cost of the merge increases with every feature branch as the probability of conflicts increases. Integration is also delayed.
Production fixes which are done off older tagged releases also require eventual merges means increased dependencies and a strict linear sequence of working with code. Slowly but surely you find yourself in a situation where you’re begging for a release because the sheer maintenance of code is a nightmare and you just want to move up the commit chain. The amount of time spent on this maintenance is waste and the root cause is code not being released fast enough (one way you might accelerate this is feature flags and trunk-based development).
Second, as amount of unreleased code increases, feedback is delayed. Assumptions about future features pile up and the probability of developing the wrong feature increases. The chance of a team developing the wrong thing is inversely proportionally to the frequency of releases. It is tempting to hold off until you think you have critical mass to release a product feature but the hidden waste in this approach is the cost resulting in the delayed feedback from market and production infrastructure. This applies to technical practices as well because you simply don’t know how your stack will behave in the wild. We can try to recreate production simulations in our pipeline but production data sets, user behaviour, usage patterns, are inherently unpredictable.
There inherent cost of delay in keeping this inventory is often difficult to measure because it is hidden from plain sight. Applying rigour through economic concepts can help surface this cost of delay.
3. Unimplemented designs
I wrote about the idea of designs being tested providing lower quality data than software undergoing similar tests. Often seemingly sensible activities if not filtered through the hard lense of waste will drain a team. The tendency to iterate over software proxies (like designs) instead of investing the same energy in developing the software can quickly turn wasteful. Prototypes that undergo several iterations without being implemented not only provide diminishing returns but take away from the product delivery capacity. We may feel like we’re doing valuable work by moving things around in a wireframe and testing designs, but I posit that we have to be very careful of this activity because it is not yielding software, which is the primary measure of progress.
Visual designs are also a creepy cousin of wireframes and on the surface seem like a valuable activity and a good communication tool to convey how exactly the product should look to the engineering team. However, the same could very well be achieved by activities such as pairing and co-development. The “inventory” in these situations are usually, sketches, wireframes, visual designs, usually a couple iterations of each. We have come to accept these as normal steps in the product development workflow but upon closer examination one can find many areas where we can eliminate waste by either stopping some of these activities, changing their frequency, or changing the way they are executed. It is relevant to view supporting artifacts like these as partially done work as they are in a queue which when fully cleared ships the product. The smaller the items in the queue, the faster we ship.
4. @ignored and slow tests
Commented out or ignored tests are sad. They remind me of an abandoned, dilapidated house, which you’re forced to look at every day on your way to work. You can see the hopes and dreams of the test laid out but you can also see, nay, sense the reasons it was commented out. Perhaps it was lack of test data or an environment requirement that wasn’t met,or maybe the person who wrote it didn’t simply know what they were supposed to assert. In the end they didn’t want to delete it “just in case”, so preserved it like a mummy. Glancing at this test every time and wondering what it is, is waste.
Slow, end-to-end test that fire up Selenium and make assertions by operating at the top of the automated testing pyramid when the same result could be achieved lower in the pyramid (unit, integration) are wasteful. They’re slow because they fire up web browsers (even in headless mode), and brittle because they require the entire application to be running and functional before even simple asserts can be done.
End-to-end tests have a feeling of completeness to them, but all they give you is a false sense of security, and as soon as tests fail and the commenting/@ignoring begins, you start trusting your tests less and less, defeating the entire purpose of testing and resulting in, you guessed it, waste. The inventory here are test cases, running time of tests, ignored tests, and failed/false-positive tests that you have to keep examining to determine if they’re still needed.
My advice for testing: use end-to-end tests to check resilience and environment availability. For everything else, use unit and integration.
5. Stale Definition of Done
I could’ve picked many an option to make a process point but a stale Definition of Done is fresh in my memory so I’ll go with this. I saw a team review their Definition of Done at the start of sprint planning and they stared at it like it was an ancient scroll they were trying to decipher. It was written months ago and they kept “reviewing” it every sprint wondering if they needed to update it. Nobody quite new what they were reviewing or if the words even meant anything anymore, they just knew that they had to review their definition.
I didn’t interject until about five minutes into this exercise and asked whether if we had to create a new DoD today based on what we learned in the last six months, would it be different than what that thing is in front of them. They nodded vigorously and trashed the flipchart and decided to make a new one.
This had been a habit of theirs every sprint for the last 10 sprints and they spent at least 10 minutes, which resulted in 10 sprints x 10 minutes x 9 people, for a total of 900 minutes, or a full 15 man hours wasted. It may not sound like it but this stale definition of done is inventory which had to be handled every sprint and resulted in waste. Refreshing this inventory is a good example of waste reduction.
6. Definition of Ready as a gate
I’ve come across a lot of teams that use a DoR effectively by viewing it as a sanity check on whether something is ready to be developed. I recently drew this triangle to remind a team on the stuff that they may want to think about when interrogating stories and uncovering hidden assumptions.
The wastefulness happens when we treat DoRs as a gate which must be passed with the same rigour as a Definition of Done. The reality is that as you start working on something you will uncover details that will inform the work and change your perspective on how you might attack the work. The learning delay by not starting the work can compound the schedule delay when it does eventually get started.
The point here is to treat Definition of Ready with a grain of salt lest it become a gate for a team to clear before we add value. Keeping this as informal and conversational as possible has shown to work well.
7. Large backlogs
Building more cars than you can sell is the classical manufacturing example of overproduction. Applied to software, the ’extra features’ waste easily maps to building features that the customer doesn’t need. Having large backlogs with hidden assumptions is a recipe for delivering unwanted features.
Developing hypotheses of what you think your customers want, creating an experiment out of that hypotheses, and testing it is what keeps us lean and avoids delivering things nobody uses. As soon as we adopt the mindset of assuming we know what the customer wants and think it’s a matter of just building it, extra features creep in.
In many industries the business plan should be to not have a business plan and instead have a backlog of testable hypotheses which guide future direction. In the absence of this we are relying on gut feel (which BTW is very valuable) and our biases to guide product decision-making, and this can lead to nobody using whatever you’re building.
Depending on the uncertainty and competition in the industry, having a long backlog is a highly risky approach which ties you into building unvalidated features. Mellisa Perri’s Escaping the Build Trap is a must-read to avoid this.
8. Redundant application architectures
It’s not common to see an architectural approach as inventory, and I’m hoping this example, even if it doesn’t apply to the readers’s particular context, makes them keep an eye out for such patterns. N-tier architectures have transformed how we structure applications, but one often sees layering like this:
// Controller.java
public List<Employee> getEmployees() {
return service.getEmployees();
}
// Service.java
public List<Employee> getEmployees() {
return dao.getEmployees();
}
// Dao.java
public List<Employee> getEmployees() {
return entityManager.createNamedQuery(GET_EMPLOYEES).getResultList();
}
The service layer (or business layer) is not doing any processing of the data being returned by the Dao layer, and one has to question this design at least a little. If the middle layer is not applying any transformations, or doing any business processing, what is it’s value? Every time we have to return information, we have to create an “empty” layer just because of habit. This can be argued to be waste even if you subscribe to the indirection principle. I would suggest that this is closer to YAGNI.
Not for a second am I suggesting to eliminate the service layer but am encouraging engineers to question the suitability of architectural patterns to the problem being solved. These classical patterns are ingrained in engineers and it is almost muscle memory that writes this code. Frameworks like Angular provide reasonable (and opinionated) approaches to application architecture, but an engineer has to have their own opinion on the problem they are solving or else waste will follow.
9. Over-tasking
I’m split on whether tasking is a valuable activity or Lucifer manifested on a Kanban board. It ultimately depends on how valuable they are to the team, but there’s been too many instances of tasks that look like this: develop UI, implement CSS, write unit test, execute tests, etc. There’s also the associated confusion on an engineer’s face as she scratches her head and wonders if she’s actually “developed the UI”, because let’s be serious, nobody just develops the UI without touching the CSS, comes back later to develop some more markup, writes a test, edits the CSS and so on. It’s an iterative and cyclical process that when reflected in linear tasks will result in wasteful stand-up chatter, the board not reflecting the work, and most of all, confusion.
These tasks are inventory which needs to be minimized and improved. One of the reasons for creating smaller stories is so you don’t have to task and can just get the work done without having to worry about the minutia. Anytime you see an engineer analyze the task board and think even for a split second to themselves what that task is supposed to exactly mean, you are witnessing waste due to high inventory. Eliminate it.
10. Lack of release retrospectives
Doing things when they don’t need to be done is the essence here. Nobody will disagree with that idea but will involuntarily continue to do exactly the opposite. The manufacturing example goes back to the prodcution of motorcycles. A technician used to tighten the bolts on motorcyles and later another technician came to inspect and tightened it just in case. Waste. The simple solution was to put a small dot after you tighten so the next person wouldn’t have to question quality. In software, extra processing is everywhere.
Say you implement a supervised ML algorithm which works against a data set of telephone calls into a call center. During feature engineering you worked with call center employees, business managers and customers. After a few weeks of work you delivered something and a few months down the line a new call center related problem came up. If as a team or organization we are not effectively leveraging the knowledge from the previous work we have a broken system with inherent waste. The relearning in this example can take many forms: re-thinking source selection, contemplating feature engineering from scratch, algorithm selection, weighting of inputs, tuning or validation.
This is probably the most difficult type of waste to consistently address because we have a habit of moving onto the next thing and not “refactoring” or learning from our experiences. Most teams will have a retrospective at the end of some feature delivery but the mechanism of translating the learnings into reduction of future work are missing. This topic closely ties to slack in our work and how whatever we do we should inject slack into the schedule so that we can clean up, reflect, improve re-use and make maintainable whatever we just did. The mentality of moving onto the next thing without properly closing out the previous is dangerous.
11. Coupling Peer Review to Pull Requests
Pull requests are a valuable construct in a trustless asynchornous enviornment such as open source development. People work on different things at different times and face-to-face communication is hard to facilitate. Reviewing code in isolation and providing feedback asycnchronously is desired because schedules are hard to coordinate.
Using pull requests as a code review mechanism in a team working the same hours in overhead. Pull request can be used as a vehicle to merge code into develop or master branches, but the actual peer review of the code can be done using pairing or mobbing. If you’re not following those XP practices, simply calling over your colleague so she can review your code is far more effective than submitting a pull request, tagging reviewers and then working through a tool. The high bandwidth of face-to-face communication or screen sharing trumps the pull request/review model every time and avoids the delay associated with asynchronous pull requests.
12. Interrupting a programmer
“Hey, can I steal you for a second”. When a programmer is focused on something they do not want to be interrupted. It doesn’t matter how “small” the question is. Open office spaces have mistaken the ability to distract people for collaboration. One of the many downsides of such workspaces is that it is socially acceptable to tap someone who is in ‘flow’ on the shoulder to take them away from their work and focus on something that’s important to you instead. That break of concentration is costly because the context switch from whatever they were doing to what you want them to do to back to what they were doing is not cheap, especially in engineering.
Half the time programmers are struggling to maintain focus in a visually distracting setting and when they finally are able to achieve it, it’s open season on anyone interrupting them. This may seem like an “anti-agile” thing as we’re all about collaboration and face-to-face communication. However, that does not excuse poor team norms. One solution: as a team, agree to use ‘do not disturb’ flags properly. It can be a Slack status, a sticky-note, dedicated focus time or anything else the team agrees on. This is as much about respecting people’s time than it is about wastefulness.
13. Waiting for remote builds to tell the real story
Teams should strive for green tests on their local machine to always imply green tests on their Continuous Integration server. Put another way, if your build works locally it should work remotely. If there are reasons that this is not true it will result in delays in shipping as programmers will get delayed feedback from the “real” environment.
This topic is closely related to infrastructure and configuration management because it’s usually those two area that prevent this from being true. Differences in local and remote environments give programmers a false sense of success because a push doesn’t mean “done” anymore. Instead it means that they’re done their part and it’s time to cross your fingers because who knows how those end-to-end tests will work out in the wild! If they fail, see the above waste about brittle tests.
A closely related topic is pipeline ownership. Often a centrally shared team is the one owning the pipeline and configuring things for teams. This model worked well when there was homogeneity across development teams. However, in a multi-team environment where people are working with different stacks it makes far more sense for teams to own the pipeline and configure it how they see fit. The central team’s responsibility can be to empower teams by providing tools and techniques to help teams delivery what they wrote to production.
14. “I did my part”
Four words that mean nothing in software development. I am a big proponent of pairing and mobbing to instill a sense of team ownership (and quality), but there are plenty of teams out there that operate in a manner where Programmer 1 does Task A, tells Programmer 2 that they’re done. Programmer 2 starts Task B and then they figure out how to make A and B work together.
The waste on such team happens through handoffs and delays. The presence of a handoff between P1 and P2 is a sign that programmers may require an upskill or that the work is too big. Only in special cases should we be relying on someone else to do something before we call something complete (I’m not going to get into the back-end vs front-end false dichotomy but that’s where I would go with this). If there is a separate person who does QA then we have another handoff with another associated delay. Practices like pair programming, TDD, mocked backends can reduce the need for handoffs and thus reduce delays.
In cases where this is not possible and a handoff is necessitated, then the delay should be minimized and in sync with the larger system’s flow. Examples include someone outside the team being responsible for deployment, an external SME handling exploratory testing, a designer shared across teams, a peer review norm which depends on another team. The list can go for pages and one thing will be common across all items: there is a handoff and an associated delay. The preference is to eliminate the handoff, but if not possible, minimize the delay.
15. Defects
Defects result in context switching. You’re working on something and now you have to go context switch to something you were working on a while ago to figure out why the software is behaving the way it is. It’s often time-consuming to setup your environment to emulate the production environment where it happened, get the necessary test data to reproduce it, and then figure out what the implications of code changes are. There’s no shortcut here and the antidote here is entirely to prevent defects from happening.
Practices like TDD can result in zero-defect code but it requires programmer discipline and professionalism. If you’re not practicing TDD, at the very least we should treat a defect as a learning opportunity by first writing a test that reproduces this defect. Then fix the production code and then see the test go green. This way the defect stays dead and at least you’ll know through a failing test that it’s returned to zombie state.
A defect should also result in some sort of retrospective which figures out how something like this can be prevented in the future. It does not have to be a formal one hour meeting, but some sort of discussion which implicates a practice or process gap should be identified. Agile is ultimately about inspecting and adapting, and the defect is a cue to do both.