Architecture & DesignA Kaizen Approach For DevOps: How to Help Teams Find and Fix...

A Kaizen Approach For DevOps: How to Help Teams Find and Fix Their Own Problems content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Technology companies put a lot of stock into building the kind of talent that they need. They rarely complain about talent shortages; they hire and build good talent in-house that they can trust. In a world where funding is vital, it makes sense to invest in teams early and hone and develop them to suit the organisational needs. But what happens when work is not visible, people are working out of context, and inertia is pulling the organisation out of alignment?

There is a concept going around the tech industry at the moment, a way to teach organisations to find and fix what is getting in the way, a way of programming us to get good at getting better. Its name? Kaizen.

Kaizen is a Japanese word that closely translates to “change for better,” the idea of continuous improvement—large or small—involving all employees and crossing organisational boundaries. By improving standardized programmes and processes, kaizen aims to eliminate waste and make organisations leaner.

The approach was made famous by Toyota after World War II, taking elements of Japanese business practice and Western business influences, and has since spread from manufacturing to healthcare, banking, and government. It was introduced by Americans as part of the Marshall Plan to help rebuild Japanese industry quickly and effectively.

So, brief history lesson aside, what does it mean in a modern DevOps context? Well, kaizen, in a DevOps setting, is using systematic, scientific, method-based learning to explore improvement across the full value stream to improve customer/client outcomes.

Traditional means of making work visible involves working with complex systems—people interacting with organisational processes—with plentiful unhelpful handoffs between departments from planning, to development, from testing, to release, and finally to operations. Organisational charts and documented processes are all dominated by meetings. Meeting, meetings, meetings. We can solve this problem if we just have more meetings! Wrong. Looking back at this process, you have to ask: Is the organisation actually in charge or is it all an illusion of control?

The best way to fix such a complex system is to create the conditions in which the system can start to fix itself. We know this, but what we tend to find happening is that people from their own silos offer advice on how to fix the problem from their POV, but this is not a holistic view and can completely overlook the bigger picture (and really piss people from other silos off!).

So, how can we teach an organization to fix itself? If you think of achieving goals—in this case, organizational improvement—in an ideal dream world, with zero barriers, there would be a consistently upward trajectory over time from start to finish. In the real world, what happens is, after a promising start, fear and panic set in and people revert to their ‘legacy behaviours’ rather than sticking to task because the level of change is too intimidating. This results in the fresh process being aborted and work being delivered that is declared “done,” but in fact falls way short of the high-quality work the team should be producing.

The Solution

The best solution is to move away from big, wholesale improvement changes—which can be daunting and off-putting—and instead take a kaizen approach where continuous reflection and improvement are cyclical and embedded into the process throughout, not just at the end of product delivery.

Continuous reflection and improvement are cyclical
Figure 1: Continuous reflection and improvement are cyclical

As Figure 1 shows, you want to move away from “Big J” changes and introduce lots of “little j” changes throughout, more in line with that upwards trajectory we strive for. Professor David Kolb developed an experiential learning cycle in the 1970s; it is still used today and appears in a number of different formats, but essentially boils down to ‘Plan, Do, Review.’ Plan your tasks, processes, actions, basically your entire workflow. Do your work and make a note of any stumbling blocks or where work slows down. Review your work, together as peers. You are not looking for a blame culture to develop—it is important to stress that—this task is about identifying obstacles and issues that are getting in the way of you being able to do your collective job, which is to get the product ready for customers. You probably know this process as PDCA, PDSA, or something similar.

The best wisdom I received to help facilitate this review discussion is ‘www.ebi,’ which stands for ‘What worked well? Even better if….’ Use the learning from your www.ebi review process, revisit your plans, procedures, tasks, and so forth, and make changes. If it works, great! If it doesn’t, well, then you have an option to review it later on and can make changes. To amalgamate a couple of clichés, you are not going to nail it the first time. This is a marathon, not a sprint.

DevOps kaizen has three general chunks of activity that people work on, and they’re all interrelated: Planning and retrospectives, service delivery metrics, and program oversight of kaizen.

Kaizen's three chunks of activity
Figure 2: Kaizen’s three chunks of activity

Delivery Metrics

Let us look at service delivery metrics first. Organisations are awash with metrics. They have data points all over the place, but often are lacking when they have to interpret that data to deliver a complex end-to-end process. Lead time is perhaps the most obvious example, and people talk a lot about this in DevOps. As you will know, lead time has so many strings attached to it that giving an accurate number can be difficult. One approach that is often taken is to look at the true time that changing a single line of code from idea conception all the way through to the production climate takes. This includes the lead time (duration and predictability), mean time to detect, mean time to repair/fix, and quality at the source (scrap/rework).

Mean time to detect and mean time to repair might seem like odd inclusions, but how quickly you can recover from a failure that you know is going to happen, how quickly that failure can be detected, and how quickly you can detect that a failure is going to happen are crucial. This notion of quality at the source is not just saying where and how many problems you are having; it is also looking at where you are catching those problems. Making sure you have your metrics accurate helps to inform what you are looking for in the retrospectives as well as in the program oversight.

So, how do you make the work visible? That process begins in the planning and retrospectives stage. You want to shift your organisational thinking in terms of moving from vertical to horizontal value streams and looking at what impacts the end-to-end life cycle. What’s a value stream? Every transaction with a customer is a value stream, everything that has to line up with people, processes, and tools that are needed to make that point of transaction happen.

Value Streams

Think about these value streams, get everyone involved together, and map out the end-to-end process for each one. On paper or on a whiteboard works best, from my experience, because it is easier for everybody to edit. It doesn’t matter if you’re good artists; what matters is having a visual representation that demonstrates that everyone is working together towards the same goal and highlights how things actually work.

Once you have an idea of how information flows through your organisation, go back and look at what gets in the way of delivery and mark this in a different colour. Not gripes, not personal vendettas, but the real things that are stopping the organization from actually delivering: excessively long lead times, too much scrap and rework (poor specs), MTTD/MTTR times, and so on.

Once you have these hotspots, look at what countermeasures you could put in place. What are the backlog-ready, short-term and, most of all, actionable things you can do to make your work better? Focus on the small j’s, not the Big J’s. Encourage cross-department thinking. If something is improved in department A, what impact will that have on departments B and C? Is there a compromise that could work across all departments instead?

The next step is to take these countermeasures and get some agreement on how they can be done. Use the Toyota Kata style templates to make sure everyone is singing from the same hymn sheet, to get buy-in from your bosses, and look at solving the problem together.

What you will end up with is an idea of how you work and, most importantly, where things break and why; “this is why things take so long” and “this is why things fail.” And the best part is, you now have agreed on steps on what small changes you can make next, which, with everyone on board and having had a say into how things should run, should result in an improved organization: an organization that can fix itself.

The kaizen programme oversight is just that; relatively hands-off, and using more of a coaching approach than a management approach. The role of the person(s) involved is to ask the questions rather than to try and provide answers. There are three key points of this process:

  1. A willingness to make change happen.
  2. Resources to make change happen.
  3. Accountability: Is progress being made?

This approach can save money too! At a conference I attended last month, I heard about a company that was having persistent performance problems that would occur at 2pm every day. Customers would call up the contact centre, explain the problem, and in the afternoon the problem would subside. But, the problem kept cropping up every day. So, eventually, they got the war room together to try and fix it.

Working in the War Room

They had a chaotic war room process and they brought in pretty much everyone who touched it, but because of the culture, everyone was just eager to prove that it wasn’t their fault and to point the finger elsewhere. This process went on for over two weeks; the problem would occur, the war room would get together, the problem would go away briefly, and then come back the next day.

Eventually, they realised there was a problem with the database, so they called out the vendor consultant. The vendor hadn’t seen this problem before, created a workaround rather than a fix, which ultimately didn’t work.

When they sat down and worked out exactly how much this incident had cost them, it was over $1million, including war room meeting time, idling call centre agents who couldn’t work because of the impact to their customers, and this doesn’t take into account potential brand damage! Over a year, that could escalate to well over $12m; that’s money they may as well take outside and burn because it is wasted.

In time, they sat everyone down and came up with ideas together about how they can fix this process: streamlining the call centre, improving monitoring, fixing bridge issues, getting the right test environments and overseeing the automation and SDLC for code reviews or database changes. This put them on the right track to creating an organization that could identify and fix its own problems.


So, to recap using kaizen in a DevOps environment:

  1. Be intentional about establishing these program elements: planning and retrospectives, search delivery metrics, and program oversight.
  2. Bake it into your operating model. It’s lightweight enough to do as a team in stages, but you need to be dedicated to it. Improvement is not an offensive play; improvement is a mindset.
  3. Put a lot of emphasis into making the work visible. The reality is that, if you get people in the same room, you’ll be shocked at how much the organisation doesn’t know and how much people do not know what their colleagues are doing. So, making the work visible is extremely important.
  4. I cannot stress it enough: Focus on continuous improvement. Get people out of the mindset of big J’s and into the mindset of small j’s, and harness the power of all the brains you have in your company, get everybody improving and driving towards the right direction.

About the Author

Jamie Mercer is a global niche technology recruitment specialist for Pearson Frank, where he focuses on placing Java, PHP, Web, and mobile technology programmers and professionals in roles across the development cycle.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories