Root Cause Analysis: Turning Failures into Future DLC Success
Successful software development operations are those that continuously learn from the past to improve the future.
Agile DLC teams are typically familiar with the retrospective; a round-table review after each iteration to determine “What went well?”, “What went wrong?”, “How can we do better?”
Retrospectives are a sound way to groom effective collaboration, even source great inputs to drive improvements. That said, what you do with those inputs is what really creates the mature, high performing teams that get to work on the best innovation projects. Enter Root Cause Analysis – time is well spent in dissecting failure modes so that you can prevent them in the future.
You may have seen our other articles on Lean Thinking so, before we get to RCA, we should mention another L6S tool that plays a critical part in enabling failure prevention — Process Failure Mode Effects Analysis.
In this article, I highlight five areas critical to ensuring DevOps gets off the ground and delivers sustainable results. These can help organizations get over the cultural, communication and intellectual humps to drive DevOps success.
Process Failure Mode Effects Analysis (PFMEA)
Assuming you’ve already mapped your current state Value Stream (If not, see above link as a refresher), when transforming your Development Life Cycle (DLC) to a more effective ‘shift left’ prevention vs. break/fix model, you’ll want to dive into your failure modes.
Failure modes are effectively ‘what can go wrong’ within each step of your process and, depending on the complexity of your product architecture and the maturity of your team, there can be 1:N failure modes within each process step.
Baselining your PFMEA to capture what your team ‘knows can go wrong’ enables the flow of information that will start to drive risk prevention and feed Root Cause Analysis when a failure is incurred. Of course, you’ll need to capture mitigation steps as well.
Pivotal to effective PFMEA is the Risk Priority Number (RPN) for each failure mode. This helps you prioritize mitigation based on impact. The RPN is built from a balanced view of Severity, Occurrence and Detection and, in busy meetings, can be the order you need to ensure top priorities are met with the right focus.
Once your PFMEA is captured, it’s time to think about data and workflow. Consider tooling similar to how you govern user stories and bugs (scrum example below).
Enabling a workflow like this can provide real-time risk burn down across the organization (All levels – Enterprise-Product-Release-Iteration-Function-Resource), helping you get to value-add deployments while avoiding untimely disruptions to busy development teams from silo operations (‘Just let me code’ Vs. ‘Please mitigate risk 10.3.4 that states….’).
An integrated model minimizes burden on the delivery teams so that they can focus on delivering feature value. And, when visualized effectively, celebrates successes and increases team morale.
As the orchestration improves, it becomes easier to de-risk the business and drive customer satisfaction, keeping the auditors happy (pull vs. push) and delivery teams can focus on what they do best.
Of course, it’s software development and, just like with any innovation, new risks will inevitably appear that you won’t catch first time. That’s ok, as long as you take the time to understand what caused those failures and work them into your prevention backlog. To do that effectively, you will need to adopt Root Cause Analysis.
Root Cause Analysis (RCA)
How you conduct RCA really depends on the size of your enterprise and complexity of your business line or product. Additionally, it depends on the impact that can be measured; customer, financial, business continuity and employee matter most to leaders.
Once you’ve streamlined PFMEA into your DLC, you know what can go wrong and, when it goes wrong, how it can impact the business. Inevitably, that will happen and will trigger the need for RCA. We suggest including the following as part of your operational model:
Experienced Lead for RCA — someone who can facilitate RCA initiatives (from within your Quality Management Office or your Portfolio Team) and groom others to lead and participate in RCA initiatives. Goal: Keep RCA objective and meaningful.
Robust System — that captures incidents, as well as assigns and tracks assigned preventive and corrective actions and updates to your failure modes as new issues are uncovered. Goal: Holistic risk prevention.
Service Level Agreements (SLAs) — tied to functional effectiveness or regulatory compliance depending on your business. Goal: Drive timeliness.
Governance Cadence — review open CAPA tasks with your DLC stakeholders, driving ownership, discipline and accountability. Goal: provide view of health across your enterprise at product and functional levels.
Capture and Automate — means to capture newly identified ‘failure modes’ from RCA activities and automate them into the List of Values (LOVs) maintained in your Application Lifecycle Management (ALM) tool. Goal: continuous improvement.
SDLC Partners’ Root Cause Analysis (RCA) engine enables clients to take data through effective RCA and ensure the right initiatives are identified, prioritized, tracked and measured, delivering improved bottom line and customer outcomes while still holding the necessary functional leads accountable for implementation and timely success. Our goal is to help you meet — or better — exceed executive management expectations.
Our Engine integrates easily with most ALMs through our expansive Application Programming Interface (API), ensuring a clean data set and consistent source of truth across your product portfolio.
We have proven that effective RCA enables healthy, customer-driven innovation and energizes employees, enhances profit and delights customers.
If you would like help improving the maturity of your DLC, implementing PFMEA, RCA, or simply to learn more about our RCA Engine – contact us by email or call 412.373.1950.