This is the first of a series of articles on conducting a root cause investigation. The model applies to a Corrective Action/ Preventive Action or CAPA investigation, as well as any other type of investigation. In this premier article, we describe a model for conducting a science-based, systematic investigation leaving future articles to delve into more detail as the individual steps are explored, specific tools are highlighted, and example investigations are reviewed.
To us an investigation is directly related to measuring performance. We may measure the performance of a:
- Product being used by a customer;
- Machine performing some work;
- Test method used to evaluate work or material;
- Process (such as design and development, production, distribution, creating a marketing plan, etc.); and
- Transaction (for example, processing a customer’s order, fulfilling a maintenance request, or responding to a requisition to recruit a sales rep)
When we measure performance we hope to see a fairly steady, predictable result (Figure A). Occasionally, however, that performance drops (Figure B).
A drop in performance doesn’t happen by itself; something changed in the product, machine, test, process or transaction, and this change caused the performance to drop. To find and eliminate that change we conduct an investigation.
Figure C is our model for the investigation – we call it the Investigation Roadmap. We’ll introduce it today and explore it in greater detail in future articles. The investigation methodology consists of Seven Steps, represented by numbered boxes. Within each box icons symbolize tools that can be used with that step. The icons are not intended to be limiting. There are hundreds, if not thousands, of tools that can potentially be used during an investigation. The Seven Steps alone present a very theoretical approach. It’s the tools that make the methodology extremely practical. We’ve organized the Seven Steps into the five phases of Define, Measure, Analyze, Improve, Control (the DMAIC)methodology used by Lean Six Sigma. In fact, we have integrated concepts into our roadmap from many improvement strategies.
Step 1: Define the Performance Problem
We begin the investigation with an investment many organizations fail to make: defining the performance problem. Without a fundamental understanding of the issue investigators are doomed to wasting a tremendous amount of time and heighten the risk of failure. To gain an understanding of the problem itself we advocate describing the problem in 8 dimensions using the IS / IS NOT Diagram, which immediately places boundaries on the investigation thereby providing focus and narrowing the search for the change. Additionally, for each process being investigated we need to understand those process steps and their inputs. After all, we’re looking for a change that caused our drop in performance. The change we’re looking for may be a change in the process itself or a change in one of the inputs to a process step. We strongly prescribe describing each process through a process flow diagram modified to identify the key inputs to each process step.
Step 2: Collect Data
A second investment often overlooked is collecting data, which is vital to:
- Assure we are working with the facts (if not, our investigation will head in the wrong direction and waste very valuable time);
- Obtain new, more detailed information regarding the performance problem (we will never be handed all the information we will need); and
- Identify patterns in the data.
Identifying patterns is critical. We’re trying to solve a mystery. To find the change that caused performance to drop we look for clues, or patterns in the data. As an example, in the IS/ IS NOT diagram, we may identify that the performance problem is occurring only on Production Line 1 and not Line 2, even though the exact same product is produced on both lines. A strong pattern in this situation is the performance drop is on Line 1, and not Line 2. With this clue we narrow our search for a change to something in or around Line 1. It may be a change to Line 1 itself, to material only Line 1 uses, or some other factor which only affects Line 1.
Step 3: Identify Possible Causes
A common error with this step is people relying solely on the fishbone diagram. The fishbone is a good tool for developing a list of possible causes, however, it is only a form of brainstorming. There are other very good brainstorming tools and additional tools much more powerful than a fishbone. We’ll explore some of these in future articles. We stress using multiple tools to develop a fairly extensive list of possible causes. If, when we finish step three, the real root cause (the change) is on our list, we will be able to find it. If it’s not, we’re on our way to failure.
Step 4: Test Possible Causes
Now we reap the rewards of our Step 1 and Step 2 investments. We take one possible cause at a time and test it against the facts in our IS/ IS NOT diagram. Roughly 85 percent of the possible causes will be quickly ruled out and only a few probable causes remain for further investigation.
Step 5: Identify Technical and Systemic Root Causes
The technical root cause is the change we have been searching for; the technical reason for our drop in performance. To identify the technical root cause from the few remaining probable causes we need to do more.
- If, during testing, we made any assumptions about how the facts supported a possible cause, we now need to verify those assumptions. Some assumptions won’t be true leading additional probable causes to be eliminated.
- Finally, we may need to conduct experiments on the very few probable causes remaining. We’ve already eliminated the possible causes that don’t make sense so we won’t be wasting time and other valuable resources by experimenting on them.
Once the technical root cause is known we can use the 5 Whys to identify systemic root causes, system failures, which allowed the change to occur or failed to detect the change.
Step 6: Determine Corrective / Preventive Actions
We now determine the corrective or preventive actions. All root causes will fall into two categories,
- Someone has made a mistake, error, or omission; or
- There is too much variation in the performance of the product or process.
For the first, mistake proofing is applied to eliminate or reduce the probability of the mistake, find the defect, or mitigate the drop in performance. For the second, variation reduction and optimization techniques are used.
Recognizing that the corrective / preventive actions are themselves changes, risk mitigation is applied to reduce the probability that new problems may occur when they are implemented. Risk mitigation tools include the use of:
- Failure Mode and Effects Analysis (FMEA)
- Fault Tree Analysis (FTA)
- Design verification
- Process validation
Next, a control plan is developed to assure the performance problem does not return.
Step 7: Verify Corrective / Preventive Actions
Finally, we need to assure that the corrective / preventive actions are actually implemented and then measure the performance to assure it returns to the level it was at before performance dropped. There is also the opportunity to capture the knowledge gained through the investigation and share it with other parts of the organization and key partners so that they may take additional preventive actions.
We have summarized a systematic, science-based methodology for conducting an investigation. Future articles will explore the steps, tools, and example investigations in greater depth.