Read Part I: Probability in Human FactorsNone of this is to say that the current paradigm should be over-thrown, or that a qualitative HFE/UE testing paradigm is insufficient. To say so would be to misunderstand the HFE/UE paradigm itself. Additionally, we ought not dispose of the great experimental value gained through qualitative methods. To say so would be to underestimate the value of the HFE/UE paradigm. No one would suggest that we observe thousands of patients to get quantitative data in lieu of deeper qualitative data (forget the impracticality and cost prohibitive nature of doing so). Further, I would not suggest that we re-visit quantitative acceptance criteria during usability testing.
Moreover, I do not suggest that we re-insert probability into the calculus for deciding which tasks to evaluate during usability testing. Usability risk assessment should help determine the testing approach through definition of critical tasks based on severity of harm. Then, the testing should feedback into the usability risk assessment to determine acceptability based on a combination of likelihood (however defined) and severity. After all, our HFE/UE engineering goal is to reduce or eliminate use errors or problems on critical tasks.
Furthermore, we would not want to replace empirical testing with strictly analytical assessments. We are right to suggest that not all use errors can be anticipated analytically. On the other hand, the main issue with the empirical program is that not all analytically uncovered use errors are observable in small sample sizes.
So, should we simply resign ourselves to the shortcomings of our empirical paradigm? Ought we complete the risk management circle, using the human factors validation data as the empirical basis to explicitly demonstrate the acceptability of a user interface based on residual risk likelihood? Can we extract likelihood more explicitly (in a quantitative, semi-quantitative or at least an explicitly qualitative fashion)? Can we re-link a qualitative empirical paradigm with risk analysis? Can we make better decisions and better understand the safety of a device by talking openly of likelihood and quantifying our uncertainty about that likelihood?
Structured Expert Judgment (SEJ)
Accepting the current status quo, the answers to these questions are an obvious “no”. It seems that given the significant hurdles associated with statistically powered simulated use studies (e.g., practicality, applicability, cost, etc.), the industry is resolved to estimating use error probabilities through subjective expert opinion. Of course, such a strategy is not without foundation in medical device risk management. ISO 14971:2012 considers expert judgment as one of the seven available techniques for estimating probabilities.1 Additionally, Tavanti and Wood list expert opinion as a potential source of quantitative probabilities.2 But several authors warn us that such estimates could vary widely among experts, casting doubt on the ability of expert judgment to be effective. Summing up the current industry approach, Strochlic and Wiklund describe use error probability estimates as “‘shots in the dark’ that could have varied by several orders of magnitude among the individuals doing the ‘shooting.’”3 Thus, once resolved to expert judgment, the perceived inaccuracy of such judgment, and the inability to resolve discrepancies between experts are considered insurmountable. But what if the estimates of use error likelihood were not “wildly incorrect?” Or, what if, from potential multiple orders of magnitude differences among experts, some of whom are indeed likely to be “wildly incorrect,” an accurate assessment of use error probabilities could emerge? What if, “shots in the dark,” through an appropriate methodology, can be turned into “shots on goal?” In this sense, appeal to expert judgment demands appeal to structured methodologies for eliciting and combining expert judgments.
Appeal to expert judgment is appropriate “to supplement existing data when these are sparse, questionable or only indirectly applicable”.4 Attempting to elicit expert advice through informal means is not new. However, informal elicitation of subjective expert opinions is not the goal. The thesis is that structured expert judgment (SEJ) elicitation of human factors experts solves the lack of quantitative data problem in usability risk management. “Structured expert judgment refers to an attempt to subject this process to transparent methodological rules, with the goal of treating expert judgements as scientific data in a formal decision process,” according to Cooke and Goossens.5
Given the value of SEJ in situations in which direct data is lacking, it is not surprising that such a technique is applied in a variety of industries. In fact, several industries have appealed to SEJ to quantify unknown probabilities of interest in the absence of other data collection methodologies.6 Such widespread use engenders confidence in the acceptability and value of the technique in situations that demand it. However, there are several limitations associated with SEJ in general, as well as several competing SEJ methodologies. Techniques for combining multiple expert judgments are divided into two categories: Behavioral (e.g., Delphi) and mathematical aggregation. Mathematical techniques can be divided further into axiomatic or Bayesian techniques.7
For use error probabilities in the medical device industry, I explored Cooke’s Classical model, an axiomatic mathematical model for expert judgment aggregation capable of mitigating some of the practical and psychological biases associated with behavioral and Bayesian aggregation techniques.8
Choice of Cooke’s Classical model is based on a desire for what Cooke terms “rational consensus.”9 Rational consensus is achieved when a method complies with conditions consistent with a “scientific” method. According to Cooke and Goossens, the following four criteria should be met to achieve rational consensus:5
- Scrutability/accountability: The expert judgment process is open to peer review and must be reproducible.
- Empirical control: The quality of expert assessments can be assessed based on possible observations. That is, expert opinions are falsifiable in principle, in the same way scientific statements and theories are.9
- Neutrality: The methodology for combining expert assessments should not result in bias and should encourage experts to make estimates in line with their true beliefs. For this, the scoring of experts should follow a strictly proper scoring rule, in which the expert receives his or her highest score only by making estimates in line with his or her true beliefs.10
- Fairness: Experts are not judged prior to aggregating the results. Experts are assessed based only on performance.
Rational consensus is ideally suited for addressing the current status quo for estimating use error probabilities. As mentioned previously, human factors professionals assume that expert judgments of use error probabilities demonstrate high expert-to-expert variability. Such variability is believed to prohibit accurate estimate of use error probabilities. However, a properly structured expert judgment methodology meeting the criteria for rational consensus, may resolve the order of magnitude differences between different expert’s judgments and transform disparate expert judgments into a “good assessor.”
In a case study of this technique applied to human factors experts assessing use error probabilities for a drug delivery device, I demonstrate that SEJ elicitation of human factors experts can provide statistically accurate and reliable estimates of use error probabilities with uncertainty for a drug delivery device.8 As only a single case study, the results do not guarantee that SEJ elicitation, in all cases, can provide reliable estimates of use error probabilities with associated uncertainty. This case study demonstrates that there is no reason to believe that estimating use error probabilities with uncertainty for medical devices is immune to SEJ. That is, though industry has abandoned attempts to estimate use error probabilities and account for expert uncertainty due to apparent high expert-to-expert variance and presumed inaccuracies, I demonstrate that such variance does not preclude the creation of an accurate probability estimate so long as structured elicitation techniques are employed.8 In fact, it demonstrates that Cooke’s Classical model can be employed virtually, in a simple online setting, with promising results. If the appetite for mathematical SEJ models is not strong, additional research into the viability of behavioral aggregation models (e.g., Delphi technique) is encouraged. We would be remiss not to explore potentially valuable structured means of any type to estimate probabilities of use errors on critical tasks.
In the end, given that the de-emphasis of use error probabilities in usability risk management for medical devices is the result of methodological difficulties, the success of an SEJ elicitation via Cooke’s Model encourages further examination. Explicit probability estimates (even if semi-quantitative) and quantification of associated uncertainty may result in more complete and potentially more impactful usability risk assessments for medical devices. Use errors account for over one-third of all medical device failures reported to the FDA, and account for over one-half of all medical device recalls.11 Using a virtual SEJ elicitation model to improve the depth and quality of usability risk management could pre-emptively reduce the harm and cost associated with drug delivery/medical device user interface issues.
This work is derived in part from the author’s doctorate of engineering dissertation at George Washington University.
- International Organization for Standardization. Medical devices – Application of risk management to medical devices (ISO 14971:2007, Corrected version 2007-10-01). BS EN ISO 14971:2012.
- Tavanti, M. and Wood, L. (2017). “A method for quantitative estimate of risk probability in use risk assessment.” In Proceedings of the Human Factors and Ergonomics Society European Chapter 2016 Annual Conference, edited by D. de Waard, A. Toffetti, R. Wiczorek, A. Sonderegger, S. Rottger, P. Bouchner, T. Franke, S. Fairclough, M. Noordzij, and K. Brookhuis, 229-240. ISSN 2333-4959 (online). Available from http://hfes-europe.org.
- Strochlic, A. and Wiklund, M. (November 10, 2016). “Medical Device Use Error: Focus on the Severity of Harm.” MedTech Intelligence. Retrieved from https://www.medtechintelligence.com/feature_article/medical-device-use-error-focus-severity-harm/
- Meyer, M.A., Booker, J.M. (1991). Eliciting and Analyzing Expert Judgment – A Practical Guide. London: Academic Press, Ltd.
- Cooke, R.M., and Goossens, L.L.H.J. (2008). “TU Delft expert judgment data base.” Reliability Engineering & System Safety 93 (5): 657-674. https://doi.org/10.1016/j.ress.2007.03.005.
- O’Hagen, A., et al. (2006). Uncertain Judgments – Eliciting Experts’ Probabilities. West Sussex: John Wiley and Sons, Ltd.
- Eggstaff, J.W., Mazzuchi, T.A. and Sarkani, S. (2014). “The effect of the number of seed variables on the performance of Cooke’s classical model.” Reliability Engineering & System Safety 121: 72-82. https://doiorg.proxygw.wrlc.org/10.1016/j.ress.2013.07.015.
- Zampa, N.J. (2018). “Structured Expert Judgment Elicitation of Use Error Probabilities for Drug Delivery Device Risk Assessment.” D. Eng diss., George Washington University. 10841440.
- Cooke, R.M. (1991). Experts in Uncertainty: Opinion and Subjective Probability in Science. Environmental Ethics and Science Policy. New York: Oxford University Press, Inc.
- Cooke, R. M., and Goossens, L.L.H.J. (2000). “Procedures Guide for Structured Expert Judgment.” Project Report EUR 18820EN Nuclear science and technology, specific programme Nuclear Fission safety 1994-98, Report to: European Commission. Luxembourg, Euratom.
- Story, M.F. (2007). “Emerging Human Factors and Ergonomics Issues for Health Care Professionals.” In Medical Instrumentation: Accessibility and Usability Considerations, edited by Jack M. Winters and Molly Follette Story, 29-40. Boca Raton: CRC Press.