Clinical Infectious Diseases INVITED ARTICLE HEALTHCARE EPIDEMIOLOGY: Robert A. Weinstein, Section Editor

Benefit-risk Evaluation for Diagnostics: A Framework (BED-FRAME) Scott R. Evans,1,2 Gene Pennello,3 Norberto Pantoja-Galicia,3 Hongyu Jiang,2 Andrea M. Hujer,4 Kristine M. Hujer,4 Claudia Manca,5 Carol Hill,6 Michael R. Jacobs,4 Liang Chen,5 Robin Patel,7 Barry N. Kreiswirth,5 and Robert A. Bonomo4; for the Antibacterial Resistance Leadership Group 1 Department of Biostatistics, and 2Center for Biostatistics in AIDS Research, Harvard T. H. Chan School of Public Health, Boston, Massachusetts; 3Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland; 4Louis Stokes Cleveland Veterans Affairs Medical Center, Case Western Reserve University School of Medicine, Ohio; 5Public Health Research Institute, New Jersey Medical School, Rutgers University, Newark; 6Duke Clinical Research Institute, Duke University, Durham, North Carolina; and 7Mayo Clinic, Rochester, Minnesota

The medical community needs systematic and pragmatic approaches for evaluating the benefit-risk trade-offs of diagnostics that assist in medical decision making. Benefit-Risk Evaluation of Diagnostics: A Framework (BED-FRAME) is a strategy for pragmatic evaluation of diagnostics designed to supplement traditional approaches. BED-FRAME evaluates diagnostic yield and addresses 2 key issues: (1) that diagnostic yield depends on prevalence, and (2) that different diagnostic errors carry different clinical consequences. As such, evaluating and comparing diagnostics depends on prevalence and the relative importance of potential errors. BED-FRAME provides a tool for communicating the expected clinical impact of diagnostic application and the expected tradeoffs of diagnostic alternatives. BED-FRAME is a useful fundamental supplement to the standard analysis of diagnostic studies that will aid in clinical decision making. Keywords. benefit-risk; diagnostics; diagnostic yield; pragmatism.

Pragmatic and systematic benefit-risk evaluations are needed to inform medical decision making. Progress has been made in the treatment setting [1–7], but less attention has been paid to diagnostic tests, with contributions limited to a few reports [8– 14]. We have developed a new approach that incorporates graphical tools for evaluating the benefit-risk balance of diagnostics. Standard evaluation of diagnostics consists of estimating sensitivity, specificity, positive/negative predictive values and likelihood ratios, and accuracy (the overall percentage that is correctly classified). Antimicrobial susceptibility testing (AST) studies [14] may also calculate very major error rates (the rate at which the diagnostic test indicates susceptibility when the reference test indicates resistance) and major error rates (the rate at which the diagnostic test indicates resistance when the reference test indicates susceptibility) [15] or, using terminology defined by Evans et al [6], susceptibility sensitivity (the probability that the test result is susceptible when the reference standard is susceptible) and resistance sensitivity (the probability that the test result is resistant when the reference standard is resistant).

Received 15 February 2016; accepted 12 May 2016; published online 18 May 2016. Correspondence: S. R. Evans, FXB-625A, Center for Biostatistics in AIDS Research and the Department of Biostatistics, Harvard T. H. Chan School of Public Health, 651 Huntington Ave, Boston, MA 02115 ([email protected]). Clinical Infectious Diseases® 2016;63(6):812–7 © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail [email protected]. DOI: 10.1093/cid/ciw329

812



CID 2016:63 (15 September)



HEALTHCARE EPIDEMIOLOGY

Although useful, these statistics have limited utility for guiding clinical decision making and do not convey the resulting impact of clinical application. For example, suppose there is a choice between 2 diagnostics: one with a higher sensitivity and one with a higher specificity. Or suppose 2 methodologies for AST are being compared: one with a higher resistance sensitivity and the other with a higher susceptibility sensitivity. Which diagnostic should be selected to optimize clinical outcomes? The answer depends not only upon the differences in the magnitudes of the sensitivities and specificities of the tests (or resistance and susceptibility sensitivities, in the case of the AST), but also upon the prevalence of disease (resistance/ susceptibility for AST) in a given region and time frame, and the relative importance of “false positive (FP)” vs “false negative (FN)” errors. Approaches that incorporate these factors and translate laboratory results into measures of the expected clinical impact of possible diagnostic applications are needed. Herein, we propose methods that will supplement standard diagnostic evaluations to help clinicians effectively evaluate and compare diagnostic alternatives. METHODS

Benefit-Risk Evaluation of Diagnostics: A Framework (BEDFRAME) is a strategy for pragmatic evaluation of diagnostics designed to supplement traditional approaches. BED-FRAME evaluates diagnostic yield and addresses 2 key issues: (1) that diagnostic yield depends on prevalence, and (2) that different diagnostic errors (FPs and FNs) carry different clinical consequences.

BED-FRAME consists of 5 steps. We describe each step and its rationale. We then illustrate BED-FRAME using an AST dataset of Acinetobacter species and imipenem [8]. Step 1: Displaying the Expected Diagnostic Yield as a Function of the Prevalence

The purpose of Step 1 is to visually display the expected clinical impact of diagnostic application in medical practice, which is lacking with traditional diagnostic evaluations. The clinical impact is summarized by the expected diagnostic yield. Diagnostic yield refers to the distribution of true positive (TP), true negative (TN), FP, and FN results, or in the case of AST, the distribution of true susceptible (TS), true resistance (TR), false susceptible (FS), and false resistance (FR) results when a diagnostic is utilized in practice. Calculations of the components of expected diagnostic yield are as follows: TP = (sensitivity) × ( prevalence of disease)/( population size) FN = (1 – sensitivity) × ( prevalence of disease)/( population size) TN = (specificity) × (1 – prevalence of disease)/( population size) FP = (1 – specificity) × (1 – prevalence of disease)/( population size) For AST:

identifying resistance while the other is better at identifying susceptibility. To illustrate these trade-offs between diagnostic alternatives, the expected between-diagnostic differences in FN and TN (or FS and TS, for AST studies) results are plotted as a function of the prevalence. Step 3: Calculate the Number Needed to Test

It is helpful to understand the number of patients upon which the diagnostics need to be applied in order to observe a betweendiagnostic difference in resulting diagnoses. In general, the number needed to treat (NNT) for positivity (NNTP) is the expected number of patients who are positive for the condition (eg, diseased) and who must be evaluated with one test vs another to result in one additional TP result (assuming that sensitivity is greater for the first vs the second test). For AST, the NNT for resistance (NNTR) is the expected number of resistant patients who must be evaluated with one test vs another to result in one additional TR result. Likewise, the NNT for susceptibility (NNTS) is the expected number of susceptible patients who must be evaluated with one test vs another to result in one additional TS result. The calculation is simple: NNTR is the reciprocal of the difference in resistance sensitivity rates; NNTS is the reciprocal of the difference in susceptibility sensitivity rates. Step 4: Plot-Weighted Accuracy as a Function of the Relative Importance

Accuracy

TS = (susceptibility sensitivity) × ( prevalence of susceptibility)/( population size) FR = (1 – susceptibility sensitivity) × ( prevalence of susceptibility)/( population size) TR = (resistance sensitivity) × (1 – prevalence of susceptibility)/( population size) FS = (1 – resistance sensitivity) × (1 – prevalence of susceptibility)/( population size) Clearly, diagnostic yield is a function of prevalence (of disease or susceptibility/resistance in AST). Prevalence can vary temporally and geographically. Thus, it is optimal to display expected diagnostic yield as a function of the prevalence for each diagnostic being evaluated. Step 2: For Comparative Diagnostic Studies, Plot the Expected BetweenDiagnostic Difference in FN and TN (or FS and TS in AST Studies) as a Function of the Prevalence

As seen in Step 1, once sensitivity and specificity (or susceptibility and resistance sensitivities for ASTs) for a diagnostic have been estimated, then one can calculate the expected diagnostic yield (and specifically RN and TN, or FS and TS for ASTs) as a function of prevalence for a fixed population size (eg, 1000 patients). When comparing diagnostics, it is important to quantify the trade-offs between them—for example, when one diagnostic is more effective at identifying disease (higher sensitivity), while the other is better at identifying nondisease (higher specificity); or in the case of AST, when one diagnostic is more effective at

Accuracy (ie, the overall percentage correctly classified) is often reported in diagnostic studies. It ranges from 0% to 100%, with higher scores indicating better accuracy. Let p = prevalence of nondisease, then: Accuracy ¼ ð pÞðspecificityÞ þ ð1  pÞðsensitivityÞ Or analogously for AST using terminology defined in Evans et al [6] and assuming p = prevalence of susceptibility: Accuracy ¼ ð pÞðsusceptibility sensitivityÞ þ ð1  pÞ  ðresistance sensitivityÞ There are 2 important challenges to the interpretation of accuracy: (1) Accuracy treats all errors as if they are equally important; and (2) because it depends on prevalence, accuracy is not generally comparable from study to study, as prevalence rates may differ between studies. To address the first challenge, the accuracy measure can be adjusted to incorporate differential error consequences (weighted accuracy). Then, to address the second challenge, weighted accuracy can be reported as a function of prevalence. Weighted Accuracy

Suppose that clinicians/microbiologists believe that failing to identify positivity/resistance (ie, a very major error in AST) is a more important error than failing to identify negativity/ HEALTHCARE EPIDEMIOLOGY



CID 2016:63 (15 September)



813

susceptibility (ie, a relatively less important error). The relative importance of a FP/resistant test result relative to a false-negative/ susceptible test result may be quantified by a weight r (note 0 ≤ r ≤ 1 if it is assumed that a false-negative/susceptible result is worse, but r could be >1 otherwise). Then: Weighted accuracy ¼ ½rpðspecificityÞ þ ð1  pÞðsensitivityÞ  ð1=ðrp þ 1  pÞÞ; or for AST; Weighted accuracy ¼ ½rpðsusceptibility sensitivityÞ þ ð1  pÞ  ðresistance sensitivityÞ ð1=ðrp þ 1  pÞÞ Weighted accuracy can be interpreted as the accuracy adjusted for the relative importance of diagnostic error costs. It ranges from 0% to 100%, with higher percentages indicating better accuracy. Weighted accuracy can be compared under varying prevalence and relative importance. Step 5: Display the Difference in Weighted Accuracy as a Function of Relative Importance and Prevalence

Diagnostic comparison depends on the prevalence and the relative importance of classification errors. Prevalence of, for example, susceptibility/resistance can vary geographically and temporally. Relative importance can depend on whether a test is used for screening vs confirmatory diagnosis [13] and the difference in clinical consequences of correct vs incorrect diagnoses. Relative importance may also be viewed as clinician/ patient-specific preference, and thus may also vary. Therefore, analyses should evaluate how a diagnostic comparison changes as these factors vary. The goal of Step 5 is to provide a visual display that compares diagnostics as prevalence and relative importance vary. Once estimates for sensitivity and specificity (or susceptibility and resistance sensitivities for AST) for a diagnostic are available, then weighted accuracy can be easily calculated for a specific prevalence and relative importance using the formula above. The difference in weighted accuracy between 2 diagnostics can then be calculated. By varying the prevalence and relative importance, one can obtain the between-diagnostic differences in weighted accuracies for various combinations of prevalence and relative importance. Diagnostic comparisons are summarized by plotting difference contours of weighted accuracy as a function of the relative importance and prevalence. EXAMPLE

We illustrate BED-FRAME using data from an AST study of Acinetobacter species and imipenem. Acinetobacter species is a gram-negative pathogen that can cause pneumonia as well as skin, soft tissue, and bloodstream infections among critically ill patients. In the United States, approximately 12 000 cases of multidrug-resistant (MDR) Acinetobacter species infections occur and are associated with approximately 500–750 deaths per year. Approximately 63% of infections are MDR [16]. Treatment options include 814



CID 2016:63 (15 September)



HEALTHCARE EPIDEMIOLOGY

carbapenems ( preferred) and colistin, a polymyxin, or tigecycline (used as a last resort). Platforms for Rapid Identification of MDR-gram negative bacteria and Evaluation of Resistance Studies was a blinded evaluation of 2 rapid molecular diagnostic platforms ( polymerase chain reaction/electrospray ionization mass spectrometry [PCR/ESI-MS] and molecular beacons [MB]) for discriminating resistance/susceptibility to imipenem against Acinetobacter species. Altogether, 200 isolates were evaluated, of which 97 (48.5%) were imipenem-resistant according to the reference standard of minimum inhibitory concentration determination. Few nonsusceptible/resistant (eg, intermediate) results were observed and were considered “resistant,” consistent with a clinical decision to avoid using imipenem when this result occurs. Susceptibility/resistance determinations were based on the absence/ presence of 7 genes: blaOXA-23,-40,-58 and blaNDM,-KPC,-VIM, and -IMP. Resistance sensitivity for PCR/ESI-MS was 96% (93/97) with a 95% confidence interval (CI) of 91%–99%; resistance sensitivity for MB was 89% (86/97) with a 95% CI of 81%– 94%. Susceptibility sensitivity for PCR/ESI-MS was 83% (86/ 103) with a 95% CI of 74%–89%; and susceptibility sensitivity for MB was 92% (95/103) with a 95% CI of 85%–97% [6–8]. Step 1: Display the Expected Diagnostic Yields as a Function of the Susceptibility Rate

The expected diagnostic yield (raw yield based on 12 000 Acinetobacter infections annually in the United States) for each platform is displayed using an expected diagnostic yield profile plot with slide rule (Figure 1). When the imipenem susceptibility rate (ie, the prevalence) is 40%, the expected yield is as follows: TR = 6408, FR = 384, TS = 4416, and FS = 792 for MB; and TR = 6912, FR = 816, TS = 3984, and FS = 288 for PCR/ESIMS. Thus, the between-platform difference in expected yield is 504 more TR for PCR/ESI-MS (also, 504 more FS for MB) and 432 more TS for MB. When the susceptibility rate is 60%, the difference in expected yield is 336 more TR for PCR/ESI-MS and 648 more TS for MB. Step 2: Plot the Between-Platform Difference in TS and FS as a Function of the Susceptibility Rate

The expected between-diagnostic differences in the raw numbers of (1) TS diagnoses and (2) FS diagnoses are plotted as a function of the susceptibility rate to illustrate platform tradeoffs (Figure 2). PCR/ESI-MS has higher resistance sensitivity and thus will have fewer FS results. MB has higher susceptibility sensitivity and thus always has higher TS. Step 3: Calculate the NNTS and NNTR

The NNTS is 1/(0.92 – 0.83) = 11.1 (MB vs PCR/ESI-MS), implying that if 11 TS patients were tested, we expect that MB would correctly classify one more than PCR/ESI-MS. The NNTR is 1/(0.96 – 0.89) = 14.3 (PCR/ESI-MS vs MB), implying that if 14 TR patients were tested, we expect that PCR/ESI-MS would correctly classify one more than MB.

Figure 2. If the molecular beacon (MB) and polymerase chain reaction/electrospray ionization mass spectrometry (PCR/ESI-MS) platforms were utilized on 12 000 annual Acinetobacter infections, the figure displays the expected between-platform differences in the raw numbers of (1) true susceptible (TS) diagnoses and (2) false susceptible (FS) diagnoses as a function of the susceptibility rate to illustrate between-platform trade-offs.

importance by platform when the susceptibility rate = 40%. For comparison, the weighted accuracy of a “random” test (ie, a flip of a fair coin) having 50% resistance sensitivity and 50% susceptibility sensitivity (black horizontal line at 50%) as well as for a test whose result always indicates resistance are also displayed. Step 5: Display the Difference in Weighted Accuracy as a Function of Relative Importance and Susceptibility Rate

Figure 1. Assuming that the molecular beacon (MB) and polymerase chain reaction/ electrospray ionization mass spectrometry (PCR/ESI-MS) platforms were utilized on 12 000 annual Acinetobacter infections, the slide-rule profile plots indicate the expected diagnostic yield (the expected distribution of true susceptible [TS], true resistance [TR], false susceptible [FS], and false resistance [FR) results) as a function of the imipenem susceptibility rate.

Step 4: Plot the Weighted Accuracies as a Function of the Relative Importance

Suppose that failing to identify resistance is considered a more important error than failing to identify susceptibility. Failing to identify susceptibility implies unnecessary exposure to a polymyxin when imipenem (less toxic and perhaps more effective) could have been used. Failing to identify resistance implies treating with imipenem, an ineffective antibiotic, instead of a polymyxin, which, although more toxic, may be more effective. (Note: The relative importance would ultimately depend on the relative effectiveness and toxicity of one agent vis-a-vis another.) Figure 3 displays the weighted accuracy vs relative

Figure 4 displays a comparison (PCR/ESI-MS vs MB) of weighted accuracy as a function of the relative importance and the susceptibility rate. The green area indicates combinations of relative importance and susceptibility rates where PCR/ESIMS is favorable; the red area indicates combinations where MB is favorable. The solid black line indicates where PCR/ ESI-MS and MB are equivalent. Contours indicate the magnitude of weighted accuracy differences. Consider the 3 points labeled A, B, and C. Point A (relative importance =100% [ie, errors are equivalent, and susceptibility rate = 60%]) results in approximately 3% higher weighted accuracy for MB. Point B (relative importance = 50% [ie, FR is half as important as FS, and susceptibility rate = 80%]) results in approximately 4% higher weighted accuracy for MB. Point C (relative importance = 50%, and susceptibility rate = 30% [ie, resistance outbreak]) results in approximately 4% higher weighted accuracy for PCR/ESI-MS vs MB. In summary, MB provides better performance when the susceptibility rate and relative importance are high. PCR/ESI-MS provides a better alternative as the susceptibility rate decreases. HEALTHCARE EPIDEMIOLOGY



CID 2016:63 (15 September)



815

Figure 3. Assuming that the imipenem susceptibility rate is 40%, the figure displays the weighted accuracy (overall percentage correctly classified, adjusted for the relative importance of a false-resistant test result relative to a false-susceptible test result) as a function of the relative importance. Results are presented for 4 tests: (1) polymerase chain reaction/electrospray ionization mass spectrometry (PCR-ESI-MS); (2) molecular beacons (MB); (3) a test that always indicates resistance; and (4) a random test (equivalent to flipping a coin).

DISCUSSION

AST is frequently ordered in medical practice to help inform treatment decisions. An important challenge when evaluating diagnostic alternatives is weighing the risks and benefits

Figure 4. The figure displays the between-platform difference (molecular beacons [MB] vs polymerase chain reaction/electrospray ionization mass spectrometry [PCR/ ESI-MS]) in weighted accuracy (overall percentage correctly classified, adjusted for the relative importance of a false-resistant test result relative to a false-susceptible test result) for various combinations of relative importance and imipenem susceptibility rates.

816



CID 2016:63 (15 September)



HEALTHCARE EPIDEMIOLOGY

associated with false susceptibility vs FR results. Unfortunately, clinicians often interpret test results without systematic and structured consideration of the risks and benefits. Instead, clinicians often rely on their intuition when weighing the benefits and risks of diagnostic alternatives. Until this void is filled with a more systematic evaluation framework, enumerations will remain academic. If rapid AST diagnostics are to be an effective part of antimicrobial stewardship, then clinicians need methods that systematically evaluate diagnostic alternatives by weighing how well they identify resistance and susceptibility and the resulting clinical impact of diagnostic application. The clinical impact of an AST diagnostic depends not only on its ability to correctly discriminate susceptibility from resistance, but also on the prevalence of susceptibility/resistance and the relative importance of a false susceptibility vs FR result. BED-FRAME provides a tool for communicating the expected clinical impact of diagnostic application and the expected trade-offs between diagnostic alternatives. It systematically evaluates and compares diagnostics as the prevalence and the relative importance of potential errors vary. The prevalence of, for example, susceptibility/resistance can vary geographically (even within the same city or across different wards of the same hospital) and temporally. Thus, access to local surveillance data is important. The evaluation and selection of relative importance (of a false susceptibility vs a FR result) is challenging. Surveys can inform population perspectives regarding the relative importance. However, a more structured evaluation includes comparing clinical consequences (ie, expected efficacy, toxicity, and quality of life) resulting from treatment decisions based on correct vs incorrect diagnostic results. For example, suppose that one was evaluating the importance of a false susceptibility result to a specific antibiotic in a setting of resistance for this antibiotic. One could compare the clinical consequences of actions taken based on a false susceptibility result (efficacy, toxicity, resistance implications) vs the clinical consequences of actions that would be taken if the diagnosis was correct. If there was a treatment alternative that provided for considerably greater efficacy in the resistant setting, then the importance of the false susceptibility would be large. However, if no such treatment alternative existed, such that efficacy is unlikely to be improved with a correct diagnosis, then the importance of a correct diagnosis is less. In the example AST study of Acinetobacter species and imipenem, most clinicians may err on the side of FR and overtreat rather than miss resistance and undertreat (ie, false susceptibility is a more important error than false resistance) given the seriousness of disease. However, this decision depends upon the effectiveness of the treatment alternatives resulting from the diagnostic evaluation. Consider the importance of a false susceptibility to imipenem. For example, if the colistin or

polymyxin alternative is not much better than imipenem against imipenem-resistant organisms, then the importance of a false susceptibility is diminished, as there is not an effective alternative. But if colistin or polymyxin is much more effective than imipenem in imipenem-resistant organisms, then false susceptibility is a very important error. A contrast in the consequences on resistance may also be informative. BED-FRAME is a strategy for pragmatic evaluation of diagnostics designed to aid in clinical decision making, providing a tool for communicating the expected clinical impact of diagnostic application and the expected trade-offs of alternatives. BEDFRAME is a useful fundamental supplement to the standard analysis of diagnostic studies. Notes Disclaimer. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health (NIH). Financial support. This work was supported by the National Institute of Allergy and Infectious Diseases of the NIH (award number UM1AI104681). Potential conflicts of interest. R. P. reports grants from nanoMR, BioFire, Check-Points, Curetis, 3M, Merck, Hutchison Biofilm Medical Solutions, Accelerate Diagnostics, Allergan (Pfizer), and The Medicines Company; is a consultant to Curetis, Roche, St Jude, Thermo Fisher Scientific, and Diaxonhit; has a patent on Bordetella pertussis/parapertussis polymerase chain reaction with royalties paid by TIB MOLBIOL, a patent on a device/method for sonication with royalties paid by Samsung to Mayo Clinic, and a patent on an antibiofilm substance issued; receives travel reimbursement and an editor’s stipend from American Society for Microbiology; and receives honoraria from the United States Medical Licensing Examination®, UpToDate, and the Infectious Diseases Board Review Course. All other authors report no potential conflicts. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.

References 1. Gelber RD, Gelman RS, Goldhirsch A. A quality-of-life oriented endpoint for comparing treatments. Biometrics 1989; 45:781–95. 2. Chuang-Stein C, Mohberg NR, Sinkula MS. Three measures for simultaneously evaluating benefits and risks using categorical data from clinical trials. Stat Med 1991; 10:1349–59. 3. Follmann D. Regression analysis based on pairwise ordering of patients’ clinical histories. Stat Med 2002; 21:3353–67. 4. Pocock SJ, Ariti CA, Collier TJ, Wang D. The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. Eur Heart J 2012; 33:176–82. 5. Egan JP. Signal detection theory and ROC analysis. New York: Academic Press, 1975. 6. Evans SR, Rubin D, Follmann D, et al. Desirability of outcome ranking (DOOR) and response adjusted for duration of antibiotic risk (RADAR). Clin Infect Dis 2015; 61:800–6. 7. Evans SR, Hujer AM, Jiang H, et al. Rapid molecular diagnostics, antibiotic treatment decisions, and developing approaches to inform empiric therapy: PRIMERS I and II. Clin Infect Dis 2016; 62:181–9. 8. Evans SR, Jiang H, Hujer AM, et al. Informing antibiotic treatment decisions: evaluating rapid molecular diagnostics (RMDs) to identify susceptibility and resistance to carbapenems against Acinetobacter spp. PRIMERS–III. In: ID Week 2015, San Diego, CA. Poster 1604, Abstract 52116, Session 227. 9. McIntosh MW, Pepe MS. Combining several screening tests: optimality of the risk score. Biometrics 2002; 58:657–64. 10. Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostatistics 2005; 6:227–39. 11. Green DM, Swets JA. Signal detection theory and psychophysics. New York: Wiley, 1966. 12. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006; 26:565–74. 13. Baker SG. Putting risk into perspective: relative utility curves. J Natl Cancer Inst 2009; 101:1538–42. 14. Clark RB, Lewinski MA, Loeffelholz MJ, Tibbetts RJ. Cumitech 31A: verification and validation of procedures in the clinical microbiology laboratory. Sharp SE, ed. Washington, DC: ASM Press, 2009. 15. US Department of Health and Human Services, Food and Drug Administration, Centers for Devices and Radiological Health. Guidance on review criteria for assessment of antimicrobial susceptibility devices, 1991. Available at: http://www. fda.gov/ohrms/dockets/98fr/000109gd.pdf. Accessed 9 April 2016. 16. US Department of Health and Human Services, Centers for Disease Control and Prevention. Antibiotic resistance threats in the United States, 2013. Available at: http:// www.cdc.gov/drugresistance/pdf/ar-threats-2013-508.pdf. Accessed 11 February 2016.

HEALTHCARE EPIDEMIOLOGY



CID 2016:63 (15 September)



817

Benefit-risk Evaluation for Diagnostics: A Framework (BED-FRAME).

The medical community needs systematic and pragmatic approaches for evaluating the benefit-risk trade-offs of diagnostics that assist in medical decis...
352KB Sizes 2 Downloads 13 Views