CME JOURNAL OF MAGNETIC RESONANCE IMAGING 41:1163–1164 (2015)

Commentary

On Confirmation Bias in Imaging Research

Biases are important confounders in medical research. As investigators, we seek to control for common sources of bias and report them clearly in publications. For example, patient selection bias is present in nearly all observational research, and its effect on test and treatment performance can be difficult to quantify (1). Evaluating research questions in carefully selected populations can limit the generalizability of the results if a study population is not representative of the general clinical population to whom the research question is relevant. While patient selection bias is, in general, commonly acknowledged in the research community, confirmation bias can be more insidious (2–4). Confirmation bias is a type of selection bias that occurs when a particular reference standard is required for study inclusion. This bias arises because in an observational study, the reference standard is available only in a subset of the overall population, and the subset may not be representative of the whole. EXAMPLES IN ABDOMINAL MAGNETIC RESONANCE IMAGING (MRI) Research on patients with cirrhosis and potential hepatocellular carcinoma (HCC) is a scenario in which confirmation bias is a common challenge. Patients with liver nodules suspicious for HCC may be zmanaged by a variety of techniques, including liver transplantation, resection, percutaneous ablation, transarterial embolization, external radiation, chemotherapy, or expectant management. Imaging plays a key role in optimizing therapy, and it is generally considered unethical to biopsy or resect nodules that are probably (although not definitely) benign. Thus, requiring a histopathologic reference standard biases the study population toward patients with nodules with features suspicious for HCC. This increases the prevalence of disease in the study population, while discarding large numbers of benign lesions, along with premalignant or atypical malignant lesions. In addition, some patients undergo liver transplantation based on hepatic failure rather than for potential malignancy. As a result, some benign or atypical nodules may be included in the study for reasons independent of the study question. However, the interaction between these two factors (severity of DOI 10.1002/jmri.24720 View this article online at wileyonlinelibrary.com. C 2014 Wiley Periodicals, Inc. V

hepatic failure and likelihood of malignancy) remains poorly understood. Thus, studies performed on all liver nodules found at liver explant are not generalizable to all patients at risk for HCC. There may be further bias if liver sectioning is done in correlation with antecedent imaging. The resected specimen may be scrutinized for small nodules detected at antecedent imaging but not for preoperatively undetected nodules. Compared with unblinded sectioning, this may lead to overestimation of actual sensitivity. Furthermore, if the hypothesis is related to the imaging appearance of the liver nodules, and since the imaging appearance contributes to the likelihood of liver transplantation, then the utilization of imaging features in both the hypothesis and the reference standard can make the results difficult to interpret. In some scenarios, the use of a histopathologic reference standard may be highly appropriate. For example, testing the hypothesis “What is the frequency of microvascular invasion in liver nodules with imaging features X?” against histopathology may be entirely appropriate, so long as the majority of liver nodules with features X are resected in the course of routine care. Some nodules with imaging features X may not have been confirmed for reasons such as comorbidities that prevent surgery or the patient’s desire for nonsurgical management. However, as long as the reasons for lack of confirmation are independent of the study hypothesis, the study population may still be representative of the clinical population as a whole. Many other examples of such biases can be found in imaging research. For example, visualization of meniscal tears at knee MRI typically guides the need for arthroscopy with resection or repair of the meniscus. However, requiring arthroscopic confirmation as the reference in a study of the MRI appearance of meniscal tears may bias the study population toward patients with large or obvious meniscal tears or those patients with classic symptoms warranting arthroscopy. More subtle tears that were not visualized at MRI and with few associated symptoms may not be included in such a study, artificially increasing the sensitivity and altering negative predictive value of a tested finding for predicting the presence of a tear. As another example, in peripheral vascular imaging, magnetic resonance angiography (MRA) and CT angiography (CTA) have been shown to be highly accurate for the detection and characterization of lower extremity peripheral vascular disease. As a result, at many

1163

1164

centers, digital subtraction angiography (DSA) is performed only in patients who require therapy or in whom MRA/CTA was insufficient. Thus, requiring DSA as a reference standard may bias a study population toward patients with positive MRA/CTA or severe symptoms. It is these symptoms that prompted DSA despite a negative MRA/CTA. The effect of such biases on the study population may be difficult to determine, and the conclusions reached in the study must be appropriately guarded. If the goal of the study is to make a determination regarding all patients who are evaluated by imaging for peripheral vascular disease, the presence of these biases limits the generalizability of the study results. COMPOSITE REFERENCE STANDARD As in the above examples, requiring confirmatory testing as a reference standard is not inherently wrong. However, it must be made clear that the study results only apply to populations similar to the study population. The core issue underlying confirmation bias is whether the study population is representative of the larger clinical patient population. For this reason, some authors choose a heterogeneous, “composite” reference standard to maximize the inclusiveness of their studies’ inclusion/exclusion criteria, in an attempt to make the study results as generalizable as possible. For example, in HCC research a composite reference standard may include histopathology for those nodules in which it is available, and follow-up imaging for others. Follow-up imaging is used to determine whether a nodule is benign vs. malignant based on interval changes in size and/or appearance. While this reference standard is more inclusive than histopathology alone, challenges arise in categorizing nodules that have undergone therapy without biopsy or those that did not have sufficient follow-up for categorization, and thus no reference. If the number of such nodules is relatively small, they may be excluded with the caveat that the test performance determined in the study may be inaccurate. Some authors choose to consider nodules undergoing therapy without biopsy as uniformly malignant as a “best guess” strategy. In such scenarios, it must be acknowledged that the assumed prevalence of disease in the study population may be higher than in the general population, with associated statistical uncertainties. However, some statistical conclusions can still be drawn; for example “The negative predictive value of test Y is no less than. . ..” Although not always elegant, the use of composite reference standards attempts to sacrifice

Commentary

some degree of diagnostic certainty to increase the inclusiveness of the study, reduce systematic recruitment biases, and make the study results as generalizable as possible. Alternatively, test statistics can be calculated twice, once assuming that the reference standard is positive for all unconfirmed cases (for example, ablated liver lesions), and again assuming that it is negative. This method is rarely used, but can provide a range of test performance statistics, with the true test performance likely lying within those boundaries. CONCLUSION Ideal reference standards can be challenging to achieve outside the context of (and sometimes even within) randomized controlled trials. Thus, investigators must carefully balance the desire for an accurate reference standard against the confirmation biases that the use of a strict reference standard may introduce. Using composite reference standards may introduce some uncertainty into study results, but may be worthwhile to improve the generalizability of those results, and for some research questions may be more appropriate than stricter histology-based reference standards. Regardless of the reference standard selected, sources of potential bias must be clearly reported in articles and conclusions tempered appropriately. Mustafa R. Bashir, MD Abdominal Imaging Division Department of Radiology Duke University Medical Center Durham, NC, USA Claude B. Sirlin, MD University of California, San Diego, USA Scott B. Reeder, MD, PhD University of Wisconsin, Madison, USA REFERENCES 1. Carlin CS, Solid CA. An approach to addressing selection bias in survival analysis. Stat Med 2014 [Epub ahead of print]. 2. Guimond ME. Confronting confirmation bias about breast cancer screening with the four Cs. Nurs Womens Health 2014;18:28–37. 3. Doll BB, Waltz JA, Cockburn J, Brown JK, Frank MJ, Gold JM. Reduced susceptibility to confirmation bias in schizophrenia. Cogn Affect Behav Neurosci 2014;14:715–728. 4. Kukucka J, Kassin SM. Do confessions taint perceptions of handwriting evidence? An empirical test of the forensic confirmation bias. Law Hum Behav 2014;38:256–270.

On confirmation bias in imaging research.

On confirmation bias in imaging research. - PDF Download Free
42KB Sizes 3 Downloads 4 Views