The Power and the Peril of Large Administrative Databases EARLIER this year, the National Cancer Institute (NCI) announced that it had removed all prostate specific antigen (PSA) data from the SEER (Surveillance, Epidemiology and End Results) and SEER-Medicare programs. The SEER program was initiated by the NCI in 1973, and collects cancer incidence and survival data from 17 population based cancer registries across the United States, representing roughly 28% of the U.S. population.1 The SEER-Medicare dataset links the cancer information in SEER to administrative claims data for patients in SEER covered under the Medicare program.2 The NCI removed the PSA data from these datasets after routine quality control activities revealed that a “substantial number of registryreported PSA values were incorrect.”3 While the extent of the problem is not clear, one must assume it is significant given the NCI’s drastic actions. The NCI is now reviewing the entire dataset and implementing protocols to ensure the quality of the PSA data in the future. They are hopeful that, at some point, the PSA data will be validated and added back into SEER. In the interim the revelation that the current PSA data in SEER are flawed and potentially invalid will have 2 major impacts on the field of prostate cancer research. First, ongoing analyses using SEER and SEER-Medicare which include PSA data will have to be redesigned in light of the problems with these data. Simply put, journals will not be able to accept SEER studies that rely on the PSA data as a primary variable of interest, including those that use PSA in risk stratification systems to adjust for confounding or in cohort identification. This effect is relatively straightforward and should not cause great problems in the field going forward. The greater problem, however, is the impact of the flawed PSA data on the existing urological literature. To date, SEER and SEER-Medicare data have been used to address a wide variety of clinical questions in prostate cancer, ranging from issues around screening4 to changes in treatment patterns5 to the effectiveness of treatment for localized and advanced disease,6,7 just to name a few. Given the problems with the PSA data in SEER, how are

0022-5347/15/1941-0010/0 THE JOURNAL OF UROLOGY® Ó 2015 by AMERICAN UROLOGICAL ASSOCIATION EDUCATION

10

j

www.jurology.com

AND

RESEARCH, INC.

we to decide if the results presented in these reports and others are valid? If this question isn’t troubling enough, consider this d if the data from SEER (one of the oldest and best known population based cancer registries in the world administered by the United States’ leading cancer research agency) have been called into question, isn’t there a danger that data from other large administrative datasets (like Medicare or Medicaid, the Agency for Healthcare Research and Quality’s Healthcare Cost and Utilization Project, or the American College of Surgeons’ National Cancer Database or National Surgical Quality Improvement Program) could also be problematic and possibly flawed? Both of these impacts should force us to reconsider how we use these datasets in the future. There is no doubt that these datasets have incredible value when answering difficult clinical and health care policy questions. They benefit from having realworld longitudinal data from large numbers of patients which are highly generalizable. These datasets often allow us to look at questions that we could never answer in longitudinal prospective studies of patients with prostate cancer. However, when researchers attempt to use these datasets just because they are publically available and large, to answer questions that the datasets are not welldesigned to address, we run into real problems. Consider the unique strengths of the SEERMedicare dataset. The basic information concerning cancer incidence and treatment is likely highly reliable because its collection is legislatively mandated. In addition, resource utilization information is also likely to be valid and highly reliable as it is linked to Medicare reimbursement. Studies that focus on cancer incidence and resource utilization leverage these strengths and likely advance the field. A good example of this is the recent report by Schroeck et al which explored the diffusion of new technologies, such as robotic surgery and intensity-modulated radiotherapy, and its effect on PSA testing.8 It is critical to note that the authors did not use the absolute PSA values in this study but rather just whether or not the test was performed. Because Medicare data reflect payment for

http://dx.doi.org/10.1016/j.juro.2015.05.002 Vol. 194, 10-11, July 2015 Printed in U.S.A.

ACCURACY OF LARGE ADMINISTRATIVE DATASETS

the test, the data are highly likely to be correct and, in turn, the findings of the study are probably valid. The problems arise when we ask these datasets to address questions they are not designed to do. For example, there have been a series of studies using SEER and SEER-Medicare data which explore the comparative effectiveness of various treatments for localized prostate cancer.7,9 The problem with these studies is that they rely on clinical characteristics (such as PSA values and comorbidity indices) that these datasets are not truly designed to collect. After all, these data points are not necessary for payment and the law does not mandate their collection. Other researchers have used billing information in Medicare as a proxy for patient reported outcomes, which is fraught with problems.10 In fact, I would be remiss if I didn’t mention that I am also guilty of asking these data to do more than they are capable, having used the clinical PSA data in SEERMedicare to identify low risk patients with prostate

11

cancer in an effort to assess changes in rates of active surveillance.11 Clearly, we have to rethink how we use these data. These large administrative datasets have tremendous value for our field if we use them properly, and that is the key. We have to stop publishing secondary data analyses from these large administrative datasets just because the data are relatively easy to obtain and analyze. We need to reserve these datasets for research questions which they can answer in a valid and reliable manner. We cannot ask these datasets to answer questions that they are not capable of answering. In that situation we have to do the really hard work and collect primary data. It’s time for us to stop doing big data fishing expeditions and taking the easy way out. David F. Penson Department of Urologic Surgery Vanderbilt University VA Tennessee Valley Geriatric Research, Education and Clinical Center Nashville, Tennessee

REFERENCES 1. National Cancer Institute: Surveillance, Epidemiology, and End Results Program: Overview. Available at http://seer.cancer.gov/about/over view.html. Accessed May 4, 2015. 2. National Cancer Institute: Healthcare Delivery Research: Seer-Medicare Linked Database. Available at http://healthcaredelivery.cancer.gov/ seermedicare/. Accessed May 4, 2015. 3. National Cancer Institute: Surveillance, Epidemiology, and End Results Program: PSA Values and SEER Data. Available at http://seer.cancer. gov/data/psa-values.html. Accessed May 4, 2015. 4. Etzioni R, Tsodikov A, Mariotto A et al: Quantifying the role of PSA screening in the US

prostate cancer mortality decline. Cancer Causes Control 2008; 19: 175. 5. Filson CP, Schroeck FR, Ye Z et al: Variation in use of active surveillance among men undergoing expectant treatment for early stage prostate cancer. J Urol 2014; 192: 75. 6. Satkunasivam R, Kim AE, Desai M et al: Radical prostatectomy or external beam radiation therapy versus no local therapy for survival benefit in metastatic prostate cancer: a SEER-Medicare analysis. J Urol 2015; Epub ahead of print. 7. Gandaglia G, Sammon JD, Chang SL et al: Comparative effectiveness of robot-assisted and open radical prostatectomy in the postdissemination era. J Clin Oncol 2014; 32: 1419.

8. Schroeck FR, Kaufman SR, Jacobs BL et al: Technology diffusion and diagnostic testing for prostate cancer. J Urol 2013; 190: 1715. 9. Wong YN, Mitra N, Hudes G et al: Survival associated with treatment vs observation of localized prostate cancer in elderly men. JAMA 2006; 296: 2683. 10. Hu JC, Gu X, Lipsitz SR et al: Comparative effectiveness of minimally invasive vs open radical prostatectomy. JAMA 2009; 302: 1557. 11. Ritch CR, Graves AJ, Keegan KA et al: Increasing use of observation among men at low risk for prostate cancer mortality. J Urol 2015; 193: 801.

The Power and the Peril of Large Administrative Databases.

The Power and the Peril of Large Administrative Databases. - PDF Download Free
72KB Sizes 3 Downloads 4 Views