Developing EvidenceB a s e d Sc re e n i n g Recommendations, with Consideration for Rheumatology Ned Calonge,

MD, MPH

a,b,c,

*

KEYWORDS  U.S. Preventive Services Task Force  Rheumatoid arthritis  Screening  Evidence-based recommendations KEY POINTS  Screening for preclinical rheumatic disease may improve health by enabling treatment to start before clinical symptoms occur.  Potential health harms associated with screening programs, such as the harms associated with false-positive tests and overdiagnosis, must be weighed against health benefits.  The U.S. Preventive Services Task Force (USPSTF) uses an explicit process to create evidence-based screening recommendations that assess net benefit, or benefits minus harms.  Screening tests recommended by the USPSTF are provided with first dollar coverage under the Affordable Care Act.

INTRODUCTION

In the clinical prevention world, screening for the early detection of disease is categorized as secondary prevention, involving interventions that are implemented after the asymptomatic onset of biologic disease, but before the progression to symptomatic disease that would be diagnosed through the usual health care approach. Conceptually, intervening early leads to better health outcomes than does waiting until the disease manifests clinically. Good examples of effective screening modalities exist in the chronic disease arena: screening for cervical cancer with cytology can almost eliminate deaths from this cancer,1 and screening for and treating hypertension

Disclosure: None. a The Colorado Trust, 1600 Sherman Street, Denver, CO 80111, USA; b Department of Family Medicine, University of Colorado School of Medicine, Mail Stop F-496, Academic Office 1, 12631 East 17th Avenue, Aurora, CO 80045, USA; c Department of Epidemiology, Colorado School of Public Health, Campus Box B119, 13001 E. 17th Place, Aurora, CO 80045, USA * The Colorado Trust, 1600 Sherman Street, Denver, CO 80111. E-mail address: [email protected] Rheum Dis Clin N Am 40 (2014) 787–795 http://dx.doi.org/10.1016/j.rdc.2014.07.016 rheumatic.theclinics.com 0889-857X/14/$ – see front matter Ó 2014 Elsevier Inc. All rights reserved.

788

Calonge

significantly reduces the occurrence of atherosclerotic disease in the coronary and cerebral arteries in nearly all adult age groups.2 A broader definition of screening, brought to the forefront of medicine by the current interest in genetic testing, involves screening for the risk of disease and intervening even before the biologic onset of a condition. This new technology will continue to bring new challenges for decision making in screening programs. Screening for asymptomatic or preclinical rheumatic conditions has the potential for providing improved disease outcomes, with the early treatment of disease using safe and effective therapeutics started before symptomatic tissue damage. However, even though screening is conceptually appealing, as it is in other conditions, it has downsides in terms of the potential for harm, which is associated with almost all medical interventions. Screening benefits must be weighed against harms when deciding to offer screening. The benefits of screening are straightforward: early intervention leads to earlier treatment and better health outcomes. On the other hand, 4 or 5 categories of potential harms exist. A test can be falsely negative, creating false reassurance and the potential to delay the diagnosis of a treatable condition when patients ignore symptoms. The test may be falsely positive, arguably the most important negative outcome, leading to unnecessary and potentially harmful diagnostic tests, treatment, and labeling. The test could overdiagnose disease that does not require treatment. In this scenario, the test result is a true-positive, but detects disease that would not progress, meaning that any treatment is unnecessary. Overdiagnosis is a critical harm associated with screening for prostate cancer, in which most cancers are indolent and would never impact the patient’s health,3 and screening for breast cancer, in which most ductal carcinoma in situ does not benefit from treatment.4 Finally, the test can be correct, but early detection may have no real benefit, and therefore screening consumes resources, increasing the cost of medical care without health benefit. In addition, harms may be associated with the test itself, such as additional radiation exposure with x-ray screening, which could, over a lifetime of screening, increase the risk of disease, or, in the case of a somewhat invasive test such as colonoscopy, the test may carry the risk of adverse outcomes, including hemorrhage and perforation. The evidence bar tends to be set high for screening, more so than for other medical interventions, because the population targeted is asymptomatic. If clinicians are going to medically intervene with people who are otherwise well and clinically manifesting no illness, they should only do so based on strong evidence showing that the benefits well outweigh the harms. FRAMEWORK FOR DISEASE SCREENING

In creating a framework for deciding whether to implement a disease screening program, Wilson and Jungner5 created a list of critical criteria to assess. Paraphrasing this sentinel article, these criteria include the following queries:  Is the disease an important health problem (in terms of severity and incidence)?  Does the disease have a recognizable presymptomatic stage that lasts long enough to allow for screening, diagnosis, and treatment?  Are acceptable and reliable screening tests available for the presymptomatic stage?  Does treatment of the disease during the presymptomatic stage result in improved outcomes?  Do sufficient resources exist for diagnosing and treating the population with positive screening results?

Evidence-Based Screening Recommendations

A group of evidence-based medicine researchers reframed these criteria for today’s evidence-based approach in 2011,6 stating that the fundamental question to answer when determining whether to screen should be whether the program, if implemented under present conditions, would result in sufficient net benefit (benefits minus harms) to justify starting (or continuing) the program given the level of resources required. This determination should include an evidence-based consideration of several factors, including the probability of an adverse health outcome without screening, the degree to which screening identifies all those who would experience the adverse health outcome, and the magnitude of incremental health benefit conferred by earlier treatment initiated as a result of screening. Also important in the assessment of a screening test are the frequency of false-positive tests; the experiences of people with false-positive results, including the frequency and severity of workup and treatment; and the frequency and severity of outcomes associated with overdiagnosis. PROCESS OF THE UNITED STATES PREVENTIVE SERVICES TASK

These considerations guide the process of the U.S. Preventive Services Task Force (USPSTF), an independent panel of nationally recognized, nonfederal experts experienced in primary care, disease prevention, evidence-based medicine, and research methods. The USPSTF, which is hosted by the Agency for Health Research and Quality, was created in 1985 and charged by Congress to review the scientific evidence for clinical preventive services and develop evidence-based recommendations for the health care community. Under section 2713(a)(1) of the Affordable Care Act,7 the recommendations from the USPSTF have taken on greater importance; recommendations that are graded A or B (described in detail later) will involve services covered by all insurers, with no additional out-of-pocket costs to the patient (ie, first dollar coverage). The USPSTF uses an explicit process8,9 that can be categorized into specific steps: 1. Define the question about the provision of a preventive service within an analytical framework and using a set of key questions. 2. Define, retrieve, and review the relevant evidence; judge the quality of individual studies; and summarize the evidence for each key question. 3. Synthesize and judge the strength or adequacy of the body of evidence for each, and across all, key questions. 4. Determine the magnitude of net benefit10 (the balance of benefits and harms). 5. Judge the certainty of net benefit.10 6. Link magnitude and certainty of net benefit to a recommendation statement/letter grade. The steps are implemented in 2 integrated activities, one being a systematic evidence review (SER) that addresses the literature search, study quality rating, and summary of the evidence on the magnitude of benefits and harms. The other process, the recommendation process, takes the SER results, assesses the strength of evidence, and determines the magnitude and certainly of net benefit and the corresponding letter grade/recommendation statement. Step 1: Define the Analytical Framework and Key Questions

The analytical framework defines the clinical scenario for the topic—who are you screening, how, for what condition, in what setting, and for what purpose—and creates a set of key questions that guide the SER and decision making. A generic

789

790

Calonge

analytical framework for evaluating a screening test is presented in Fig. 1, in which the numbers refer to the list of key questions: 1. Does direct evidence show that providing the service improves health outcomes if implemented in a general primary care population? Outside of treatment scenarios, sufficient direct evidence is rarely available. If insufficient evidence is available to answer this question, the remaining framework allows the creation of a chain of evidence to support a recommendation. 2. Can a population at risk and/or at increased risk be identified? Higher-risk populations have a higher rate of disease, and this translates to better screening test performance (specifically, fewer false-positives). 3. Does screening reliably lead to preclinical/earlier detection? This question deals with the screening test utilities of sensitivity, specificity, and predictive values of positive and negative tests. 4. Does treatment of screening-detected disease lead to improvement in important health outcomes, including mortality or morbidity? This critical question was often not satisfactorily addressed before the widespread adoption of some screening tests, as evident with prostate-specific antigen testing for prostate cancer. 5. Are harms associated with specific parts of the screening process, including risk identification, screening itself, confirmatory diagnosis for people with positive tests, and treatment? Step 2: Find, Review, Judge the Quality of, and Summarize the Available Evidence

Rating the quality of the studies included in the SER requires assessment and judgment based on explicit research design criteria. The USPSTF rates studies as good, fair, or poor, wherein good quality means high internal validity with outcomes confidently assigned to the factors under study; fair quality indicates some minor threats to internal validity; and poor quality indicates major threats to internal validity so that competing hypotheses regarding study outcomes cannot be confidently excluded. Step 3: Determine the Strength of Evidence

Grading the strength of evidence, which the USPSTF categorizes as inadequate, adequate, or convincing, considers more than just the quality of studies, and requires

1

General populaƟon Risk assessment

Diagnosis and treatment

Screening test

2

In Increased risk iindividuals

3

Improvement In mortality, morbidity, and/or other outcomes

Screening program

TTest posiƟvee

4

5

5

5

Harms of risk assessment

Harms of screening

Harms of diagnosis and treatment

Fig. 1. Generic analytical framework for screening for disease (numbers refer to key questions described in text).

Evidence-Based Screening Recommendations

assessment and judgment based on explicit epidemiologic research criteria captured by 6 critical appraisal questions: 1. Do the studies have the appropriate research design to answer the key question? 2. To what extent are the studies of high quality (ie, have good internal validity)? 3. To what extent are the studies generalizable to the US population (ie, have good external validity)? 4. How many studies, and of what size, have been performed to answer the key question (ie, how precise is the evidence)? 5. How consistent are the studies (ie, do they support the same conclusion)? 6. Do additional factors support the conclusions? If several high-quality studies existed with consistent, logical results generalizable to the United States, the evidence would be convincing. If most but not all criteria are addressed satisfactorily, the evidence is adequate. If the studies are conflicting or are of poor quality, individually or in aggregate, the evidence is inadequate. Note that adequate evidence is not possible with poor-quality studies, but inadequate evidence is possible with fair- or good-quality studies if they have conflicting results that cannot otherwise be explained. Step 4: Determine the Magnitude of Net Benefit

Determining net benefit involves an assessment of the available evidence on benefits and harms. The magnitude of benefit is based on the effect sizes (results) from studies in the SER. If more than one study is available, the SER process summarizes the effect sizes with meta-analysis. The magnitude of harm is also taken from the SER, although traditionally researchers have focused less on measuring and reporting harms than on reporting benefits. The magnitude of net benefit is graded as substantial, moderate, small, or zero/net harm by the USPSTF. This grading requires a judgment of the comparison between benefits and harms. For many topics this can be challenging, because the “currency” of benefits and harms is likely to be different, and how different benefits and harms are valued can vary widely. For example, within her lifetime, a woman choosing to begin screening mammography at age 40 years will reduce her risk of dying of breast cancer from approximately 30 in 1000 to 22 in 1000 (the benefit), while accepting a 50% risk of having a false-positive mammogram that will require follow-up testing11 and around a 20% risk of overdiagnosis and unnecessary treatment (the harms).4 Depending on personal values and preferences, this tradeoff may be worth it for some women. Step 5: Determine the Certainty of Net Benefit

Based on the magnitude of net benefit and the strength of evidence assessed, the USPSTF makes a judgment about the certainty of net benefit, determined again using the critical appraisal questions, synthesized across all of the key questions and the entire body of evidence. Certainty can be considered the opposite of the risk of making the wrong recommendation: either recommending for a screening test that will not improve overall individual or population health, or recommending against a test that may have important net benefit. A rating of high certainty, which has a low risk of being wrong, must be supported by consistent results from good-quality studies assessing the impact on important health outcomes. The conclusion is unlikely to be affected by future studies. A rating of moderate certainty, based on sufficient evidence, has a somewhat higher risk of being wrong, because the assessment is constrained by issues raised in the critical appraisal; here, additional information from future studies could change the conclusion. In the case of low certainty, the evidence is insufficient

791

792

Calonge

to assess the net effects, positive or negative, on health outcomes, and future studies will be needed for further assessment. Step 6: Assign a Letter Grade/Recommendation

Once the certainty and magnitude of net benefit are determined, the USPSTF links this assessment to a letter grade and a specific recommendation using a matrix presented in Table 1. A and B recommendations should be implemented in practice, and clinicians and patients can have confidence that these will improve health. D recommendations should not be implemented, because at least moderate certainty exists that they will have no net health benefit, or that the harms outweigh the benefits. C recommendations require a bit more effort. The net benefit, although positive and real, is small, and the tradeoffs between benefits and harms should be discussed with the patient, taking into account unique patient characteristics, values, and preferences. The inconclusive evidence letter grade I deserves additional comment. Compared with the other letter grades, it is a conclusion, not a recommendation. It means that evidence of effectiveness is insufficient, not that sufficient evidence shows no effectiveness. Common reasons for a grade of I include a lack of evidence on clinical outcomes, poor quality of existing studies, and good-quality studies with conflicting results. With this conclusion, a possibility of clinically important health benefit remains, but more research is needed. EVIDENCE-BASED SCREENING RECOMMENDATIONS FOR RHEUMATIC DISEASE

In considering the USPSTF process, rheumatology must address some questions when evaluating potential screening and early treatment for rheumatologic conditions:  Is the disease sufficiently described, including its natural history (including treatment), to implement effective prevention strategies?  What is the prevalence of the disease? Can a population at increased risk be determined? For example, the prevalence of rheumatoid arthritis (RA) is known to be higher among American Indians and Alaska Natives12 and their firstdegree relatives,13 and prevalence drives the population performance of any screening test; therefore, a disease with a low prevalence will generate a large number of false-positive tests, even if the test is highly specific. Conversely, in terms of the predictive value of a positive test, screening performance improves with the increased prevalence in higher-risk populations.  What important health outcomes could screening improve, and how can these be measured? For example, one could measure joint destruction through imaging and arthritis-related quality of life. Evidence that a screening strategy improves an intermediate clinical outcome, such as a blood test result, may be

Table 1 Recommendation matrix for the U.S. Preventive Services Task Force Magnitude of Net Benefit (Benefit Minus Harms) Certainty of Net Benefit

Substantial

Moderate

Small

Zero/Negative

High

A

B

C

D

Moderate

B

B

C

D

Low

I

A & B: recommend use. C: recommend against routine use. D: recommend against use. I: No recommendation; insufficient evidence.

Evidence-Based Screening Recommendations

insufficient to support a positive recommendation, unless evidence of a link to an important health outcome (morbidity or mortality) is sufficient.  Is or could there be supportive evidence from randomized controlled trials (RCTs)? Although a convincing chain of evidence can be created without an RCT, it is still the gold standard of study design and can provide the best support for a recommendation. Regardless of the design, any supportive study must be of good quality; even an RCT must account for harms, lead-time, and length-time bias, and all other sources of bias. OUTCOMES TABLES

A useful approach to evaluating a potential screening strategy, and one that the USPSTF often uses in estimating net benefit, is the construction of an outcomes table.4 For example, if 10,000 people are screened for RA, how many will benefit, how many will be harmed, what is the resource consumption, and how does all this compare with not screening? Fig. 2 presents a simplified approach to creating an outcomes table, using estimates of population prevalence and screening test utilities. The positive predictive value in this example, 34%, indicates that just more than one-third of individuals with positive test results will have RA, whereas two-thirds will have positive results but not have disease. With these estimates in hand, the next step is to determine the potential benefit and harm for each result group (Table 2):  True-positives (n 5 252): what are the expected benefits from early intervention (prevention, joint sparing, symptom improvement, quality of life) and how does this compare with waiting for clinical diagnosis? Are there harms associated with treatment that might be increased from a longer duration of treatment?  False-negatives (n 5 13): are there any harms for these individuals? Might these individuals delay seeking diagnosis and treatment because they think they are disease-free?  False-positives (n 5 487): are there any harms for these individuals? What additional testing will be required to determine that their results are in fact negative? What is the risk of unnecessary treatment? Will these individuals experience harm from labeling or increased anxiety?  Overdiagnosis: can this number be estimated? Are these individuals who will test positive and may never develop symptomatic disease, or who may have a natural history of disease that does not require preventive or other therapeutic intervention?

10,000 persons at-risk, assume lifeƟme RA risk=2.65%14 265 RA Screening test— assume: True posiƟves 252

9,735 5 no RA

9 95% sensiƟvity

95% specificity

False negaƟves True negaƟves False posiƟves 13 9,248 487

PosiƟve predicƟve value (proporƟon tesƟng posiƟve who have disease) = 34% Fig. 2. Hypothetical performance of a screening test to detect preclinical rheumatoid arthritis (RA).14

793

794

Calonge

Table 2 Hypothetical outcomes table for RA screening Outcomes

10,000 People with Usual Care

10,000 People Screened for RA

RA

265 (diagnosed clinically)

265

True-positives

252 people with early detection

True-negatives

9248 reassured they do not have RA

False-positives

487 with incorrect positive tests

False-negatives

13 with incorrect negative tests

Overdiagnosis

# not needing treatment (unknown)

This exercise provides information that a clinician can use to determine whether a given screening process is likely to result in a net health benefit to those screened. It also provides guidance on what research is needed to answer the key questions in an analytical framework designed to create an evidence-based recommendation. FUTURE CONSIDERATIONS/SUMMARY

Creating evidence-based screening recommendations is a complex process, and the process used by the USPSTF has been codified in the Affordable Care Act to the degree that the conclusions of the USPSTF have major coverage implications for insures, providers, and patients. As ongoing research discovers promising evidence for the potential of preclinical identification of individuals with rheumatologic diseases, understanding the evidence-based process and using simple computational tools, such as outcomes tables, can help rheumatology researchers provide the evidence needed to identify potentially effective screening programs, and help determine when a program may not lead to improvements in individual and population health. Further information on USPSTF methods and procedures are available on the Task Force’s Web site (www.uspreventiveservicestaskforce.org/), as are all of their recommendations and supportive SERs that provide examples of the processes outlined in this article. REFERENCES

1. U.S. Preventive Services Task Force, Moyer VA. Screening for cervical cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med 2012;156:880–91. 2. James PA, Oparil S, Carter BL, et al. 2014 evidence-based guideline for the management of high blood pressure in adults: report from the panel members appointed to the Eighth Joint National Committee (JNC 8). JAMA 2014;311(5): 507–20. 3. U.S. Preventive Services Task Force, Moyer VA. Screening for prostate cancer: U.S. Preventive Services recommendation statement. Ann Intern Med 2012; 157:120–34. 4. Miller AB, Wall C, Baines CJ, et al. Twenty five year follow-up for breast cancer incidence and mortality of the Canadian National Breast Screening Study: randomised screening trial. BMJ 2014;348:g366. 5. Wilson JMG, Jungner G. Principles and practice of screening for disease. Geneva: World Health Organization; 1968.

Evidence-Based Screening Recommendations

6. Harris R, Sawaya GF, Moyer VA, et al. Reconsidering the criteria for evaluating proposed screening programs: reflections from 4 current and former members of the U.S. Preventive Services Task Force. Epidemiol Rev 2011;33(1):20–35. 7. H.R. 3590, One Hundred Eleventh Congress of the United States of America At the Second Session, “An Act Entitled The Patient Protection and Affordable Care Act,” United States Government Printing Office, 2010. 8. Guirguis-Blake J, Calonge N, Miller T, et al. Current Processes of the U.S. Preventive Services Task Force: refining evidence-based recommendation development. Ann Intern Med 2007;147:117–22. 9. Barton MB, Miller T, Wolff T, et al. How to read the new recommendation statement: methods update from the U.S. Preventive Services Task Force. Ann Intern Med 2007;147:123–7. 10. Sawaya GF, Guirguis-Blake J, LeFevre M, et al. Update on the methods: estimating certainty and magnitude of net benefit. Ann Intern Med 2007;147:871–5. 11. U.S. Preventive Services Task Force. Screening for breast cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med 2009;151: 716–26. 12. Ferucci ED, Emplin DW, Laneir AP. Rheumatoid arthritis in American Indians and Alaska Natives: a review of the literature. Semin Arthritis Rheum 2005;34(4): 662–7. 13. Smolik I, Robinson DB, Bernstein CN, et al. First-degree relatives of patients with rheumatoid arthritis exhibit high prevalence of joint symptoms. J Rheumatol 2013; 40(6):818–24. 14. Crowson CS, Matteson EL, Myasoedova E, et al. The lifetime risk of adult-onset rheumatoid arthritis and other inflammatory autoimmune rheumatic diseases. Arthritis Rheum 2011;63(3):633–9.

795

Developing evidence-based screening recommendations, with consideration for rheumatology.

Screening for presymptomatic disease provides the potential for early intervention and improved outcomes. However, although this practice has potentia...
321KB Sizes 0 Downloads 7 Views