The American Journal of Sports Medicine http://ajs.sagepub.com/

Clinical Assessment of Physical Examination Maneuvers for Rotator Cuff Lesions Lyndsay E. Somerville, Kevin Willits, Andrew M. Johnson, Robert Litchfield, Marie-Eve LeBel, Jaydeep Moro and Dianne Bryant Am J Sports Med 2014 42: 1911 originally published online June 16, 2014 DOI: 10.1177/0363546514538390 The online version of this article can be found at: http://ajs.sagepub.com/content/42/8/1911

Published by: http://www.sagepublications.com

On behalf of: American Orthopaedic Society for Sports Medicine

Additional services and information for The American Journal of Sports Medicine can be found at: Email Alerts: http://ajs.sagepub.com/cgi/alerts Subscriptions: http://ajs.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

>> Version of Record - Aug 1, 2014 OnlineFirst Version of Record - Jun 16, 2014 What is This?

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

5-in-5

Clinical Assessment of Physical Examination Maneuvers for Rotator Cuff Lesions Lyndsay E. Somerville,*yz PhD, Kevin Willits,z MD, Andrew M. Johnson,§ PhD, Robert Litchfield,z MD, Marie-Eve LeBel,z MD, Jaydeep Moro,|| MD, and Dianne Bryant,yz{ PhD Investigation performed at The University of Western Ontario, London, Ontario, Canada, and McMaster University, Hamilton, Ontario, Canada Background: Shoulder pain and disability pose a diagnostic challenge for clinicians owing to the numerous causes that exist. Unfortunately, the evidence in support of most clinical tests is weak or absent. Purpose: To determine the diagnostic validity of physical examination maneuvers for rotator cuff lesions. Study Design: Cohort study (diagnosis); Level of evidence, 1. Methods: Consecutive shoulder patients recruited for this study were referred to 2 tertiary orthopaedic clinics. A surgeon took a thorough history and indicated his or her certainty about each possible diagnosis. A clinician performed the physical examination for diagnoses where uncertainty remained. Arthroscopy was considered the reference standard for patients who underwent surgery, and MRI with arthrogram was considered the reference for patients who did not. The sensitivity, specificity, and likelihood ratios were calculated to investigate whether combinations of the top tests provided stronger predictions of the presence or absence of disease. Results: There were 139 participants. None of the tests were highly sensitive for diagnosing rotator cuff tears or tendinosis. Tests for subscapularis tears were all highly specific. No optimal combination of tests improved the ability to correctly diagnose rotator cuff tears. Closer analysis revealed the internal rotation and lateral rotation lag sign did not improve the ability to diagnose subscapularis or supraspinatus tears, respectively, although the lateral rotation lag sign demonstrated a discriminatory ability for tear size. Conclusion: No test in isolation is sufficient to diagnose a patient with rotator cuff damage. A combination of tests improves the ability to diagnose damage to the rotator cuff. It is recommended that the internal rotation and lateral rotation lag signs be removed from the gamut of physical examination tests for supraspinatus and subscapularis tears. Keywords: shoulder; diagnosis; physical examination; rotator cuff

Shoulder pain and disability are common symptoms that pose a diagnostic challenge for physicians owing to the

numerous causes and the potential for multiple disorders existing in the same patient. Included among these causes are lesions of the rotator cuff. Neer14,15 described rotator cuff disease as a spectrum of disorders categorized into 3 stages, progressing from edema and hemorrhage of the rotator cuff tendons to complete tears. These tears almost inevitably involve the supraspinatus tendon,3 but they may also involve the infraspinatus, teres minor, and/or the subscapularis. An accurate physical examination that can differentially diagnose rotator cuff injury can save a patient additional tests that may be costly and invasive. Unfortunately, the evidence in support of most physical examination tests is weak or absent.4,10 Most studies do not meet the methodological requirements to be useful in a clinical setting.12,22 Two systematic reviews concluded that a methodologically robust study was necessary to inform clinical practice.4,10 For a diagnostic study to meet these requirements, 3 criteria must be met. First, the sample of patients must be representative of patients for whom clinicians would face diagnostic uncertainty, including those with varying

*Address correspondence to Lyndsay E. Somerville, Division of Orthopaedic Surgery, University of Western Ontario & London Health Sciences Centre, University Campus, 339 Windermere Road, London, ON, Canada, N6A 5A5 (e-mail: [email protected]). y Faculty of Health Sciences, The University of Western Ontario, London, Ontario, Canada. z Department of Surgery, Schulich School of Medicine & Dentistry, The University of Western Ontario, London, Ontario, Canada. § School of Health Studies, Faculty of Health Sciences, The University of Western Ontario, London, Ontario, Canada. || Department of Surgery, Division of Orthopaedic Surgery, McMaster University, Hamilton, Ontario, Canada. { Department of Clinical Epidemiology & Biostatistics, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada. One or more of the authors has declared the following potential conflict of interest or source of funding: This study received funding through an internal source that paid for research assistant time to collect data. The American Journal of Sports Medicine, Vol. 42, No. 8 DOI: 10.1177/0363546514538390 Ó 2014 The Author(s)

1911 Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

1912 Somerville et al

The American Journal of Sports Medicine

severity of the disease, those with isolated disease, those with and without concomitant symptoms, and those with other diseases that have similar symptoms. When the sample selected does not represent the full spectrum of disease, the estimates of sensitivity or specificity produced from that study will not be valid within a typical practice. Second, the results of the diagnostic test cannot influence which patients undergo the gold standard test. To better understand this requirement, imagine a situation in which only those patients who have a positive test result undergo the gold standard. Since the likelihood of having the disease (a more severe case or isolated disease) is highest in patients with a positive test, sensitivity would be overestimated (verification bias). Third, the person interpreting the gold standard must be blind to the results of physical examination tests and other forms of testing (eg, other physical examination maneuvers, imaging). If blinding is not adequate, the interpretation of the gold standard is likely to be influenced by knowing the results of the other tests, most commonly having the effect of overestimating the performance of the test. The purpose of this study is to determine the diagnostic validity of physical examination maneuvers for rotator cuff injury (defined as full-thickness tears, partial-thickness tears, and tendinosis of any of the 3 rotator cuff tendons: supraspinatus, infraspinatus, and subscapularis) with the use of a methodologically rigorous and clinically applicable study.

MATERIALS AND METHODS Patient Population Between May 2007 and November 2008, we recruited consecutive patients from 4 clinicians with a subspecialty in shoulder surgery at 2 tertiary care orthopaedic centers. All participants came to the clinic for their first consultation for complaints about their shoulder. We excluded patients who were referred for shoulder replacement surgery. All patients gave informed consent. The study was approved by each center’s Health Sciences Research Ethics Board.

Identification of Physical Examination Tests We identified existing physical examination tests through a systematic review. In several instances, there were variations in the description of how each test was conducted and/or how a positive or negative result was defined. To reach consensus, we used a modified Delphi process,20 in which participating surgeons were asked to indicate their preference to include or exclude each test. The survey included the original description of the test and scoring and any modifications. Next, we tallied the results of this survey and included tests for which most surgeons indicated that the test should be included, excluded tests for which the majority of surgeons indicated that the test should be excluded, and produced a second survey for tests for which no majority was reached. A majority was defined

as at least 4 of the 5 participating surgeons indicating include or exclude. The second survey presented the results of the first survey and identified tests for which there were discrepancies between surgeons. This survey asked each surgeon to present arguments for why the test should or should not be included and to reaffirm his or her decision. If, after this second survey, any test was still without a majority decision, we presented surgeons with a document providing the arguments for and against, and a meeting with the surgeons was held and discussion continued until consensus was reached. We included the following physical examination tests: (1) supraspinatus test (Jobe test) and full can test (supraspinatus tears); (2) lift-off test, belly press test, and internal rotation lag sign (subscapularis tears); (3) lateral rotation lag sign (infraspinatus tears); and (4) painful arc, Hawkins-Kennedy sign, and Neer impingement sign (tendinosis) (see Appendix 1, available online at http://ajsm.sagepub.com/supplemental).

Clinical Examination Testing To adhere to the diagnostic process, patients completed a detailed questionnaire before their consultation that elicited demographic information, symptoms, and selfreported history of their disease. The physician was not provided with the completed questionnaire to avoid influencing his or her usual approach to history taking. Instead, the physician completed his or her usual history and indicated the pretest probability for each of the 8 shoulder lesions using a diagnostic threshold scale to denote his or her diagnostic uncertainty (Figure 1). The 8 diagnoses of interest were rotator cuff lesions, acromioclavicular joint injury, superior labrum anterior-posterior (SLAP) lesions, other labral lesions, and instability (anterior, posterior, inferior, or multidirectional; each represented by a separate scale). Using the scale in Figure 1, if the surgeon indicated his or her uncertainty was below the testing threshold (ie, certain that the lesion was not playing a role in the patient’s complaints) or above the treatment threshold (ie, certain that the lesion was a contributing factor such that no further testing was necessary), he or she did not perform the tests for that disease. Patients for whom the physician faced uncertainty in the diagnosis (ie, clinician rated as above the testing but below the treatment threshold) remained as part of the study group for that diagnosis, and the physical examination maneuvers specific to that diagnosis were then performed. For example, if the clinician was certain that the patient did not have tendinosis but was uncertain whether the diagnosis was more severe rotator cuff injury, the patient was included in the study group for rotator cuff tears but not for tendinosis. The exclusion of patients from analyses of a specific diagnosis if the clinician was certain the patient did not have that particular diagnosis is the key methodological feature that ensures that the results of this study are truly applicable in a clinical setting. To standardize the technique and scoring for each test, we constructed a glossary that was provided to clinicians. Each clinician was required to review the glossary and

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

Vol. 42, No. 8, 2014

Physical Examination for Rotator Cuff Lesions

Threshold for treatment

Threshold for testing

0%

100%

1913

to arthroscopy, and MRIa has been shown to be highly sensitive and specific for detecting both rotator cuff and labral injuries.1,21

Statistical Analysis No testing required as the disease probability is below the testing threshold

The probability is between test and treatment thresholds and therefore further testing is required

Testing is complete and treatment begins; because disease probability is above treatment threshold

Figure 1. Thresholds for the diagnostic process. ensure that the method of application matched the description provided. To assist with standardization, we included pictures that illustrate the technique. Furthermore, we used a standardized data collection form that includes the description of how each test is performed and scored. Finally, the graduate student was trained how to perform all physical examination tests and was familiarized with alternative techniques so that he or she could provide correction if the clinician was performing the test in a manner other than as described in the protocol. A research assistant was present to ensure that all tests were completed and to record the results of the test on the data collection form. We ensured that the physician performing the physical examination tests did not review any available imaging studies or reports before evaluating the patient. This study discusses the results of the physical examination for rotator cuff lesions.

Reference Standard Arthroscopy and magnetic resonance imaging with arthrogram (MRIa) were the main reference standards. The surgeon performed a systematic diagnostic arthroscopy, taking care to visualize and evaluate the integrity of all pertinent anatomy, and was required to complete a standardized checklist documenting any findings for each structure (Appendix 2, available online). We developed this to minimize differences between surgeons due to variations in methods of examination and to minimize any detection bias should the clinician recall the physical examination or imaging at the time of interpreting the surgical examination. Although most patients went on to have surgery, some were not referred for surgery or opted out. These patients underwent a standardized MRIa as the reference standard. All MRIs had an intra-articular injection of gadolinium done under fluoroscopy. The strength of the MRI magnet was 1.5 tesla, and the MRI sequences were all protocoled to provide optimal imaging of the lesions being investigated (axial T1-weighted and T1-weighted fat saturated [fat sat], coronal T1-weighted and T1-weighted fat sat, proton density fat sat, T2-weighted fat sat, and sagittal proton density fat sat sequences). All MRIs were done at a university center by radiologists with substantial expertise in musculoskeletal imaging. Images were interpreted by a fellowship-trained radiologist who was blinded to the physical examination results. There is good evidence to suggest that MRIa is a comparable reference standard

To determine sample size, we assumed a sensitivity and specificity of at least 0.85 and a 95% confidence interval with a width of 60.10,9 yielding an estimate of 50 patients in each disease category. We inflated the sample size by 10% to account for attrition. Sensitivity and specificity were calculated for each test individually, including 95% confidence intervals around these estimates. These values were used to calculate positive and negative likelihood ratios. A likelihood ratio is the likelihood that a test result (positive or negative) is elicited in a patient with the target disorder compared with the likelihood the same test result is elicited in a patient without the target disorder. Likelihood ratios indicate the extent that a given diagnostic test result will change the odds of having the target disorder. A likelihood ratio of 1 has no practical significance, since the clinician’s impression of the probability of the presence of the target disorder would not change. Likelihood ratios greater than 1 imply that the test result is associated with the disease: the greater the value, the more likely the disorder is present. Conversely, likelihood ratios less than 1 indicate that the test result is associated with absence of disease: the closer it is to 0, the more likely the disorder is absent. In addition, we assessed the ability of the tests to distinguish between differing sizes of tears by receiver operating characteristic (ROC) curve analysis. If this analysis demonstrates that the tests have the ability to differentiate tear size, we will determine the most appropriate cutoff point to define disease positive and disease negative. Next, because we are interested in identifying the tests that accurately diagnose lesions in patients who have surgery versus those who do not, we repeated this analysis after categorizing patients according to whether an existing rotator cuff lesion was surgically repaired or not. Repaired was defined as any manipulation of the rotator cuff that required suturing. Debridement was not considered a repair. For the 8 patients who underwent an MRIa, an experienced surgeon viewed the MRI and decided whether any existing rotator cuff lesion was repairable as defined above. We divided the tests into groups according to the disease they were being tested for (rotator cuff injury, instability [anterior, posterior, multidirectional], superior labral lesion, other labral lesions, and acromioclavicular abnormalities). We dummy-coded these sets of tests to indicate whether 1 test, 2 tests, or all tests were positive. We tested whether combinations of tests improve the ability to diagnose. We calculated the sensitivity, specificity, and likelihood ratio of all tests positive, 1 test positive, at least 1 test positive, and so on. Tests that were poor indicators of disease were removed from the analysis and the accuracy was reevaluated. This analysis determined the appropriate number and combinations of tests for each disease category that provided the greatest clinical yield.

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

1914 Somerville et al

The American Journal of Sports Medicine

TABLE 1 Distribution of Associated Disease for Supraspinatus Tearsa

FT tear (n = 53) PT tear (n = 14) Tendinosis (n = 16) No RC lesion (n = 56)

Biceps Tendon Lesion

Subscapularis Lesion

Anterior Labrum Bankart

Posterior Labrum Reverse Bankart

SLAP (Types II-V)

AC OA (ModerateSevere)

Other Mild Pathologic Abnormalities

No Other Pathologic Abnormality

25 0 1 2

10 2 2 5

4 2 4 15

1 1 0 5

5 2 0 12

14 2 3 5

30 10 7 15

6 3 5 12

a AC, acromioclavicular; FT, full thickness; OA, osteoarthritis; PT, partial thickness; RC, rotator cuff; SLAP, superior labrum anteriorposterior.

TABLE 2 Distribution of Associated Disease for Subscapularis Tearsa

FT tear (n = 8) PT tear (n = 12) Tendinosis (n = 21) No RC lesion (n = 98)

Supraspinatus Lesion

Biceps Tendon Lesion

Anterior Labrum Bankart

Posterior Labrum Reverse Bankart

SLAP (Types II-V)

AC OA (ModerateSevere)

Other Mild Pathologic Abnormalities

No Other Pathologic Abnormality

4 11 17 37

3 7 10 10

2 0 1 22

0 0 0 7

0 0 1 18

2 4 8 10

5 9 16 44

1 0 2 12

a AC, acromioclavicular; FT, full thickness; OA, osteoarthritis; PT, partial thickness; RC, rotator cuff; SLAP, superior labrum anteriorposterior.

RESULTS There were 189 participants. Of these, 15 patients refused to undergo one of the reference standard tests or canceled their test; therefore, the remaining 174 patients composed the study sample. Thirty-five patients were not suspected of having rotator cuff disease and therefore did not undergo rotator cuff testing. One hundred thirty patients underwent physical examination tests for both rotator cuff tears and tendinosis. Six patients only underwent tests for tendinosis, and 3 patients only underwent tests for rotator cuff tears. Therefore, there were 139 participants in this study. Of these, 101 were men and 38 were women, with an average age of 46.0 6 16.0 years. The injuries related to the rotator cuff are reported in Tables 1 and 2. The average supraspinatus tendon tear size was 2.4 6 1.4 cm from anterior to posterior (range, 0.5-6 cm), and the average subscapularis tendon tear size was 1.2 6 1.2 cm from anterior to posterior (range, 0.2-4 cm). Other injuries identified were anterior and posterior labrum lesions, SLAP lesions, biceps injury, and acromioclavicular joint abnormalities. The diagnostic validity measures for all of the studied physical examination tests are presented in Table 3. None of the tests were highly sensitive for diagnosing either rotator cuff tears or tendinosis. The supraspinatus test had the largest proportion of patients with a positive test result (n = 66), of whom 42 had lesions of the supraspinatus tendon. The lateral rotation lag sign had only 12 patients with a positive result, of whom 10 had lesions, but 67 additional patients had tears of the supraspinatus that were not identified by this test. This test’s ability

was improved when assessing the integrity of the infraspinatus tendon (sensitivity, 33.3%; specificity, 94.7%). Most patients in this sample had a positive Hawkins-Kennedy test (62.5%) and Neer impingement test (65%), although these tests had moderate levels of sensitivity. The tests for subscapularis tears were all highly specific and consequently had high likelihood ratios. Most patients had negative tests for the lift-off test (n = 120), belly press test (n = 121), and the internal rotation lag sign (n = 127). Of these, 15, 14, and 18 had lesions of the subscapularis tendon for each of the tests, respectively. When assessing the ability of the tests to detect tears of varying sizes, both the supraspinatus test and full can test were not effective at distinguishing tears of differing sizes, but the lateral rotation lag sign demonstrated a good discriminating ability. The ROC curve analysis revealed that the best cutoff for diagnosing a supraspinatus tear is 2.9 cm with a sensitivity of 80% and a specificity of 81.4% (Figure 2). The subscapularis tests could not differentiate tear sizes. The results of the assessment of combinations of tests are presented in Table 4. There was no optimal combination of tests that improved the sensitivity and specificity for diagnosing rotator cuff tears. If, however, at least 1 test for tendinosis was positive, the sensitivity for detecting full-thickness tears was 88.5%. Closer analysis revealed that removing both the internal rotation lag sign and the lateral rotation lag sign did not result in a loss of certainty for diagnosing subscapularis and supraspinatus tears, respectively (Table 5). The ability to detect tears was improved slightly when disease positive was defined as rotator cuff injury that was repairable, although

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

Vol. 42, No. 8, 2014

Physical Examination for Rotator Cuff Lesions

1915

TABLE 3 Diagnostic Values for the Physical Examination Tests for Rotator Cuff Injury Testa Supraspinatus tear Supraspinatus test FT tears FT and PT tears All disease Full can test FT tears FT and PT tears All disease Lateral rotation lag sign FT tears FT and PT tears All disease Infraspinatus tear Lateral rotation lag sign FT tears FT and PT tears All disease Subscapularis tear Lift-off test FT tears FT and PT tears All disease Belly press test FT tears FT and PT tears All disease Internal rotation lag sign FT tears FT and PT tears All disease Painful arc Tendinosis All disease FT and PT tears FT tears Hawkins-Kennedy Tendinosis All disease FT and PT tears FT tears Neer Tendinosis All disease FT and PT tears FT tears

Sensitivity

95% CI

Specificity

95% CI

Positive LR

Negative LR

71.7 65.6 57.8

58.4-82.0 52.2-74.6 47.1-67.9

64.6 64.6 63.3

53.6-74.2 52.5-75.1 49.3-75.3

2.02 1.81 1.57

0.44 0.55 0.67

64.2 58.2 55.4

50.7-75.7 46.3-69.3 44.7-65.6

67.1 67.7 71.4

56.2-76.5 55.6-77.8 57.6-82.2

1.95 1.80 1.94

0.53 0.62 0.62

19.6 15.4 12.4

11.0-32.5 8.6-26.1 6.9-21.3

97.5 97.0 96.0

91.3-99.3 89.6-99.2 86.5-98.9

7.84 5.08 3.09

0.83 0.87 0.91

33.3 26.1 24.4

16.3-56.3 12.6-46.5 12.8-41.0

94.7 94.5 96.0

89.0-97.6 88.6-97.5 90.1-98.4

6.33 4.74 6.00

0.70 0.78 0.79

25.0 21.1 17.5

7.2-59.1 8.5-433 8.8-32.0

95.0 96.3 98.9

89.5-97.7 90.9-98.6 93.8-99.8

4.96 5.68 15.2

0.79 0.82 0.84

50.0 30.0 19.5

21.5-78.5 14.6-51.9 10.2-34.0

95.9 97.3 98.9

90.8-98.3 92.3-99.1 93.9-99.8

12.2 11.0 17.4

0.52 0.72 0.81

96.0 96.5 100.0b

90.9-98.3 91.3-98.6 96.0-100

0.0 1.49

1.04 0.98 0.88

0.0 5.3 12.5

0.0-32.4 0.94-24.6 5.5-26.1

37.5 56.9 61.4 66.7

18.5-61.4 45.9-67.0 49.4-72.4 53.0-78.0

48.7 61.1 61.4 60.7

39.9-57.6 47.8-73.0 49.7-72.0 50.0-70.5

0.73 1.46 1.60 1.70

1.28 0.71 0.63 0.55

62.5 72.8 75.8 80.4

38.6-81.5 62.3-81.3 64.2-84.5 67.5-89.0

37.0 51.9 42.5 47.6

28.8-45.9 38.9-64.6 32.3-53.4 37.3-58.2

0.99 1.51 1.32 1.54

1.01 0.52 0.57 0.41

68.8 67.5 67.2 72.0

44.4-85.8 56.6-76.8 55.0-77.4 58.3-82.5

34.8 37.0 35.7 38.1

26.8-43.7 25.4-50.4 25.5-47.4 28.5-48.8

1.05 1.07 1.05 1.16

0.90 0.88 0.92 0.74

a

‘‘All disease’’ refers to full-thickness (FT) tear, partial-thickness tear (PT), or tendinosis. Not able to calculate positive likelihood ratio (LR).

b

this was minimal (Table 6 and Appendix 3, available online).

DISCUSSION Our study demonstrates that the physical examination tests for rotator cuff abnormalities are poor diagnostic indicators of disease. The supraspinatus test (Jobe test) had

the best combination of test properties for diagnosing full-thickness tears of the supraspinatus, with a sensitivity of 71.7% (95% CI, 58.4-82.0) and a specificity of 64.6% (95% CI, 53.6-74.2), although the Hawkins-Kennedy test for tendinosis had the highest sensitivity for full-thickness tears (80.4%). Both the supraspinatus test and the full can test had a positive likelihood ratio approaching 2 for the full range of disease severity. The lateral rotation test had the poorest combination of measurement properties for

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

1916 Somerville et al

The American Journal of Sports Medicine

Figure 2. Receiver operating characteristic (ROC) curve for lateral rotation lag sign.

supraspinatus tears, although when assessing the integrity of the infraspinatus tendon, these were improved. In fact, this test had a high discriminating ability when assessing tear size. The analysis revealed that 2.9 cm was the most appropriate cutoff point for discrimination. This means that this test is best at diagnosing tears larger than 2.9 cm. This is most likely due to the extension of the tear into the infraspinatus tendon. None of the tests for subscapularis tears were sensitive for diagnosing any stage of the disease, although they were all highly specific. The Neer impingement sign was the most sensitive for diagnosing tendinosis (sensitivity, 68.8%; 95% CI, 44.485.8), although the Hawkins-Kennedy sign was the most sensitive for all other degrees of rotator cuff disease. Mechanically, this makes sense, since previous research has demonstrated that this position results in greater subacromial contact of the rotator cuff than does the Neer sign.2,5,16 None of the tests for tendinosis demonstrated values of sensitivity and specificity to suggest that it is a useful test in the clinical setting. As the definition of disease positive became more inclusive (ie, from full-thickness tear only to include partialthickness and tendinosis), the sensitivity was reduced and the specificity was improved for all tests. This may be a result of an increase in false-positive results since less severe disease often demonstrates symptoms. For supraspinatus lesions, the likelihood ratio was (not surprisingly) highest when all tests were positive, compared with when only 1 test was positive. This means that when all tests are positive, the clinician can be certain that the patient has the disease—it does not, however, mean that patients with some negative tests do not have the disease. We demonstrated that both the lateral

rotation lag sign and internal rotation lag sign were very poor indicators of supraspinatus and subscapularis tears, respectively. By removing these tests from our analysis for supraspinatus and subscapularis tears, we found these tests do not improve diagnostic certainty—that despite their ability to rule in a disease (high specificity), their ability is no greater than that offered by other tests, and the sensitivity of the lateral rotation and internal rotation lag sign is too low to be useful in ruling out disease. Thus, we recommend abandoning these tests in a clinical setting for supraspinatus and subscapularis tears. Our values of sensitivity and specificity are lower than most reported values in the existing literature. Sackett and Haynes18 identified a continuum of diagnostic study architectures ranging from efficacy trials (phase I and II) to effectiveness trials (phase III). Phase I and II studies cannot evaluate a test’s suitability for use in a clinical setting because they evaluate diagnostic validity in patients with known disease (ie, disease negative or positive) and known test outcome (ie, test negative or positive), respectively. A phase III study investigates whether the diagnostic test can distinguish between patients with and without the disorder for whom it was clinically reasonable to suspect that the disease might be present. While phase I and II studies offer information about diagnostic validity under ideal conditions (efficacy), phase III studies offer practical diagnostic information about the validity of physical examination tests under usual conditions in a clinical setting (effectiveness).18 Most published studies assessing physical examination tests for rotator cuff lesions are phase II studies. A major contributing factor that defines whether a study is an efficacy or effectiveness trial is the sample included. Specifically, for a study to be defined as an effectiveness trial, the sample of patients must be representative of the population of patients with shoulder complaints for which the physician would face diagnostic uncertainty in a typical orthopaedic practice. This includes patients with the full spectrum of the disease of interest, including those with and without concomitant lesions and those with other shoulder injuries that have similar symptoms. One method to ensure a representative sample is to recruit patients consecutively as they present to the clinic.11 We found that most published diagnostic studies did not select an appropriate sample, and not surprisingly, the reported sensitivities and specificities between studies are as variable as their samples. One common error in the selection of participants for studies assessing the validity of a diagnostic test is to include patients for whom clinicians do not face diagnostic uncertainty. Many of the studies that we identified in the literature include a proportion of patients who are known to be free of the target disorder, considered to have normal shoulders, or known to have the target disorder.6,7,19 For example, in a study by Gerber et al6 to validate the belly press test for diagnosing subscapularis tears, the reported value of sensitivity was extremely encouraging (100%); however, the study included only patients with isolated subscapularis tears. Including patients known to have the target disorder will overestimate sensitivity of the test by increasing the

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

Vol. 42, No. 8, 2014

Physical Examination for Rotator Cuff Lesions

1917

TABLE 4 Diagnostic Validity of the Combination of Physical Examination Maneuversa

Supraspinatus tearsb FT tears At least 1 positive At least 2 positive All 3 positive All tears At least 1 positive At least 2 positive All 3 positive Subscapularis tearsc FT tears At least 1 positive At least 2 positive All 3 positive All tears At least 1 positive At least 2 positive All 3 positive Tendinosisd Tendinosis At least 1 positive At least 2 positive All 3 positive All disease At least 1 positive At least 2 positive All 3 positive FT tears At least 1 positive At least 2 positive All 3 positive

Sensitivity

95% CI

Specificity

95% CI

Positive LR

Negative LR

73.6 62.3 18.9

60.4-83.6 48.8-74.1 10.6-31.4

57.5 73.8 98.8

46.6-67.7 63.2-82.1 93.3-99.8

1.73 2.37 15.09

0.46 0.51 0.82

67.2 55.2 14.9

55.3-77.2 43.4-66.5 8.3-25.3

57.6 74.2 98.5

45.6-68.8 62.6-83.3 91.9-99.7

1.58 2.14 9.85

0.57 0.60 0.86

50.0 25.0 0.0

21.5-78.5 7.2-59.1 0.0-32.4

92.8 96.8 97.6

86.9-96.2 92.1-98.8 93.2-99.2

6.94 7.81 0.0

0.54 0.78 1.03

35.0 15.0 5.0

18.1-56.7 5.2-36.0 0.89-23.6

94.7 97.4 98.2

88.9-97.5 92.5-99.1 93.8-99.5

6.59 5.65 2.83

0.69 0.87 0.97

75.0 62.5 31.3

50.5-89.8 38.6-81.5 14.2-55.6

21.7 38.3 62.5

15.2-29.9 30.1-47.3 53.6-70.7

0.96 1.01 0.83

1.15 0.98 1.10

80.5 68.3 45.1

70.6-87.6 57.6-77.4 34.8-55.9

25.9 48.2 75.9

16.1-38.9 35.4-61.2 63.1-85.4

1.09 1.32 1.87

0.75 0.66 0.72

88.5 73.1 51.9

77.0-94.6 59.8-83.2 38.7-64.9

28.6 45.2 72.6

20.0-39.0 35.0-55.9 62.3-81.0

1.24 1.33 1.90

0.40 0.60 0.66

a

FT, full thickness; LR, likelihood ratio. Includes the supraspinatus test, full can test, and lateral rotation lag sign. c Includes lift-off test, belly press test, and internal rotation lag sign. d Includes painful arc, Neer test, and Hawkins-Kennedy test. b

TABLE 5 Diagnostic Validity of the Combination of Tests With the Lateral Rotation and Internal Rotation Lag Sign Removeda

Supraspinatus tears FT tears At least 1 positive 2 positive All tears At least 1 positive 2 positive Subscapularis tears FT tears At least 1 positive 2 positive All tears At least 1 positive 2 positive

Sensitivity

95% CI

Specificity

95% CI

Positive LR

Negative LR

73.6 62.3

60.4-83.6 48.8-74.1

57.5 74.7

46.6-67.7 64.1-83.0

1.73 2.46

0.46 0.51

67.2 55.2

55.3-77.2 43.4-66.5

57.6 75.8

45.6-68.8 64.2-84.5

1.58 2.28

0.57 0.59

50.0 25.0

21.5-78.5 7.2-59.1

93.6 97.6

87.9-96.7 93.2-99.2

7.81 10.42

0.53 0.77

35.0 15.0

18.1-56.7 5.2-36.0

95.6 98.2

90.1-98.1 93.8-99.5

7.91 8.48

0.68 0.87

a

FT, full thickness; LR, likelihood ratio.

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

1918 Somerville et al

The American Journal of Sports Medicine

TABLE 6 Diagnostic Values for the Physical Examination Tests for Surgically Repaired Rotator Cuff Lesionsa Test

Sensitivity

95% CI

Repairable rotator cuff tear Supraspinatus tests (no lateral rotation lag sign) At least 1 positive 70.7 58.0-80.8 2 positive 58.6 45.8-70.4 Subscapularis tests (no internal rotation lag sign) At least 1 positive 41.7 19.3-68.1 2 positive 16.7 4.7-44.8 Tendinosis tests At least 1 positive 86.0 74.7-92.7 At least 2 positive 71.9 59.2-81.9 3 positive 50.9 38.3-63.4

Specificity

95% CI

Positive LR

Negative LR

57.3 74.7

46.1-67.9 63.8-83.1

1.66 2.31

0.51 0.55

94.2 97.5

88.5-97.2 93.0-99.2

7.20 6.72

0.62 0.86

27.9 45.6 73.4

19.2-38.6 35.1-56.5 62.8-81.9

1.19 1.32 1.91

0.50 0.62 0.67

a

LR, likelihood ratio.

proportion of patients with a positive test. In contrast, we found the sensitivity of the belly press test to be 50%. A second common error is the exclusion of patients who do not undergo surgery. Although many studies that made this methodological error included consecutive patients, the sample is no longer representative once patients who do not undergo surgery are removed because it excludes those patients for whom the disease was suspected but for whom surgery was not recommended, or those who opted out of recommended surgery (all likely to have less severe lesions). By excluding these patients, one increases the proportion of patients who are more likely to have a positive test (since they have more severe pathologic lesions), thereby overestimating the sensitivity. For example, Park and colleagues17 reported the painful arc test to be relatively sensitive for diagnosing impingement syndrome (73.5%). Despite including a relatively broad spectrum of patients, they excluded patients who did not undergo diagnostic arthroscopy. In contrast, we found the sensitivity to be 54.5%. A third potential reason for our less encouraging results is our standardization of the arthroscopic investigation. Although we could not blind the clinicians performing arthroscopy to the diagnosis, we developed standardized data collection forms to ensure that all structures were evaluated and reported on. Without this standardization, surgeons may look more closely when interpreting the gold standard for pathologic lesions suggested by the physical examination and ignore the structures that tested negatively. Discriminatory evaluation of the gold standard will bias its interpretation in favor of the physical examination. For example, we found the sensitivity of the HawkinsKennedy test and Neer impingement test to be 62.5% and 68.8%, respectively, whereas MacDonald et al13 reported respective sensitivities of 88.9% and 77%. The greatest difference between our study and their study is the potential bias in the interpretation of the gold standard. A further difficulty in interpreting the findings of the majority of existing publications is that they solely report the sensitivity and specificity of the tests, which are simply properties of the test and do not assist with clinical decision making. Alternatively, likelihood ratios are good

summaries of diagnostic accuracy, are simpler to interpret, and are more useful to clinicians since they can be applied to individual patients. Despite their usefulness, likelihood ratios are rarely reported, and few clinicians are conscious of how often they use them implicitly in practice.8 For example, a clinician would, in theory, only send a patient for a test when he or she believes that there is useful information to be gained from knowing the results of that test. The advantage of likelihood ratios becomes apparent when looking at our study results for the lift-off test. Based on the value for sensitivity (\20%), one might suggest that the lift-off test is extremely poor at ruling out subscapularis disease; however, the likelihood ratio for this test exceeds 5 for every category of disease. In other words, if a patient has a subscapularis tear, he or she is 5 times more likely to test positive on the lift-off test compared with a patient who does not have a tear. This knowledge is extremely useful to a clinician. Similarly, the supraspinatus test had the best combination of sensitivity and specificity for supraspinatus tears, but these moderate values result in a likelihood ratio approaching 2—which generate only small shifts in uncertainty. The strengths of this study include its large sample size, which enabled us to provide precise measures of the specificity, sensitivity, and likelihood ratios of these tests, both individually and in combination. In addition, this study involved 4 surgeons in 2 different cities in Ontario, Canada, which increases the applicability of the results. Consequently, there is enormous potential for knowledge transfer, in that our results will be used to guide practice; teach medical students, residents, and fellows; and will create a more research-friendly atmosphere with the standardization of tests. The limitations of this study include the potential for detection bias, since the surgeon who completed the physical examination also completed the surgical evaluation. We have minimized this source of bias by creating a standardized protocol for the arthroscopy that all surgeons performed so that all structures were investigated carefully and reported in a standardized fashion. In addition, the time delay between the clinical examination and surgical evaluation and the large volume of patients included in

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

Vol. 42, No. 8, 2014

Physical Examination for Rotator Cuff Lesions

this study reduced the probability that the surgeon remembered the results at the time of surgical evaluation. In addition, for the secondary analysis, one of the surgeons was to decide whether the rotator cuff lesion was repairable based on the MRIa, which is a subjective assessment and therefore could result in errors in classification. However, excluding these patients from the analysis would result in less valid measurement properties of the tests because part of the population is being excluded. Furthermore, the clinician reviewing the images has more than 10 years of experience reading MRIs. On the basis of these study results, clinicians must understand that no test in isolation is sufficient to diagnose rotator cuff disease. In particular, we recommend removing the internal rotation and lateral rotation lag signs from the gamut of physical examination tests for studying supraspinatus and subscapularis lesions. Performing a combination of tests is more likely to reduce the uncertainty about diagnosing supraspinatus disease but not subscapularis disease, since any one test is as good as any combination of them. Finally, researchers should consider reporting likelihood ratios, which improve the interpretability of test results in a clinical setting.

REFERENCES 1. Applegate GR, Hewitt M, Snyder SJ, et al. Chronic labral tears: value of magnetic resonance arthrography in evaluating the glenoid labrum and labral-bicipital complex. Arthroscopy. 2004;20(9):959-963. 2. Brossmann J, Preidler KW, Pedowitz RA, et al. Shoulder impingement syndrome: influence of shoulder position on rotator cuff impingement— an anatomic study. AJR Am J Roentgenol. 1996;167(6):1511-1515. 3. Codman EA. The Shoulder. Boston, MA: Thomas Todd Company; 1934. 4. Dinnes J, Loveman E, McIntyre L, et al. The effectiveness of diagnostic tests for the assessment of shoulder pain due to soft tissue disorders: a systematic review. Health Technol Assess. 2003;7(29):iii, 1-166. 5. Flatow EL, Soslowsky LJ, Ticker JB, et al. Excursion of the rotator cuff under the acromion: patterns of subacromial contact. Am J Sports Med. 1994;22(6):779-788.

1919

6. Gerber C, Hersche O, Farron A. Isolated rupture of the subscapularis tendon. J Bone Joint Surg Am. 1996;78(7):1015-1023. 7. Gerber C, Krushell RJ. Isolated rupture of the tendon of the subscapularis muscle: clinical features in 16 cases. J Bone Joint Surg Br. 1991;73(3):389-394. 8. Grimes DA, Schulz KF. Refining clinical diagnosis with likelihood ratios. Lancet. 2005;365(9469):1500-1505. 9. Harper R, Reeves B. Reporting of precision of estimates for diagnostic accuracy: a review. BMJ. 1999;318(7194):1322-1323. 10. Hegedus EJ, Goode A, Campbell S, et al. Physical examination tests of the shoulder: a systematic review with meta-analysis of individual tests. Br J Sports Med. 2008;42(2):80-92. 11. Knottnerus JA. The Evidence Base of Clinical Diagnosis. London, UK: BMJ Books; 2002. 12. Lijmer JG, Mol BW, Heisterkamp S, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999; 282(11):1061-1066. 13. MacDonald PB, Clark P, Sutherland K. An analysis of the diagnostic accuracy of the Hawkins and Neer subacromial impingement signs. J Shoulder Elbow Surg. 2000;9(4):299-301. 14. Neer CS II. Anterior acromioplasty for the chronic impingement syndrome in the shoulder: a preliminary report. J Bone Joint Surg Am. 1972;54(1):41-50. 15. Neer CS II. Impingement lesions. Clin Orthop Relat Res. 1983; 173:70-77. 16. Pappas GP, Blemker SS, Beaulieu CF, et al. In vivo anatomy of the Neer and Hawkins sign positions for shoulder impingement. J Shoulder Elbow Surg. 2006;15(1):40-49. 17. Park HB, Yokota A, Gill HS, et al. Diagnostic accuracy of clinical tests for the different degrees of subacromial impingement syndrome. J Bone Joint Surg Am. 2005;87(7):1446-1455. 18. Sackett DL, Haynes RB. The architecture of diagnostic research. BMJ. 2002;324(7336):539-541. 19. Scheibel M, Magosch P, Pritsch M, et al. The belly-off sign: a new clinical diagnostic sign for subscapularis lesions. Arthroscopy. 2005;21(10):1229-1235. 20. Somerville L, Bryant D, Willits K, et al. Protocol for determining the diagnostic validity of physical examination maneuvers for shoulder pathology. BMC Musculoskelet Disord. 2013;14:60. 21. Toyoda H, Ito Y, Tomo H, et al. Evaluation of rotator cuff tears with magnetic resonance arthrography. Clin Orthop Relat Res. 2005;439:109-115. 22. Whiting P, Rutjes AW, Reitsma JB, et al. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140(3):189-202.

For reprints and permission queries, please visit SAGE’s Web site at http://www.sagepub.com/journalsPermissions.nav

Downloaded from ajs.sagepub.com at MEMORIAL UNIV OF NEWFOUNDLAND on August 2, 2014

Clinical Assessment of Physical Examination Maneuvers for Rotator Cuff Lesions.

Shoulder pain and disability pose a diagnostic challenge for clinicians owing to the numerous causes that exist. Unfortunately, the evidence in suppor...
334KB Sizes 0 Downloads 3 Views