e8(1) C OPYRIGHT Ó 2015

BY

T HE J OURNAL

OF

B ONE

AND J OINT

S URGERY, I NCORPORATED

the

Orthopaedic forum

The Validity of Level of Evidence Ratings of Articles Submitted to JBJS Nasir Hussain, BSc, MSc, Rosanna Yankanah, BSc, and James G. Wright, MD, MPH, FRCSC

Background: In 2003, The Journal of Bone & Joint Surgery (American Volume) implemented a requirement for submitted clinical research articles to include a level of evidence rating. The aim of this study was to analyze the agreement between authors and JBJS regarding the level of evidence rating of accepted clinical articles. Methods: A random sample of 353 clinical research articles accepted by JBJS from 2010 to 2012 was analyzed; 188 had a level of evidence rating provided by the author. Articles were grouped by study type and subspecialty. An unweighted kappa value was calculated to measure agreement between the authors and the JBJS editor, whose decision was used as the gold standard. In a secondary analysis, the articles in each subspecialty were categorized according to the year of submission to evaluate temporal trends. Results: Of the 353 articles, 69.4% (245) were classified by JBJS as representing a therapeutic study, 17.6% (sixty-two) were classified as representing Level-I evidence, and 25.2% (eighty-nine) dealt with arthroplasty. Agreement between the author and the JBJS editor was 0.79 (95% confidence interval [CI], 0.71 to 0.89; p < 0.001) for the study type, 0.62 (95% CI, 0.53 to 0.70; p < 0.001) for the level of evidence, and 0.65 (95% CI, 0.58 to 0.73; p < 0.001) for the full level of evidence rating (study type and level of evidence). Conclusions: Level of evidence ratings suggested by authors from 2010 to 2012 had moderate to substantial agreement with the ratings assigned by the JBJS editor. This suggests that the level of evidence rating system is being properly understood by authors of articles published in JBJS. However, the low frequency of reporting suggests that JBJS needs to strictly enforce requirements for submission of a level of evidence rating at the time of manuscript submission.

Peer Review: This article was reviewed by the Editor-in-Chief and one Deputy Editor, and it underwent blinded review by two or more outside experts. The Deputy Editor reviewed each revision of the article, and it underwent a final review by the Editor-in-Chief prior to publication. Final corrections and clarifications occurred during one or more exchanges between the author(s) and copyeditors.

Orthopaedic surgeons use the best evidence in the literature to help make treatment decisions regarding their patients. Different research designs have different strengths and weaknesses. A

randomized controlled trial is generally considered to be the best design for a clinical study evaluating a therapy, whereas a prospective cohort study is considered to be the best design for

Disclosure: One or more of the authors received payments or services, either directly or indirectly (i.e., via his or her institution), from a third party in support of an aspect of this work. None of the authors, or their institution(s), have had any financial relationship, in the thirty-six months prior to submission of this work, with any entity in the biomedical arena that could be perceived to influence or have the potential to influence what is written in this work. Also, no author has had any other relationships, or has engaged in any other activities, that could be perceived to influence or have the potential to influence what is written in this work. The complete Disclosures of Potential Conflicts of Interest submitted by authors are always provided with the online version of the article.

J Bone Joint Surg Am. 2015;97:e8(1-5)

d

http://dx.doi.org/10.2106/JBJS.M.01491

e8(2) TH E JO U R NA L O F B O N E & JO I N T SU RG E RY J B J S . O RG V O L U M E 97-A N U M B E R 2 J A N UA R Y 21, 2 015 d

d

d

T H E VA L I D I T Y O F L E V E L S U B M I T T E D T O JBJS

OF

E V I D E N C E R AT I N G S

OF

A RT I C L E S

TABLE I General Characteristics of the 353 Studies* Level of Evidence (no. [%]) Study Type

I

II

III

IV

Therapeutic

51 (14.4)

Diagnostic

7 (2.0)

32 (9.1)

57 (16.1)

105 (29.7)

53 (15.0)

11 (3.1)

Prognostic

10 (2.8)

4 (1.1)

3 (0.8)

9 (2.5)

Economic or decision

10 (2.8)

0 (0.0)

0 (0.0)

1 (0.3)

0 (0.0)

*As classified by the editor.

investigating prognostic factors1. Clinical research studies can be classified according to their research design by means of a hierarchal “level of evidence” system. Although a full critical appraisal of a study is required to fully assess its quality, the level of evidence provides a rapid and succinct ranking of the study design relative to that of other studies with respect to its strengths and weaknesses2,3. In 2003, The Journal of Bone & Joint Surgery (American Volume) implemented a requirement for submitted clinical research articles to include a suggested level of evidence rating4. The currently used rating system contains five levels for each of the four study types, which are therapeutic, prognostic, diagnostic, and economic or decision modeling. On submission of a clinical research article to JBJS, the authors are to provide a rating that is subsequently confirmed by the JBJS Associate Editor for Evidence-Based Orthopaedics, but this requirement has not been strictly enforced. Although an overarching goal was to help place clinical research in context for the reader, introduction of the level of evidence system was also intended to guide authors, reviewers, and possibly patients regarding the merits of specific types of study designs4. Even ten years after this system had been introduced and widely used in many contexts, the validity of level of evidence assignments used by JBJS authors remained unknown. High validity would suggest that authors who publish in JBJS have a good understanding of the levels of evidence. If those submitting articles to JBJS are representative of the broader academic community, then the same is true of that group in general. The aim of the present study was to analyze the validity of the level of evidence ratings submitted by authors of clinical research articles accepted by JBJS. A secondary aim of this study was to evaluate the level of evidence ratings of the articles in each subspecialty according to the year of article submission to obtain insights into temporal trends. Materials and Methods Sample The sample size needed for this study was estimated on the basis of the intraclass correlation coefficient (ICC). The ICC is equivalent to the quadratically weighted k (kappa) and thus serves as a reasonable approximation in a sample size determination. An a priori calculation based on an alpha of 0.05 and a power of 90% indicated that a total of 188 articles with both author and editor ratings would be required to distinguish between an ICC value of 0.20 (representing the null

hypothesis of little agreement) and a value of 0.40. We randomly selected an approximately equal number of editor-rated articles from each year. All clinical articles accepted by JBJS during 2010 to 2012 that had a level of evidence rating from the Associate Editor were assigned a random number by means of a computer-generated randomization sequence. These numbers were used to select a random sample containing an approximately equal number of rated articles from each year to yield the desired total of 188 rated by both the authors and editor. If an article in this random sample did not have a level of evidence rating suggested by the author, it was excluded from the primary analysis of agreement but was included in the secondary analysis of overall trends in submissions.

Level of Evidence Ratings The clinical articles included in the random sample that had level of evidence ratings from both the author and the JBJS editor were analyzed. The rating assigned by the JBJS editor was used as the gold standard. Thus, agreement between the ratings of the authors and the editor represented a form of criterion validity. Each JBJS level of evidence rating consists of the level of evidence (I through V) and the study type (therapeutic, prognostic, diagnostic, or economic or decision modeling). We also categorized the orthopaedic subspecialty of each article as arthroplasty, foot and ankle, sports medicine, spine, tumor, pediatrics, trauma, hand and upper extremity, or other. If a clinical article was relevant to more than one orthopaedic subspecialty, it was included in each.

Data Analysis Data were analyzed with use of SPSS statistical software (version 20; IBM, Armonk, New York). Specifically, overall frequencies of the levels of evidence, study types, and subspecialties were calculated for each of the three years. Agreement between

TABLE II Orthopaedic Subspecialties Represented in the 353 Studies* Subspecialty

No. (%)

Arthroplasty

89 (25.2)

Foot and ankle

39 (11.0)

Sports medicine

22 (6.2)

Spine

21 (5.9)

Tumor

11 (3.1)

Pediatrics

35 (9.9)

Trauma

54 (15.3)

Hand and upper extremity

49 (13.9)

Other

38 (10.8)

*Five studies fell in more than one subspecialty.

e8(3) TH E JO U R NA L O F B O N E & JO I N T SU RG E RY J B J S . O RG V O L U M E 97-A N U M B E R 2 J A N UA R Y 21, 2 015

T H E VA L I D I T Y O F L E V E L S U B M I T T E D T O JBJS

d

d

d

k

95% CI

Study type Level of evidence

0.81 0.59

0.67 to 0.95 0.44 to 0.74

Full rating

0.60

0.47 to 0.73

2010, n = 67

0.72

0.53 to 0.91

Level of evidence

0.65

0.51 to 0.79

Full rating

0.70

0.58 to 0.82

2012, n = 54 Study type

0.82

0.67 to 0.97

Level of evidence

0.60

0.44 to 0.76

Full rating

0.65

0.51 to 0.79

Study type Level of evidence

0.79 0.62

0.71 to 0.89 0.53 to 0.70

Full rating

0.65

0.58 to 0.73

Overall, n = 188

Author Rating (no. [%]) Therapeutic

Prognostic

Diagnostic

Therapeutic Prognostic

117 (62.2)

8 (4.3)

2 (1.1)

6 (3.2)

42 (22.3)

Diagnostic

1 (0.5)

1 (0.5)

1 (0.5)

10 (5.3)

A RT I C L E S

Agreement An unweighted k was calculated for the 188 article submissions in the random sample that had ratings by both the authors and editor. The unweighted k for the articles submitted during 2010 to 2012 was 0.79 (95% confidence interval [CI], 0.71 to 0.89; p < 0.001) for the study type, 0.62 (95% CI, 0.53 to 0.70; p < 0.001) for the level of evidence, and 0.65 (95% CI, 0.58 to 0.73; p < 0.001) for the full evidence rating consisting of the study type and level of evidence (Table III). All values represent “moderate to substantial” agreement for categorical data5. The number of raw agreements was greater for the study type (169, 89.9%) than for the level of evidence (135, 71.8%) and the full evidence rating (133, 70.7%) (Tables IV and V). The greatest discrepancy regarding the study type involved differentiating between therapeutic and prognostic studies; eight (4.3%) of the studies were misclassified by the authors as prognostic when they were actually therapeutic, and six (3.2%) were misclassified as therapeutic when they were prognostic. Nineteen (10.1%) of the studies were misclassified as representing Level-III evidence when the level was actually II. As shown in Table III, agreement between the authors and editor was generally moderate to substantial in each of the three years.

TABLE IV Raw Agreement Regarding the Study Type (N = 188)

Editor Rating

OF

General Characteristics of the Selected Studies Of the 353 article submissions, 69.4% (245) represented studies of therapies (Table I). Only one represented economic or decision modeling. Sixty-two (17.6%) of the articles represented Level-I evidence, 35.4% (125) represented Level-IV evidence, and none represented Level-V evidence. One hundred and five (29.7%) of the articles were classified by the editor as therapeutic Level IV, making this the most common full classification, and fifty-one (14.4%) were therapeutic Level I. The most common subspecialty category was arthroplasty (eighty-nine submissions, 25.2%), followed by trauma (fifty-four, 15.3%) and hand and upper extremity (forty-nine, 13.9%). The least common subspecialty category was tumor (eleven, 3.1%) (Table II).

2011, n = 67 Study type

E V I D E N C E R AT I N G S

Results The random sample consisted of 353 accepted clinical articles containing a level of evidence assigned by the editor. A level of evidence rating suggested by the authors was present in 53% (188) of the article submissions.

TABLE III Level of Agreement Year

OF

the JBJS editor and the authors was assessed by calculating the unweighted k statistic. The interpretation of the k value was performed with use of the cutoffs 5 for agreement suggested for categorical data by Landis and Koch . The overall agreement and agreement for each year were calculated for the level of evidence, study type, and full level of evidence rating. When possible, agreements were also calculated within each orthopaedic subspecialty. The total number of raw agreements was crosstabulated according to the study type and level of evidence.

Source of Funding In addition to internal funding from the Robert B. Salter Chair for Paediatric Surgical Research at The Hospital for Sick Children, this study received funding from the Hospital for Sick Children Foundation.

TABLE V Raw Agreement Regarding the Level of Evidence (N = 188) Author Rating (no. [%]) Editor Rating

I

II

III

I

28 (14.9%)

3 (1.6%)

II

1 (0.5%)

30 (16.0%)

19 (10.1%)

3 (1.6%)

III

1 (0.5%)

6 (3.2%)

25 (13.3%)

7 (3.7%)

IV

2 (1.1%)

4 (2.1%)

7 (3.7%)

52 (27.7%)

0 (0%)

IV 0 (0%)

e8(4) TH E JO U R NA L O F B O N E & JO I N T SU RG E RY J B J S . O RG V O L U M E 97-A N U M B E R 2 J A N UA R Y 21, 2 015 d

d

d

TABLE VI Level of Agreement Stratified by Orthopaedic Subspecialty Specialty

k

95% CI

Arthroplasty, n = 53 Study type

0.73

0.50 to 0.95

Level of evidence

0.62

0.46 to 0.78

Foot and ankle, n = 19 Study type

0.88

0.65 to 1.10

Level of evidence

0.63

0.37 to 0.88

Study type

0.39

20.14 to 0.92

Level of evidence

0.43

0.05 to 0.81

Sports medicine, n = 10

Spine, n = 11 Study type

0.84

0.55 to 1.10

Level of evidence

0.51

0.14 to 0.88

Tumor, n = 4 Study type Level of evidence

0.43

0.29 to 0.57

0.60

0.10 to 1.10

Pediatrics, n = 20 Study type

0.90

0.71 to 1.10

Level of evidence

0.72

0.48 to 0.96

Trauma, n = 25 Study type

0.84

0.63 to 1.00

Level of evidence

0.56

0.31 to 0.81

Hand and upper extremity, n = 27 Study type

1.00

1.00 to 1.00

Level of evidence

0.84

0.67 to 1.01

0.59 0.30

0.30 to 0.88 20.04 to 0.64

Other, n = 19 Study type Level of evidence

Finally, an unweighted k was calculated for each orthopaedic subspecialty (Table VI). Although the numbers are small, articles involving the hand and upper extremity had the highest k (1.00 for the study type rating) and articles involving sports medicine had the lowest. Discussion Author-reported level of evidence ratings had moderate to substantial agreement with those assigned by the editor. This relatively high level of agreement suggests that authors have a proper understanding of the level of evidence rating system, including a clear understanding of both levels of evidence and study types. This pattern was consistent over the three years studied. If the authors evaluated in this study are representative of the broader academic community, our findings suggest that levels of evidence ratings are generally understood and used in a valid manner by orthopaedic researchers. However, the authors of only 53% of the accepted articles included in the study provided a level of evidence rating. JBJS is considered to be a leading journal within orthopaedics and was

T H E VA L I D I T Y O F L E V E L S U B M I T T E D T O JBJS

OF

E V I D E N C E R AT I N G S

OF

A RT I C L E S

the first to require authors to submit a level of evidence rating, yet nearly half of authors of clinical research articles did not submit a rating at the time of submission to comply with this requirement. Given that one of the aims of introducing the level of evidence ratings was to create an appreciation of the importance of study quality, JBJS may benefit from making the level of evidence rating a mandatory field when authors enter general information about the manuscript online, thus strictly enforcing this submission requirement. Although the primary goal of our study was to evaluate agreement between the authors of JBJS articles and the editor regarding the level of evidence rating, other researchers have evaluated trends in the level of evidence of articles in a variety of orthopaedic journals. Zaidi et al. reviewed the level of evidence of articles published in the foot and ankle literature from 2000 to 20106. As in our study, they found that Level-IVevidence and a therapeutic study type were the most common. Wupperman et al. reported in 2007 that studies related to therapy and LevelIV evidence were the most common in Spine7. Finally, an overarching review of nine orthopaedic journals by Obremskey et al. was published in 2005. A majority of the 382 clinical articles from 2003 were Level IV (58.1%) and therapeutic (70.7%)8. Our study demonstrated a decline in the proportion of Level-IV studies in JBJS from 2010 to 2012. Although still the single most common category, studies with Level-IV evidence represented only a third of the studies accepted in 2012. Our study has potential limitations. First, it would have been much more beneficial to evaluate the author-editor agreement regarding the level of evidence over a longer period of time. JBJS began requiring authors to submit a level of evidence rating in 2003, and evaluating temporal trends since that time would have provided greater insight into how their understanding of level of evidence ratings changed over time; however, only the author-assigned ratings for recent years were available. Second, only 53% of authors submitted a level of evidence rating, and the results cannot be generalized to authors who did not submit such a rating. Because authors who provide a level of evidence rating will likely be more interested in the topic of evidence levels and thus may be better informed than authors who do not assign a rating to their article, our results may overestimate the skill with which authors submitting to JBJS assign a level of evidence rating. Third, our analysis was based on the year that the clinical article was accepted for publication. It might have been preferable to use the date of submission of each article. Finally, our subgroup analysis of trends in the level of evidence ratings according to year and subspecialty was underpowered and can only be considered to represent preliminary results. Overall, our study suggests that JBJS authors, who constitute opinion leaders within the orthopaedic community, are utilizing the level of evidence ratings in a valid manner. Authors should continue to actively utilize such a level of evidence rating during submission of clinical articles to continue to develop their familiarity with the level of evidence and study design classification system. The proportion of articles submitted without a level of evidence rating suggests that JBJS needs to strictly enforce this requirement during manuscript submission. n

e8(5) TH E JO U R NA L O F B O N E & JO I N T SU RG E RY J B J S . O RG V O L U M E 97-A N U M B E R 2 J A N UA R Y 21, 2 015 d

d

d

Nasir Hussain, BSc, MSc Rosanna Yankanah, BSc James G. Wright, MD, MPH, FRCSC Division of Orthopaedic Surgery,

T H E VA L I D I T Y O F L E V E L S U B M I T T E D T O JBJS

OF

E V I D E N C E R AT I N G S

OF

A RT I C L E S

The Hospital for Sick Children, 1218-555 University Avenue, Toronto, ON M5G 1X8, Canada. E-mail address for N. Hussain: [email protected]

References 1. Bhandari M, Giannoudis PV. Evidence-based medicine: what it is and what it is not. Injury. 2006 Apr;37(4):302-6. Epub 2006 Feb 17. 2. Brighton B, Bhandari M, Tornetta P 3rd, Felson DT. Hierarchy of evidence: from case reports to randomized controlled trials. Clin Orthop Relat Res. 2003 Aug;413: 19-24. 3. Sprague S, McKay P, Thoma A. Study design and hierarchy of evidence for surgical decision making. Clin Plast Surg. 2008 Apr;35(2):195-205. 4. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of evidence to the journal. J Bone Joint Surg Am. 2003 Jan;85(1):1-3.

5. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74. 6. Zaidi R, Abbassian A, Cro S, Guha A, Cullen N, Singh D, Goldberg A. Levels of evidence in foot and ankle surgery literature: progress from 2000 to 2010? J Bone Joint Surg Am. 2012 Aug 1;94(15):e1121-10. 7. Wupperman R, Davis R, Obremskey WT. Level of evidence in Spine compared to other orthopedic journals. Spine (Phila Pa 1976). 2007 Feb 1;32(3):388-93. 8. Obremskey WT, Pappas N, Attallah-Wasif E, Tornetta P 3rd, Bhandari M. Level of evidence in orthopaedic journals. J Bone Joint Surg Am. 2005 Dec;87(12):2632-8.

The validity of level of evidence ratings of articles submitted to JBJS.

In 2003, The Journal of Bone & Joint Surgery (American Volume) implemented a requirement for submitted clinical research articles to include a level o...
473KB Sizes 0 Downloads 3 Views