The American Journal of Surgery (2015) 210, 193-198

Editorial Opinion

Framework for a critical evaluation of surgical literature The preamble Online journal clubs to further educational activities of students, residents, fellows, and practicing surgeons are becoming increasingly common. These are driven by efforts to increase educational opportunities, standardize content, improve educational efficiency, and ultimately to improve patient care for geographically diverse learners.1 Widespread internet access and technology makes these ‘‘clubs’’ possible, as does the plethora of widely available educational materials, including peer-reviewed medical literature. There are problems with online journal clubs. Learners distant from each other are unable to establish a culture of learning or a common point of reference. The absence of proximity detracts from collaborative learning and eliminates ‘‘sidebar’’ discussions. Nonetheless, online journal clubs offer an opportunity for high-quality learning experiences for geographically distant learners from geographically distant educators. Online journal clubs are being promoted by many, including the directors of the fellowships in Surgical Oncology, Hepato-Pancreatico-Biliary Surgery, and Minimally Invasive Surgery. Additionally, online curricula including SCORE utilize monthly journal websites for supplemental education. To promote learning through online journal clubs and give learners and educators a common ground, we have developed the outline herein to evaluate and discuss medical literature, particularly in fields related to the broadest terms to General Surgery. To facilitate evaluation of and discussion about scholarly publications, this outline breaks manuscripts down into basic components (ie, Introduction, Methods, Results, and Discussion) and

The authors recognize Dr Steven Shackford for his efforts in this project. The authors declare no conflicts of interest. * Corresponding author. Tel.: 11-813-615-7030; fax: 813-615-8350. E-mail address: [email protected] Manuscript received January 27, 2015; revised manuscript February 20, 2015 0002-9610/$ - see front matter Ó 2015 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amjsurg.2015.02.003

promotes evaluations of each component such that the manuscripts can be qualitatively evaluated, as a whole, on their merit in quite objective and quantitative terms.2,3 Reading the medical literature requires some sophistication. This scoring sheet points out some important issues to consider when reading medical literature. Considerations raised herein are not to be taken as absolute, but instead as guidelines, albeit strong ones. The more points a manuscript gets, the better it is (Table 1). It is impossible to write a strong paper from a bad dataset. A poor paper can be written from a great dataset. A great paper can only be written from a well-designed study and a great dataset. However, how interesting it is, how meaningful it is, or how important it is may be another issue.

Introduction Why? Why was the work undertaken? The authors should provide sufficient justification as to why this study was undertaken. The study should be original, relevant, and with purpose. This justification should be found in both the Introduction and, in a nonrepetitive manner, in the Discussion sections of the article. Bias in undertaking the study should be clearly stated by the authors, sought by the reviewers, and may negatively impact the scoring of the article. In judging whether there is justification for undertaking the study, use the following criteria.

Clinical relevance (0 to 10 points) Does the work address a clinical problem of relevance? Score: 0 to 10 points. In assigning this score you must apply your own judgment as to the clinical relevance and magnitude of the clinical problem that the paper addresses. In assigning a score, consider how important this paper is and how interesting it is. Every paper should be important, not just a ‘‘so what, who cares’’ paper. An excessively abstruse or esoteric paper (eg, tuberculosis peritonitis in

194

The American Journal of Surgery, Vol 210, No 1, July 2015

patients with secondary sclerosing cholangitis) scores no points. If the authors do not make a cogent and concise argument in support of their manuscript, it negatively impacts the perceived importance of the manuscript.4 Also, consider the forum a paper is presented in (eg, ‘‘newspaper,’’ ‘‘throw away’’ journal, or ‘‘high impact’’ journal). Different forums may be associated with very different degrees of peer-review scrutiny, which will impact importance and clinical relevance of the paper.5

make a paper inadmissible for review. Such an example would include a paper about a new technology written by an officer of a company promoting that technology. Always look for a potential inappropriate bias (eg, funding source). Additionally, authors’ relationships with industry should be specifically listed.

Originality (0 to 5 points) Ideally, the study reports work not previously done to address an important (and common, ie, clinically relevant) clinical problem. In assigning the score, you must determine not only the uniqueness of the paper but also its value in solving the clinical problem. The authors should make the case for their work in both the Introduction and Discussion sections in a nonrepetitive manner. If the work is not original, it must validate, confirm, refute, or provide another approach to previously published and cited work; this will generate a lower score for originality. You need to know the body of medical literature to accurately score ‘‘originality.’’ For a thorough evaluation, a MEDLINE (or similar search) of related articles should be obtained to determine originality. Another paper about a common problem (bile duct injuries after laparoscopic cholecystectomy) scores no points unless it is somehow unique (very large size, new approach, very long follow-up, etc). In assigning a score for originality, you must judge how effective and important it is in confirming or refuting previous work(s) in its area and how relevant this is. A favorable example of work which confirms previously done work might be a study done with a larger sample size including different types of patients, a study which prolongs the experimental period, or a study done with a new (ie, different) test or imaging study during follow-up. Be critical, especially of ‘‘copycat’’ papers. A copycat paper scores no points as it makes no meaningful contributions.

Purpose (0 to 5 points) Is the purpose of the paper clearly stated? The purpose of the study should be very clear. In assigning this score, look for the phrases ‘‘we undertook this study to’’ or ‘‘the purpose of this study was to.’’ If you do not find such explicit wording, then the purpose is implied and only 1 to 2 points can be given depending on how much work you have to do to determine the implied purpose. Don’t hesitate to score zero if you have to work hard to figure out why the study was undertaken. Look for bias. Bias can be acceptable, if the design of the study allows for an independent outcome. That is, if bias cannot affect study outcome. If the study is funded by an interested party, this should be clearly noted and acknowledged. An irreconcilable conflict should

Hypothesis (0 or 5 points) Is the hypothesis clearly stated? Yes, score 5 points, and if no, score 0 points; this score is binary. This is one of the most important parts of a scientific paper. The authors must state that they have formulated a hypothesis that the work is intended to prove or disprove. The author’s hypothesis is very important because it gives the bias of the authors in undertaking, analyzing, and presenting their study. The hypothesis should be explicitly stated. Look for phrases such as ‘‘we hypothesized that.’’ or ‘‘our hypothesis was.’’ Implied hypotheses get no score. The hypothesis would preferably not be a null hypothesis.

Materials and Methods How? How did the authors do their work? You should be critical in assessing the methodology and study design. If the paper describes animal work in which variables can be very well controlled, you should expect a welldesigned and controlled study. If the paper is about a very large clinical study, there will be inherent problems different from a small controlled clinical trial. You should consider the following issues:

Design (0 to 15 points) Is the design of the study appropriate to test the hypothesis? Are the variables that the authors propose to measure appropriate to test the hypothesis? Score: 0 to 15 points. If the study involves humans, the variables are difficult to control and scoring can be a little more liberal. With research involving humans, there are some special considerations. First, was Institutional Review Board approval obtained and noted? This will provide some assurance of proper design. Then, determine the study design. High score should be given for prospective randomized controlled and double-blinded studies. In fairness, however, not all studies can be double blinded or blinded. Determine if there is a control group that receives similar treatment in all aspects but the intended therapy or insult. Note how patients were randomized. This should be clear. Lower scores are obtained for those studies that are retrospective chart reviews, ‘‘clinical series,’’ or ‘‘case’’ reports. In general, these latter designs score few points, although the paper may have considerable clinical interest (but that is a different issuednote A.1.). Consider how

A. Rosemurgy et al. Framework for evaluating surgical literature the patients were identified? The criteria for entry of a patient into the study should be explicitly defined, otherwise, ‘‘garbage in, garbage out.’’ Consider how many patients are at risk during later time points of the paper. For example, a study of 200 patients might really only be a study of 50 patients or less at 2 years. Is the duration of the study long enough or too long to observe the outcome or long enough or too long to be clinically applicable? For example, a paper that studies the salutary benefits of a laparoscopic approach to colectomy (vs an ‘‘open’’ approach) with 20-year follow-up is probably using an endpoint that is too distant to the therapeutic intervention. Consider the quality of the database. If a large ‘‘national’’ database is used, there will be holes in the database. What are they and are they important? Consider who entered the data into the database. Are the clinical variables explicitly defined? This is critically important in any retrospective or prospective study. Clinical endpoints should also be explicitly defined. For example, if the work is a study of ‘‘gastroesophageal reflux disease (GERD),’’ how is GERD defined? How is ‘‘infection’’ defined? How is ‘‘pancreatic cancer’’ defined, and which cell types are included? How is a ‘‘positive margin’’ defined? By microscopy or gene analysis? Do the authors use International Classification of Diseases (ICD-9) codes, molecular analysis, and so on? Who coded the diagnosis? What is the database and how good is it? How current? If the study is comparing treatments based on morbidities, how well are the morbidities defined? For example, if ‘‘pneumonia’’ is denoted, how is pneumonia defined? Be critical. If deep venous thrombosis is denoted, how was it defined? How relevant are the variables to the hypothesis? Is the endpoint which the authors propose appropriate and relevant to the clinical situation? This is a very important point. For example, in cancer research the ‘‘effectiveness’’ of a given therapy is often measured in terms of survival. Clearly, the effectiveness of any therapy can be judged in terms of survival, but survival is clearly dependent on ‘‘multiple’’ variables. Rather, therapeutic regimens should also be judged in terms of their ‘‘efficacy’’ in treating relevant variables such as tumor markers, quality of life, cost, disease-free survival, and need for further intervention. Survival might not always be the best ‘‘endpoint’’ given the patients and disorder or disease being studied. Similarly, and more obviously, survival would not be an appropriate endpoint for therapy for GERD. A study on mastectomy for breast cancer cannot just focus on survival, which again is due to many factors. As well, in this example, quality of life and morbidity will be major issues. A therapy being studied in patients terminally ill with cancer might best be judged by its impact on reducing a cancer-specific mediator. Simply stating that a given therapy does not improve survival is not necessarily an indictment of the therapy, which may be intended to improve physiology, symptoms, quality of life, cost of care, preserve resources, and so on. Any critical analysis of

195

a therapy should look at many issues and endpoints (and survival, as indicated). Consistent with our thoughts, the Center for EvidenceBased Medicine has developed a methodology for grading evidence. This methodology aids us in grading study design by helping us grade the design through which data were obtained. For this project, we have adapted one of Center for Evidence-Based Medicine’s tables (Table 2). Each study can be assigned a level. This should be important in your score.

Methodology (0 to 10 points) Are the methodology and techniques for measurement standardized and accepted? Score: 0 to 10 points. In assigning a value, do the authors employ a standard methodology which has been proven to effectively measure or quantify the variable(s) they are using? Is the methodology used clinically? Is the methodology current and ‘‘state-of-the art’’? Is it available to you? Are the authors proposing a new technology or assay to measure a variable? Are there better ways to measure the variable than what they use? If they are proposing a new measure of a variable, how accurate or valid is the method and do they provide accuracy or validation data? Be critical and thoughtful. For example, if deep venous thrombosis was ‘‘measured,’’ how was it detected? How was its extent determined? How was the thrombosis followed?

Analyses (0 to 10 points) Were statistical analyses undertaken? Score: 0 to 10 points. In assigning a value, judge whether analyses were done or whether only descriptive statistics (mean age, mean injury severity score, mean body mass index, number of men or women, etc) were presented. If only descriptive statistics are presented, assign a value of 0. If the dataset or study design only requires descriptive statistics, this is not a strong paper, although possibly interesting (ie, another issue: note A.1). Were the data determined to be parametric or nonparametric? Data, if parametric, should be presented as mean 6 standard deviation, not mean 6 standard error of the mean, although the latter looks better. Nonparametric data should be presented as median data, with the range of data. Was the sample size sufficient to detect a clinically significant difference? Is the study adequately powered? As a ‘‘rule of thumb’’ if there will be a 10% difference in outcomes without statistical significance, then there is a good probability the study was underpowered. Were proper analyses undertaken? Statistical analyses of the data should be appropriate for the study design. Were accommodations made for multiple (ie, repeated) analyses, measures, or comparisons? The authors should also explicitly state the level of significance for their study. Generally, authors will accept a 5% chance of a type 1 error (rejecting the null hypothesis when it is, in fact, correct).

196 Table 1

The American Journal of Surgery, Vol 210, No 1, July 2015

Statistical analysis (0 to 10 points)

Scoring Sheet Points possible

Introduction Clinical relevance Originality Purpose Hypothesis Materials and methods Design Methodology Analyses Results What Tables & figures Statistical analyses Discussion So what? Total

Points given

10 5 5 5 15 10 10 15 5 10 10 100

Consider the sample sizes, measures of variance, 95% confidence limits, and P values of the data and the analyses. Are the data parametric or nonparametric and were appropriate statistical analysis undertaken? Do the authors provide appropriate measures of variance? Do the analyses reflect this? Were the appropriate statistical analyses undertaken? Is it clearly stated which analyses were used? The reader is encouraged to think about the statistical methodology used and ask for assistance as necessary. In evaluating the statistical methodology used, the reader should be able to defend or refute the authors if called upon. Was covariance possible and considered? If so, how was it eliminated? If you perceive that there are confounding variables which might affect the results, do the authors provide data or considerations about those confounding variables?

Discussion Results What? (0 to 15 points) This involves the Results section of the article. Score this section 0 to 15 points. Do the authors clearly and succinctly present all the data that they promised in the Methods section? Make sure data are presented fairly, openly, clearly, completely, and without bias. The data of the study should be so clearly presented that you should be able to analyze the data or redo their statistical analyses. That point is very important. Were animals or subjects eliminated or lost (do all the numbers add up)? Pay particular attention to patients lost to follow-up or an ever decreasing number of patients available for follow-up (ie, patients at risk). If animals and subjects were eliminated or lost, do they provide the sample sizes for each of the analyses undertaken at time points studied? Be critical. In assigning a score, determine how much work you had to do to find the relevant data to either prove or disprove their hypothesis. The more work you had to do to understand, analyze, decipher, and follow the results of the study, the lower the score should be. This is very important. In sum, the data should be believable and be presented so that you could redo their analyses and test their conclusions.

Tables and figures (0 to 5 points) Look at the tables and figures and decide if they are clear and add to the presentation. The tables and figures should not be redundant with the text. Confusing and excessively detailed tables and figures detract from a manuscript and can detract from the study. They should assist in providing data that improve your ability to analyze the study.

So what! (0 to 10 points) This is the Discussion section in the article. Herein, the authors provide us with the relative importance of their work compared with previously done and published work. Here, they present the importance of their work in addressing the clinical problem or question studied. The first paragraph of the Discussion is a good place for the authors to strongly, succinctly, and clearly state why this paper should have been published. The first paragraph is the best place, but the authors should clearly state this somewhere. If the authors can’t or don’t state a strong case, can you? This is very important. If they do not believe their manuscript is important, why or how can you? The authors should clearly state the importance of their paper! Simply put, the authors should justify publication of their paper. If they do not, you are doing their work. After the first paragraph, the subsequent few paragraphs should discuss ALL the data, concisely putting the results of the study in some perspective. Much data (eg, demographic data) can be succinctly summarized and discussed, often in one sentence. Do the author’s data justify their conclusions? Are the data and conclusions consistent? Is the clinical relevance explicitly stated? Do they compare their work to relevant and important or landmark studies in the field? Are the references current (within the last 5 years), adequate in number (generally more than 20), and from respected journals? Are there important references missing? Are landmark references missing? The limitations of the study should be noted and discussed, although not necessarily in detail. Is there a small sample size? Is there a conflict of interest for the authors? Are patients lost to follow-up? Is the study outcome plausible given the biology or biological mechanisms behind the disorder or disease being studied? Can the results be mechanistically explained?

Levels of evidence for primary research question

Study type Therapeutic studies: investigating the results of treatment

Prognostic studies: investigating the effect of a patient characteristic on the outcome of disease

Diagnostic studies: investigating a diagnostic test

Economic and decision analyses: developing an economic or decision model

Level 1

 High-quality randomized controlled trial with statistically significant difference or no statistically significant difference but narrow confidence intervals  Systematic review of level I randomized controlled trials

 High-quality prospective study (all patients were enrolled at the same point in their disease with 90% follow-up of enrolled patients)  Systematic review of level I studies

 Testing of previously developed diagnostic criteria in series of consecutive patients (with universally applied reference gold standard)  Systematic review of level I studies

 Sensible costs and alternatives; values obtained from many studies; multiway sensitivity analyses  Systematic review of level I studies

Level II

 Lesser quality randomized controlled trial (eg, ,80% follow-up, no blinding, or improper randomization)  Prospective comparative study  Systematic review of level II studies or level I studies with inconsistent results

 Retrospective study  Untreated controls from a randomized controlled trial  Lesser quality prospective study (eg, patients enrolled at different points in their disease or ,80% follow-up)  Systematic review of level II studies

 Development of diagnostic criteria on the basis of consecutive patients (with universally applied reference gold standard)  Systematic review of level II studies

 Sensible costs and alternatives; values obtained from limited studies; multiway sensitivity analyses  Systematic review of level II studies

 Case–control study (patients identified for the study on the basis of their outcome are compared with those who did not have the outcome)

 Study of nonconsecutive patients (without consistently applied reference gold standard)  Systematic review of level III studies

 Analyses based on limited alternatives and costs; poor estimates  Systematic review of level III studies

Level III  Case–control study (patients identified for the study on the basis of their outcome are compared with those who did not have the outcome)  Retrospective comparative study  Systematic review of level III studies Level IV

 Case series

 Case series

 Case–control study (patients identified for the study on the basis of their outcome are compared with those who did not have the outcome)  Poor reference standard

 No sensitivity analyses

Level V

 Expert opinion

 Expert opinion

 Expert opinion

 Expert opinion

A. Rosemurgy et al. Framework for evaluating surgical literature

Table 2

Adopted from DeVries and Berlet.1

197

198

The American Journal of Surgery, Vol 210, No 1, July 2015

Is there a summary or conclusion, which clearly relates to the work they did and which clearly refers to the importance of their work? Does their conclusion denote the significance of their work? Are the authors critical enough of their work? For a Discussion, ‘‘long’’ is not the same as ‘‘good.’’ ‘‘Short’’ is often lacking important discussion. However, the Discussion does not need to be long to be good. Score: 0 to 10 points. A poor discussion gets 0 points.

Alexander Rosemurgy, M.D., F.A.C.S.* Sharona Ross, M.D., F.A.C.S. Carrie Ryan, M.S. Florida Hospital Tampa, 3000 Medical Park Drive, Tampa, FL 33613, USA

Summary  Don’t help the authors. You shouldn’t help the authors too much in reading their paper.  Be critical  Answer ‘‘Why was this paper published?’’ It should be easy to answer.  Is this paper important? Does anyone care? Will it stand up to scrutiny and time? Will other authors reference this paper?  Judge the manuscript as follows (Table 1): 80 to 100: Strong paper 60 to 79: OK 40 to 59: Caution ,40: A waste of paperda shame they cut down trees to publish it

Randall Harris, M.D., Ph.D. The Ohio State University, College of Public Helath

References 1. DeVries JG, Berlet GC. Understanding levels of evidence for scientific communication. Foot Ankle Spec 2010;3:205–9. 2. Hofmeyer A, Newton M, Scott C. Valuing the scholarship of integration and the scholarship of application in the academy for health sciences scholars: recommended methods. Health Res Policy Syst 2007; 5:1–8. 3. Atkins D, Best D, Briss P, et al. Grading quality of evidence and strength of recommendations. BMJ 2004;328:1490–4. 4. Juni P, Altman DG, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001;323: 42–6. 5. Viswanathan M, Ansari MT, Berkman ND, et al. Assessing the risk of bias of individual studies in systematic reviews of health care interventions. In: Methods Guide for Effectiveness and Comparative Effectiveness Reviews. Rockville, MD: Agency for Healthcare Research and Quality; 2008. p. 1–15.

©2015 Elsevier

Framework for a critical evaluation of surgical literature.

Framework for a critical evaluation of surgical literature. - PDF Download Free
137KB Sizes 0 Downloads 8 Views