XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Article

Fully non-parametric receiver operating characteristic curve estimation for random-effects meta-analysis

Statistical Methods in Medical Research 0(0) 1–16 ! The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0962280214537047 smm.sagepub.com

Pablo Martı´nez-Camblor

Abstract Meta-analyses, broadly defined as the quantitative review and synthesis of the results of related but independent comparable studies, allow to know the state of the art of one considered topic. Since the amount of available bibliography has enhanced in almost all fields and, specifically, in biomedical research, its popularity has drastically increased during the last decades. In particular, different methodologies have been developed in order to perform meta-analytic studies of diagnostic tests for both fixed- and randomeffects models. From a parametric point of view, these techniques often compute a bivariate estimation for the sensitivity and the specificity by using only one threshold per included study. Frequently, an overall receiver operating characteristic curve based on a bivariate normal distribution is also provided. In this work, the author deals with the problem of estimating an overall receiver operating characteristic curve from a fully non-parametric approach when the data come from a meta-analysis study i.e. only certain information about the diagnostic capacity is available. Both fixed- and random-effects models are considered. In addition, the proposed methodology lets to use the information of all cut-off points available (not only one of them) in the selected original studies. The performance of the method is explored through Monte Carlo simulations. The observed results suggest that the proposed estimator is better than the reference one when the reported information is related to a threshold based on the Youden index and when information for two or more points are provided. Real data illustrations are included. Keywords meta-analysis, random-effects, receiver operating characteristic curve, sensitivity, specificity

1 Introduction In statistics, the meta-analysis is understood as the technique for combining estimated results from multiple independent comparable studies. Originally devoted to summarize the conclusions of Oficina de Investigacio´n Biosanitaria de Asturies (OIB-FICYT) and Universidad de Oviedo, Oviedo, Spain Corresponding author: Pablo Martı´nez-Camblor, Oficina de Investigacio´n Biosanitaria de Asturies (OIB-FICYT), Calle Matema´tico Pedrayes, 25 Entresuelo, Oviedo 33005, Asturies, Spain. Email: [email protected]

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

2

Statistical Methods in Medical Research 0(0)

related clinical trials,1,17 nowadays meta-analysis applications extend beyond the clinical trials even the medicine and health areas, including observational studies and disciplines ‘from astronomy to zoology’.3 Systematic reviews and meta-analyses allow to know the state of the art, strengths and weakness, of a considered topic. In addition, it has been identified as one of the most cited types of research papers in medical science.4 There also exists a number of methodological papers which deal with different meta-analysis issues. Sutton and Higgins5 provided an ample review, with almost 300 references, of recent development in meta-analysis in medical research. In addition, several guidelines have been published, highlighting the two tutorials published in Statistics in Medicine6,7 for inference and multivariate methods, respectively. On the other hand, the receiver operating characteristic (ROC) curve is frequently used in order to report the sensitivity (or true-positive (TP) rate i.e. the ability of the test to classify diseased subjects as diseased) and 1-specificity (or false-positive (FP) rate i.e. the inability of the test to recognize normal subjects as normal) for all possible thresholds of a diagnostic test. Obviously, the employed method for a meta-analysis of diagnostic tests depends on the type of available data. The most common situation is that, for each reference selected in the systematic review, only one (or a few) estimated pair of sensitivity and specificity for one (or a few) particular threshold will be reported. For instance, Vouloumanou et al.,8 in a systematic review, selected a total of 16 works about the relationship of serum procalcitonin (PCT) and neonatal sepsis. Table 1 shows the selected thresholds in order to classify the children as positive (with neonatal sepsis) or negative (without

Table 1. First author surname (complete reference information is available on8) PCT threshold used, TP, FP, FN, TN and diagnostic OR with the respective 95% confidence interval. First author

PCT-threshold

TP

FP

FN

TN

OR (95% CI)

1. Cetinkaya 2. Jacquot 3. Bender 4. Boo 4.1 Booa 4.2 Booa 5. Santuz 6. Savagner 7. Bustos-Betanzo 8. Isidor 9. Lo´pez-Sastre 10. Pastor-Peidro´ 11. Vazzalwar 11.1 Vazzalwara 12. Ballot 13. Chiesa 14. Resch 14.1 Resha 15. Franz 16. Lapillone

0.5 0.6 5.75 2 0.5 10 1 0.8 1 0.5 0.55 2 0.5 1 0.5 1 2 6 0.5 5

92 30 20 16 16 13 11 11 38 38 43 7 35 26 51 15 34 32 26 16

0 15 31 24 41 17 22 1 5 8 41 21 3 0 60 6 11 2 39 65

31 0 9 2 2 5 8 3 12 7 14 0 1 10 14 4 7 9 20 3

40 28 63 45 28 52 108 25 17 123 107 95 12 15 58 109 16 25 77 66

238 112 4.35 12.2 4.53 7.36 6.52 55.8 9.80 74.6 7.77 66.6 84.5 78.2 3.43 58.0 6.60 34.9 2.54 4.79

a

This information was not considered in the original meta-analysis.8 PCT: procalcitonin; TP: true-positive; FP: false-positive; FN: false-negative; TN: true-negative; OR: odd ratio

Downloaded from smm.sagepub.com by guest on November 14, 2015

(14.2–3983) (6.41–1962) (1.80–10.5) (2.97–50.6) (1.10–18.6) (2.38–22.8) (2.41–17.7) (7.30–427) (3.10–30.9) (26.2–212) (3.88–15.5) (3.66–1211) (11.2–636) (4.28–1429) (1.73–6.81) (15.6–216) (2.22–19.6) (7.89–154) (1.27–5.07) (1.44–15.9)

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

3

neonatal sepsis), true-positives (TPs), false-positives (FPs), false-negatives (FNs), true-negatives (TNs) and the diagnostic odd ratios (ORs) with the respective 95% confidence interval (CI) (computed by using the R package mada). Several algorithms have been proposed in order to estimate the global sensitivity and specificity from a bivariate approach for both fixed-9 and random-effects models (see, for instance, Menke10 and references therein). In addition, different software and packages have been developed. The mada package, previously mentioned and used to compute the diagnostic OR in the above Table 1, can be taken as example. In the considered PCT meta-analysis,8 the package MIDAS, for the popular commercial statistical software Stata, was used to perform a hierarchical summary ROC curve (HSROC). The global reported sensitivity and specificity were 0.810 (0.740–0.870) and 0.790 (0.690–0.870), respectively. Figure 1 depicts, at left, the TP rate (sensitivity) and the FP rate (1-specificity) for each study and the computed summary ROC (SROC) curve. Global sensitivity and specificity with the elliptical 95% confidence region are also highlighted. However, in a non-exhaustive review of the 16 papers considered by Vouloumanou et al.,8 we have detected that almost three of them have additional information. In particular, Boo et al.11 contains information about three different thresholds, and the papers of Vazzalwar et al.12 and Resch et al.13 contain information on two different thresholds. Table 1 includes this information (text in grey). In addition, Figure 1, at right, depicts all these points with each linear interpolated ROC curve by taking into account that all (ROC) curves begin at (0, 0) and finish at (1, 1). In this work, in order to estimate the ROC curve when the available data are from a meta-analysis study, a fully non-parametric method is proposed. It is assumed that, for each included reference, only the number of TPs, FNs, TNs and FPs for one or a few thresholds are known, so only this information will be used. This is a common problem which usually has been addressed by assuming bivariate normality on the logit sensitivity and specificity transformations.14 Our approach is completely different. We focus on estimating the global ROC curve from weighted means of each individual interpolated ROC curve. These curves are built by simple linear interpolation between the

Figure 1. (left) Estimated HSROC curve for the Neonatal data (similar to the original paper). The points are the pairs of the true-positive and false-positive rates used in the original meta-analysis. (right) All available points (see Table 1). Black line stands for the fixed-effects model nPSROC (see Section 2).

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

4

Statistical Methods in Medical Research 0(0)

available points, although, of course, other methods could also be used with this goal. Note that, since the ROC curves always start at (0, 0) and finish at (1, 1), only one point must be added in order to interpolate a convex curve. The weights are computed for both the fixed- and the random-effects models. The outline of the paper is as follows. In Section 2, the non-parametric summary ROC (nPSROC) curve is introduced. The method estimates one individual ROC curve for each selected reference. Two different weighting schemes are proposed, one which only considers the within-study variability (fixed-effects model), and another one which takes into account the between studies variability (random-effects model). In Section 3, the performance of the proposed method is studied in different scenarios via Monte Carlo simulations. The results suggest that the proposed method obtains better estimations than the reference one when the studies report information about thresholds based on the Youden index (i.e. the threshold, T, is such that T ¼ arg maxt2R fsensitivityðtÞ þ specificityðtÞg) and when some of the included references report information about more than one threshold which, in practice, are the most frequent situations. In Section 4, a real meta-analysis study is considered and, finally, we present our conclusions.

2 nPSROC curve As it is well known, the ROC curve is a graphical method which displays the TP against the FP ratios for all possible thresholds. It is recognized as a useful and valuable tool for evaluating and comparing the diagnostic systems effectiveness which plays a central role in the performance evaluation of a classification rule, in particular, in medical diagnoses, early detection of cancer, gene expression of microarrays, proteomic patterns, etc. (Baker15). Of course, there exists a vast literature about it; the reader is referred to Martı´ nez-Camblor16 for a recent review. Let  and  be two independent random variables, representing the values of the considered diagnostic test for the negative (without the studied characteristic/normal) and the positive (with the characteristic/ diseased) subjects, respectively. Conventionally, it is assumed (without loss of generality) that larger values of the measure (marker) suggest larger confidence that a given subject is diseased. Therefore, for a fixed threshold (or cut-off point) t, the ROC curve is defined by   RðtÞ ¼ 1  F F1 ð 1  t Þ 8t 2 ½0, 1  where F and F denote the cumulative distribution functions (CDFs) for the random variables  and ^ , respectively. Its natural non-parametric estimator, Rð¼ Rn, m ), is the result of replacing the unknown CDFs by the empirical cumulative distribution functions (ECDFs), i.e. given samples ^ is of positives, Xn, and negatives, Ym, where n and m stand for the respective sample sizes, R defined by   ^ ðtÞ ¼ 1  F^n Xn , F^1 ðYm , 1  tÞ 8t 2 ½0, 1 R m

stands for the ECDF referred to the sample Xn and where F^n ðXn , Þ ^ F^1 ðY , Þ ¼ inffx : F ðY , xÞ  g denotes the empirical quantile function of Y . Given a number m m m m m of independent but related studies, S, the focus of this paper is to estimate the global (average) ROC curve ^ A ðtÞ ¼ R

S 1 X ^ j ðtÞ 8t 2 ½0, 1 wj ðtÞ  R WðtÞ j¼1

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

5

^ j ðÞ is the estimated ROC curve for the jth study, wj ðÞ is the weight of this study in the metawhere R P ^ A ðÞ may be analysis (1  j  S) and WðÞ ¼ Sj¼1 wj ðÞ. Since the involved studies are independent, R ^ a non-monotone function. By defining sRA ð0Þ ¼ 0, we can introduce the following slight modification ( ) ^ A ðzÞ, ^ A ðtÞ ¼ max sup sR sR

^ A ðtÞ R

8t 2 ½0, 1

z2½0, tÞ

Unfortunately, in the meta-analytic context the whole curve is usually unknown. In most of the case, only the information referred to one (or a few) threshold is reported. In this work, we propose approximating the ROC curve from a simple and direct linear interpolation, i.e. let R^ j, 1 , . . . , R^ j, lj be the provided ROC curve values at points tj, 1 , . . . , tj, lj in the jth study (1  j  S); for this study, the proposed ROC curve approximation will be ^ ^ ^ ^ ^ L, j ðtÞ ¼ Rj, i  Rj, i1  t þ Rj, i1  tj, i  Rj, i1  tj, i1 R tj, i  tj, i1 tj, i  tj, i1

  t 2 tj, i1 , tj, i and

i 2 1, . . . , lj þ 1

where ðtj, 0 , R^ j, 0 Þ ¼ ð0, 0Þ and ðtj, lj þ1 , R^ j, lj þ1 Þ ¼ ð1, 1Þ for j 2 1, . . . , S. Of course, other functional approximations with the appropriate properties would also be valid at this point. The estimator of the average ROC curve is straightforward ( ) ^ A, L ðzÞ, R ^ A, L ðtÞ ^ A, L ðtÞ ¼ max sup sR 8t 2 ½0, 1 ð1Þ sR z2½0, tÞ

P ^ A, L ðÞ ¼ W1 ðÞ  S wj ðÞ  R ^ L, j ðÞ. The values of the weights depend on ^ A, L ð0Þ ¼ 0 and R where sR j¼1 the type of model selected. The fixed-effects model assume that there is one true effect size (in the current context, one fixed theoretical ROC curve) which underlies to all the studies in the analysis; therefore, the observed differences are due, exclusively, to the sampling error, i.e. the unique variability taken into account is the within-study one. However, the studies may differ in terms of patients characteristic, employed methodologies, etc. In the univariate meta-analyses, the DerSimonian and Laird2 random-effects model is frequently used in order to deal with both within-study and inter-study variability. In the following subsections, we develop methods for the two different approaches.

2.1

The fixed-effects model

In the fixed-effects model, the weights are usually taken to be wj ¼ v2j where v2j is the variability in the jth study (1  j  S). In the present context, we take them depending on t and then h i1 ^ j ðtÞ . Hsieh and Turnbull18 proved that, if (i) both F and F have continuous wj ðtÞ ¼ V R 1 densities, f and f, respectively, (ii) f ðF1  ðtÞÞ=f ðF ðtÞÞ is bounded in every subinterval (a, b) of (0, 1) and (iii) n=m !  4 0 when minðn, mÞ ! 1, then there exists a probability space on which one can define two sequences of independent versions of Brownian bridges, fBðmÞ 1 ðtÞgf0t1g and

fBðnÞ 2 ðtÞgf0t1g such that ðm Þ o pffiffiffi n ^ ðtÞ  RðtÞ ¼ 1=2  @RðtÞ ð1  tÞ þ BðnÞ ð1  RðtÞÞ þ oð1Þ a:s: n R 2 @t 1

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

6

Statistical Methods in Medical Research 0(0)

^ L, j ðtÞ can be uniformly on [a, b]. Hence, for each t 2 ½0, 1, for j 2 1, . . . , S, the variance of R approximated by h i 1 @RðtÞ2 1 ^ V Rj ðtÞ ¼  ð1  tÞ  t þ  ½1  RðtÞ  RðtÞ m @t n Of course, both the R function and its derivative are unknown and, as usual, they can (and must) ^ L, j be replaced by sampling estimations. The plug-in method is direct in order to replace R by R (1  j  S). However, this replacement is not so easy for R0 (first derivative of R). On one hand, for ^ L, j is a not smooth function on [0, 1]. Therefore, R0 is not defined at each each j 2 1, . . . , S, R t 2 ½0, 1 (this concern could be easily avoided by using most sophisticated interpolation

functions), particularly, R0 is not defined at the provided points tj,i 1  i  lj . And mainly, on ^ L, j ðsÞ ¼ 1, the variability will be zero for t  s (in practice, this is a really the other hand, if R common problem shared for practically all the previously existing algorithms) and the individual weighting should be infinitum (for instance, in the previous considered meta-analysis of the Neonatal data, in the studies labelled as 2. Jacquot and 10. Pastor-Peidro´, see Table 1). Hence, weh propose to make R0 ¼ 1 (by assuming the worst situation, RðtÞ ¼ t) in order to approximate i ^ j ðtÞ ð1  j  SÞ by V R v^2j ðtÞ ¼

i 1 1 h ^ L, j ðtÞ  R ^ L, j ðtÞ 8t 2 ½0, 1  ð1  tÞ  t þ  1  R m n

ð2Þ

^ L, A ðÞ, is the result of taking the above weights in The proposed overall ROC curve estimator, sR ^ L, A ðÞ can be approximated by equation (1). In addition, the standard error of R h i ^ L, A ðÞ ¼ WðtÞ1=2 . The black line in the right panel of the Figure 1 depicts this estimation ^ R se for the Neonatal data.

2.2

The random-effects model

In the random-effects model, the total variability, used for computing the weights, is the result of summing the within-study and the inter-study variability. This procedure takes into account the different sources of heterogeneity and assumes that the effect size has a random behaviour. DerSimonian and Kacker17 revised some different procedures to compute the inter-study variability. The usual general method of moments applied to the estimation of the considered inter-study variability,  2(t), leads to the following estimator 2 ^M ðtÞ ¼

n o2 1 XS ^ L, j ðtÞ  R ^ L, A ðtÞ  w ð t Þ  R j j¼1 WðtÞ

ð3Þ

which is based on the DerSimonian and Laird estimator. Therefore, the weights for 2 ðtÞ1 for j 2 1, . . . , S, where v^2j ðÞ was the random-effects model are wj ðtÞ ¼ ½v^2j ðtÞ þ ^M

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

7

defined in the equation (2). For each t 2 ð0, 1Þ, the respective standard error can be approximated by " # S S h i2 h i X X 1 1   2 ^ L, j ðtÞ ¼ ^ L, j ðtÞ ^ L, A ðÞ ¼ V ^ R   wj ðtÞ  R w ð t Þ  V R se j WðtÞ j¼1 W2 ðtÞ j¼1 ( ) S S S X X X 1 1  2 2 2 2 ¼ 2  w ðtÞ  v^j ðtÞ ¼ 2  wj ðtÞ  ^ ðtÞ  wj ðtÞ W ðtÞ j¼1 j W ðtÞ j¼1 j¼1 ( ) S X 1 1 2 2  1  ^ ðtÞ  ¼ wj ðtÞ WðtÞ WðtÞ j¼1 h i1 P 2 where, in this case, wj ðtÞ ¼ v^2j ðtÞ þ ^M ðtÞ ð1  j  SÞ and WðtÞ ¼ Sj¼1 wj ðtÞ. Different methods for estimating the inter-study variability have been proposed; the interested reader is referred to DerSimonian and Kacker17 for a recent review and some new algorithms. The left panel in Figure 2 depicts the fixed-effects nPSROC curve estimation with a 95% confidence band for the Neonatal data (Table 1). The estimated 95% CIs for the TP and the FP rates (which were 0.695 and 0.200, respectively) for the threshold based on the Youden index were (0.646–0.733) and (0.166–0.264), respectively. The right panel is similar to the left panel for the random-effects model. The 95% CIs for the TP and the FP rates (their optimal estimated values, in the Youden index sense, were 0.622 and 0.192, respectively) were (0.638–0.726) and (0.157–0.248), respectively. Note that the specificities computed are similar to each other and similar to the one reported in the original paper while the sensitivities are a little worse than the one reported by Vouloumanou et al.8 Difference between both fixed- and random-effects ROC curves is almost negligible. The ^2 ðÞ values are close to zero for all t 2 ð0, 1Þ, specially, for the largest t-values.

Figure 2. At left, estimated nPSROC curve and a 95% confidence band for the Neonatal data when a fixed-effects is considered. Similar to the left panel, at right, the nPSROC for random-effects model.

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

8

Statistical Methods in Medical Research 0(0)

3 Simulation study In order to explore the behaviour of the proposed methods, a Monte Carlo simulation study has been carried out. We considered three different scenarios for the (bio)marker distribution: normal, heavy-tailed and skewed. Different number of studies, S, have been considered, in particular S ¼ 10, 20 and 30. The number of positive and negative subjects per study has been drawn from a Poisson distribution with parameters P : 100, 30, 30, 20, 20, 15, 50, 45, 55 and 10, for the positives and N : 40, 45, 100, 70, 130, 25, 20, 130, 150 and 120, for the negatives (they are based on the sample sizes observed in the Neonatal data, see Table 1). One, two and three studies per sample size were drawn in order to obtain S ¼ 10, 20 and 30, respectively. Two different criteria were considered to obtain the reported thresholds. When all reported estimations were computed by using the Youden criterion (those points that optimize the sum of the TP and TN rates) and when they were selected completely at random. In addition, for the nPSROC curve, the situation where these two points per study were reported was also considered.

3.1

Normally distributed (bio)marker

In this case, the considered marker is normally distributed in both the positive and negative populations. In particular, in the jth study ð1  j  SÞ, it follows a NðP, j , P, j Þ distribution in the positive subjects and a NðN, j , N, j Þ in the negative subjects, where P, j , P, j , N, j and sN, j are also 2 random variables following the distributions P, j  Nð, Þ, N, j  Nð0, Þ, logðP, j Þ  Nð , ’Þ 2 and logðN, j Þ  Nð0, ’Þ. Table 2 depicts the observed means and the standard deviations (mean  SD) in 10,000 iterations for 100 times the integrated square error between the real ROC Rcurve, RðÞ, and  2 the approximated ^ ^ ðtÞ  RðtÞ dt. The proposed one, RðÞ, i.e. the mean and the SDs of the expression 100  R random-effects nPSROC curve estimator was compared with the bivariate random-effects model proposed by Riley et al.19 The function Riley from the R package metamisc (freely available in the CRAN) was used to this goal. This model achieves similar results to the ones proposed by Rutter and Gatsonis20 and Reitsma et al.21 The considered simulations reflected two different ROC curve shapes, both with an area under the curve (AUC) of 0.7. Figure 3 depicts the real ROC curves (black lines) and the points obtained in one of the simulated iterations, in particular, for the fixed-model case ( ¼ 0 and ’ ¼ 0) and S ¼ 10. In the first considered situation (Figure 1, left), where the variances are equal for both the positive and the negative populations ( ¼ 1), in general, the random-effects nPSROC obtained better results than the Riley method when the provided information was referred to the Youden index threshold (Youden case) but the method failed when the reported information was referred to a randomly selected threshold, Random case, (note that this situation is really unusual in practice). In addition, the nPSROC method takes advantage of all available data and when information about the two cutoff points was provided (Both case), it achieved the best results. Although the quality of the obtained estimations changed with the shape of the real ROC curve and with the variability of the selected parameters (the cases  ¼ 0 and ’ ¼ 0 stand for a fixed-effect model), the pattern of result was always the same: nPSROC was clearly better than Riley in the Youden case and clearly worse in the Random case, and nPSROC based on both points achieved the best results. In addition, Riley performed better when it was based on randomly selected thresholds and nPSROC performed better when it was based on Youden index-based thresholds. The observed results were similar but worse when the variances were different (Figure 3, right). In this case, the nPSROC method

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

9

based on both the randomly and on the Youden index-based cut-off points was, again, the clear winner. The results obtained by the Riley method when the provided cut-off points were selected at random (quite unusual in practice, as mentioned) were better than the results obtained by the nPSROC based on the Youden index and quite similar to the nPSROC in the Both case. Again, the nPSROC method was better when the information was related to cut-off points based on the Youden index; these results were slightly improved when information about both criteria was available and the Riley method is better in the Random scheme.

R b  RðtÞÞ2 dt in 10,000 Monte Carlo Table 2. Observed means and standard deviations (mean  SD) for 100  ðRðtÞ iterations. The proposed nPSROC curve estimation in three different scenarios and the Riley method have been b RðÞ stands for the real ROC curve where the data were drawn. The marker is considered in order to estimate RðÞ, normally distributed in both the positive and the negative subjects. Youden

Random

Both

S

h

w

D

u

nPSROC

Riley

nPSROC

Riley

10

0.74

0.00

10

1.17

logð4Þ

20

0.74

0.00

20

1.17

logð4Þ

30

0.74

0.00

30

1.17

logð4Þ

0.00 0.00 0.10 0.10 0.20 0.00 0.00 0.10 0.10 0.20 0.00 0.00 0.10 0.10 0.20 0.00 0.00 0.10 0.10 0.20 0.00 0.00 0.10 0.10 0.20 0.00 0.00 0.10 0.10 0.20

0.00 0.10 0.00 0.10 0.10 0.00 0.10 0.00 0.10 0.10 0.00 0.10 0.00 0.10 0.10 0.00 0.10 0.00 0.10 0.10 0.00 0.10 0.00 0.10 0.10 0.00 0.10 0.00 0.10 0.10

0.137  0.07 0.137  0.07 0.149  0.08 0.149  0.08 0.191  0.14 0.428  0.13 0.428  0.14 0.432  0.14 0.433  0.14 0.442  0.15 0.110  0.04 0.111  0.04 0.115  0.05 0.114  0.05 0.136  0.08 0.415  0.09 0.418  0.09 0.415  0.09 0.418  0.09 0.423  0.10 0.102  0.03 0.099  0.03 0.105  0.03 0.103  0.03 0.118  0.06 0.410  0.07 0.413  0.07 0.413  0.07 0.415  0.07 0.418  0.08

1.304  1.18 1.278  1.10 1.432  1.21 1.448  1.21 2.023  1.48 1.506  1.63 1.544  1.64 1.506  1.62 1.502  1.61 1.469  1.55 0.964  0.51 1.027  0.52 1.089  0.57 1.096  0.56 1.498  0.69 0.908  1.20 0.929  1.21 0.898  1.14 0.911  1.16 0.902  1.07 0.892  0.40 0.897  0.39 1.025  0.47 1.024  0.46 1.472  0.62 0.687  0.92 0.666  0.89 0.681  0.87 0.651  0.81 0.719  0.82

1.252  0.49 1.243  0.49 1.254  0.51 1.260  0.51 1.299  0.58 2.474  0.79 2.464  0.78 2.457  0.78 2.474  0.78 2.466  0.80 1.217  0.34 1.228  0.36 1.232  0.36 1.227  0.36 1.250  0.42 2.445  0.56 2.432  0.54 2.423  0.55 2.439  0.55 2.436  0.57 1.204  0.28 1.210  0.28 1.211  0.29 1.212  0.30 1.231  0.34 2.428  0.44 2.429  0.44 2.419  0.45 2.426  0.45 2.423  0.46

0.253  0.53 0.212  0.26 0.236  0.29 0.245  0.30 0.326  0.42 0.450  0.43 0.450  0.41 0.445  0.40 0.452  0.41 0.463  0.44 0.135  0.12 0.145  0.14 0.152  0.14 0.152  0.14 0.204  0.18 0.364  0.25 0.358  0.24 0.358  0.25 0.358  0.25 0.360  0.26 0.117  0.10 0.119  0.11 0.128  0.11 0.129  0.11 0.169  0.14 0.337  0.19 0.338  0.20 0.333  0.19 0.332  0.19 0.328  0.20

nPSROC: non-parametric summary receiver operating characteristic

Downloaded from smm.sagepub.com by guest on November 14, 2015

0.125  0.07 0.125  0.07 0.138  0.09 0.140  0.09 0.183  0.15 0.331  0.12 0.332  0.12 0.335  0.12 0.337  0.13 0.349  0.14 0.097  0.04 0.099  0.05 0.103  0.05 0.103  0.05 0.125  0.08 0.311  0.08 0.313  0.08 0.311  0.08 0.314  0.08 0.322  0.09 0.087  0.03 0.086  0.03 0.091  0.03 0.090  0.03 0.106  0.06 0.304  0.06 0.306  0.06 0.306  0.06 0.308  0.06 0.312  0.07

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

10

Statistical Methods in Medical Research 0(0) (a)

(b)

Figure 3. (left) Real ROC curve when y ¼ 0.74 and ¼ 0. Points were simulated from a fixed-effects model and selected by the Youden index criterion (black) and at random (grey). (right) Real ROC curve when y ¼ 1.17 and ¼ logð4Þ. Points are simulated from a fixed-effects model and selected by the Youden index criterion (black) and at random (grey).

3.2

Heavy-tailed distributed (bio)marker

In the second case, the considered marker follows a heavy-tailed distribution in both the positive and the negative populations. In particular, in the jth study ð1  j  SÞ, the distributions were t1, P,j ðtf, c stands for a Student T distribution with f degree of freedom and where c is the non-centrality parameter) and t1, N,j for the positive and the negative subjects, respectively. The parameters P,j and N,j are random variables following the pffiffiffi distributions P, j  Nð , pffiffiffiÞ and N, j  Nð0, Þ. Two different values for were explored: ¼ 2 (AUC ¼ 0.8) and ¼ 7 (AUC ¼ 0.9). Figure 4, left, depicts these ROC curves. Table 3 is similar to Table 2 for the above scenario. Observed results are similar to the previous ones. nPSROC was better than Riley when the reported thresholds were the ones referred to the Youden index and it was clearly worse when they were randomly selected. The best results were achieved when both points were available (only the nPSROC method). In this case, the results obtained by the nPSROC method based on the Youden index were better than the ones obtained by the Riley method based on randomly selected thresholds.

3.3

Skewed distributed (bio)marker

Finally, in the third considered case, the marker follows a skewed distribution in both the positive and the negative subjects. The distribution of the negative subjects in the jth study ð1  j  SÞ is the right-skewed distribution expfj1 þ N, j jg (we denoted by expfg the exponential distribution with parameter ), while the positive subjects follow a expfjP, j jg. The parameters P,j and N,j are random variables following the distributions P, j  Nð, Þ and N, j  Nð0, Þ. Two different values for  were explored:  ¼ 0.43 (AUC ¼ 0.7) and  ¼ 0.25 (AUC ¼ 0.8). Figure 4, right, depicts these ROC curves. Table 4 for the right-skewed marker distribution is similar to the previous Tables 2 and 3. The observed results confirm the previous conclusions; nPSROC worked better than Riley when the

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

11

Figure 4. Real ROC curve when the marker distribution is heavy-tailed (left) and right-skewed (right).

R b  RðtÞÞ2 dt in 10,000 Monte Carlo Table 3. Observed means and standard deviations (mean  SD) for 100  ðRðtÞ iterations. The proposed nPSROC curve estimation in three different scenarios and the Riley method have been b RðÞ stands for the real ROC curve where the data were drawn. The marker is considered in order to estimate RðÞ, heavy-tailed distributed in both the positive and the negative subjects. Youden

10

d pffiffiffi 2

10

pffiffiffi 7

20

pffiffiffi 2

20

pffiffiffi 7

30

pffiffiffi 2

30

pffiffiffi 7

S

Random

Both

D

nPSROC

Riley

nPSROC

Riley

0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20 0.00 0.10 0.20

0.127  0.09 0.137  0.09 0.161  0.13 0.218  0.14 0.219  0.14 0.226  0.16 0.108  0.06 0.110  0.06 0.119  0.08 0.188  0.09 0.186  0.10 0.181  0.11 0.097  0.04 0.103  0.05 0.107  0.06 0.177  0.07 0.173  0.08 0.162  0.08

2.904  1.61 3.241  1.58 3.621  1.68 4.188  3.19 4.201  3.08 4.513  2.97 2.746  1.56 3.026  1.51 3.720  1.54 3.280  1.30 3.437  1.23 3.692  0.94 2.821  1.18 2.822  1.19 3.604  1.27 3.169  1.04 3.303  0.89 3.686  0.67

2.908  1.13 2.869  1.13 2.955  1.21 5.334  2.12 5.308  2.10 5.438  2.18 2.816  0.81 2.824  0.80 2.839  0.85 5.095  1.47 5.112  1.46 5.111  1.52 2.711  0.66 2.772  0.66 2.776  0.69 4.940  1.19 4.959  1.20 4.877  1.24

0.508  0.46 0.650  0.53 0.737  0.89 1.301  2.71 1.310  2.69 1.308  2.58 0.423  0.24 0.438  0.23 0.491  0.28 0.679  0.27 0.682  0.27 0.706  0.28 0.396  0.17 0.409  0.18 0.451  0.19 0.641  0.20 0.645  0.20 0.664  0.19

nPSROC: non-parametric summary receiver operating characteristic

Downloaded from smm.sagepub.com by guest on November 14, 2015

0.119  0.09 0.129  0.10 0.157  0.13 0.179  0.12 0.182  0.13 0.195  0.15 0.091  0.05 0.095  0.06 0.106  0.08 0.148  0.08 0.148  0.09 0.149  0.10 0.078  0.04 0.083  0.05 0.090  0.06 0.138  0.07 0.136  0.07 0.131  0.07

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

12

Statistical Methods in Medical Research 0(0)

R b  RðtÞÞ2 dt in 10,000 Monte Carlo Table 4. Observed means and standard deviations (mean  SD) for 100  ðRðtÞ iterations. The proposed nPSROC curve estimation in three different scenarios and the Riley method have been b RðÞ stands for the real ROC curve where the data were drawn. The marker is considered in order to estimate RðÞ, right-skewed distributed in both the positive and the negative subjects. Youden

Random

Both

S

k

D

nPSROC

Riley

nPSROC

Riley

10

0.43

10

0.25

20

0.43

20

0.25

30

0.43

30

0.25

0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10 0.00 0.05 0.10

0.210  0.10 0.212  0.10 0.232  0.13 0.478  0.17 0.467  0.18 0.428  0.23 0.184  0.06 0.185  0.06 0.186  0.07 0.474  0.12 0.447  0.13 0.381  0.16 0.179  0.05 0.177  0.05 0.178  0.05 0.477  0.10 0.451  0.11 0.363  0.13

1.093  1.42 1.181  1.50 1.583  1.79 1.178  1.59 1.343  1.73 2.590  2.83 0.661  0.63 0.727  0.69 1.070  0.97 0.647  0.98 0.754  0.93 1.999  1.75 0.537  0.35 0.538  0.34 0.613  0.43 0.459  0.66 0.611  0.66 1.881  1.43

1.496  0.54 1.511  0.55 1.460  0.58 3.536  1.11 3.527  1.17 3.437  1.26 1.463  0.37 1.477  0.38 1.441  0.42 3.534  0.80 3.467  0.82 3.345  0.91 1.467  0.31 1.464  0.31 1.463  0.31 3.494  0.65 3.460  0.66 3.301  0.76

0.210  0.29 0.216  0.26 0.240  0.28 0.365  0.37 0.370  0.38 0.397  0.42 0.132  0.12 0.134  0.13 0.141  0.14 0.307  0.22 0.296  0.23 0.277  0.24 0.117  0.12 0.115  0.15 0.115  0.11 0.287  0.17 0.272  0.18 0.238  0.19

0.186  0.10 0.190  0.10 0.210  0.12 0.415  0.16 0.409  0.18 0.386  0.22 0.159  0.05 0.161  0.06 0.164  0.07 0.397  0.11 0.376  0.12 0.329  0.14 0.151  0.04 0.150  0.04 0.151  0.05 0.392  0.09 0.372  0.10 0.311  0.11

nPSROC: non-parametric summary receiver operating characteristic

available thresholds were based on the Youden criterion and worse when they were randomly selected. The approximation based on both points improved the results based on (only) the Youden index. In this setting, the Riley estimator based on randomly selected points performed really well. It obtained the best results (smaller mean square error) in most of the cases: practically always for P,j ¼ 0.25, and also for P,j ¼ 0.43 and S  20. However, its observed variability was larger than the variability observed for the nPSROC procedure.

4 Interleukin 6 in the early-onset diagnostic of neonatal sepsis In order to illustrate the practical behaviour of the proposed method, we present the work which motivated this research. It consists in a meta-analysis where the diagnostic value of the Interleukin 6 (IL6) for the early detection of neonatal sepsis is explored. Therefore, the focus is to know the capacity of the IL6 levels for discriminating neonatal children with sepsis and neonatal children, which have some pathology, without sepsis. Once the corresponding systematic review was performed (see, Costa-Romero22), a total of nine papers were finally selected. Table 5 shows the same information as Table 1 for the IL data. In bold, we remark the cut-off points, among the provided in the original papers, which are optimal in the Youden index sense. Curiously, thresholds, sensitivities and specificities reported by the papers labelled as 1. Abdollahi et al.23 and as 7. Resch et al.13 are exactly the same although the total number of subjects is different.

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

13

Table 5. First author surname (complete reference information are available as supplementary material) IL6 threshold (in pg/ml) used, TP, FP, FN, TN and diagnostic OR with the respective 95% confidence interval. First author

IL6-threshold

TP

FP

FN

TN

OR (95% CI)

1. 1. 1. 2. 2. 3. 4. 4. 5. 6. 6. 7. 7. 7. 8. 8. 8. 8. 9.

10 60 150 12 250 200 20 50 160 20 400 10 60 150 17.8 50 55 75 32

35 26 23 20 16 14 19 15 12 19 8 29 22 19 12 12 11 10 59

5 0 0 27 5 13 28 24 6 51 10 9 0 0 11 6 5 0 29

14 23 26 8 12 5 5 9 0 5 16 12 19 22 0 0 1 2 7

11 16 16 65 80 102 70 74 14 91 132 18 27 27 11 16 17 22 22

5.12 37.2 29.3 5.74 19.3 20.0 8.77 4.96 55.8 6.30 6.50 4.60 63.5 47.7 25.0 63.5 24.4 189 6.05

Abdollahi Abdollahi Abdollahi Bender Bender Chiesa Doellner Doellner Martin Doellner Doellner Resch Resch Resch Rite-Gracia Rite-Gracia Rite-Gracia Rite-Gracia Silveira

(1.56–16.8) (2.11–655) (1.66–515) (2.30–14.3) (6.21–60.1) (6.44–62.2) (3.09–24.8) (1.96–12.5) (2.85–1091) (2.30–17.2) (2.30–18.4) (1.65–12.8) (3.63–1110) (2.72–834) (1.32–474) (3.26–1235) (3.46–171) (8.32–4295) (2.37–15.4)

IL6: Interleukin 6; TP: true-positive; FP: false-positive; FN: false-negative; TN: true-negative; OR: odd ratio

Figure 5. (left) ROC curves estimations by using the Riley et al.19 (black) and the Reitsma et al.21 (gray) randomeffects models for the IL6 data. (right) random-effects nPSROC for the same dataset.

Conventionally, in order to apply some of the meta-analytic existing models, one of the provided points is selected, in most of the cases, such point is the best one in the Youden sense, i.e. the one maximizes the sum of the sensitivity and the specificity (in bold in the previous table). By using such a criterion, the random-effects Riley method19 provided a global sensitivity and specificity of 0.723

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

14

Statistical Methods in Medical Research 0(0)

(0.613–0.811) and 0.853 (0.694–0.937), respectively. The Reitsma algorithm,21 which is equivalent to the HSROC proposed by Rutter and Gatsonis20 (see Harbord et al.24), provided similar global sensitivity and specificity. In particular, 0.719 (0.593–0.819) and 0.841 (0.658–0.938), respectively (in this case, the used R package, mada, did not report CIs which were computed via resampling). Figure 5, left, depicts the curves obtained by these two methods. The observed AUCs were 0.838 and 0.840 for the Riley and Reitsma random-effects models, respectively. The proposed random-effects nPSROC curve provided a whole ROC curve estimation. The optimal sensitivity and specificity (in the Youden index sense) were 0.737 (0.668–0.806) and 0.690 (0.580–0.730), respectively. Figure 5, right, depicts the random-effects nPSROC curve. The observed AUC was 0.766.

5 Main conclusions Meta-analysis allows to combine estimation effects from different independent comparable studies. Due to the increase in the available scientific literature and in spite of its detractors, these analyses have become increasingly popular in all areas but, specially, in medical research where, sometimes, the information derived from different studies is inconclusive or even inconsistent. In particular, the study of the diagnostic tests is a relevant topic in bio-medicine research. In this context, the ROC curve plays a central role.15 Different methods to compute SROC curves have been introduced as a way to assess the diagnostic accuracy in meta-analyses.25–28 Of course, the value of each ROC curve point is not reported in the research papers and most meta-analyses about the diagnostic test accuracy report their results based on solely one pair {sensitivity, specificity} per study. This bivariate method is also used in order to estimate SROC curves for random-effects models.10,14,19–21 However, many papers report information about two or more thresholds. This information is frequently discarded for the meta-analysis. The author has only found one paper which dealt with this problem. In particular, Hamza et al.29 generalized the usual bivariate randomeffects model to the situations where there exists information about k different (k must be a fixed value) thresholds per study. In addition, in the Hamza et al. paper, the authors provided a code for the commercial software SAS. Our focus is the direct ROC curve estimation; from this approach, all the information reported in the original sources included in the systematic review is used. It is important to highlight that the developed method, the nPSROC curve, does not need a fixed number of thresholds per study, it takes what exists. Both fixed- and random-effects models were explored. The proposed inter-study variability estimator is simple and easy to compute; with the same philosophy, other estimators for this variability could be implemented (see, for instance, DerSimonian and Laird2). Empirically, we have observed that, the differences between the proposed fixed- and random-effects models are, in general, quite small. However, the obtained fixed-effects curves are often smoother than the random-effects ones (see Figure 2). The simulation results suggest that the nPSROC algorithm obtains better estimations than the Riley method in the most frequent practical case, when the information reported in the original papers is related to the threshold which leads to the Youden index. It obtained worse results when the provided thresholds were randomly selected (note that this is not a realistic assumption). The best results were always when the two (Youden index based and randomly selected) available points were used for the estimation. Moreover, in general, the observed variability in the computed errors is less for the nPSROC estimations than for the Riley ones. In this paper, the nPSROC estimator has been applied to two different real meta-analyses. In both cases, the nPSROC estimation was more pessimistic than the one computed for the usual bivariate methods (HSROC, Riley and Reitsma algorithms were considered as reference). Note that the bivariate methods do not take into account that the available information is often

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:41pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

Martı´nez-Camblor

15

selected to be the best possible one (in the simulation study, the Riley method obtained better results when the reported thresholds were randomly chosen) and, therefore, in practice, the computed SROC usually overestimated the quality of the diagnostic procedure. Both the AUC and the optimal sensitivity/specificity values are often overestimated. Finally, we want to remark some additional advantages of considering a whole ROC curve estimation. When different ROC curves are involved in the meta-analysis, and therefore different ROC curves must be estimated, the general Bootstrap Algorithm (gBA) proposed by Martı´ nezCamblor and Corral30 can be used in order to compare them in both unpaired31 and paired32 cases, in a type of meta-regression generalization. Further researches are needed to adapt these procedures to the meta-analysis context. Acknowledgement The author is grateful to Marta Costa-Romero for providing the IL6 data which motivated this research. He is also grateful to Susana Dı´ az-Coto, Cristina Me´ndez-Quintana and Jacobo de Un˜a-A´lvarez for the manuscript revision. I would also like to thank the reviewers for their comments and suggestions which have improved this manuscript.

Funding This work was supported by the Grant MTM2011-23204 of the Spanish Ministry of Science and Innovation (FEDER support included).

Conflict of interest None declared.

References 1. Pearson K. Report on certain enteric fever inoculation statistics. Br Med J 1904; 3: 1243–1246. 2. DerSimonian R and Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986; 7: 177–187. 3. Petticrew M. Systematic reviews from astronomy to zoology: myths and misconceptions. Br Med J 2001; 322: 98–101. 4. Patsopoulus NA, Analatos AA and Ioannidis JP. Relative citation impact of various study designs in the health sciences. J Am Med Assoc 2005; 18: 2362–2366. 5. Sutton AJ and Higgins JPT. Recent developments in metaanalysis. Stat Med 2008; 27: 625–650. 6. Normand SLT. Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med 1999; 18: 321–359. 7. Van Houwelingen HC, Arends LR and Stijnen T. Advanced methods in meta-analysis: multivariate approach and metaregression. Stat Med 2002; 21: 589–624. 8. Vouloumanou EK, Plessa E, Karageorgopoulos DE, et al. Serum procalcitonin as a diagnostic marker for neonatal sepsis: a systematic review and meta-analysis. Intensive Care Med 2011; 37: 747–762. 9. Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In: Egger M, Davey Smith G and Altman DG (eds) Systematic reviews in health care: metaanalysis in context. London: BMJ Publishing Group, 2001, pp.157–162.

10. Menke J. Bivariate random-effects meta-analysis of sensitivity and specificity with SAS PROC GLIMMIX. Methods Inf Med 2010; 1: 54–64. 11. Boo NY, Nor Azlina AA and Rohana J. Usefulness of a semi-quantitative procalcitonin test kit for early diagnosis of neonatal sepsis. Singapore Med J 2008; 49: 204–208. 12. Vazzalwar R, Pina-Rodrigues E, Puppala BL, et al. Procalcitonin as a screening test for late-onset sepsis in preterm very low birth weight infants. J Perinatol 2005; 25: 397–402. 00 13. Resch B, Gusenleitner W and Muller WD. Procalcitonin and interleukin-6 in the diagnosis of early-onset sepsis of the neonate. Acta Paediatrica 2003; 92: 243–245. 14. Hanza TY, Reitsma JB and Stijnen T. Meta-analysis of diagnostic studies: a comparison of random intercept, normal-normal, and binormal-normal bivariate summary ROC approaches. Med Decis Making 2008; 28: 639–649. 15. Baker SG. The central role of Receiver Operating Characteristic (ROC) curves in evaluating tests for the early detection of cancer. J Natl Cancer Inst 2003; 95: 511–515. 16. Martı´ nez-Camblor P. Area under the ROC curve comparison in the presence of missing data. J Korean Stat Soc 2013; 42: 431–442.

Downloaded from smm.sagepub.com by guest on November 14, 2015

XML Template (2014) [22.5.2014–5:42pm] [1–16] //blrnas3/cenpro/ApplicationFiles/Journals/SAGE/3B2/SMMJ/Vol00000/140062/APPFile/SG-SMMJ140062.3d (SMM) [PREPRINTER stage]

16

Statistical Methods in Medical Research 0(0)

17. DerSimonian R and Kacker R. Random-effects model for meta-analysis of clinical trials: an update. Contemp Clin Trials 2007; 28: 105–114. 18. Hsieh F and Turnbull BW. Nonparametric and semiparametric estimation of the receiver operating characteristic curve. Ann Stat 1996; 24: 25–50. 19. Riley RD, Thompson JR and Abrams KR. An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics 2008; 9: 172–186. 20. Rutter C and Gatsonis C. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med 2001; 20: 2865–2884. 21. Reitsma J, Glas A, Rutjes A, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005; 58: 982–990. 22. Costa-Romero M. Interleukina 6 en el diagnstico precoz de sepsis neonatal de transmision vertical. Technical report, Hospital Universitario Central de Asturias, 2012. 23. Abdollahi A, Shoar S, Nayyeri F, et al. Diagnostic value of simultaneous measurements of procalcitonin, interleukin 6 and HS-CRP in prediction of early-onset neonatal sepsis. Mediterr J Hematol Infect Dis 2012; 4: e2012028. 24. Harbord R, Deeks J, Egger M, et al. A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 2007; 8: 239–251.

25. Littenberg B and Moses LE. Estimating diagnosticaccuracy from multiple conicting reports – a new metaanalytic method. Med Decis Making 1993; 13: 313–321. 26. Moses LE, Shapiro D and Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993; 12: 1293–1316. 27. Irwing L, Tosteson ANA, Gatsonis C, et al. Guiedlines for meta-analyses evaluating diagnostic tests. Ann Intern Med 2001; 20: 667–676. 28. Midgette AS, Stukel TA and Littenberg B. A metaanalytic method for summrizing diagnostic test performances: receiver-operating-characteristic-summary point estimates. Med Decis Making 1993; 13: 253–256. 29. Hamza TH, Arends LR, Van Houwelingen HC, et al. Multivariate random effects meta-analysis or diagnostic tests with multiple thresholds. BMC Med Res Methodol 2009; 9: 1–15. 30. Martı´ nez-Camblor P and Corral N. A general bootstrap algorithm for hypothesis testing. J Stat Plann Inference 2012; 142: 589–600. 31. Martı´ nez-Camblor P, Carleos C and Corral N. Powerful nonparametric statistics to compare k-independent ROC curves. J Appl Stat 2011; 38: 1317–1332. 32. Martı´ nez-Camblor P, Carleos C and Corral N. General nonparametric ROC curves comparison. J Korean Stat Soc 2013; 42: 71–81.

Downloaded from smm.sagepub.com by guest on November 14, 2015

Fully non-parametric receiver operating characteristic curve estimation for random-effects meta-analysis.

Meta-analyses, broadly defined as the quantitative review and synthesis of the results of related but independent comparable studies, allow to know th...
480KB Sizes 0 Downloads 3 Views