Journal of Applied Psychology 2015, Vol. 100, No. 1, 194 –202

© 2014 American Psychological Association 0021-9010/15/$12.00 DOI: 10.1037/a0036635

RESEARCH REPORT

Small Sample Mediation Testing: Misplaced Confidence in Bootstrapped Confidence Intervals Joel Koopman, Michael Howe, and John R. Hollenbeck

Hock-Peng Sin Florida International University

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Michigan State University

Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20 – 80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. Keywords: mediation, bootstrapping, permutation, Bayes Supplemental materials: http://dx.doi.org/10.1037/a0036635.supp

inferences because theoretical progress depends on both the quality of those tools as well as the manner in which they are applied (Edwards, 2008; Williams, Edwards, & Vandenberg, 2003). This article represents one such effort to critically evaluate bootstrapping:1 a statistical tool that is “rapidly becoming common in the literature” for testing mediation (Young, Bauman, Chen, & Bastardi, 2012, p. 92). In recent years, bootstrapping has been argued to be a useful and powerful tool for testing the significance of mediation hypotheses within psychology, particularly with small samples. Such advocacy emanates from Shrout and Bolger’s (2002, p. 424) assertion that bootstrapping “can be applied even when sample sizes are moderate or small, that is, in the range of 20 – 80 cases” (hereafter, a small sample refers to a sample with 20 – 80 cases) and has been sustained by similar recommendations from other noted methodologists (e.g., Preacher & Hayes, 2004). Researchers have heeded

New statistical methodology is often applied promiscuously, more so if it is complicated, computer-based, and hard to check. The process of connecting it back to the basic principles of statistical inference comes later, but in the long run, no methodology can survive if it flouts these principles. Efron (2000, p. 1294)

A scientific field can progress only as fast as its theories can be empirically tested. As those theories become increasingly complicated and sophisticated, so too must the statistical tools used to test their propositions. However, it is important to heed Efron’s advice and critically evaluate the tools by which we make statistical

This article was published Online First April 14, 2014. Joel Koopman, Michael Howe, and John R. Hollenbeck, Department of Management, Michigan State University; Hock-Peng Sin, Department of Management and International Business, Florida International University. The authors gratefully thank John D. Kammeyer-Mueller, Emilija Djurdjevic, and Samantha Conroy for their helpful comments and suggestions on an earlier version of this article. We would also like to thank the High Performance Computing Center at Michigan State University for providing the necessary computing power to conduct this research. Correspondence concerning this article should be addressed to Joel Koopman, Department of Management, Michigan State University, North Business College Complex, 632 Bogue Street, Room N475, East Lansing, MI 48824-1122. E-mail: [email protected]

1 Our use of the term bootstrapping refers exclusively to the “samplebased” or “case-based” method in which resampling is conducted using the raw data (cf. J Wang, Carpenter, & Kepler, 2006). This is the most frequently utilized method in psychological research and was the focus of research by Shrout and Bolger (2002); Preacher and Hayes (2004), and others. Thus, our comments and conclusions should not be extended to other bootstrapping methods (e.g., the “parameter-based” bootstrap, which creates a sampling distribution using the parameter estimate and not the raw data; Preacher, Zyphur, & Zhang, 2010; M Wang et al., 2013).

194

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SMALL SAMPLE MEDIATION TESTING

this suggestion, and we note an increasing trend in the use of bootstrapping to test mediation with 20 – 80 cases. We contend that there is cause for concern regarding both the Type I and Type II error rates associated with using bootstrapping to test mediation in small samples. Bootstrapping’s purported advantage in this domain is largely based on a statistical power advantage relative to other tests, such that bootstrapping is more powerful (i.e., less likely to result in a Type II error) than previously utilized methods (e.g., Sobel test). However, this overlooks a substantial concern with the absolute Type II error rate of bootstrapping. For example, MacKinnon, Lockwood, and Williams (2004) found that the best case Type II error rate for bootstrapping averaged across various effect size conditions with N ⫽ 100 was 0.38 (i.e., nearly double the recommended Type II error rate of 0.20, corresponding to statistical power of 0.80; Cohen, 1992). Our simulations show in many instances that researchers utilizing sample sizes in the range endorsed by Shrout and Bolger (2002) have an even larger chance of committing a Type II error. Additionally, methodologists have recently expressed apprehension regarding bootstrapping’s tendency to exhibit an inflated Type I error rate (e.g., in excess of the 5% threshold associated with a 95% confidence interval; Fritz, Taylor, & MacKinnon, 2012). For example, Cheung and Lau (2008) demonstrated in large samples (i.e., Ns ranging from 100 to 500 cases) that bootstrapping consistently exceeds 5% Type I error. As we discuss later, bootstrapping exhibits this same tendency in the range of small samples advocated by Shrout and Bolger (2002). Interestingly, recent articles published in Journal of Applied Psychology have insinuated the opposite: that bootstrapping is preferred because it has a lower Type I error rate than other methods. In light of these concerns, our article makes two important contributions. First, by conducting a detailed examination of bootstrapping in the small samples advocated by Shrout and Bolger (2002), we find that bootstrapping routinely exceeds established thresholds for both types of error in many of the conditions evaluated. Though we are not the first to examine these issues with bootstrapping in general, we are the first to focus extensively on the range of 20 – 80 cases advocated by Shrout and Bolger (2002). This is important because, as we discuss later, researchers have heeded this advocacy, and sample sizes in this range represent a growing segment of mediation research conducted within applied psychology. Thus, one contribution of this article is to amend an important misconception regarding testing mediation. Our second contribution is an exploratory investigation of the performance of two emerging methods for testing mediation in small sample contexts. We consider an alternative resampling method that relies on residual permutation (Taylor & MacKinnon, 2012) as well as a Bayesian approach (Yuan & MacKinnon, 2009). While there are reasons for concern regarding small samples generally (e.g., Schmidt, Hunter, & Pearlman, 1981; Tversky & Kahneman, 1974), researchers may sometimes have little recourse (particularly at higher levels of analysis; Shen et al., 2011). As such, Koopman, Howe, and Hollenbeck (in press) recently called for investigations of alternatives to bootstrapping in small sample contexts. Accordingly, our goal was to identify potential alternatives to bootstrapping in small samples.

195

Bootstrapping in Small Samples Early Advocacy In their 2002 article in Psychological Methods, Shrout and Bolger argued that researchers seeking to test mediation hypotheses (specifically, the significance of the indirect effect, or ␣ ⴱ ␤, in Figure 1) should use bootstrapping. Bootstrapping is a nonparametric statistical technique that does not require an a priori assumption about the shape of the sampling distribution for this product (Efron, 1979); instead, bootstrapping empirically estimates the sampling distribution by repeatedly resampling the data and confidence interval end points can be obtained directly. Shrout and Bolger (2002) viewed this as a substantial advantage, arguing that bootstrapping is ideal for testing the indirect effect because the sampling distribution of the indirect effect tends to be nonnormal, particularly in small samples (Stone & Sobel, 1990). In contrast, the prevailing method for testing the indirect effect at the time, the Sobel (1982) test, requires an assumption that this distribution is normal. When this assumption is not satisfied, the Sobel test tends to exhibit higher Type II error rates than those of bootstrapping (Preacher & Hayes, 2004). On this basis, Shrout and Bolger (2002, p. 429) concluded that bootstrapping has “more power” in small samples compared with the Sobel test. This statement represents a generally valid assertion of relative statistical power. However, recall that these authors further asserted that bootstrapping can be applied in the range of 20 – 80 cases. We view this second assertion as an unwarranted extension, given that a relative power advantage does not imply that bootstrapping can consistently reject a false null hypothesis at an acceptable rate. As an example, if one statistical method demonstrates 20% power and the other 30%, then the second method can correctly be described as being relatively more powerful than the first, but neither test approaches the broadly recommended threshold of 80% (Cohen 1992). Recent advocacy aside, Bradley Efron, the creator of bootstrapping whose quote we opened with, has consistently argued against such small sample applications. Efron (1990, p. 4) remarked on the inaccuracy of bootstrapped confidence intervals in small samples, noting that this inaccuracy could be problematic for hypothesis testing because “small errors in the end points can change the verdict of the test.” Efron and Tibshirani (1993, p. 178) echoed this in their comment that bootstrapped confidence intervals “can still be erratic for small sample sizes.”

Figure 1. A partially mediated model.

196

KOOPMAN, HOWE, HOLLENBECK, AND SIN

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Evidence From the Journal of Applied Psychology To document the prevalence with which bootstrapping is used to test mediation in small samples, we examined every article published in the Journal of Applied Psychology (JAP) from 2006 through 2012. We chose the years 2006 –2012 to provide sufficient time for researchers to become aware of bootstrapping and utilize it in their analyses as we consider the general entry of bootstrapping into psychological research to be Shrout and Bolger’s (2002) seminal article with augmentation from Preacher & Hayes (2004, 2008). JAP represented an ideal choice for this investigation because of its visibility among scholars and its reputation for publishing high-quality empirical research. Zickar and Highhouse (2001) ranked JAP as the most important journal for industrial– organizational psychologists as research published there generally represents best practices in applied psychology (Shen et al., 2011, p. 1055). JAP’s current Institute for Scientific Information’s impact factor is approximately 4.758, and the journal is ranked Number 2 in the applied psychology category. Moreover, JAP is likely seen as an important source for best practices in research method and design given its stated emphasis on publishing “rigorously conducted empirical investigations” as a cornerstone of the journal’s reputation (Kozlowski, 2009, p. 1). Efron’s arguments notwithstanding, our review reveals a misconception within psychology that bootstrapping represents a rigorous test of mediation in small samples. Lance (2011, p. 281) argued that amending such misconceptions is a critical research activity because, if left unchecked, they may “establish normative research conduct and reinforce beliefs about what’s what and what’s not.” Indeed, the evidence we present in the following suggests that such norms are established. While we highlight quotes from recent research published in JAP that illustrate the presence of these norms, our intent is not to imply wrongdoing. To the contrary, based on accepted wisdom at the time, these authors likely made the best analytical choice. As our goal was only to substantiate our argument about the pervasiveness of this belief, we have omitted these authors’ identities. In a 2012 JAP article, the authors stated that they tested mediation using bootstrapping in their sample of 77 undergraduate students “because of the relatively small sample.” Another research team noted in their 2013 JAP article that bootstrapping was implemented in their sample of 50 district managers “because of the small sample size of this study.” In a 2011 JAP article, a research team testing mediation with 67 work units summed up this point in their assertion that bootstrapping “is regarded the most powerful test of the size and significance of indirect effects.” While the prevailing advocacy for bootstrapping is based on statistical power advantages, recently in JAP an alternate rationale has emerged based on a supposed advantage in terms of Type I error as well. The authors of a 2011 JAP article noted that bootstrapping was preferred for their sample of 79 management teams based not only on statistical power but also “superior” Type I error rates. A 2012 JAP article by different authors echoed this point for their sample of 32 student teams. When taken in conjunction with the previous statements supporting bootstrapping in small samples based on statistical power, a clear signal is being sent by JAP that small sample research testing mediation is acceptable so long as the analysis uses bootstrapping.

As further evidence, for each of the articles we examined in JAP from 2006 through 2012 in which mediation was tested in a single-level primary empirical study, we coded whether the analysis utilized bootstrapping (i.e., percentile, bias-corrected, or biascorrected and accelerated methods for constructing bootstrap confidence intervals), the Sobel (1982) test, or an alternative method such as structural equation modeling goodness-of-fit tests (e.g., James & Brett, 1984; James, Mulaik, & Brett, 2006) or the causal steps approach (e.g., Baron & Kenny, 1986). We do not focus further on either the causal steps or goodness-of-fit methods because these methods do not provide an effect size or confidence interval for the indirect effect. Table 1 summarizes these results. In a given year, the total number of articles in JAP that tested mediation in a single-level primary investigation ranged between 21 and 37. Table 1 also shows the number of those articles that tested mediation with 80 or fewer cases (i.e., the upper bound specified by Shrout and Bolger, 2002). In, 2006, only 8% of articles (two out of 25) testing mediation hypotheses relied on 80 or fewer cases. However, this number increased fourfold by 2012 as 32% of articles (eight out of 25) testing mediation employed sample sizes of 80 or fewer cases. In Figure 2, we have plotted the proportion of articles relying on small samples to test mediation to visually demonstrate this increasing trend. In Figure 3, we focused on articles testing mediation in small samples to compare the proportions that used bootstrapping versus the Sobel test. In both 2006 and 2007, none of the articles testing mediation with 80 or fewer cases used bootstrapping. In contrast, from 2010 –2012, bootstrapping was the method used in the majority of mediation tests, culminating in 2012 with seven tests out of eight (88%). In addition, of all the small sample articles examined, those using bootstrapping tended to have 15% fewer cases in their analyses (58 cases on average compared with 68 cases on average for the Sobel test). Our analysis reveals two important trends: first, the proportion of articles with 80 or fewer cases relative to the total number of articles testing mediation is rising, and second, the proportion of those articles that used bootstrapping is likewise increasing while the proportion using the Sobel test is decreasing.

Table 1 Examination of Mediation Testing in the Journal of Applied Psychology

Year

Bootstrapping

Sobel test

Other

Total

Total testing mediation

2006 2007 2008 2009 2010 2011 2012

0 0 3 2 3 6 7

1 2 1 3 2 1 0

1 4 0 2 0 0 1

2 6 4 7 5 7 8

25 37 21 26 21 24 25

Mediation testing with fewer than 80 cases

Note. Table presents a count of articles testing mediation in a primary, single-level investigation in the Journal of Applied Psychology 2006 – 2012. The total number of such articles is provided, as well as a breakdown of those that tested mediation with 80 or fewer cases and the analytical method used.

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

SMALL SAMPLE MEDIATION TESTING

197

Combs (2010) recently noted that, in general, sample sizes in organizational research are trending upward. However, our analysis of recent mediation articles in JAP suggests that this trend may not be uniform. While technological improvements (e.g., e-mail, MTurk, Survey Response, and so on) have made identifying potential participants easier than ever, Shen et al. (2011, p. 1058) noted that obtaining a sizeable number of cases at higher levels of analysis (e.g., departments, work units, and so on) is a “substantial practical achievement given all of the additional difficulties inherent in data collection and maintaining sample sizes at these higher level units.” In light of this, we consider two emerging methods for testing mediation in the hopes of identifying a preferable alternative to bootstrapping when it is necessary to test mediation using small to moderate samples.

Alternatives to Bootstrapping Permutation method. The permutation method is, like bootstrapping, an established resampling technique.2 We consider this test based on general recommendations to use resampling techniques with nonnormally distributed data (e.g., Bishara & Hittner, 2012) and for testing mediation in particular (e.g., Taylor & MacKinnon, 2012). This test differs from bootstrapping in the manner with which the sampling distribution is estimated. While bootstrapping resamples observed cases in their entirety with replacement, permutation resamples only the residuals from the prediction of the mediator and dependent variable without replacement (Good, 2005). By conducting some large number of permutations, researchers can empirically sample the indirect effect sampling distribution, allowing confidence interval end points to be selected directly (as with bootstrapping). An important caveat associated with this method is that while proper model specification is always important (mediation is, after all, a causal argument; Stone-Romero & Rosopa, 2008), this issue is particularly relevant to the permutation method. Because the indirect effect sampling distribution is estimated based on the residuals from regression models, a misspecification of these models would lead to incorrect residuals and likely produce biased estimates of the indirect effect. Bayesian method. Given the forthcoming “revolution” in social science being driven by Bayesian3 methods (Kruschke, Agui-

Figure 2. Trend representing proportion of mediation tests with fewer than 80 cases. Figure displays the increasing trend representing the proportion of articles that tested for mediation in the Journal of Applied Psychology 2006 –2012 using fewer than 80 cases.

Figure 3. Comparison of bootstrapping and Sobel test usage for mediation tests with fewer than 80 cases. Figure compares proportions of articles in Journal of Applied Psychology 2006 –2012 that used the Sobel (1982) test compared with those that used bootstrapping to test for mediation with fewer than 80 cases.

nis, & Joo, 2012, p. 722) and typified in part by the forthcoming special issue on this topic in the Journal of Management, we decided to investigate the efficacy of a Bayesian approach to testing mediation hypotheses in small samples. In Bayesian analysis, a specified prior distribution that encompasses a priori beliefs about the nature of the distribution is combined with the parameter distribution estimated from the observed data (i.e., the likelihood) to yield a posterior distribution that represents the sample adjusted estimate for the probability distribution for the parameter of interest (Zyphur & Oswald, in press). The “strength” of the prior is generally based on the information available about the nature of the relationship under consideration. Strong (i.e., informative) priors “shrink” (Gelman, Carlin, Stern, & Rubin, 2003) or pull the posterior estimate toward the prior (and potentially away from the likelihood estimate) while weak (i.e., uninformative) priors exert very little influence, allowing the likelihood to dominate (Yuan & MacKinnon, 2009). For our purposes, a Bayesian analysis can conduct a familiar form of hypothesis testing by examining the Bayesian “credibility interval” that captures 95% of the posterior distribution obtained from the analysis (see Zyphur & Oswald, in press). If a specified null value (i.e., 0) falls outside this interval, then using a similar logic to null-hypothesis significance testing, the hypothesis can be rejected. In our reporting on the results of this method, we refer to the probability of the credibility interval correctly excluding zero as representing statistical power and the probability of the credi2 This method is also referred to as the residual bootstrap (Pituch & Stapleton, 2008; Pituch, Stapleton, & Kang, 2006). However, to avoid confusion with the more established case-based bootstrap discussed thus far in the article, we have adopted the “permutation” terminology of Taylor and MacKinnon (2012) to refer to the noniterative permutation method that they recommend for testing mediation effects. 3 Given space limitations, a detailed discussion of the differences between Bayesian analytic techniques and the other methods utilized in this article is infeasible. For our purposes, Bayesian results can be interpreted similarly to other presented methods; however, we stress that this is a simplification, and for more detail, the reader is directed to Bayarri and Berger (2004); Kruschke, Aguinis, and Joo (2012), and Zyphur and Oswald (in press).

KOOPMAN, HOWE, HOLLENBECK, AND SIN

198

bility interval incorrectly excluding zero as Type I error (while these are primarily terms associated with a frequentist perspective, such analyses and conclusions are permissible within a Bayesian framework; Zyphur & Oswald, in press).

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Method We conducted a simulation study to examine the performance of bootstrapping, permutation, and Bayesian methods in the range of 20 –100 cases. We chose this range based on Shrout and Bolger’s (2002) advocacy for bootstrapping in small to moderate samples, adding the condition where N ⫽ 100 in search of a lower bound for a recommended sample size for mediation analyses. We conducted all simulations using R (R Core Team, 2012). We simulated the mediated relationship shown in Figure 1, modeling the relationship between X and Y as fully mediated by M (i.e., the total effect of X on Y is equal to the indirect effect, and the direct effect c= in Figure 1 is equal to zero). This is a meaningful specification and represents the model most commonly used in other investigations of bootstrapping4 (e.g., MacKinnon et al., 2004). Two separate analyses were conducted for statistical power and Type I error. Statistical power is represented by the proportion of times that the confidence or credibility interval (for brevity, hereafter we refer only to confidence intervals) correctly excludes zero when the population effect is specified to be nonzero. Type I error is represented by the proportion of times that the confidence interval incorrectly excludes zero when the population effect is specified to be zero. For the statistical power analyses, we modeled population effect sizes for both paths that make up the indirect effect (i.e., the ␣ and ␤ paths from Figure 1) as small (0.10), medium (0.30), and large (0.50) standardized effects (Cohen, 1992). We evaluated six combinations of these effect sizes: small/small (␣ ⫽ .10/␤ ⫽ 0.10), small/medium (␣ ⫽ .10/␤ ⫽ 0.30), small/ large (␣ ⫽ .10/␤ ⫽ 0.50), medium/medium (␣ ⫽ .30/␤ ⫽ 0.30), medium/large (␣ ⫽ .30/␤ ⫽ 0.50), and large/large (␣ ⫽ .50/␤ ⫽ 0.50). For the Type I error analyses, we specified the effect size of the ␣ path to be zero; therefore, the population indirect effect for these analyses was always zero. We evaluated three combinations of effect size for this analysis: zero/small (␣ ⫽ .00/␤ ⫽ 0.10), zero/medium (␣ ⫽ .00/␤ ⫽ 0.30), and zero/large (␣ ⫽ .00/␤ ⫽ 0.50). For each of the nine conditions, we simulated 1,000 random samples unique to this study, drawing from a standard multivariate normal population using the MASS package (Venables & Ripley, 2002). The size of the random samples ranged from 20 to 100, increasing in increments of 20. For the bootstrapping analyses, we drew 5,000 resamples (e.g., Preacher & Hayes, 2004) using the “boot” package (Canty & Ripley, 2013). We created 95% confidence intervals for the bootstrap using “percentile,” “bias-corrected,” and “bias-corrected and accelerated” methods as described in Efron and Tibshirani (1993) and Cheung and Lau (2008). For the permutation analysis, we followed the procedure for noniterative permutation confidence intervals for the indirect effect as described by Taylor and MacKinnon (2012). Following their recommendation, we estimated the sampling distribution using 2,000 permutated samples. The Bayesian analysis was conducted using the Markov chain Monte Carlo (MCMC) procedure to characterize the posterior distribution using MCMCpack (Martin, Quinn, & Park, 2011). Following Yuan and

MacKinnon (2009), we initially discarded 1,000 burn-in iterations for each sample and used 10,000 subsequent iterations for the analysis. We modeled an uninformative prior (Kruschke et al., 2012), operationalized as an extremely wide normal distribution (i.e., variance of 106) centered on zero (see also: Yuan & MacKinnon, 2009; Zyphur & Oswald, in press). To eliminate the additional variance introduced in resampling simulations by the selection of a random seed (Gleser, 1996; Koopman et al., in press), we specified a starting value of zero across all methods. For researchers wishing to use these methods in their own research, we have provided an R syntax file and instructions in the online supplemental materials.

Results Table 2 details the results of our statistical power simulation; each cell represents the proportion of the 1,000 replicate samples for each sample-size/effect-size combination in which the confidence or credibility interval correctly excluded zero. Focusing first on the three conditions in which at least one effect was small (i.e., 0.10), bootstrapping (in particular, the biascorrected and bias-corrected and accelerated methods) tended to outperform both the permutation and Bayesian methods in a relative sense. However, the highest level of statistical power attained in any condition was 22%, far short of the 80% level recommended by Cohen (1992). Additionally, the relative differences between bootstrapping and the other tests appear minor relative to how far any method is from exhibiting a sufficient level of statistical power. Turning to the condition in which both paths of the indirect effect were medium (i.e., 0.30), while no method exceeded 80% power, we note that greater than 70% of intervals for each method (excluding the Sobel test) correctly excluded zero when N ⫽ 100. So long as researchers expect moderate effect sizes, these results suggest that 100 cases may be a justifiable minimum number of cases for researchers (see also Maxwell, 2004). Again, we also find little relative difference among bootstrapping, permutation, and Bayesian analysis, but as expected, the Sobel test remained underpowered here. In the final two conditions where at least one path was large (i.e., 0.50), all methods approached or exceeded 80% power with at least 80 cases and relative differences between methods continue to be minor. Table 3 details the results of our Type I error simulation; each cell represents the proportion of the 1,000 replicate samples for each sample-size/effect-size combination in which the confidence interval incorrectly excluded zero. In psychology, a 5% Type I error rate is generally expected with the reporting of a 95% confidence interval. However, in the case of the indirect 4 As pointed out by a reviewer for this article, this represents the “textbook example” of mediation; mediation models are frequently more complicated. We agree with this point and note that this specification represents essentially a best-case scenario. With more complicated models, statistical power is likely lower than what we present here (although future research investigating more complicated models would be valuable). For example, Taylor, MacKinnon, and Tein (2008) presented a mediated model with two in-line intervening variables. In their model, for bias-corrected bootstrapping when N ⫽ 100 and all three paths are moderate, their statistical power is 0.336. In our simulation, when the two paths of the indirect effect are moderate and N ⫽ 100, statistical power is 0.785.

SMALL SAMPLE MEDIATION TESTING

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

effect, this rate can actually deviate from this specified value (see: Fritz et al., 2012). As expected, our analyses reveal that bootstrapping has a strong tendency to exceed 5% Type I error in a number of conditions. Specifically, when the modeled nonzero path is moderate or large, at least one bootstrapping method exceeds 5% Type I error in nine out of 10 conditions. Of note is that while neither the permutation nor Bayesian methods was perfect here, their effective error rates were substantially lower than bootstrapping. Across the same 10 conditions, permutation exceeded 5% Type I error four times and the Bayesian method did so only twice. Moreover, the maximum Type I error rate for bootstrapping was nearly 50% larger than

Table 2 Comparative Statistical Power Analysis of Six Methods for Testing the Significance of Indirect Effects Effect size/sample size ␣ ⫽ .10 20 40 60 80 100 ␣ ⫽ .10 20 40 60 80 100 ␣ ⫽ .10 20 40 60 80 100 ␣ ⫽ .30 20 40 60 80 100 ␣ ⫽ .30 20 40 60 80 100 ␣ ⫽ .50 20 40 60 80 100

Method SBL

PCT

BC

BCa

BYS

PMT

0.000 0.001 0.001 0.004 0.003

0.007 0.009 0.009 0.025 0.021

0.016 0.021 0.019 0.045 0.052

0.017 0.021 0.018 0.043 0.055

0.003 0.006 0.006 0.020 0.020

0.007 0.011 0.010 0.025 0.027

0.006 0.006 0.022 0.032 0.060

0.020 0.039 0.064 0.099 0.138

0.033 0.074 0.123 0.164 0.200

0.031 0.077 0.124 0.162 0.196

0.011 0.026 0.057 0.098 0.132

0.020 0.046 0.073 0.106 0.142

0.019 0.032 0.071 0.115 0.145

0.059 0.082 0.133 0.160 0.183

0.101 0.113 0.174 0.189 0.216

0.097 0.111 0.167 0.191 0.216

0.042 0.061 0.121 0.144 0.183

0.076 0.081 0.126 0.153 0.183

0.019 0.082 0.207 0.397 0.546

0.052 0.212 0.383 0.567 0.711

0.101 0.302 0.498 0.666 0.785

0.098 0.298 0.493 0.658 0.782

0.040 0.186 0.366 0.570 0.716

0.080 0.236 0.401 0.597 0.730

0.063 0.290 0.524 0.691 0.826

0.148 0.428 0.649 0.756 0.862

0.231 0.527 0.715 0.798 0.889

0.220 0.528 0.703 0.794 0.888

0.123 0.424 0.644 0.762 0.867

0.200 0.471 0.662 0.767 0.870

0.194 0.663 0.901 0.980 0.998

0.319 0.763 0.930 0.989 0.999

0.436 0.824 0.951 0.994 0.999

0.427 0.815 0.952 0.994 0.999

0.286 0.753 0.937 0.989 0.999

0.402 0.795 0.945 0.992 0.999

␤ ⫽ 0.10

␤ ⫽ 0.30

␤ ⫽ 0.50

␤ ⫽ 0.30

␤ ⫽ 0.50

␤ ⫽ 0.50

Note. Values represent the proportion of correct rejections of the null hypothesis over 1,000 replications. ␣ and ␤ refer to the population effect sizes for the paths of the indirect effect in a mediation model. Bolded values represent conditions with power that exceeds 80%. SBL ⫽ Sobel test; PCT ⫽ percentile bootstrap confidence intervals; BC ⫽ bias-corrected bootstrap confidence intervals; BCa ⫽ bias-corrected and accelerated bootstrap confidence intervals; BYS ⫽ Bayesian credibility intervals calculated with an uninformative, diffuse prior; PMT ⫽ confidence intervals calculated by creating a sampling distribution using permuted outcome values.

199

Table 3 Comparative Type I Error Analysis of Six Methods for Testing the Significance of Indirect Effects Effect size/sample size ␣ ⫽ .00 ␤ ⫽ 0.10 20 40 60 80 100 ␣ ⫽ .00 ␤ ⫽ 0.30 20 40 60 80 100 ␣ ⫽ .00 ␤ ⫽ 0.50 20 40 60 80 100

Method SBL

PCT

BC

BCa

BYS

PMT

0.000 0.001 0.001 0.000 0.002

0.002 0.004 0.007 0.005 0.009

0.007 0.014 0.013 0.020 0.019

0.008 0.013 0.010 0.020 0.019

0.000 0.002 0.004 0.003 0.007

0.007 0.003 0.006 0.007 0.007

0.002 0.008 0.010 0.011 0.016

0.016 0.032 0.035 0.031 0.055

0.030 0.059 0.062 0.069 0.089

0.029 0.058 0.059 0.068 0.089

0.005 0.020 0.030 0.022 0.043

0.016 0.036 0.037 0.037 0.051

0.008 0.028 0.035 0.037 0.030

0.027 0.058 0.068 0.058 0.053

0.059 0.087 0.091 0.072 0.071

0.059 0.086 0.092 0.078 0.070

0.016 0.046 0.052 0.052 0.038

0.038 0.063 0.056 0.056 0.044

Note. Values represent the proportion of incorrect rejections of the null hypothesis over 1,000 replications. ␣ and ␤ refer to the population effect sizes for the paths of the indirect effect in a mediation model. Bolded values represent conditions with Type I error that exceeds 5%. SBL ⫽ Sobel test; PCT ⫽ percentile bootstrap confidence intervals; BC ⫽ biascorrected bootstrap confidence intervals; BCa ⫽ bias-corrected and accelerated bootstrap confidence intervals; BYS ⫽ Bayesian credibility intervals calculated with an uninformative, diffuse prior; PMT ⫽ confidence intervals calculated by creating a sampling distribution using permuted outcome values.

that of permutation and 80% larger than the Bayesian method (9.2% for bootstrapping compared with 6.3% for permutation and 5.2% for Bayes). In summary, while the bootstrapping, permutation, and Bayesian methods were generally comparable in statistical power, they exhibited important differences relative to one another in terms of Type I error. Finally, it bears noting that although the Sobel test tends to be underpowered in these contexts, unlike the other methods considered, it did not exceed 5% Type I error in any condition.

Discussion It is well known that parameter estimates and confidence intervals derived from small samples are affected by sampling error (Cohen, 1990; Schmidt et al., 1981; Tversky & Kahneman, 1974). Although no sample is perfectly representative of the population, chance deviations are minimized in large samples but can be extreme in small samples (Tversky & Kahneman, 1971). Sampling error is an important issue as it causes not only variance in parameter estimates across studies but also differences in the significance of results that are entirely attributable to random error. In spite of well-known concerns regarding small samples, in recent years, bootstrapping has been suggested as a powerful analytical tool that researchers can apply to test mediation in small samples (Shrout & Bolger, 2002). As we demonstrated, this recommendation has been heeded by researchers, as evidenced by an increasing trend in the number of small sample studies utilizing bootstrapping

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

200

KOOPMAN, HOWE, HOLLENBECK, AND SIN

that are being published in JAP. The purpose of this article was to investigate the empirical performance of bootstrapping in small samples, as well as to suggest two potential alternatives. Past research on bootstrapping has predominantly focused on a comparison with the Sobel (1982) test across a wide range of sample sizes with little attention paid specifically to the small sample context where bootstrapping was explicitly endorsed and is increasingly applied. For example, MacKinnon et al. (2004) conducted a limited investigation and reported only averaged results, limiting the ability of readers to examine variations across conditions. Cheung and Lau (2008) focused exclusively on larger samples in their structural equation modeling comparison of bootstrapping and the Sobel test, and Taylor et al. (2008) considered a mediation model with two intervening variables. Biesanz, Falk, and Savalei (2010) simulated missing data to compare bootstrapping and the Sobel test, while Koopman et al. (in press) highlighted the consequences associated with changing the random seed in the bootstrap analysis. While the results of these prior simulations did raise some of our same concerns with bootstrapping in different contexts, with the exception of Koopman et al. (in press), these authors generally concluded that bootstrapping is the preferred method for testing mediation. Our results, focusing specifically on the sample size range advocated by Shrout and Bolger (2002), led us to strongly disagree with those conclusions. We therefore conducted an exploratory comparison of bootstrapping with two emerging methods for testing mediation. Our results illustrate several concerns with the use of bootstrapping to test mediation in small samples. First, the likelihood of committing a Type II error is notably high in many of our conditions. In particular, if at least one effect size is expected to be small, the likelihood of demonstrating mediation with a small sample, regardless of method employed, is very low. Similarly, moderate effect sizes seem to warrant sample sizes (i.e., at least 100 cases) in excess of the range advocated by Shrout and Bolger (2002), and even large effect sizes require samples corresponding to the upper end of Shrout and Bolger’s recommended range. However, research utilizing sample sizes below these thresholds is increasingly being published in JAP. This is troublesome because research in JAP is seen as being at the forefront of methodological rigor. If researchers regularly design their studies in accordance with those we described previously, more often than not they may be misled by sampling error. Tversky and Kahneman (1971, p. 107) noted that underpowered research “makes for frustrated scientists and inefficient research. The investigator who tests a valid hypothesis but fails to obtain significant results cannot help but regard nature as untrustworthy or even hostile.” Research resulting in such Type II errors represents wasted resources and missed opportunities. Additionally, the results of our Type I error analysis add further cause for concern regarding bootstrapping. Not only does bootstrapping frequently exceed the expected 5% Type I error rate but also some conditions exhibited rates nearly double those expected (see also: Cheung & Lau, 2008; Fritz et al., 2012). Moreover, this error rate is indicative of what should be expected for a single hypothesis test; in actuality, investigators in many studies in psychology conduct multiple hypothesis tests, increasing the overall experiment-wise error. This experimentwise increase in error rates is problematic enough when using tests

with the expected Type I error rate (e.g., Hollenbeck, DeRue, & Mannor, 2006), but the problem is likely compounded with the application of bootstrapping. Given the frequently elevated Type I error rates for bootstrapping, we recommend against the use of this test in small samples— especially given that at least two alternative methods exist with comparable statistical power. Despite concerns over small samples in general, they are sometimes unavoidable (Shen et al., 2011). In addition, we would add that sometimes a particularly unique or interesting sample becomes available where the number of cases is somewhat small; for example, in their sample of firefighters, Colquitt, LePine, Zapata, and Wild (2011) had only 70 cases for which objective performance data were available. In these situations, the results of our analysis potentially encourage the use of one of the two alternative methods for testing mediation as opposed to bootstrapping. Our intention is not to broadly recommend small sample mediation testing using these alternatives; we agree with Cohen (1990, p. 1304) that “less is more, except of course for sample size.” However, in those instances where sample size is small and simply cannot be reasonably increased, and thus researchers might feel compelled to utilize bootstrapping, we instead argue that either the permutation test or Bayesian method is a better alternative. Both methods have comparable statistical power to bootstrapping while exhibiting far less propensity to result in a Type I error. Of the two, we advocate more strongly for the Bayesian method, not only because it performed adequately in our simulation but also for the additional opportunities to improve upon the results presented here using this method (Zyphur & Oswald, in press). We utilized an uninformative prior in our simulations to present a broadly applicable version of this method. However, this test is conservative in small samples as the likelihood (i.e., the data) dominates the posterior distribution (Kruschke et al., 2012). Instead, if researchers are able to identify and justify an informative prior based on theory or other research, then Bayesian analysis becomes a notably more potent statistical tool, particularly for small samples (Zyphur & Oswald, in press). Indeed, similar arguments for Bayesian approaches have been made regarding other types of analyses that tend to be constrained to small samples (e.g., when validating local employee selection batteries; Newman, Jacobs, & Bartram, 2007). Overall, we make the following recommendations regarding the testing of mediation hypotheses in small samples: 1. Advocacy for the use of bootstrapping in small samples based on relative statistical power is a red herring given the lack of power demonstrated in most contexts in our simulation. Such assertions obscure the more general problem of attempting to draw valid inferences for population indirect effects in these contexts. We recommend that, whenever at all possible, researchers design studies with sample sizes adequate to deliver a reasonable level of power in accordance with the results of our simulation and a priori expectations for the expected effect size. 2. When collecting additional cases is infeasible, researchers should avoid bootstrapping and instead consider either the permutation or Bayesian methods. If the investigation is sufficiently guided by theory and previous findings, then we recommend using the Bayesian method with an informative prior to capitalize on this knowledge.

SMALL SAMPLE MEDIATION TESTING

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

References Baron, R. M., & Kenny, D. A. (1986). The moderator mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. doi:10.1037/0022-3514.51.6.1173 Bayarri, M. J., & Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis. Statistical Science, 19, 58 – 80. doi:10.1214/ 088342304000000116 Biesanz, J. C., Falk, C. F., & Savalei, V. (2010). Assessing mediational models: Testing and interval estimation for indirect effects. Multivariate Behavioral Research, 45, 661–701. doi:10.1080/00273171.2010.498292 Bishara, A. J., & Hittner, J. B. (2012). Testing the significance of a correlation with nonnormal data: Comparison of Pearson, Spearman, transformation, and resampling approaches. Psychological Methods, 17, 399 – 417. doi:10.1037/a0028087 Canty, A., & Ripley, B. (2013). Boot: Bootstrap R (S-plus) functions. (R Package Version 1.3–9). Vienna, Austria: R Foundation for Statistical Computing. Cheung, G. W., & Lau, R. S. (2008). Testing mediation and suppression effects of latent variables: Bootstrapping with structural equation models. Organizational Research Methods, 11, 296 –325. doi:10.1177/ 1094428107300343 Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304 –1312. doi:10.1037/0003-066X.45.12.1304 Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. doi:10.1037/0033-2909.112.1.155 Colquitt, J. A., LePine, J. A., Zapata, C. P., & Wild, R. E. (2011). Trust in typical and high-reliability contexts: Building and reacting to trust among firefighters. Academy of Management Journal, 54, 999 –1015. doi:10.5465/amj.2006.0241 Combs, J. G. (2010). Big samples and small effects: Let’s not trade relevance and rigor for power. Academy of Management Journal, 53, 9 –13. doi:10.5465/AMJ.2010.48036305 Edwards, J. R. (2008). To prosper, organizational psychology should . . . overcome methodological barriers to progress. Journal of Organizational Behavior, 29, 469 – 491. doi:10.1002/job.529 Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1–26. doi:10.1214/aos/1176344552 Efron, B. (1990). Six questions raised by the bootstrap (Technical Report No. 139). Stanford, CA: Stanford University, Division of Biostatistics. Efron, B. (2000). The bootstrap and modern statistics. Journal of the American Statistical Association, 95, 1293–1296. doi:10.1080/01621459 .2000.10474333 Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York, NY: Chapman & Hall/CRC. doi:10.1007/978-1-4899-4541-9 Fritz, M. S., Taylor, A. B., & MacKinnon, D. P. (2012). Explanation of two anomalous results in statistical mediation analysis. Multivariate Behavioral Research, 47, 61– 87. doi:10.1080/00273171.2012.640596 Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis (2nd ed.). New York, NY: Chapman & Hall/CRC. Gleser, L. J. (1996). Comment on “bootstrap confidence intervals” by T. J. Diciccio and B. Efron. Statistical Science, 11, 219 –221. Good, P. (2005). Permutation, parametric, and bootstrap tests of hypotheses (3rd ed.). New York, NY: Springer–Verlag. Hollenbeck, J. R., DeRue, D. S., & Mannor, M. (2006). Statistical power and parameter stability when subjects are few and tests are many: Comment on Peterson, Smith, Martorana, and Owens (2003). Journal of Applied Psychology, 91, 1–5. doi:10.1037/0021-9010.91.1.1 James, L. R., & Brett, J. M. (1984). Mediators, moderators, and tests for mediation. Journal of Applied Psychology, 69, 307–321. doi:10.1037/ 0021-9010.69.2.307 James, L. R., Mulaik, S. A., & Brett, J. M. (2006). A tale of two methods. Organizational Research Methods, 9, 233–244. doi:10.1177/ 1094428105285144

201

Koopman, J., Howe, M., & Hollenbeck, J. R. (in press). Pulling the Sobel test up by its bootstraps. In C. E. Lance & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends. New York, NY: Routledge. Kozlowski, S. W. J. (2009). Editorial. Journal of Applied Psychology, 94, 1– 4. doi:10.1037/a0014990 Kruschke, J. K., Aguinis, H., & Joo, H. (2012). The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722–752. doi:10.1177/ 1094428112457829 Lance, C. E. (2011). More statistical and methodological myths and urban legends. Organizational Research Methods, 14, 279 –286. doi:10.1177/ 1094428110391814 MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99 –128. doi:10.1207/ s15327906mbr3901_4 Martin, A. D., Quinn, K. M., & Park, J. H. (2011). MCMCpack: Markov chain Monte Carlo in R. Journal of Statistical Software, 42, 1–21. Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, 147–163. doi:10.1037/1082-989X.9.2.147 Newman, D. A., Jacobs, R. R., & Bartram, D. (2007). Choosing the best method for local validity estimation: Relative accuracy of meta-analysis versus a local study versus Bayes-analysis. Journal of Applied Psychology, 92, 1394 –1413. doi:10.1037/0021-9010.92.5.1394 Pituch, K. A., & Stapleton, L. M. (2008). The performance of methods to test upper level mediation in the presence of nonnormal data. Multivariate Behavioral Research, 43, 237–267. doi:10.1080/0027317080 2034844 Pituch, K. A., Stapleton, L. M., & Kang, J. Y. (2006). A comparison of single sample and bootstrap methods to assess mediation in cluster randomized trials. Multivariate Behavioral Research, 41, 367– 400. doi: 10.1207/s15327906mbr4103_5 Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments & Computers, 36, 717–731. doi:10.3758/ BF03206553 Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40, 879 – 891. doi:10.3758/ BRM.40.3.879 Preacher, K. J., Zyphur, M. J., & Zhang, Z. (2010). A general multilevel SEM framework for assessing multilevel mediation. Psychological Methods, 15, 209 –233. doi:10.1037/a0020141 R Core Team. (2012). R: A language and environment for statistical computing (Version 2.15.1) [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1981). Task differences as moderators of aptitude-test validity in selection: A red herring. Journal of Applied Psychology, 66, 166 –185. doi:10.1037/0021-9010.66.2.166 Shen, W. N., Kiger, T. B., Davies, S. E., Rasch, R. L., Simon, K. M., & Ones, D. S. (2011). Samples in applied psychology: Over a decade of research in review. Journal of Applied Psychology, 96, 1055–1064. doi:10.1037/a0023322 Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods, 7, 422– 445. doi:10.1037/1082-989X.7.4.422 Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290 –312. doi:10.2307/270723 Stone, C. A., & Sobel, M. E. (1990). The robustness of estimates of total indirect effects in covariance structure models estimated by maximum likelihood. Psychometrika, 55, 337–352. doi:10.1007/BF02295291

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

202

KOOPMAN, HOWE, HOLLENBECK, AND SIN

Stone-Romero, E. F., & Rosopa, P. J. (2008). The relative validity of inferences about mediation as a function of research design characteristics. Organizational Research Methods, 11, 326 –352. doi:10.1177/ 1094428107300342 Taylor, A. B., & MacKinnon, D. P. (2012). Four applications of permutation methods to testing a single-mediator model. Behavior Research Methods, 44, 806 – 844. doi:10.3758/s13428-011-0181-x Taylor, A. B., MacKinnon, D. P., & Tein, J. Y. (2008). Tests of the three-path mediated effect. Organizational Research Methods, 11, 241– 269. doi:10.1177/1094428107300344 Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105–110. doi:10.1037/h0031322 Tversky, A., & Kahneman, D. (1974, September 27). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124 –1131. doi: 10.1126/science.185.4157.1124 Venables, W. N., & Ripley, B. D. (2002). Modern statistics with S (4th ed.). New York, NY: Springer. doi:10.1007/978-0-387-21706-2 Wang, J., Carpenter, J. R., & Kepler, M. A. (2006). Using SAS to conduct nonparametric residual bootstrap multilevel modeling with a small number of groups. Computer Methods and Programs in Biomedicine, 82, 130 –143. doi:10.1016/j.cmpb.2006.02.006 Wang, M., Liu, S. Q., Liao, H., Gong, Y. P., Kammeyer-Mueller, J., & Shi, J. Q. (2013). Can’t get it out of my mind: Employee rumination after

customer mistreatment and negative mood in the next morning. Journal of Applied Psychology, 98, 989 –1004. doi:10.1037/a0033656 Williams, L. J., Edwards, J. R., & Vandenberg, R. J. (2003). Recent advances in causal modeling methods for organizational and management research. Journal of Management, 29, 903–936. doi:10.1016/ S0149-2063(03)00084-9 Young, M. J., Bauman, C. W., Chen, N., & Bastardi, A. (2012). The pursuit of missing information in negotiation. Organizational Behavior and Human Decision Processes, 117, 88 –95. doi:10.1016/j.obhdp.2011.09 .004 Yuan, Y., & MacKinnon, D. P. (2009). Bayesian mediation analysis. Psychological Methods, 14, 301–322. doi:10.1037/a0016972 Zickar, M. J., & Highhouse, S. (2001). Measuring prestige of journals in industrial– organizational psychology. Industrial–Organizational Psychologist, 38, 29 –36. Zyphur, M. J., & Oswald, F. L. (in press). Bayesian estimation and inference: A user’s guide. Journal of Management.

Received July 20, 2013 Revision received March 13, 2014 Accepted March 18, 2014 䡲

Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.

Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Boots...
215KB Sizes 1 Downloads 3 Views