PSYCHOMETRIKA

2013 DOI : 10.1007/ S 11336-013-9386-5

EMPIRICAL CORRECTION TO THE LIKELIHOOD RATIO STATISTIC FOR STRUCTURAL EQUATION MODELING WITH MANY VARIABLES

K E -H AI Y UAN UNIVERSITY OF NOTRE DAME

Y UBIN T IAN BEIJING INSTITUTE OF TECHNOLOGY

H IROKAZU YANAGIHARA HIROSHIMA UNIVERSITY Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is TML , a slight modification to the likelihood ratio statistic. Under normality assumption, TML approximately follows a chi-square distribution when the number of observations (N ) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N , empirical results show that TML rejects the correct model too often when p is not too small. Various corrections to TML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct TML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to TML , and they control type I errors reasonably well whenever N ≥ max(50, 2p). The formulations of the empirically corrected statistics are further used to predict type I errors of TML as reported in the literature, and they perform well. Key words: Bartlett correction, Bayesian information criterion, maximum likelihood, type I errors.

1. Introduction Structural equation modeling (SEM) is one of the most widely used methods in social and behavioral research. With typical questionnaires containing many items, SEM is a statistical tool in modeling the covariances of the observed items/variables through a few underlying latent variables. The most widely used statistic for evaluating a SEM model is TML , a slight modification to the likelihood ratio statistic. Under the normality assumption, TML approximately follows a chi-square distribution when the number of observations (N ) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. A recent study by Moshagen (2012) showed that, with p = 60 and N = 200, using TML ∼ χdf2 rejects the correct model 98 % while ideally it should only reject 5 %. The purpose of this paper is to propose an empirical approach to correct TML so that the corrected statistic is better approximated by χdf2 with a large p and/or small N . In particular, we aim to develop statistics that can provide reasonable control of type I errors whenever N ≥ max(50, 2p). In addition, we will also provide a systematic evaluation of existing corrections to TML together with the newly obtained corrections. Requests for reprints should be sent to Ke-Hai Yuan, Department of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA. E-mail: [email protected]

© 2013 The Psychometric Society

PSYCHOMETRIKA

Let S be the sample covariance matrix based on a sample of size N , and (θ ) be the covariance structural model, with θ being the vector of q unknown free parameters. The statistic TML can be written as TML = (N − 1)FML ,

(1)

where FML = FML (θˆ ) with     FML (θ ) = tr S −1 (θ ) − logS −1 (θ ) − p, and θˆ is the estimator of θ that minimizes FML (θ ). When data are normally distributed and the null hypothesis E(S) =  = (θ 0 ) holds, TML converges in distribution to χdf2 with df = p(p + 1)/2 − q. However, with a relatively large p, using TML ∼ χdf2 tends to reject the correct model at a much higher rate than the nominal level. Various corrections to TML have been proposed to replace N − 1 in (1) by a smaller number so that the resulting statistic follows χdf2 more closely. We will briefly review these proposals to put our study in context. Pioneer corrections to likelihood ratio statistics were proposed and studied by Bartlett (1937, 1954), Box (1949) and Lawley (1956). The idea in these studies is to identify a scalar c such that ˆ )−1 TML , where cˆ E(TML ) = (1 + c/N)df + O(1/N 2 ) and to correct the statistic by (1 + c/N is estimated using the same sample as for obtaining TML . Such an improvement to test statistics is generally called the Bartlett correction. Although the correction just makes the mean of the resulting statistic closer to df , Barndorff-Nielsen and Hall (1988) showed that the corrected statistic enjoys higher order accuracy in controlling type I errors. Wakaki, Eguchi and Fujikoshi (1990) obtained the Bartlett correction for a class of covariance structural models. This correction was studied by Okada, Hoshino and Shigemasu (2007) and the results indicate that the corrected statistic controls type I errors better than TML . A Bartlett correction for TML was also obtained by Ogasawara (2010) when data are nonnormally distributed. Although the Bartlett correction is principled, the formulation of cˆ is rather complicated even for a very simple model, and thus it is hard to apply the correction to general SEM models. Two heuristic corrections to TML come from the context of exploratory factor analysis (EFA), where the likelihood ratio statistic can also be expressed as TML as given in (1). Bartlett (1951) proposed to replace N − 1 in (1) by Nb = N − (2p + 11)/6 − 2m/3, with m being the number of latent factors (see also Lawley & Maxwell, 1971, p. 36). Using Monte Carlo, Geweke and Singleton (1980) showed that TMLb = Nb FML ∼ χdf2 works well for EFA with small N . This corrected statistic is implemented as the default option for EFA in popular software such as SPSS. Because the corrected statistic TMLb is easy to implement, Fouladi (2000) proposed to apply TMLb for SEM. However, studies by Nevitt and Hancock (2004) and Herzog, Boomsma and Reinecke (2007) indicate that type I errors with TMLb ∼ χdf2 in SEM tend to be much lower than the nominal level. This is because in EFA the number of factor loadings proportionally increases with m; whereas in confirmatory factor analysis (CFA) or SEM the number of factor loadings does not change with m in most applications, due to each variable being typically loaded on one factor (Anderson & Gerbing, 1988). Considering that the number of parameters in SEM increases at a much lower speed than that in EFA when m increases, Yuan (2005) proposed to replace N − 1 in (1) by Ny = N − (2p + 13)/6 − m/3. However, this proposal is not statistically justified either, and the study by Herzog and Boomsma (2009) indicates that type I errors with TMLy = Ny FML ∼ χdf2 are still lower than the nominal level although it performs better than TMLb ∼ χdf2 . Another heuristic correction to TML was given by Swain (1975), who proposed to replace N − 1 in (1) by      Ns = N − 1 − p 2p 2 + 3p − 1 − hq 2h2q + 3hq − 1 /(12df ),

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

where hq = [(1 + 8q)1/2 − 1]/2. The statistic TMLs = Ns FML has been evaluated by Fouladi (2000), Herzog et al. (2007) and Herzog and Boomsma (2009). They found that type I errors with TMLs ∼ χdf2 are much closer to the nominal level than with TMLb ∼ χdf2 . In particular, Herzog and Boomsma (2009) found that TMLs performs the best among TML , TMLb , TMLy , TMLs . Thus, new statistics need to perform better than TMLs before they can be recommended for wide use. Notice that when m = 1, EFA and CFA models are identical. However, Ns = Nb when m = 1. Therefore, Swain’s corrected statistic TMLs does not achieve the higher order accuracy of a Bartlett correction. We would like to note that each of the existing corrections to TML aims to yield a statistic that is more closely approximated by χdf2 than TML . None of the corrected statistics literally follows χdf2 . This is because, except for a linear model with normally distributed errors, there does not exist a transformation to the likelihood ratio statistic such that the resulting statistic literally follows χdf2 (see Fujikoshi, 2000). Along this line of research, we aim to obtain simple corrections that are easy to implement and the resulting statistics also more closely follow χdf2 than existing ones. Let ce be a function of certain characteristics of the data and model. Following the idea of Bartlett correction, we propose to empirically estimate ce so that E(TMLe ) is as close to df as possible, where TMLe = cˆe TML = Ne FML with Ne = cˆe (N − 1). Thus, the principle of our correction is the same as that of the Bartlett correction. We will refer existing corrections to TML as analytical corrections and the corrections involving estimating ce empirical ones. The approach of implementing our empirical correction can be compared to the parametric bootstrap, where critical values or limits of confidence intervals are estimated by replications from an estimated population with a given model (Efron & Tibshirani, 1993). Because our interest is in correcting TML for general SEM models, we estimate ce using independent samples from many conditions with normally distributed populations. Our data are partially from the literature where values of TML or its average are reported. We will also conduct new simulations for various combinations of N , p, m, and q to obtain additional values of TML . As we shall see, empirically corrected statistics indeed perform a lot better than TMLs according to established statistical criteria, and they also control type I errors more accurately. In addition to TML , there are other statistics for SEM and some of them account for nonnormally distributed data. The reason for us to focus on TML is because it is most widely used and, many times, the only test statistic being reported. An improvement to TML has the potential to affect the practice of the SEM community most. Also, under certain conditions, TML may asymptotically follow χdf2 even when data are nonnormally distributed, a property called asymptotic robustness (Amemiya & Anderson, 1990; Kano, 1992; Mooijaart & Bentler, 1991; Satorra & Bentler, 1990; Yuan & Bentler, 1999a). Another reason for us to choose to work on TML is because multiple proposals to its correction exist. For a new methodology such as empirical correction, we would like to know how it works when compared with existing corrections of TML before applying the methodology to other statistics. Section 2 contains the methodology used to obtain empirically corrected statistics. Section 3 obtains several empirically corrected statistics and compares them against analytically corrected ones. Type I error controls of the empirically and analytically corrected statistics are examined in Section 4 by Monte Carlo. These statistics are further compared by their performances in predicting type I errors of TML in Section 5, using data reported in the literature. Examples contrasting different statistics with real data are given in Section 6. Discussion and recommendations are given in the concluding section.

PSYCHOMETRIKA

2. Models for Empirical Correction, Their Estimation and Evaluation This section contains the elements of empirical correction, which include the formulation of the correction factor ce and method for its estimation. Criteria for comparing different formulations of ce are also described. Let ce = g(β) be a function that may depend on the characteristics of the sample and model. The idea of empirical correction is to estimate ce such that the corresponding statistic TMLe = cˆe TML approximately satisfies E(TMLe ) = df . Thus, the model behind the empirical correction is E(TML ) = df /g(β).

(2)

Let T¯ML be the sample average of TML across Nr replications. It follows from (2) that we may regard T¯ML as following the nonlinear regression model df + ε, T¯ML = g(β)

(3)

where ε is an error term satisfying E(ε) = 0. Notice that, because the g(β) in (3) does not contain any random variable, T¯ML and ε have the same variance. Denote the variance of TML as v 2 . When the number of replications Nr is large enough, it follows from the standard central limit theorem (see e.g., p. 236 of Casella & Berger, 2002) that ε ∼ N (0, v 2 /Nr ) or   (4) T¯ML ∼ N df /g(β), v 2 /Nr . Because a chi-square distribution can be well approximated by a normal distribution when df is large (see Evans, Hastings & Peacock, 2000, p. 54) and a large p typically corresponds to a large df in the context of SEM, TML already approximately follows a normal distribution when p is large. We expect the approximation in (4) to work well when Nr is not too small. In principle, the g(β) in (2)–(4) can be an arbitrary function and the parameter vector β can include the structural parameter θ as its elements. But we will only consider (N − 1)g(β) as linear functions of N , p, m,1 and the number of model parameters q for parsimonious reason. Another reason for us to choose such a form of g(β) is because Nb = (N − 1)g(β) corresponding to Bartlett’s correction for EFA models is also a linear function of N , p and m. Because, under the null hypothesis of a correct model, the distribution of TML does not depend on θ asymptotically, and Monte Carlo results in Jackson (2001) suggest that E(TML ) practically does not depend on θ , we do not let θ be part of β. In our development for empirical correction with large p, we will assume that the SEM model is formulated through latent variables and the factor loadings are unknown in advance. Under such an assumption, the number of parameters is no less than that of a 1-factor model with q = 2p. Thus q1 = q − 2p is the number of parameters beyond that of a 1-factor model. Since p and 2p are linearly related, we will consider g(β) in the form of g(β) = 1 − (β0 + a1 β1 + · · · + ak βk )/(N − 1) = 1 − a β/(N − 1),

(5)

where a = (1, a1 , . . . , ak ) with p, m and q1 being the candidates for aj . With a proper estimate βˆ = (βˆ0 , βˆ1 , . . . , βˆk ) of β, the corresponding corrected statistic is   TMLe = N − 1 − (βˆ0 + a1 βˆ1 + · · · + ak βˆk ) FML   = N − (αˆ + a1 βˆ1 + · · · + ak βˆk ) FML , (6) where αˆ = 1 + βˆ0 . 1 If the model possesses a hierarchical factor structure, m will be the number of factors at the first level only.

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

We next consider how to estimate the β in (4) or (5) by ML. Suppose we have obtained T¯ML for Nc conditions. The conditions include different models and samples with different number of variables and sizes from normally distributed populations. Let T¯i , pi , mi , q1i , df i , Ni , Nri and vi2 denote the T¯ML , number of variables, number of latent factors, number of model parameters beyond that of a 1-factor model, degrees of freedom, sample size, number of replications, and the variance of TML for the ith condition, respectively. Let ni = Ni − 1 and gi (β) = 1 − ai β/ni . Then the average of the log likelihood function of β based on (4) is given by l(β) =

Nc 1  li (β), Nc i=1

where

 2 2 v Nri  1 gi (β)T¯i − df i . li (β) = − log i − 2 2 Nri 2gi (β)vi2

Taking the differential of li (β) yields

¯ Ti ai ai Nri df i ∂li (β) ¯i − df i )ai − = β + ( T l˙i (β) = ∂β ni ni gi3 (β)vi2 and the ML estimate (MLE) βˆ of β satisfies the normal estimating equation Nc 1  ˆ = 0. l˙i (β) Nc

(7)

i=1

Notice that (7) involves the variance parameter vi2 that needs to be estimated before we solve the equation. Because we do not know the distribution of TML , maximum likelihood estimate of vi2 cannot be obtained. Since vi2 is only a nuisance parameter in empirically correcting TML , we can substitute it by a consistent estimate, which will not affect the asymptotic efficiency of the βˆ that satisfies (7) (Cox & Hinkley, 1974). With the sample variance si2 of Ti across Nri replications being available, we estimate vi2 by si2 . Let wi (β) = Nri df i /[ni gi3 (β)si2 ]. Then Equation (7) can be solved iteratively using β

(l+1)

=

N c  wil i=1

ni

−1 T¯i ai ai

Nc 

wil (T¯i − df i )ai ,

i=1

where wil = wi (β (l) ) = Nri df i /[ni gi3 (β (l) )si2 ]. It is clear that βˆ depends on g(β). Among viable candidates of g(β), we need to find one that fits (2)–(4) the best. For a given g(β), let   Q(β) = E l(β) , where the expectation is defined on TML across all conditions of normally distributed sample and model. With a continuously differentiable g(β) and under standard regularity conditions, there exists a β ∗ that maximizes Q(β), and, as the number of conditions Nc properly increases, the MLE βˆ converges to β ∗ with probability 1.0 (see White, 1982). The process of our empirical correction is to approximate β ∗ by βˆ corresponding to the g(β) that fits the model in (2)–(4) the best. We will compare different formulations of g(β) using the Bayesian information criterion

PSYCHOMETRIKA

(BIC). Since there are Nc + k + 1 parameters being estimated, the BIC associated with (4) is given by BIC =

 2  Nc   2 s Nri  ˆ ¯ log i + + (Nc + k + 1) log(Nc ). gi (β)Ti − df i ˆ 2 Nri gi2 (β)s i i=1

(8)

Notice that the first term within the curly brackets in (8) as well as Nc log(Nc ) do not change with different formulations of gi (β). We may drop them and rewrite the BIC as Nc  

BIC =

i=1

 2 Nri  ˆ ¯ + (k + 1) log(Nc ). gi (β)Ti − df i ˆ 2 g 2 (β)s i

(9)

i

Also notice that the standardized residuals corresponding to (3) and (4) are given by 1/2

ui =

1/2

Nri N (T¯i − df i /gi ) = ri (gi T¯i − df i ), si gi si

i = 1, 2, . . . , Nc .

Thus, the BIC in (9) is simply the sum of squared standardized residuals plus a constant. We could obtain standard errors (SE) from the corresponding information matrix to judge the significance of each βˆj . But our interest is mainly in obtaining a proper g(β). In particular, the significance of βˆj is closely related to the value of Nc , and the effect of βˆj on the TMLe in (6) is through the value of aj βˆj rather than the significance of βˆj . Thus, we will not judge the contribution of βˆj by its statistical significance level. In addition to BIC for each corrected statistic, we will also examine the mean of residuals (meanres ), mean of standardized residuals (meansres ), mean sum of squared residuals (MSSres ), mean sum of squared standardized residuals (MSSsres ), these are, respectively, given by meanres =

Nc   1  ˆ T¯i − df i , gi (β) Nc i=1

meansres =

 Nc  1/2  Nri  1  ˆ T¯i − df i , gi (β) ˆ Nc i=1 gi (β)si

MSSres =

c    1 ˆ T¯i − df i 2 , gi (β) Nc − (k + 1)

N

i=1

and  Nc   2 Nri  ˆ ¯ 1 MSSsres = . g (β)Ti − df i 2 ˆ 2 i Nc − (k + 1) i=1 gi (β)si According to (4), we would expect MSSsres ≈ 1.0 if the correction is perfect. If MSSsres is substantially above 1.0, then the corresponding correction can be systematically improved. Since g(β) is applied directly to TML , not its weighted version, we would like to have MSSres as small as possible when comparing different corrections. Of course, a good correction should also correspond to both meanres and meansres being close to zero. Although existing analytical corrections to TML are not developed according to the model in (2)–(4), the means of the corrected statistics need to equal df if they follow χdf2 . Consequently,

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

the nonlinear regression model in (3) or the normal distribution approximation in (4) equally holds for an analytically corrected statistic if it follows χdf2 . Thus, the model in (2)–(4) offers a platform to compare different corrections to TML by putting them in the format of g(β)TML . The widely used TML corresponds to g = 1.0. The well-known Bartlett correction in exploratory factor analysis corresponds to   gb = 1 − (2p + 5)/6 + 2m/3 /(N − 1).

(10)

The correction of Swain (1975) corresponds to       12(N − 1)df . gs = 1 − p 2p 2 + 3p − 1 − hq 2h2q + 3hq − 1

(11)

The correction given in Yuan (2005) corresponds to   gy = 1 − (2p + 7)/6 + m/3 /(N − 1).

(12)

We will evaluate the fit of each of these analytical corrections to the model in (2)–(4) in the next section. In addition, we also include the Bartlett correction to TML for EFA with m = 1 in our study. This is because the number of factor loadings in a SEM or CFA model often equals that of a 1-factor model. The g(β) corresponds to this correction is given by gb1 = 1 − (p/3 + 3/2)/(N − 1),

(13)

and we denote the resulting statistic by TMLb1 . Obviously, none of the corrections in (10)–(13) involves unknown parameters. Thus, with k + 1 = 0, the BIC in (9) further reduces to BIC =

 Nc   Nri 2 ¯ , (g − df ) T i i i gi2 si2 i=1

which is just the sum of squared standardized residuals ui . In summary, empirical correction obtains statistics whose means are as close to df as possible. Since any statistic in SEM has to be endorsed by Monte Carlo studies before it can be recommended for routine use, and the empirically corrected statistics directly utilize the Monte Carlo results, we expect the empirically corrected statistics to be better approximated by χdf2 than analytically corrected ones. In particular, TML is asymptotically pivotal, that is, its asymptotic distribution under the null hypothesis does not depend on any unknown parameters of the population distribution. For asymptotically pivotal statistics, Monte Carlo correction achieves the same order of accuracy as the Bartlett correction. Readers interested in the relationship between the accuracy of Monte Carlo correction and pivotal statistics are referred to Beran (1988) and Hall (1992).

3. Empirically Corrected Statistics Versus Analytically Corrected Ones In this section we first describe the conditions under which our data for T¯ML are obtained. Then we evaluate the fit of analytically corrected statistics with respect to the model in (2)–(4). We next empirically estimate several formulations of g(β) and evaluate their fit to the model in (2)–(4). Last, standardized residuals following the empirically and analytically corrected statistics are examined in graphics.

PSYCHOMETRIKA

Part of our data are from published simulation studies on TML with normally distributed populations. Since our aim is to correct TML as defined in (1), not a truncated2 TML , we need to avoid studies when not all replications are converged or when certain TML are artificially deleted due to improper solutions. Our search of the literature found six values of T¯ML based on Nr = 200 replications from Table 3 of Hu, Bentler and Kano 1992; seven values of T¯ML based on Nr = 500 from Table 3 of Bentler and Yuan (1999); four values of T¯ML based on Nr = 200 from Table 1 of Curran, West and Finch (1996); and four values based on Nr = 200 from Table 3 of Fan, Thompson and Wang (1999). The model in Fan et al. (1999) contains cross-loadings and the structural model is not saturated, whereas the measurements in the other three studies are all unidimensional and the latent factors are freely correlated. These 21 conditions are listed in the first four rows of Table 1. In addition to the above 21 values of T¯ML , we also obtained 321 values of T¯ML based on normally distributed populations with Nr = 500, p = 6, 8, 12, 16, 24, 32, 40, 48, and the number of factors ranges from m = 1 to m = p/2, respectively. The number of indicators on each factor is the same. Because a SEM model can be equivalently represented as a CFA model (see e.g., Browne et al., 2002), we only considered CFA model for the 321 conditions. Since the distribution or expectation of TML practically does not depend on the value of the structural parameters (see Jackson, 2001), all the population factor loadings are set at 0.70. The factors are freely correlated with population correlations being set at 0.50, and the errors are independent with unique variance at 1 − 0.72 = 0.51. Notice that each factor needs to have at least two indicators for the model to be identified, m attains its smallest value at 1.0 and maximum value at p/2 in our design. When not all replications reach convergences (mostly at m = p/2 and small N ), we increased the factor loadings from 0.70 to 0.80 to 0.85, and continue to 0.95 until all replications reach convergence.3 The fact that larger factor loadings yield better convergence rate was noted by Boomsma (1982), and Anderson and Gerbing (1984). The condition will be excluded if the number of convergences is less than 500 when all the population factor loadings are at 0.95. We could do a literature review to find the range of N that has been used in published studies. Considering that the SEM literature has repeatedly warned against small N (e.g., Boomsma, 1982; Anderson & Gerbing, 1984; MacCallum & Austin, 2000), it is very likely that, when N is not large enough, researchers were discouraged from using SEM or could not get their manuscripts published. Since our purpose is to correct TML with large p and/or small N , we choose N as small as possible under the condition that all replications converge. In particular, for a given p, we aim to obtain corrected statistics that are applicable when N ≥ max(50, 2p). Our smallest N approximately equals 2p when p > 25 or 50 when p ≤ 25. The conditions of N for the 321 newly simulated T¯ML are listed in Table 1 and so are their corresponding case IDs in parentheses. In summary, our data come from Nc = 342 conditions with 14 × 200 + 328 × 500 = 166, 800 independent TML . Standard deviations of TML are reported in Hu et al. (1992) and Bentler and Yuan (1999). These are further transformed to sample variances s 2 and used to estimate the corresponding v 2 in (4). The sample variance of TML across the 500 replications for each of the 321 new conditions is also recorded and used to estimate the corresponding v 2 in (4). For the studies in Curran et al. (1996) and Fan et al. (1999), only the values of T¯ML are reported. Since TML ∼ χdf2 asymptotically and there exists Var(χdf2 ) = 2df , 2T¯ML is consistent for v 2 and is used to estimate corresponding v 2 for these 8 conditions. 2 In our experience, there exist systematic differences between converged and non-converged replications as well as between samples resulting in proper and improper solutions. This was also noted by Yuan and Hayashi (2003) in the context of bootstrap simulation. 3 The convergence is defined as, within 300 iterations, the maximum difference for all elements of θ between two consecutive iterations is smaller than 0.0001.

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA TABLE 1.

342 conditions on T¯ML and the corresponding case ID.

(Case ID)

N

q1

df

p

m

q

(1–6) (7–13) (14–17) (18–21)

150,250,500,1000,2500,5000 60,70,80,90,100,110,120 100,200,500,1000 100,200,500,1000

15 15 9 12

3 3 3 4

33 33 21 31

3 3 3 7

87 87 24 47

200 500 200 200

Nr

(22–29) (30–37) (38–43)

50,60,80,100,150,200,300,500 50,60,80,100,150,200,300,500 80,100,150,200,300,500

6 6 6

1 2 3

12 13 15

0 1 3

9 8 6

500 500 500

(44–51) (52–59) (60–65)

50,60,80,100,150,200,300,500 50,60,80,100,150,200,300,500 80,100,150,200,300,500

8 8 8

1 2 4

16 17 22

0 1 6

20 19 14

500 500 500

(66–73) (74–81) (82–89) (90–97) (98–103)

50,60,80,100,150,200,300,500 50,60,80,100,150,200,300,500 50,60,80,100,150,200,300,500 50,60,80,100,150,200,300,500 80,100,150,200,300,500

12 12 12 12 12

1 2 3 4 6

24 25 27 30 39

0 1 3 6 15

54 53 51 48 39

500 500 500 500 500

(104–111) (112–119) (120–127) (128–133)

50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 100,150,200,300,500,1000

16 16 16 16

1 2 4 8

32 33 38 60

0 1 6 28

104 103 98 76

500 500 500 500

(134–141) (142–149) (150–157) (158–165) (166–173) (174–181) (182–187)

50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 50,70,100,150,200,300,500,1000 100,150,200,300,500,1000

24 24 24 24 24 24 24

1 2 3 4 6 8 12

48 49 51 54 63 76 114

0 1 3 6 15 28 66

252 251 249 246 237 224 186

500 500 500 500 500 500 500

(188–195) (196-203) (204–211) (212–219) (220–225)

60,80,100,150,200,300,500,1000 60,80,100,150,200,300,500,1000 60,80,100,150,200,300,500,1000 60,80,100,150,200,300,500,1000 100,150,200,300,500,1000

32 32 32 32 32

1 2 4 8 16

64 65 70 92 184

0 1 6 28 120

464 463 458 436 344

500 500 500 500 500

(226–233) (234–241) (242–249) (250–257) (258–265) (266–273) (274–279)

80,100,150,200,300,500,1000,1500 80,100,150,200,300,500,1000,1500 80,100,150,200,300,500,1000,1500 80,100,150,200,300,500,1000,1500 80,100,150,200,300,500,1000,1500 80,100,150,200,300,500,1000,1500 150,200,300,500,1000,1500

40 40 40 40 40 40 40

1 2 4 5 8 10 20

80 81 86 90 108 125 270

0 1 6 10 28 45 190

740 739 734 730 712 695 550

500 500 500 500 500 500 500

(280–287) (288–295) (296–303) (304–311) (312–319) (320–327) (328–335) (336–342)

100,120,150,200,300,500,1000,3000 100,120,150,200,300,500,1000,3000 100,120,150,200,300,500,1000,3000 100,120,150,200,300,500,1000,3000 100,120,150,200,300,500,1000,3000 100,120,150,200,300,500,1000,3000 100,120,150,200,300,500,1000,3000 120,150,200,300,500,1000,3000

48 48 48 48 48 48 48 48

1 2 4 6 8 12 16 24

96 97 102 111 124 162 216 372

0 1 6 15 28 66 120 276

1080 1079 1074 1065 1052 1014 960 804

500 500 500 500 500 500 500 500

PSYCHOMETRIKA TABLE 2. BICs, means and mean sum of squares corresponding to TML and four analytically corrected statistics (based on 342 conditions).

Statistic

TML

TMLb1

TMLb

TMLy

TMLs

BIC meanres meansres MSSres MSSsres

375237.630 38.571 21.266 4934.330 1097.186

4039.988 2.757 1.584 37.787 11.813

32116.465 −5.807 −4.122 239.876 93.908

5164.734 −1.525 −1.182 40.007 15.102

4571.313 2.910 1.807 40.182 13.366

2.500 0.333

1.833 0.333 0.667

2.167 0.333 0.333

α(1) β(p) β(m)

With the data described above, we first obtained the BIC, meanres , meansres , MSSres , and MSSsres for each of the four analytically corrected statistics TMLb , TMLs , TMLy and TMLb1 , corresponding to the gb , gs , gy and gb1 in (10)–(13), respectively. These are reported in Table 2. Because the statistic TML plays a central role for each of the corrections, we also include in Table 2 the results for TML for comparison purposes. Table 2 also contains the coefficients α and β corresponding to the predictors 1, p and m by putting the statistics TMLb , TMLy and TMLb1 in the format of (6), respectively. They will be compared with the coefficients obtained empirically. Among the four analytical corrections, TMLb1 fits the model in (2)–(4) the best according to the criterion BIC, followed by TMLs , TMLy and TMLb . TMLb1 also performs the best according to MSSres and MSSsres . However, TMLy corresponds to the smallest absolute meanres and meansres . This is because each of the analytical corrections is proposed heuristically, not obtained by minimizing any of the criteria in Table 2. Considering that TMLs has been shown to work best and the gs in (11) is much more complicated than the gb1 in (13), it is a little surprise that TMLs does not perform as well as TMLb1 . The results under meanres and meansres in Table 2 suggest that the gb and gy in (10) and (12) are too small and they over corrected the behavior of TML . The results also suggest that the correction in either TMLs or TMLb1 is not enough. In particular, although TMLb1 has the smallest BIC, the corresponding MSSsres = 11.813 suggests that there still exists substantial space for improvement in correcting TML . We next fit the model in (2)–(4) empirically for the g(β) in (5) corresponding to a = (1, p), (1, p, q1 ), (1, p, m) and (1, p, m, q1 ), respectively.4 The values of BIC, meanres , meansres , MSSres , and MSSsres as well as the MLEs αˆ and βˆ corresponding to each of the four formulations of g(β) are reported in Table 3. The results suggest that   TMLe (1, p, q1 ) = N − (2.381 + 0.367p + 0.003q1 ) FML ,   TMLe (1, p, m) = N − (2.299 + 0.365p + 0.038m) FML , and

  TMLe (1, p, m, q1 ) = N − (2.262 + 0.365p + 0.052m − 0.002q1 ) FML

are very comparable in fitting the model in (2)–(4), with TMLe (1, p, m) having slightly smaller BIC and MSSsres while TMLe (1, p, m, q1 ) have slightly smaller MSSres as well as slightly smaller 4 We also obtained the results of including p 2 as a predictor in (5). But the corresponding T MLe did not perform substantially better with respect to BIC. Actually, Bartlett correction for EFA also does not include the p2 term.

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA TABLE 3.

ˆ corresponding to four empirically corrected statistics BICs, means, mean sum of squares and estimated coefficients (β) (based on 342 conditions).

Statistic

TMLe (1, p)

TMLe (1, p, q1 )

TMLe (1, p, m)

TMLe (1, p, m, q1 )

BIC meanres meansres MSSres MSSsres

681.117 −0.370 −0.310 4.588 1.969

636.557 −0.403 −0.345 3.904 1.826

618.561 −0.403 −0.346 3.679 1.773

622.820 −0.400 −0.343 3.671 1.774

2.305 0.371

2.381 0.367

2.299 0.365 0.038

2.262 0.365 0.052 −0.002

α(1) ˆ ˆ β(p) ˆ β(m) ˆ 1) β(q

0.003

F IGURE 1. Standardized residuals corresponding to TML versus case id (as listed in Table 1); 292 of the 342 residuals are greater than 1.96 in absolute value; the 10 largest are in the order of 312, 320, 296, 280, 304, 328, 288, 242, 250, 226, ranging from 104.58 to 90.03.

ˆ absolute meanres and meansres . Notice that the coefficient β(p) for each of the corrections in ˆ ˆ Table 3 is much larger than the corresponding β(q1 ) or β(m), suggesting that p affects the behavior of TML more than m or q1 . The negative coefficient for q1 in TMLe (1, p, m, q1 ) suggests that the effect of m is confounded with that of q1 in modeling the behavior of TML . Also notice that the correction factors in TMLe (1, p, m) do not account for the number of parameters in θ . Since, for a given m, the number of parameters can vary a lot in either CFA or SEM, we may expect TMLe (1, p, q1 ) to be more robust than TMLe (1, p, m) in practice. Comparing Table 3 with Table 2 suggests that even TMLe (1, p) fits the model in (2)–(4) substantially better than any of the four analytically corrected statistics. In particular, the undercorrection (positive meanres ) in TMLb1 is because the coefficient associated with p is too small, while the overcorrections (negative meanres ) in TMLb and TMLy are because the coefficients associated with m are too large. The results in Tables 2 and 3 suggest that TMLs and TMLb1 are the most promising analytically corrected statistics while TMLe (1, p, m), TMLe (1, p, q1 ) and TMLe (1, p, m, q1 ) are the most promising empirically corrected ones. We next examine their fit to the model in (2)–(4) graphi-

PSYCHOMETRIKA

F IGURE 2.

Standardized residuals corresponding to T¯MLb1 versus case id (as listed in Table 1); 96 of the 342 residuals are greater than 1.96 in absolute value; the 10 largest are in the order of 320, 328, 266, 212, 312, 336, 258, 250, 280, ranging from 15.06 to 11.08.

F IGURE 3.

Standardized residuals corresponding to T¯MLs versus case id (as listed in Table 1); 115 of the 342 residuals are greater than 1.96 in absolute value; the 10 largest are in the order of 320, 280, 250, 212, 296, 242, 312, 226, 258, 204, ranging from 13.33 to 11.62.

cally. The statistic TML is also included for comparison purposes. Figures 1–6 contain the plots of the standardized residuals ui against case IDs as listed in Table 1 corresponding to TML , TMLb1 , TMLs , TMLe (1, p, q1 ), TMLe (1, p, m) and TMLe (1, p, m, q1 ), respectively. Most significant ui in Figure 1 are associated with p = 48 and N = 100, p = 40 and N = 80, p = 48 and N = 120, and p = 32 and N = 60, indicating that TML is problematic with small N and/or large p. Figure 2 suggests that most significant ui corresponding to TMLb1 are positive, and they are associated with p = 48 and N = 100, p = 40 and N = 80, and p = 32 and N = 60. Similarly, most significant ui in Figure 3 are positive, and they also correspond to large p with small N , implying that there exists a systematic discrepancy between TMLs and df in Swain’s correction. Corresponding

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

F IGURE 4.

Standardized residuals corresponding to T¯MLe (1, p, q1 ) versus case id (as listed in Table 1); 42 of the 342 residuals are greater than 1.96 in absolute value; 5 of the 10 largest are positive and 5 are negative; the positively largest 5 are in the order of 212, 320, 266, 328, 166, ranging from 4.95 to 3.10; and the negatively largest 5 are in the order of 282, 245, 338, 221, 82, ranging from −4.49 to −3.11.

F IGURE 5.

Standardized residuals corresponding to T¯MLe (1, p, m) versus case id (as listed in Table 1); 42 of the 342 residuals are greater than 1.96 in absolute value; 4 of the 10 largest are positive and 6 are negative; the positively largest 4 are in the order of 212, 320, 266, 204, ranging from 4.30 to 3.01; and the negatively largest 6 are in the order of 341, 339, 337, 336, 335, 334, ranging from −4.05 to −3.10.

to the three empirically corrected statistics, significant residuals in Figures 4, 5 and 6 are much smaller in both value and number. Except for a few outstanding ui on the right side of each figure, the residuals are about evenly distributed along the zero line. With Nc = 342, we expect about 17 cases to have |ui | > 1.96. There are 42, 42, and 41 significant ui in Figures 4–6, respectively, indicating that each empirically corrected statistic does not literally follow χdf2 . The discrepancy between the distribution of each TMLe and χdf2 was also reflected by the values of MSSsres in Table 3, which are about 1.8 rather than 1.0. According

PSYCHOMETRIKA

F IGURE 6.

Standardized residuals corresponding to T¯MLe (1, p, m, q1 ) versus case id (as listed in Table 1); 41 of the 342 residuals are greater than 1.96 in absolute value; 4 of the 10 largest are positive and 6 are negative; the positively largest 4 are in the order of 212, 320, 266, 204, ranging from 4.10 to 2.96; and the negatively largest 6 are in the order of 282, 245, 221, 314, 82, 255, ranging from −3.94 to −3.11.

to Tables 2 and 3 or Figures 1–6, what the empirical correction has achieved is statistics that are much better approximated by χdf2 than existing ones. It might be clear from the model in (2)–(4) that the empirically corrected statistics are obtained through accounting for the difference between TML and df systematically by p, m, and q1 , not simply rescaling TML . The effect of different formulations of g(β) is reflected by the values of MSSres and MSSsres , as reported in Tables 2 and 3. The ranges of the standardized residuals in Figures 1–6 are, respectively, 106.57, 17.80, 16.01, 9.43, 8.35, and 8.04, which further illustrate that the empirically corrected statistics are obtained by accounting for the difference TML − df systematically, not simply redrawing the zero line in Figure 1.

4. Control of Type I Errors The results in the previous section indicate that, when evaluated by well-established statistical criteria, the three empirically corrected statistics TMLe (1, p, q1 ), TMLe (1, p, m), and TMLe (1, p, m, q1 ) fit the model in (2)–(4) substantially better than the most promising analytically corrected TMLb1 and TMLs . In this section we use Monte Carlo to evaluate the performances of the five statistics in controlling type I errors. We will first describe the models and conditions for p and N before presenting the results. Figure 7 represents a structural model with 5 latent variables and 3 prediction errors. Eight conditions of p are used in the study: 15, 20, . . . , 50. For a given p, each latent variable is measured by p/5 indicators (without cross-loadings), which ranges from 3 to 10. In the population, all the factor loadings are set at 1.0; all the path coefficients in Figure 7 are set at 0.5; the variances of ξ1 , ξ2 , ζ1 , ζ2 , and ζ3 are set at 1.0; and the covariance between ξ2 and ξ1 is set at 0.5; each of the p measurement error/uniqueness variances is set at 0.3. These values lead to a population covariance matrix  for each p. In specifying the model for estimation, the first loading for each latent variable is set at 1.0 and the rest of the loadings are subject to estimation. Other model parameters include the p

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

F IGURE 7. A path diagram for the structural model used in evaluating type I errors empirical means of six test statistics.

measurement error variances; the 6 path loadings in Figure 7; and the variances of ξ1 , ξ2 , ζ1 , ζ2 , and ζ3 . So there are a total of q = 2p − 5 + 11 = 2p + 6 free parameters in the structural model (θ ). Thus, we have 8 models corresponding to the 8 conditions of p. We would like to note that each of the 8 models is different from any of those listed in Table 1. Since our main interest is the performance of the statistics with large p and relatively small N , sample sizes are chosen as N = 80, 100, 120, 150, 200. For each pair of p and N , 1000 independent replications of sample with size N are drawn from Np (0, ), and each sample covariance matrix is fitted to the model in Figure 7 by minimizing FML (θ ). All the 1000 replications converged for all the conditions of p and N . At the convergence of a replication, the five statistics TMLb1 , TMLs , TMLe (1, p, q1 ), TMLe (1, p, m) and TMLe (1, p, m, q1 ) are evaluated. Each of the statistics is compared against the critical values of χdf2 at the level of 0.05 and 0.10, respectively. The numbers of significant replications are recorded. At level 0.05, we would like to have each statistic to reject the model 50 times out of the 1000 replications. Table 4 contains the numbers of rejections of Tb1 = TMLb1 , Ts = TMLs , Te1 = TMLe (1, p, q1 ), Te2 = TMLe (1, p, m) and Te3 = TMLe (1, p, m, q1 ) at each pair of p and N . For each p, the averaged absolute difference (AAD5 ) between the number of rejections and the ideal number 50 across the 5 conditions of N is also included in Table 4, and the largest AAD5 among the five statistics is underlined. Due to sampling/systematic errors, few entries in Table 4 are exactly 50. At N = 80, 100 and 120, those corresponding to Tb1 and Ts tend to be much greater than 50 as p increases. At p = 50 and N = 80, those corresponding to Te1 , Te2 and Te3 are also a lot greater than 50. When N ≥ 2p, entries under the three empirically corrected statistics are reasonably close to 50. According to AAD5 , TMLs performs the worst in 7 out of the 8 conditions of p. The averaged absolute difference (AAD40 ) between the number of rejections and the ideal number 50 across the 40 conditions of (N, p) is given at the bottom of Table 4, and the AAD40 corresponding to TMLs is about 4 times that corresponding to each of the empirically corrected statistics. According to AAD40 , the three empirically corrected statistics perform about the same, with TMLe (1, p, q1 ) doing slightly better than the other two although the BIC corresponding to TMLe (1, p, q1 ) in Table 3 is slightly greater. Table 5 contains the numbers of significant replications of TMLb1 , TMLs , TMLe (1, p, q1 ), TMLe (1, p, m), and TMLe (1, p, m, q1 ) corresponding to nominal level 0.10. The performance of each statistic is similar to those in Table 4. In particular, according to AAD5 , TMLs performs the worst in 7 out of the 8 conditions of p. When N ≥ 2p, the three empirically corrected statistics perform reasonably well, with TMLe (1, p, q1 ) performing slightly better according to AAD40 . Although each of the empirically corrected statistics performs much better than the two analytically corrected ones in Tables 4 and 5, their type I error rates are still substantially higher than

PSYCHOMETRIKA TABLE 4.

Number of rejections for the model in Figure 7 by Tb1 = TMLb1 , Ts = TMLs , Te1 = TMLe (1, p, q1 ), Te2 = TMLe (1, p, m) and Te3 = TMLe (1, p, m, q1 ) at the nominal level of 0.05. With 1000 replications, the ideal number is 50.

(p)/N

Tb1

Ts

Te1

Te2

Te3

(15) 80 100 120 150 200 AAD5

54 50 45 40 55 4.8

59 54 51 41 56 5.8

50 47 41 37 55 6.0

46 45 41 37 55 7.2

46 45 41 37 54 7.0

(25) 80 100 120 150 200 AAD5

64 64 56 55 59 9.6

72 68 66 56 62 14.8

44 56 48 47 53 4.0

44 52 48 46 53 3.4

44 52 48 46 53 3.4

(35) 80 100 120 150 200 AAD5

92 63 74 66 56 20.2

98 73 80 68 59 25.6

53 37 44 57 43 7.2

52 37 43 57 43 7.2

51 37 43 57 42 7.2

(45) 80 100 120 150 200 AAD5

172 112 94 65 46 49.4

195 130 102 71 54 60.4

70 52 49 31 34 11.6

70 52 49 31 34 11.6

70 52 49 31 34 11.6

AAD40

28.0

35.5

8.7

8.9

9.0

(p)

Tb1

Ts

Te1

Te2

Te3

72 59 51 51 55 7.6

77 65 53 54 57 11.2

61 55 48 47 51 4.4

60 55 47 46 51 4.6

60 55 46 45 51 5.0

60 68 53 49 59 8.2

76 79 57 54 62 15.6

38 51 36 43 49 7.0

38 50 36 41 49 7.2

38 48 36 39 49 8.0

117 85 82 73 65 34.4

136 93 90 77 68 42.8

65 53 57 53 48 6.0

65 53 57 53 48 6.0

65 53 56 53 48 5.8

308 142 118 68 64 90.0

347 164 129 79 70 107.8

129 46 57 33 39 23.6

130 46 58 33 39 24.0

129 46 57 33 39 23.6

(20)

(30)

(40)

(50)

the nominal rates when N < 2p. This may reflect the difficulty of modeling high dimensional data with small N . With the variance of FML being rather large at small N and large p, it is not an easy task to find a statistic in the form of Ne FML whose distribution can be sufficiently approximated by χdf2 .

5. Predicting Type I Errors of TML Type I errors of TML in testing SEM models are of great interest. In particular, Monte Carlo studies in the literature have repeatedly shown that TML tends to reject correct models too often at small N and large p. In this section we show that, by using the corrected statistics in the previous section, type I errors of TML can be predicted without conducting simulation studies. The discrepancy between the predicted type I errors and empirical type I errors also gives us the information on the accuracy of approximating the distribution of the involved statistic by χdf2 .

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA TABLE 5.

Number of rejections for the model in Figure 7 by Tb1 = TMLb1 , Ts = TMLs , Te1 = TMLe (1, p, q1 ), Te2 = TMLe (1, p, m) and Te3 = TMLe (1, p, m, q1 ) at the nominal level of 0.10. With 1000 replications, the ideal number is 100.

(p)/N

Tb1

Ts

Te1

Te2

Te3

(15) 80 100 120 150 200 AAD5

103 99 94 83 104 6.2

108 103 94 84 105 7.6

99 97 89 80 101 7.2

98 94 89 80 101 8.0

96 94 89 80 101 8.4

(25) 80 100 120 150 200 AAD5

129 119 115 97 122 17.6

140 126 122 101 126 23.0

110 106 100 85 111 8.4

107 105 99 85 111 7.8

107 105 98 85 111 8.0

(35) 80 100 120 150 200 AAD5

156 121 139 122 108 29.2

175 136 151 127 113 40.4

99 88 115 98 90 8.0

98 86 113 97 90 8.4

96 84 113 97 90 9.2

(45) 80 100 120 150 200 AAD5

288 197 159 135 119 79.6

316 217 175 150 127 97.0

132 110 102 89 83 14.4

132 110 102 89 83 14.4

132 108 102 88 82 14.4

AAD40

42.6

52.3

11.2

11.4

11.7

(p)

Tb1

Ts

Te1

Te2

Te3

116 109 102 109 96 8.0

121 112 108 113 97 11.4

105 97 96 99 92 4.2

103 96 96 98 92 4.2

102 96 95 98 91 4.4

135 125 107 114 112 18.6

144 134 110 127 116 26.2

99 99 87 90 100 5.0

98 99 86 90 100 5.4

97 99 86 88 99 6.2

202 157 150 133 120 52.4

229 169 159 142 122 64.2

113 95 98 107 95 6.4

113 93 98 107 95 6.8

112 93 97 107 93 7.2

432 248 207 142 116 129.0

475 274 217 154 123 148.6

213 118 112 81 84 35.6

216 118 112 81 84 36.2

213 118 112 80 84 35.8

(20)

(30)

(40)

(50)

Suppose Ta = Na FML follows χdf2 . But we refer Th = Nh FML to χdf2 for model inference. We will call Na and Nh the working sample sizes for Ta and Th , respectively. If Nh > Na , then Th will be stochastically greater than a random variable that follows the nominal chi-square distribution. Thus, Th will commit a larger type I error when compared against the critical value c1−α from χdf2 . This type I error is given by P (Th > c1−α ) = P (Na FML > Na c1−α /Nh ) = 1 − Gdf (Na c1−α /Nh ),

(14)

where Gdf (t) is the cumulative distribution function of χdf2 with Gdf (c1−α ) = 1 − α. According to Table 3, the working sample size for TMLe (1, p, q1 ), TMLe (1, p, m) and TMLe (1, p, m, q1 ) are, respectively, Na = Ne = N − (2.381 + 0.367p + 0.003q1 ), Na = Ne = N − (2.299 + 0.365p + 0.038m),

PSYCHOMETRIKA TABLE 6. Predicted versus empirical type I errors of TML (target nominal type I error is at 5 %); the predictions are according to statistics TMLb1 , TMLs , TMLe1 = TMLe (1, p, q1 ), TMLe2 = TMLe (1, p, m), and TMLe3 = TMLe (1, p, m, q1 ), respectively; empirical type I errors are from Table 1 of Moshagen (2012), with N = 200 and based on Nr = 1000 replications.

Conditions and Ee

TMLb1

TMLs

p

m

q1

df

Ee

Ep

Dpe

Ep

Dpe

15 15 30 30 45 45 60 60 90 90

3 5 3 10 3 15 3 20 3 30

3 10 3 45 3 105 3 190 3 435

87 80 402 360 942 840 1707 1520 3912 3480

0.069 0.070 0.225 0.228 0.646 0.678 0.981 0.973 1.000 1.000

0.079 0.078 0.218 0.205 0.600 0.560 0.955 0.934 1.000 1.000

0.010 0.008 −0.007 −0.023 −0.046 −0.118 −0.026 −0.039 −0.000 −0.000

0.077 0.077 0.211 0.210 0.587 0.588 0.951 0.953 1.000 1.000

0.008 0.007 −0.014 −0.018 −0.059 −0.090 −0.030 −0.020 −0.000 −0.000

0.587

0.563

0.028

0.565

0.025

Average Conditions and Ee

TMLe1

TMLe2

TMLe3

p

m

q1

df

Ee

Ep

Dpe

Ep

Dpe

Ep

Dpe

15 15 30 30 45 45 60 60 90 90

3 5 3 10 3 15 3 20 3 30

3 10 3 45 3 105 3 190 3 435

87 80 402 360 942 840 1707 1520 3912 3480

0.069 0.070 0.225 0.228 0.646 0.678 0.981 0.973 1.000 1.000

0.082 0.080 0.239 0.227 0.663 0.634 0.977 0.970 1.000 1.000

0.013 0.010 0.014 −0.001 0.017 −0.044 −0.004 −0.003 −0.000 −0.000

0.081 0.080 0.239 0.230 0.660 0.637 0.977 0.970 1.000 1.000

0.012 0.010 0.014 0.002 0.014 −0.041 −0.004 −0.003 −0.000 −0.000

0.081 0.081 0.239 0.230 0.660 0.636 0.977 0.969 1.000 1.000

0.012 0.011 0.014 0.002 0.014 −0.042 −0.004 −0.004 −0.000 −0.000

0.587

0.587

0.010

0.587

0.010

0.587

0.010

Average

and Na = Ne = N − (2.262 + 0.365p + 0.052m − 0.002q1 ); the working sample size for TMLb1 is Na = Nb1 = N −(5/2+p/3); and that for TMLs is Na = Ns . To predict type I errors of Th = TML , we let Nh = N − 1. Instead of conducting new simulations, we use published literature on type I errors of TML to evaluate P (Th > c1−α ) and denote such empirical type I errors as Ee . Equation (14) allows us to predict Ee by Ep = 1 − Gdf (Na c1−α /Nh ). Let Dpe = Ep − Ee , which only contains sampling errors if Ta ∼ χdf2 . Otherwise, Dpe will also contain systematic errors in addition to sampling errors. Thus, the value of Dpe also reflects the accuracy of the approximation of the distribution of Ta by χdf2 . Notice that Ep is simply the probability for a χdf2 ∗ ∗ random variable to be above the new critical value c1−α = Na c1−α /Nh . Since c1−α changes as Na /Nh varies, the values of Dpe allow us to evaluate the distribution of Ta in a much wider range than just the probability on the right tail, as reported in the previous section. Table 1 of Moshagen (2012) contains empirical type I errors of TML for 10 different models with nominal level at 5 %. These errors were obtained based on Nr = 1000 replications at

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

sample size N = 200. Moshagen’s Ee s, together with p, m, q1 and df for the 10 models, are reported on the left side of our Table 6. The right side of Table 6 contains the predicted type I errors Ep and the difference Dpe according to the statistics TMLb1 , TMLs , TMLe1 = TMLe (1, p, q1 ), TMLe2 = TMLe (1, p, m), and TMLe3 = TMLe (1, p, m, q1 ), respectively. The largest entry of Dpe for each statistic among the 10 models is underlined. The averages of Ep , Ee and |Dpe | across the 10 models are included at the bottom of the table. The Ee s are predicted well by the Ep s corresponding to TMLe1 , TMLe2 , and TMLe3 for most of the conditions. The fact that Dpe does not equal zero can be due to sampling error in Ee or the approximation error because of the imperfectness of the empirical corrections. Values of the largest Dpe in each column as well as the average of |Dpe | imply that the three empirically corrected statistics work about equally well, with TMLe2 performing slightly better than TMLe1 and TMLe3 ; and TMLs or TMLb1 does not work as well. Tables 2, 3 and 4 of Herzog et al. (2007) contain empirical type I errors of TML for sample sizes N = 200, 400 and 800, respectively. The conditions on p, m, q1 and df as well as Ee are included on the left side of our Table 7. The right side of Table 7 contains the predicted type I errors Ep and Dpe corresponding to TMLb1 , TMLs , TMLe1 = TMLe (1, p, q1 ), TMLe2 = TMLe (1, p, m), and TMLe3 = TMLe (1, p, m, q1 ), respectively. According to the average of |Dpe | and the largest Dpe in each column, the three empirically corrected statistics perform about equally well, and the two analytically corrected ones do not perform as well.

6. Contrasting Different Statistics with Real Data In this section, we further compare the different statistics through real data examples. The statistics being compared include TML , TMLs , TMLb1 , TMLe1 = TMLe (1, p, q1 ), TMLe2 = TMLe (1, p, m), and TMLe3 = TMLe (1, p, m, q1 ). Because an F -statistic proposed in Yuan and Bentler (1999b) has been shown to work reasonably well with small samples (Bentler & Yuan, 1999), part of our comparison also includes the F -statistic TF . The study of Holzinger and Swineford (1939) contains test scores of N = 145 students on p = 24 subjects/variables. The first 4 subjects (visual perception, cubes, paper form board, lozenges) are designed to measure spatial ability (f1 ), the next 5 subjects (general information, paragraph comprehension, sentence completion, word classification, word meaning) are designed to measure verbal ability (f2 ), the following 4 subjects (addition, code, counting dots, straight-curved capitals) are designed to measure the speed (f3 ) in performing the tasks, the next 6 subjects (word recognition, number recognition, figure recognition, object number, number figure, figure-word) are designed to measure memory (f4 ), and the last 5 subjects (deduction, numerical puzzles, problem reasoning, series completion, Woody–McCall mixed fundamentals form I) are to measure the students’ mathematical ability (f5 ). Standardized measure of the sample multivariate kurtosis (Mardia, 1970) for this data set is 1.729. Thus, we may regard the data as approximately following a multivariate normal distribution. Following the original design of Holzinger and Swineford (1939), we first fit the sample by a unidimensional CFA model (M1 ), which has 242 degrees of freedom. The corresponding 2 . Other test statistics are reported in TML = 355.700 is highly significant when referred to χ242 Table 8, and all are highly significant although their corresponding p-values (pv ) are greater than that corresponding to TML . This might be expected since a 9-variable subset (visual perception, cubes, lozenges; paragraph comprehension, sentence completion, word meaning; addition, counting dots, straight-curved capitals) of the sample cannot be adequately fitted by a unidimensional 3-factor model (Jöreskog, 1969). As with the analysis of the 9-variable subset (Sörbom, 1989), we next fit alternative models following the suggestion of model modification. For the unidimensional 5-factor model and the 24 variables, the default of the Lagrange Multiplier (LM)

4 6 8 10 12 14 16

12 18 24 30 36 42 48

4 6 8 10 12 14 16

12 18 24 30 36 42 48

Average

m

p

Average

m

p

48 120 224 360 528 728 960

6 15 28 45 66 91 120

q1

48 120 224 360 528 728 960

df

Conditions and Ee

6 15 28 45 66 91 120

df

Conditions and Ee

q1

TABLE 7.

0.156

0.076 0.081 0.086 0.115 0.164 0.260 0.310

Ee

0.326

0.082 0.099 0.147 0.253 0.383 0.559 0.757

Ee

0.130

0.058 0.068 0.084 0.108 0.142 0.191 0.259

Ep

0.277

0.068 0.092 0.135 0.205 0.316 0.470 0.652

Ep

0.134

0.058 0.068 0.084 0.109 0.146 0.199 0.273

−0.018 −0.013 −0.002 −0.007 −0.022 −0.069 −0.051 0.026

Ep

Dpe

TMLb1

Dpe

0.039

−0.016 −0.008 −0.012 −0.043 −0.055 −0.065 −0.073

TMLs

Dpe

0.022

−0.018 −0.013 −0.002 −0.006 −0.018 −0.061 −0.037

TMLs

(b) Sample size N = 400

0.287

0.066 0.091 0.135 0.210 0.328 0.494 0.684

−0.014 −0.007 −0.012 −0.048 −0.067 −0.089 −0.105 0.049

Ep

Dpe

TMLb1

(a) Sample size N = 200

0.142

0.059 0.070 0.088 0.115 0.155 0.214 0.295

Ep

0.309

0.069 0.096 0.144 0.227 0.358 0.535 0.730

Ep

Dpe

Dpe

0.014

−0.017 −0.011 0.002 −0.000 −0.009 −0.046 −0.015

TMLe1

0.017

−0.013 −0.003 −0.003 −0.026 −0.025 −0.024 −0.027

TMLe1

0.143

0.059 0.070 0.088 0.115 0.156 0.215 0.297

Ep

0.311

0.069 0.096 0.146 0.230 0.361 0.539 0.733

Ep

Dpe

Dpe

0.014

−0.017 −0.011 0.002 0.000 −0.008 −0.045 −0.013

TMLe2

0.015

−0.013 −0.003 −0.001 −0.023 −0.022 −0.020 −0.024

TMLe2

Predicted versus empirical type I errors of TML (target nominal type I error is 5 %); the predictions are according to statistics TMLb1 , TMLs , TMLe1 = TMLe (1, p, q1 ), TMLe2 = TMLe (1, p, m), and TMLe3 = TMLe (1, p, m, q1 ), respectively; empirical type I errors are from Tables 2, 3, 4 of Herzog, Boomsma and Reinecke (2007), based on Nr = 1200 replications.

0.143

0.059 0.070 0.088 0.115 0.156 0.215 0.296

Ep

0.310

0.069 0.096 0.146 0.230 0.361 0.538 0.731

Ep

Dpe

Dpe

0.014

−0.017 −0.011 0.002 0.000 −0.008 −0.045 −0.014

TMLe3

0.015

−0.013 −0.003 −0.001 −0.023 −0.022 −0.021 −0.026

TMLe3

PSYCHOMETRIKA

4 6 8 10 12 14 16

Average

m

p

12 18 24 30 36 42 48

6 15 28 45 66 91 120

q1

48 120 224 360 528 728 960

df

Conditions and Ee

Ee

0.091

0.064 0.072 0.081 0.069 0.106 0.107 0.138

Ep

0.081

0.054 0.059 0.065 0.075 0.087 0.104 0.125 0.011

0.082

Ep 0.054 0.058 0.065 0.075 0.088 0.106 0.130

Dpe

Dpe

0.010

−0.010 −0.014 −0.016 0.006 −0.018 −0.001 −0.008

TMLs

(c) Sample size N = 800

−0.010 −0.013 −0.016 0.006 −0.019 −0.003 −0.013

TMLb1

TABLE 7. (Continued)

Ep

0.085

0.054 0.059 0.067 0.077 0.091 0.111 0.136

Dpe

0.009

−0.010 −0.013 −0.014 0.008 −0.015 0.004 −0.002

TMLe1 Ep

0.085

0.054 0.059 0.067 0.077 0.092 0.111 0.137

Dpe

0.009

−0.010 −0.013 −0.014 0.008 −0.014 0.004 −0.001

TMLe2 Ep

0.085

0.054 0.059 0.067 0.077 0.092 0.111 0.137

Dpe

0.009

−0.010 −0.013 −0.014 0.008 −0.014 0.004 −0.001

TMLe3

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

PSYCHOMETRIKA TABLE 8. Twenty four variables of Holzinger and Swineford (1939) are fitted by five confirmatory factor models: M1 is a unidimensional confirmatory factor model with m = 5 and df = 242; M2 is obtained by adding factor loading λ13,1 to M1 ; M3 is obtained by adding factor loading λ24,3 to M2 ; M4 is obtained by adding factor loading λ21,3 to M3 ; M5 is obtained by adding factor loading λ10,1 to M4 .

Model

TML

TMLs

TMLb1

TMLe1

TMLe2

TMLe3

M1

T pv

355.700 2.6 × 10−6

332.844 9.5 × 10−5

332.234 1.0 × 10−4

330.458 1.3 × 10−4

330.384 1.4 × 10−4

330.352 1.4 × 10−4

M2 (λ13,1 )

T pv

339.756 2.8 × 10−5

317.887 6.5 × 10−4

317.342 7.0 × 10−4

315.638 8.7 × 10−4

315.575 8.8 × 10−4

315.549 8.8 × 10−4

M3 (λ24,3 )

T pv

322.298 3.1 × 10−4

301.516 4.3 × 10−3

301.035 4.5 × 10−3

299.413 5.5 × 10−3

299.359 5.5 × 10−3

299.339 5.5 × 10−3

M4 (λ21,3 )

T pv

309.322 0.001

289.343 0.014

288.915 0.015

287.352 0.018

287.307 0.018

287.291 0.018

M5 (λ10,1 )

T pv

294.680 0.007

275.614 0.047

275.239 0.049

273.745 0.056

273.707 0.056

273.696 0.056

test in EQS (Bentler, 2008) suggests that letting the variable straight-curved capitals load on f1 would improve the fit of the model most. Such a parameter is also identified by model modification index for the analysis of the 9-variable subset of the sample in Sörbom (1989). The necessity for this parameter can be explained as that a student needs good spatial ability in order to speed up the test on straight-curved capitals, which is the 13th variable. Denote this model as M2 (λ13,1 ). Test statistics for this new model and the corresponding pv are given in Table 8 as well. Since all the statistics are statistically significant at 0.05 level when referred to χ241 , three more models M3 (λ24,3 ), M4 (λ21,3 ), M5 (λ10,1 ) are estimated in Table 8, where λij is the additional loading for the ith variable on the j th factor added to the previous model, as suggested by the LM test in EQS. Clearly, all the statistics for model M1 to M4 are statistically significant at the 0.05 level. The p-values for model M5 corresponding to TML , TMLs and TMLb1 are smaller than 0.05 while those corresponding to TMLe1 , TMLe2 and TMLe3 are slightly above 0.05. Notice that, although each of the models in Table 8 is quite different from those used in evaluating the performance of the statistics in Tables 2–7, the performances of the statistics are about the same. That is, TMLe1 , TMLe2 and TMLe3 are very close in value. TML tends to be a lot greater than the other statistics, and TMLs and TMLb1 tend to be somewhat greater than the three empirically corrected statistics. The sample size N is about 6 times of p in this example. With a larger p or a smaller N/p ratio, we will most likely observe greater differences among the test statistics. We did not include the F -statistic in Table 8 because its evaluation needs the sample size N to be at least greater than the model degrees of freedom. With N = 145 and the smallest degrees of freedom (with model M5 ) are df = 238, the F -statistic in Yuan and Bentler (1999b) is not defined for any of the models in Table 8. We next fit a subset of the 24 variables in order to compare TF with the other statistics. By removing the last variable (word meaning) for the verbal factor f2 , the 4 variables for the speed factor f3 , the last variable (figure-word) for the memory factor f4 , we end up with p = 18 variables and a model with m = 4 factors. The degrees of freedom of the 18-variable unidimensional CFA model are df = 129. Fitting the resulting sample to the model yields an TF = 0.874, and a p-value of 0.677 when referred to the F -distribution with 129 and 16 degrees of freedom, indicating that the model fits the data exceptionally well. However, all the other statistics in Table 9(a) suggest that the model only fits the data marginally. Notice that, unlike the other statistics, in addition to the sample covariance matrix, the formulation of TF also involves a matrix of all the sample fourth-order moments. The

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA TABLE 9. (a) Eighteen variables of Holzinger and Swineford (1939) are fitted by a unidimensional confirmatory factor model with m = 4 and df = 129.

T pv

TML

TMLs

TMLb1

TMLe1

TMLe2

TMLe3

TF

159.918 0.034

151.959 0.082

151.589 0.085

151.028 0.090

151.010 0.090

151.002 0.090

0.874 0.677

(b) Ten independent replications of TF with samples of size N = 145 drawn from a multivariate normal distribution and then transformed to have the same sample covariance matrix as the 18-variable subsample of Holzinger and Swineford (1939).

TF pv

1.624 0.133

2.850 9.8 × 10−3

1.547 0.160

1.478 0.187

5.401 2.0 × 10−4

1.025 0.513

1.648 0.126

2.528 0.018

1.567 0.152

0.962 0.580

observed different significance levels between TF and the other statistics can be due to the sample fourth-order moments. In order to see the effect of the sample fourth-order moments on TF , we obtained 10 independent replications of TF . Each sample is from a normally distributed population with p = 18 and N = 145, and the sample is transformed so that the sample covariance matrix is identical to that of the 18-variables subset of Holzinger and Swineford. The values of the ten TF s and the corresponding p-values are reported in Table 9(b), where the ten p-values vary from 2.0 × 10−4 to 0.580. Thus, we may conclude that the observed difference in p-values in Table 9(a) is due to the variability in the matrix of the sample fourth-order moments, which is of size p(p + 1)/2. The necessity for TF to involve the sample fourth-order moments is to account for possible nonnormal distribution of the sample, while TML is derived from the assumption of a normally distributed population. While TF performs reasonably well with small N (see Bentler & Yuan, 1999), it may not work well with large p, due to estimating additional features of the population distribution.

7. Conclusion and Discussion As the default statistic in essentially all SEM software, TML tends to be significant in practice, which can be due to multiple reasons. One of them is a relatively large p with a small N . The aim of this paper is to find simple corrections to explicitly account for the effect of large p and/or small N . Our results indicate that the three empirically corrected statistics TMLe (1, p, q1 ), TMLe (1, p, m), and TMLe (1, p, m, q1 ) perform about equally well in controlling type I errors. Each of them performs much better than TML and any of the analytically corrected statistics. Because TMLe (1, p, q1 ) accounts for the variation in number of parameters, we recommend it over TMLe (1, p, m) in practice when there is no obvious violation against normality and the model is formulated through latent variables with unknown factor loadings. Although TMLe (1, p, q1 ) or TMLe (1, p, m) cannot control type I errors properly when N < 2p, they perform a lot better than other available statistics. The statistic TMLe (1, p, m, q1 ) also accounts for the variation in number of parameters, the results in Sections 4 and 5 suggest that it does not have obvious advantage over TMLe (1, p, q1 ) or TMLe (1, p, m). Because the sample covariance matrices corresponding to non-converged replications tend to be further off from the structural model than those corresponding to the converged ones (Yuan & Hayashi, 2003), our data for obtaining the empirically corrected statistics have to be based on conditions where all replications are converged. For most of the intended conditions of p and N with N ≥ max(50, 2p), we managed to get all replications converged by increasing the values of factor loadings. Since the behavior of TML under the null hypothesis is practically not

PSYCHOMETRIKA

affected by the values of model parameters (Jackson, 2001), the applicability of the empirically corrected statistics is not limited to conditions with large factor loadings. However, the ratios of factor loadings to error variances do affect the likelihood of reaching a converged solution, and in practice these ratios are determined by the given sample rather than subject to manipulation. When a non-convergence occurs, one may provide proper starting values for each of the free model parameters if the software permits so, and good starting values can be obtained by fitting smaller models for subsets of the involved variables (see Yuan & Chan, 2002). If the model cannot be estimated with good starting values, then one cannot apply the empirically corrected or any other statistics. In such a case, one may need to reformulate the model or to collect more data in order to continue the research using SEM. We have showed the performance of the empirically corrected statistics in controlling type I errors and in predicting type I errors of TML . Routine use of the empirically corrected statistics also enhances the validity of model evaluation by fit indices. In particular, most widely used fit indices are based on δˆ = TML − df (see e.g., Bentler, 1990; Bentler & Bonett, 1980; McDonald & Marsh, 1990), an estimate of the noncentrality parameter δ in χdf2 (δ). When E(TML ) is much greater than df under the null hypothesis, interpreting the value of δˆ as a noncentrality parameter does not make sense. Since E(TMLe ) ≈ df under the null hypothesis, TMLe − df can be regarded as an unbiased estimate of the part of E(TMLe ) due to model misspecification. Replacing δˆ by TMLe − df corrects the positive bias due to E(TML ) > df under the null hypothesis. More discussions on the relationship between fit indices and test statistics are given in Yuan (2005). The obtained statistics TMLe (1, p, q1 ), TMLe (1, p, m) and TMLe (1, p, m, q1 ) might be improved by including more conditions and more replications in the simulated data. Considering that 342 conditions with 166,800 independent TML are used to estimate 3 parameters in ˆ ˆ ˆ β(p), β(m) and TMLe (1, p, q1 ) or TMLe (1, p, m), we do not expect substantial changes in α, ˆ 1 ) even if more conditions are used. β(q In addition to large p, heavy tails of the underlying population distribution also cause TML to be significant. Except TF , none of the studied statistics in this paper explicitly accounts for heavy tails. Other statistics that account for nonnormally distributed data and also perform reasonably well with small N include the rescaled statistic (Satorra & Bentler, 1994) and the corrected residual-based asymptotically distribution free statistic (Yuan & Bentler, 1998). However, these statistics may not be applicable with large p, due to explicitly estimating fourth-order moments (Bentler & Yuan, 1999). Empirical corrections to these statistics may result in better performed statistics that account for both nonnormality and large p. Considering that TML is asymptotically robust to certain departure of nonnormality, we would expect that TMLe (1, p, q1 ) and TMLe (1, p, m) behave reasonably well under the asymptotic robustness conditions, and further study is needed. When asymptotics or large sample statistical theory cannot provide accurate model evaluations, bootstrapping is a competitive alternative for statistical inference. Bootstrap testing for functions of covariance matrices was introduced to the SEM literature by Bollen and Stine (1993). Yung and Chan (1999) suggest that the bootstrap may be used for model evaluation with small samples. However, Yung and Bentler (1996) observed the problem of non-convergence even when N = 145 and p = 9 (N > 16p), due to duplicated observations within a bootstrap sample. The same problem of non-convergence with bootstrap inference for covariance structure models was also noted in Ichikawa and Konishi (1995) and Yuan and Hayashi (2003). Because non-converged replications cannot be regarded as following the same distribution as the converged ones, bootstrap by simply ignoring the non-converged replications is unable to control type I or type II errors properly (Yuan & Hayashi, 2003). Thus, bootstrap is not a viable option for SEM with small N or large p. The development of the paper is around correcting TML so that the mean of the resulting statistic equals the nominal degrees of freedom. Alternatively, we can also estimate β by match-

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA

ˆ ML with those of the nominal ing the empirical percentiles or critical values of TMLe = g(β)T chi-square distribution. Such corrected statistics may work well for model testing purposes. But they may work less well when used to derive model fit indices. More study is needed in this direction. Finally, we would like to note that, although we aim to correct TML with a large p, we did not intend to address the problem when p ≥ N . Actually, TML is not defined when p ≥ N due to S being singular. Similarly, Bartlett correction does not apply to the likelihood ratio statistic in EFA when p ≥ N . Interested readers in problems with p ≥ N are referred to Yuan and Chan (2008) for SEM and to Schott (2007) and Srivastava and Yanagihara (2010) for testing equality of covariance matrices.

Acknowledgements The research was supported in part by a grant from the National Natural Science Foundation of China (31271116), and in part by a grant from the Ministry of Education, Science, Sports, and Culture of Japan (22650058). We would like to thank Drs. Peter Bentler, Yutaka Kano, and Haruhiko Ogasawara for comments on earlier versions of this manuscript. References Amemiya, Y., & Anderson, T.W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. The Annals of Statistics, 18, 1453–1463. Anderson, J.C., & Gerbing, D.W. (1984). The effect of sampling error on convergence, improper solutions, and goodnessof-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155–173. Anderson, J.C., & Gerbing, D.W. (1988). Structural equation modeling in practice: a review and recommended two-step approach. Psychological Bulletin, 103, 411–423. Barndorff-Nielsen, O.E., & Hall, P. (1988). On the level-error after Bartlett adjustment of the likelihood ratio statistic. Biometrika, 75, 374–378. Bartlett, M.S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 160, 268–282. Bartlett, M.S. (1951). The effect of standardization on a χ 2 approximation in factor analysis. Biometrika, 38, 337–344. Bartlett, M.S. (1954). A note on the multiplying factors in various χ 2 approximations. Journal of the Royal Statistical Society, Series B, 16, 296–298. Bentler, P.M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246. Bentler, P.M. (2008). EQS 6 structural equations program manual. Encino, CA: Multivariate Software. Bentler, P.M., & Bonett, D.G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–606. Bentler, P.M., & Yuan, K.-H. (1999). Structural equation modeling with small samples: test statistics. Multivariate Behavioral Research, 34, 181–197. Beran, R. (1988). Prepivoting test statistics: a bootstrap view of asymptotic refinements. Journal of the American Statistical Association, 83, 687–697. Bollen, K.A., & Stine, R. (1993). Bootstrapping goodness of fit measures in structural equation models. In K.A. Bollen & J.S. Long (Eds.), Testing structural equation models (pp. 111–135). Newbury Park: Sage. Boomsma, A. (1982). The robustness of LISREL against small sample sizes in factor analysis models. In K.G. Jöreskog & H. Wold (Eds.), Systems under indirect observation: causality, structure, prediction (Part I) (pp. 149–173). Amsterdam: North-Holland. Box, G.E.P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317–346. Browne, M.W., MacCallum, R.C., Kim, C.-T., Andersen, B.L., & Glaser, R. (2002). When fit indices and residuals are incompatible. Psychological Methods, 7, 403–421. Casella, G., & Berger, R.L. (2002). Statistical inference (2nd ed.). Pacific Grove: Duxbury Press. Cox, D.R., & Hinkley, D.V. (1974). Theoretical statistics. Boca Raton: Chapman and Hall. Curran, P.J., West, S.G., & Finch, J.F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29. Efron, B., & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. Evans, M., Hastings, N., & Peacock, B. (2000). Statistical distributions (3rd ed.). New York: Wiley. Fan, X., Thompson, B., & Wang, L. (1999). Effects of sample size, estimation methods, and model specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 56–83.

PSYCHOMETRIKA Fouladi, R.T. (2000). Performance of modified test statistics in covariance and correlation structure analysis under conditions of multivariate nonnormality. Structural Equation Modeling, 7, 356–410. Fujikoshi, Y. (2000). Transformations with improved chi-squared approximations. Journal of Multivariate Analysis, 72, 249–263. Geweke, J.F., & Singleton, K.J. (1980). Interpreting the likelihood ratio statistic in factor models when sample size is small. Journal of the American Statistical Association, 75, 133–137. Hall, P. (1992). The bootstrap and Edgeworth expansion. New York: Springer. Herzog, W., & Boomsma, A. (2009). Small-sample robust estimators of noncentrality-based and incremental model fit. Structural Equation Modeling, 16, 1–27. Herzog, W., Boomsma, A., & Reinecke, S. (2007). The model-size effect on traditional and modified tests of covariance structures. Structural Equation Modeling, 14, 361–390. Holzinger, K.J., & Swineford, F. (1939). Supplementary educational monographs: Vol. 48. A study in factor analysis: the stability of a bi-factor solution. Chicago: University of Chicago Press. Hu, L.T., Bentler, P.M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112, 351–362. Ichikawa, M., & Konishi, S. (1995). Application of the bootstrap methods in factor analysis. Psychometrika, 60, 77–93. Jackson, D.L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: a Monte Carlo investigation. Structural Equation Modeling, 8, 205–223. Jöreskog, K.G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202. Kano, Y. (1992). Robust statistics for test-of-independence and related structural models. Statistics & Probability Letters, 15, 21–26. Lawley, D.N. (1956). A general method for approximating to the distribution of likelihood ratio criteria. Biometrika, 43, 295–303. Lawley, D.N., & Maxwell, A.E. (1971). Factor analysis as a statistical method (2nd ed.). New York: Elsevier. MacCallum, R.C., & Austin, J.T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201–226. Mardia, K.V. (1970). Measure of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530. McDonald, R.P., & Marsh, H.W. (1990). Choosing a multivariate model: noncentrality and goodness of fit. Psychological Bulletin, 107, 247–255. Mooijaart, A., & Bentler, P.M. (1991). Robustness of normal theory statistics in structural equation models. Statistica Neerlandica, 45, 159–171. Moshagen, M. (2012). The model size effect in SEM: inflated goodness-of-fit statistics are due to the size of the covariance matrix. Structural Equation Modeling, 19, 86–98. Nevitt, J., & Hancock, G.R. (2004). Evaluating small sample approaches for model test statistics in structural equation modeling. Multivariate Behavioral Research, 39, 439–478. Ogasawara, H. (2010). Asymptotic expansions of the null distributions of discrepancy functions for general covariance structures under nonnormality. American Journal of Mathematical and Management Sciences, 30, 385–422. Okada, K., Hoshino, T., & Shigemasu, K. (2007). Bartlett correction of test statistics in structural equation modeling. Japanese Journal of Educational Psychology, 55, 382–392 (in Japanese). Satorra, A., & Bentler, P.M. (1990). Model conditions for asymptotic robustness in the analysis of linear relations. Computational Statistics & Data Analysis, 10, 235–249. Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variables analysis: applications for developmental research (pp. 399–419). Thousand Oaks: Sage. Schott, J.R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample size. Computational Statistics & Data Analysis, 51, 6535–6542. Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384. Srivastava, M.S., & Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. Journal of Multivariate Analysis, 101, 1319–1329. Swain, A.J. (1975). Analysis of parametric structures for variance matrices. Doctoral dissertation, University of Adelaide, Australia. Wakaki, H., Eguchi, S., & Fujikoshi, Y. (1990). A class of tests for a general covariance structure. Journal of Multivariate Analysis, 32, 313–325. White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1–25. Yuan, K.-H. (2005). Fit indices versus test statistics. Multivariate Behavioral Research, 40, 115–148. Yuan, K.-H., & Bentler, P.M. (1998). Normal theory based test statistics in structural equation modeling. British Journal of Mathematical & Statistical Psychology, 51, 289–309. Yuan, K.-H., & Bentler, P.M. (1999a). On normal theory and associated test statistics in covariance structure analysis under two classes of nonnormal distributions. Statistica Sinica, 9, 831–853. Yuan, K.-H., & Bentler, P.M. (1999b). F -tests for mean and covariance structure analysis. Journal of Educational and Behavioral Statistics, 24, 225–243. Yuan, K.-H., & Chan, W. (2002). Fitting structural equation models using estimating equations: a model segregation approach. British Journal of Mathematical & Statistical Psychology, 55, 41–62. Yuan, K.-H., & Chan, W. (2008). Structural equation modeling with near singular covariance matrices. Computational Statistics & Data Analysis, 52, 4842–4858.

KE-HAI YUAN, YUBIN TIAN, AND HIROKAZU YANAGIHARA Yuan, K.-H., & Hayashi, K. (2003). Bootstrap approach to inference and power analysis based on three statistics for covariance structure models. British Journal of Mathematical & Statistical Psychology, 56, 93–110. Yung, Y.-F., & Bentler, P.M. (1996). Bootstrapping techniques in analysis of mean and covariance structures. In G.A. Marcoulides & R.E. Schumacker (Eds.), Advanced structural equation modeling: techniques and issues (pp. 195–226). Hillsdale: Erlbaum. Yung, Y.-F., & Chan, W. (1999). Statistical analyses using bootstrapping: concepts and implementation. In R.H. Hoyle (Ed.), Statistical strategies for small sample research (pp. 81–105). Thousand Oaks: Sage. Manuscript Received: 17 NOV 2012

Empirical Correction to the Likelihood Ratio Statistic for Structural Equation Modeling with Many Variables.

Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statist...
428KB Sizes 0 Downloads 0 Views