This article was downloaded by: [University of Wisconsin - Madison] On: 25 April 2013, At: 09:20 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Multivariate Behavioral Research Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hmbr20

Cattell's Scree Test In Relation To Bartlett's Chi-Square Test And Other Observations On The Number Of Factors Problem John L. Horn & Robert Engstrom Version of record first published: 10 Jun 2010.

To cite this article: John L. Horn & Robert Engstrom (1979): Cattell's Scree Test In Relation To Bartlett's Chi-Square Test And Other Observations On The Number Of Factors Problem, Multivariate Behavioral Research, 14:3, 283-300 To link to this article: http://dx.doi.org/10.1207/s15327906mbr1403_1

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/termsand-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Multivariate Behviowl Research, f979,14,288-300

CATTELL'S SCREE TEST IN RELATION TO BARTLETT'S CHI-SQUARE TEST AND OTHEIR OBSERVATIONS ON THE NUMBER O F FACTORS PROBLEM JOHN L. HORN and ROBERT ENGSTROI\II University Of Denver

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

ABSTRACT

It is demonstrated that Cattell's scree test and Bartlett's c h i q u a r e test for the number of factors are both based on the same rationale, so the former reflects statistical (subject sampling) variability and the latter usually involves psychometric (variable sampling) influences. If the alphalevel (implicit in the scree test) is set the same, the two tests slhould lead to the same conclusions. Analyses with some examples suggest that if the alpha-level for the Bartlett test is set (explicitly) in the neighbo1:hood of .0003 for sample Ns of 100 to 150, the results from applicatior~s of this test will indicate approximately the same number of factors as estimated on the basis of a scree test determined on a much larger (N 600) sa.mple. Used in this way, the Bartlett test may yield fairly good "population" estimates of the number of factors. Relationships between the Bartlett test, hence the scree test, and tests for a common factor model and for the significance of a correlation matrix are explicated.

INTRODUCTION Henry Kaiser once said (personal communication) that a r%olution to the number-of-factors problem in factor analysis is easy, that he used to make one up every morning before breakfast. But the problem, of course, is to find the solution, or a t least a asolution that others will regard quite highly if not as the best. At this point in history (after some 50 years of work and the invention of no fewer than 50 tests) the problem, conceived of in this way, is still not solved. We don't purport to solve it here; our purpose is much more modest. We aim simply to point out that Cattdl's (1966) scree test, which is rather highly regarded, can be viebwed as a kind of geometric analogue of Bartlett's (1950) chi-square test. This latter has several advantages relative to the scree test, not the least of which is that i t requires no skill in laying a This paper is dedicated to Raymond B. Cattell in recognition of his seminal and pioneering contributions to behavioral science. It is an effort to accommodate the spirit of a suggestion that the recipierits of the Cattell award for distinguished multivariate research from the Society of Multivariate Experimental Psychology should prepare a paper for publication in Multivariate Behavioral Research. Preparation of the paper was aided by grants from the National Institute of Aging (NIA), Number P R 0 1 AGOOS83, and NIMH, Number 1 R01 MH28455. We thank Jack McArdle for ;some very helpful suggestions made with respect to an earlier draft of this paper. JULY, 1979

283

John L. Horn and Robert Engstrom

straight edge along an array of points (and no knowledge of geology or mountaineering). It also has a statistical basis, although,

in view of a recent scathiqg criticism of maximum likelihood factoring (Kaiser, 1976), one is hesitant to say this in a n article that begins with mention of Kaiser's name.

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

RATIONALE FOR THE SCREETESTAND ITSSTATISTICAL NATURE Although in the most commonly used versions of factor analysis (namely, simple structure, common-factor analysis) the objective is not to estimate the principal components or principal axes factors,l still for a number of good and bad reasons it has become customary to approach solution of the number-of-factors problem by attending to the latent roots produced by these variance-maximizing procedures. Similarly, i t has become customary to assume that the factors of largest variance represent common factors and the factors of small variance represent sampling error and/or measurement error.2 Given this assumption, then if factors are considered in descending order of variance, the curve for the variances of the common factors will be distinguishable from the curve for the variances of the factors that represent error; as the slope of rubble a t the base of a mountain is distinguishable from the slope of the mountain itself, so the scree line for the variances of factors that represent error is distinguishable from the slope line for the variances of the common factors of a study. A problem with this way of thinking is that it is extremely difficult to design a study to ensure that there is a mountain of common factors and a rubble of error factors. As Cattell (1958) noted, well before he proposed the scree test, variables usually have many, many things in common, more things than the number of variables; thus there are usually more common factors than variables. This means that there are common factors that are not 1. Hereafter the term "factor" will be used to represent a component as well as a factor when the reference is to a variance-maximizing procedure such a s principal components or principal axes factoring. 2. In theory, and occasionally in practice (when good estimates of reliabilities exist), i t is possible to separate measurement and sampling error variance. In most applications of factoring methods, however, this separation is not reasonably effected (because reliability estimabs are not readily a t hand, or are not good enough to justify this nse or the factoring procedure one uses does not contain good provisions for making the separation). Usually, therefore, the two sources of error are confounded. The two kinds of error are thus regarded a s combined, and referred to simply as error, in most approaches to the number-of-factors question. This is the assumption made here.

284

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

anticipated in the design of a study and the factors that are anticipated involve error variance. How can a rationale for the sccree test abide with these conditions? The answer to this question seems Y;o be tha,t a scree line can result if many unanticipated common influences produce variance that has the character of error (as disrcussed above ant3 in footnotes 3 and 4), and the factors that are planned for involve relatively large amounts of common variance. Unanticipated eommon factors will have the character of error if the number of variables affected by each common influence is small. If tlhese small influences are of about the same magnitude, the curve representing the variances of the corresponding factors would be a horizontal line except for the fact that variance-maximizing factoring procedures force relatively more variance into early-calculated than into late-calculated factors; thus, the horizontal line can be elevated to a scree line. On the other hand, if the factors that a study is designed to indicate involve systematic va,riance in common, with four or more variables (as suggested in the design guidelines based on the analyses of Horn, 1967; Horn &z Knapp, 1973, 1974; and Humphreys, Ilgen, McGrath $r. Montanelli, 1969), then the slope for the curve representing the variances for such factors can be notably steeper than the scree line, rather like the s,lope for a m ~ u n t a i n . ~ 3. As a weed is a flower out of place, some of what is regarded a s error "is systematic variance out of place" (Horn, 1971). 4. If a study were very carefully designed to involve m, and only m, common factors operating among n variables, then there would be no need to invoke the assumption that unanticipated common factors will have the character of error. They would indeed have this character, and the scree test would be expected to yield the same results as tests based on the! assumption that slope of the root curve should change dramatically after the correct number of factors has been calculated. That is, under these conditions there should be a notable drop in the root curve just after the mth factor (component). This would be expected because the root for the mth factor would represent both common variance and error variance, whereas the root for the m + l t h factor would represent only error variance. Several number-of-factor tests are based on this rationale. However, the reasoning outlined in the body of this paper suggests that the m + l t h factor would involve common-factor variance and no d e a r inflection in the root curve need occur between the mth and m+ l t h factor. Kt, seems that often-in psychological studies, a t least-the root curve i s smooth (relative to the small inflections produced by algorithms that only imperfectly approximdte variance maximization). Such an outcome need not indicate that the inflection-change rules rest on a chimerical base; the re!sults can be interpreted as indicating that the studies in question are not well designed tol involve m and only m common factors. Nevertheless, i t seems that in many applications it is not reasonable to depend upon number of factors rules that require one to detect a notable inflection change in the rootrcurve. JULY, 1979

285

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

One conclusion we derive from this view of the matter is that the scree test will work well only when a study has been adequately designed to overdetermine common factors that do indeed represent notable systematic influences. The Cattell-Vogelman (1977) study seems to exemplify these conditions. The scree test provided clear and reasonable solutions to the number-of-factors problem in this study. This development of a rationale for a scree test also indicates that the test is based (in part) on statistical considerations in contrast to purely psychometric or mathematical considerations. The distinctions between these three ways of approaching the number-of-factor problem are not always clearly drawn, even in major textbooks on factor analysis. Briefly the distinctions may be indicated as follows: Mathematical approaches are based on considerations of linear dependence among vectors and questions about the number of independent elements that must be present in data in order to estimate factor and factor-covariance coefficients. Such considerations are indicated in the early work of Albert (1944) and Lederman (1937) and in modern treatments of issues pertaining to identification (e.g. Duncan, 1972; Joreskog, 1969). @ Psychometric approaches involve considerations of the ways in which measurement error (but see footnote 2) and the sampling of variables affect the number of factors. Such matters are well indicated in the developments (described by Horn, 1965), of Guttman (1954), Kaiser (1959) and Dickman (1960) on which the familiar root-one test is based. 8 Statistical tests pertain to the ways in which the sampling of subjects can affect the number of factors. This approach is perhaps best exemplified by the pioneering work of Lawley and Maxwell ( e . g . Lawley, 1940; Lawley & Maxwell, 1963; Maxwell, 1959). Today i t is largely associated with the developments of JGreskog (1969) and his co-workers (e.g. Joreskog & Sorbom, 1978). In accordance with these distinctions and the rationale outlined previously, the scree test can be seen to be based on both statistical and psychometric considerations. The variances of the small factors of the scree are nonzero partly in consequence of the fact that a sample of subjects is less than infinite and perfectly representative of a universe of 286

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engsitrom

inference; the correlations are inflated as a result, and therce is some capitalization on chance also in the calculation of both factor coefficients and factor variances. This means that for any given sampling (design) of variables the scree test can be expected to give different answers in small samples of subjects than in large samples. In,small samples the variance produced by errors of sampling can be expected to be large relative to the comparable variance in large samples, but the common variance produced bly a particular selection of variables and reliability of measurement can be the same under these two conditions of sampling subjects. Thus the proportion of error variance in the calculated factors can be expected to be larger in the small sample than in a big one. How will this affect the results obtained with the scree test? It seems that the answer to this question must be that the test will indicate too many factors in small samples. This follows from the rationale developed in Horn (1965) for the root-one test. The reasoning is that variance maximizing factoring procedures anpatalize on sampling error, thus elevating the variances of earlycalculated factors, and pushing back the point (in factor extraction) where the root ourve will take 011 the character of a scree. There are other ways of looking a t this matter, of course. But the major point is that the scree test has some of the properties of a statistical rule. This means that just as one shomld weigh size of sample in thinking about a result that is reporteld to be significant or not significant a t a given alpha level, so one should consider size of sample when thinking about a scree test re!sult indicating that the number of factors is such-and-so. This is as it should bre, of course, for a test that i$ based upon statistical conedderations, but it seems that this point is not always well uade!rstood. It seems that often the scree test is assumed to indicate the number of factor's independently of sample size. The need to consider overdetermination of factors (in a common f a c t ~ rstudy) and the. presence of unanticipqted, unwainted (but nonchance) common factors indicates the psychometric nature of the scree test. In the usual applications of factor analysis the scree is not determined by sampling error alone. It also invc~lves variance associated with the particular sampling of variables, this producing innumerable patterns of correlations an4 thus many "smallish" (hrrpefully) common factors. The statistical dualities of the scree test provide a basis for comparing it with other statistical testa. Such comparisons can be JULY, 1979

289

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

useful for a number of reasons, one of which is that they can suggest how statistical tests sometimes provide solutions to the wrong problem, or a t least inadequately formulated conceptions of the problem. Kaiser's (1976) review of Lawley and Maxwell (1971) indicates the problems in this regard more dramatically than most accounts of the issues (but see also Horn, 1965). Before taking up the cudgels for this battle, however, let us look briefly at the Bartlett (1951) test. This has been around for some time and is now being used commonly with maximum likelihood procedures.

This test is based on the latent roots of the residual matrix

remaining after an R matrix has been reproduced by a variancemaximizing (principal components) procedure using the factors (components) represented in The basic element of the test is what we shall refer to (folIowing Mulaik, 1972) as the B ratio (from Bartlett, 1950,1951).

where the numerator is the product of the last (n-m)-the smallest --of the ordered latent roots of an n-by-n comelation matrix, R, and the denominator contains the sum of these last (n-m) roots divided by the number and raised to the (n-m) power. Bartlett found that for Large N the following function of this B ratio is 5. The R matrix here is most likely a correlation matrix because usually only factor analysis in one sample of subjects is under consideration. Not infrequently, however, an investigator is interested in factoring a covariance matrix, particularly when more than one sample of subjects is a t hand and when developmental problems are being considered. Both of the tests considered here are applicable to these kinds of situations, although additional qualifications are needed in some applications (e.g. repeated measures). 28%

MULTIVARIATE BEHAVIORAL RESEARCH

John

L. Horn anld Robert Engsitrom

distributed approximately as chi-square:

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

where In stands for natural logarithm, and

He showed to be the degrees of freedom &51

df =l/z(n

- m + 2) ( n -

m

- 1)

A significant chi-square obtains for this test when M' does not adequately approximate R. Let us consider these matters more closely in company with the rationale for the scree test. Comparison of the Bartlett and Scree Test The denominator of the B-ratio is %he average of the last (n-m) ordered roots. An average root is a typical root. Call this A,. This is raised to the (n-m) power, which is to say that A, is multiplied by itself (n-m) times. The last (n-m) roots of R are the roots of the residual matrix. Thus the denominator of the B-ratio is the product of (n-m) typical roots of the residual matrix. The numerator of the B-ratio is the product of the Iast ~(n-m) roots, as such-i.e., not the typical root. The chi-square of the Bartlett test is therefore directly determined by the ratio of the pl.oduct; of the (n-m) last-roots, as such, to the similar product of the typical root. If the roots are approximately equal, the p~roduct of the actual roots should be approximately equd to the product of the typical root, and the B-ratio will be approximately 1.0. The logarithm of 1.0 is zero. Thus as the (n-m) last-calculated roots become approximately equal, the chi-square approaches a nearzero, insignificant value. It was noted previously that a rationale for the scree test is based on reasoning that the last-calculated (unanticipated common) factors should have approximately equal variances. :In the present context, where only error (not unanticipated common factor) variance is being considered, the reasoning is that if thie rank JULY, 1979

289

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

of R is m except for error, the last (n-m) roots should be zero within the bounds of chance; they should be nonzero only in consequence of correlation resulting by chance, and they should differ mainly only because they have been computed by an algorithm that crams as much variance as possible into early-calculated as compared to late-calculated factors. Thus Cattell's reasoning that the root-curve for the last (n-m) roots should slope as a scree is formally the same as Bartlett's reasoning that the last (nim) roots should represent only random variations of a typical root. The slope of the scree thus can be seen to represent only the variance-maximizing features of an algorithm applied to estimate (n-m) roots all of which are merely random fluc.F;uations of an average root that well represents them all. How will the Bartlett test work if there are unwanted common factors? As noted, if the rank of R is m, the typical root of the residual matrix should be zero except for random influences. However, if the number of common factors is as large as, or larger than, the number of variables, as Cattell argued in 1958, then the typical root of the residual matrix is the average of a number of common factor variances. These could be expected to be small, but even if they were not, the Bartlett test could indicate that the number of factors was m if the last (n-m) roots were approximately equal, i.e. if their product was not notably different from the comparable product (pover) of their average. Again, these are the conditions stipulated in the rationale for the scree test: If the roots for (n-m) smallish common factors are similar, then the root curve will take on the character of a scree. On the other hand, if one or more of the roots is (are) notably different from the rest, the average in the denominator of the Bratio will not well represent this root (these roots), the ratio will be notably different from 1.0, the scree line will fall below this root (these roots), and more than m factors will be indicated. Thus the Bartlett test and the scree test a r e based on very similar lines of reasoning. An alpha-level is explicit in Bartlett test, however, whereas it is only implicit in the scree test. The tests could yield quite different results as a function of selection of different alpha levels for one in comparison t o the other, but ,if this feature of the tests could be held constant, then the two should lead to the same conclusions about the number of factors. The difficulty with setting the two alphas equal is %hat there are no known procedures, or even guideline$, for setting a n alpha 290

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

Ievel for the scree test. The alpha level for the Bartlett test, on the other hand, can be easily and objectively set and studied i~elative to the number of subjects, the conditions of sampling varia~bles and other features of study design. It might be useful, therefore, to see that this test is based on essentially the same ratiomale as the highly regarded scree test. For example, by studying how choice of alpha level in use of the Bartlett tests affects the nurrtberof-factors decision one might gain some insight into the implicit alpha level adopted in using the scree test. Relation of the Bartlett Test to Some Other Well Known Tests Some of the features of the Bartlett test can be indicated by comparing it with a similar test developed by Lawley and :Maxwell (1963), This latter is based on maximum likelihood estimation

of the parameters of a common factor model

in which $ is an n-by-m set of estimated eomrno~nfactor coefficients, S) is the n-by-n diagonal matrix of estimated unniqueness, and is the estimated (from 3) covariance (correlation) ma,trix6 With a few simplifications resulting from considerations of when RR-I is approximately n, i t was found that

is distributed as chi-square with degrees of freedom

-

PI

df2 = 4/2I(n-m)2 ( m i - m)] = % ( n - m ) ( n - m - 1) 6. The model could, of course, be represented as well by:

R = h*A' + where $ i s the m-by-m matrix of estimated covariances among the common A factors, but for purposes here at hand nothing is lost by assuming @ = I, so A

AAh

[6] is a quite adequate model representation.

JULY, 1979

291

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

Here it may be noted that df2 is 2(n-m- 1) smaller than the df for the Bartlett test, as given in [3] .7 To see more explicitly how the Bartlett test relates to the Lawley-Maxwell test for nz common factors, first let IR,I represent the product of the average roots of the residual matrix. This is the denominator of the Bartlett test. Here it has been suggested that this is a determinant for an average residual matrix. Next recognize that the numerator of the Bartlett test is in fact the determinant of the residual matrix, R,. The Bartlett test may then be written as

Thus except for the fact that K # N and the fact that the degrees of freedom for [7] and [3] differ, this is the same as the test in [7] when R,, replaces fi and R, replaces R. The Bartlett test thus can be seen to be a likelihood ratio test for a model of the residual matrix. That is, R,, in the Bartlett test is as the reproduced @ in the Lawley-Maxwell test, while R, in the Bartlett test is as R itself in the test for fit of maximum-liklihood estimators of a common-factor model. A significant chi-square in the Bartlett test thus implies that an "average model" for the residual matrix is not a good model. More factors are indicated in this test in the same sense that improper estimation of the number of factors may be indicated by the likelihood ratio for estimation in the Lawley-Maxwell test (of course features of the estimation other than number of factors may indicate poor fit of a common-factor model). 7. In generating a rationale for this test Bartlett first developed a test that a correlation matrix, as such, represents nonchanee interrelationships (I.e., is statistically significant). He showed that the determinant of R can be calculated both as rz12) (1 r23.12) (1 r24.123) La] - r2n.12., . (-1)) and as Cbl IRI (A,) (A,) (An) and that -Kln IRI has a chi-square distribution when N is large. Then he reasoned that just as successive tests mirrht be run to evaluate the significance of multiple correlations, as in [a], so successive tests can be developed for the latent roots of [b]. The denominator in the 3-ratio is a rescaling of the numerator variances to a mean of unity. Bartlett suggested that this extension of the test is sound if the first on factors are well determined.

IRI

-

-

-

-

. ..

MULTIVARIATE BEHAVIORAL RESEARCH -

John h. Horn and Robert Engstrom

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

The difference between the Bartlett test and the Lawley-lMaxwell test for the common factor model is represented in the differences between the K and N multipliers of [3] and [73 and the differences between the degrees of freedom of [5] and [S]. These differences will not be considered in any detail here. It is of some interest to notice that when m is (assnmed to be) zero in the model test, it reduces to

[lo1

x2

-

= N(1n 111

- In 1RI)

- N l n IRI

.

i' $ = [[0 [O]]' + I = I and 111 = 1 This is true because 2 = under these conditions. When there are no common factors, the cornmunalities are zero and the uniquenesses are 1.0. The degrees of freedom for [lo] under these conditions will be

The Bartlett test in this case is

(see also footnote 6). Degrees of freedom are found to be the same here as in [Ill. Thus under these conditions of no common factors the Lawley-Maxwell test can be seen to be a test of an JULY, 1979

293

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Robert Engstrom

hypothesis that the model is the identity matrix, and except for the multiplier in front of In. IRI, this is the Bartlett test in 1121 that the correlation matrix is significantly different from an identity matrix. The determinant of a correlation matrix is a generalized variance representing the proportion of n-space spanned by the variable space (Green & Carroll, 1976). It can be peen in [lo] and [I21 that the tests for significance of the correlation matrix rest upon a comparison of this generalized variance, based on I, that is not affected by the correlations and a variance that is affected by the correlations (ef Horn, 1963). As noted in the development of a rationale for the Bartlett test and can be s e n clearly in [12], the essential comparison is between a product of the roots calculated by a variance-maximizing procedure that tends to make the roots as dissimilar as possible and a product (power) of the average af the roots; in the present application this average is n/n = 1.0. Thus if the rmts are notably different, the correlation matrix will be significant when judged in accordance with the tests of [lo] and [I21

.

So what does all this mean? Well, for one thing it means that one who likes the scree test could find some good reasons for using the Bartlett test, and vice versa. Both are tests of whether the (n-m) roots remaining after m roots have been used to reproduce R are notably different from the average of the lot. The fact that the Bartlett and Cattell tests are based on the same reasoning does not mean that the two will, or should, yield the same answer for a number-of-factors question when each is used in a conventional manner, or in the ways that have been used in the past, Using eonventional alpha levels, for example, the Bartlett test can indicate many more factors than one would ordinarily want to estimate (Gorsuch, 1974, p. 139; Harris & Harris, 1971). On the other hand, Browne (1968) found that the test yielded reasonable answers. Similarly, Tucker, Koopman and Linn (1969), and Cliff (1970), as well as Cattdl and Jaspers (1967), found that the scree test provided reasonable solutions to the number of factors problem under several conditions, but Linn (1968) observed several instances in which the test did not work at all well. We will not try to integrate these findings here. 2%

MULTIVARIATE BEHAVIORAL RESEARCH

John L. Horn and Robert. Eng!strorn

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

A couple of examples will be presented,, however, to indicate ways in which a conclusion reached using one of the tests can be quite different from the conclusion arrived a t using tlie other test. The results shown in Table 1 and Figure 1 are based on data that have been rather fully described by Cattell and Horn (1978). The total sample consists of 833 eighth grade children of which 829 were used in the present analysis. The scree ,test for this group indicates seven factors (see Figure 1). Thra chisquare ?or the Bartlett test for eight factors is 756 with 299 degrees of freedom. With degrees of freedom larger than 30: the expretssion

(where Z is a normal deviate and f represents degrees of freedom) is approximately normally distributed a s s normal deviate (:Beyer, 1968). For eight factors this value is 15.8, significant a t beyond the level. In fact even twenty-six factors a r e significant by this test a t the .O1 level. This indicates that in order for the Bartlett test to indicate the same number of factors a s Cattell's test one would need to set the alpha level a t a very small value indeed. Looked a t another way, this indicates that the number of factors indicated by the scree test is a conservative estimate when considered statistically; many more factors could be extracted if the statistical test were used with conventional alpha levels. On the other hand, if small samples are drawn from the large sample, the scree test tends to indicate more factors than indicated in the large sample (thus revealing the statistical qualities of the test) and when the Bartlett test is used with conventional alpha levels, i t indicates fewer factors than i t had indicated in the large sample. For the N 3 103 subjects, drawn from a subsample of 209 lower class children of the total sample, the scree test sulggests ten factors. The normal deviate for the chi-square for eleven factors is 2.32, indicating significance, to be sure, but at the .O1 level. The chi-square, test for the 19th factor is not significant a t the .05 level. Thus, if the alpha level had been set at ,007, say, the test for significance on the eleventh factor wonld not indicate significance, but the test on the 10th factor would indicate significance, and the results of the Bartlett test would agree with the c-onclusions based on the scree test. Moreover, if the alpha JULY, 1979

295

John L. Horn and Robert Engstrorn

level were set at .002 (or similar values), the Eartlett test would indicate significance for seven factors, but not for eight factors, a conclusion that agrees with the scree test results for the total sample.

FIRST

' SUBSAMRE OF

N: 103 LOWER CLASS CHILDREN u = SUBSAMPLE OF N =132 UPPER CLASS CHILDREN 6 = TOTAL SAMPLE OF N = 829 CHILDREN

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

L

ORDER

OF ROOT I N EXTRACTION

*N.B. To

prevent confusing overlap of the curves the root values have been offset, The roots for the t o t a l sample are asplotted, but the roots for the u-group have been increased by 0.5 and those for the L-group have been increased by 1.0.

Figure 1. Root Curves for Three Samples

296

MULTIVARIATE BEHAVIORAL RESEARCH

John L. Horn and Robert Engstrom Table 1 Roots (X), Chi-squares, Standard Deviates (Z), Alpha Levels for Factor Extractions in Three Sets of Data NumOrder ber

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

of Roots Root out df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Total Sample (N = 829) A X2 2 a

0 495 9.12 9983 119.9 14642.984039 59.4 2 434 1.93 2523 41.6 3 405 1.45 1750 30.7 4 377 1.19 1296 23.5 5 350 .94 999 18.2 6 324 - 8 3 851 15.8 7 2 9 9 .76 756 14.5 8 275 .74 686 13.6 9 252 .70 620 12.8 10 230 .69 564 12.2 1 1 209 .68 509 11.5 12 189 .66 453 10.7 13 170 .65 400 9.9 14 152 .63 344 8.8 15 135 .59 291 7.7 16 119 .58 251 7.0 17 104 .54 207 5.9 18 90 .51 177 5.4 19 77 -50 153 5.1 20 65 .48 130 4.8 21 54 .47 110 4.5 22 44 .46 90 4.1 23 35 .44 68 3.3 24 27 .41 48 2.5 25 20 .41 37 2.4 26 14 .38 22 1.4 9 .37 12 .8 27 28 5 .31 1 29 2 .30 0 30 .29 0

10-37 10-34 10-31 10-27 10-23 10-19 10-15 10-12 10-9 10-8 10-7 10-7 10-6 .00002 .0004 .006 .OD8 .08 .2

U-Sam le (N = 132) X x q z a 8.32 1724 27.3 3.06 960 13.4 2.26 738 9.0 1.49 587 5.8 1.28 518 4.7 1.14 464 4.0 1.07 421 3.6 .99 382 3.2 .94 346 2.9 .92 312 2.6 .83 276 2.1 .78 245 1.8 .74 220 1.6 .71 195 1.3 .69 169 1.0 .62 142 .59 I19 .55 96 -49 75 .46 60 .41 46 .37 35 .35 27 '33 19 .30 13 .26 7 .25 5 -22 2 .21 1 .19 0 0 .17

.Of2003 .0002 .0007 .002 .005 .02 .04 -06 .09 .2

L-Sample (N A x 2

-

Z

103) L

8.13 1308 19.7 2.84 747 8.2 1.75 601 5.2 1.47 538 4.4 1.42 492 3.9 .ID0004 1.23 446 3.4 -0003 1.16 410 3.2 .ID007 1.10 376 3.0.001 1.03 343 2.8 -003 .99 312 2.6 .005 .86 282 2.3 . O 1 .85 259 2.3 .01 .78 234 2.2 .01 .73 212 2.1 .02 .71 191 2.1 .02 .67 170 2.0 .02 '63 149 1.9 .03 .58 129 1.7 .04 .54 111 1.5 .06 .51 94 1.3 .1 77 1.0 .2 .45 .39 63 .8 .2 -37 53 .35 43 .31 32 .25 23: .23 18 .22 14 .18 8 .15 4 .09 0

In a sample of 132 subjects drawn from a subsample of 624 upper class children of the total sample, .the scree indicates ten factors, and the normal deviate of 2.09 for eleven factors is significant a t the .018 level. Here, however, the test for the 13th factor is not significant a t the .05 level. In this case the alpha level needs to be set in the vicinity of .01 if the BartIett test rasults are to agree with the conclusion based on the scree test in this sample; the alpha level needs to be set at about .0003 if the Bartlett-test results for this small subsample are to agree with the scree-test conclusions determined in the total sample. These examples thus support the analytic concliusions suggesting that the scree test would tend to yield an overestimate of the number of factors when samples are small. The example results also suggest that if the alpha level is set at a low value of about .0002 to ,0005 in samples of 100 to 150, application of the Baiatlett test may indicate approximately the same number of factors as would be estimated on the basis of a scree test determine~don JULY, 1979

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

J ~ h nL. Horn and Robert Engstrom

,

the basis of sampling roughly six times as many subjects. Generally, the alpha level must be set a t values that are considerably smaller than those that are most commonly used if the Bartlett and scree test results are to agree. The Bartlett test can be useful because it is convenient, objective, and explicit relative to the scree test. The laying of a straight edge to determine a scree can be more a r t than science. True, Cattell and Vogelman (1977) have presented evidence that there can be substantial consistency in the answers to the number-offactors question obtained by different users of the scree test on a given set of data. But it is one thing to establish that agreement can occur when different users apply the scree test in very well designed studies in which the answer is quite clear and quite another thing to suppose that subjectivity does not enter when researchers use the test with somewhat "dirty data" for which they have rather strong preconceptions about the number of factors. I t is diffic.ult for a reader of results based upon application of the scree test to know just how much subjectivity might have entered into the laying of the straight edge to determine the scree. In contrast, subjectivity in use of the Bartlett test can be seen explicitly in the choice of alpha level. Although subjectivity resides in how one uses the test, the obtaining of the test statistic, as such, is entirely objective. It is difficult to gather information on the scree test by accumulating results, or even experience, over several studies, but for the Bartlett test one can look a t several different alpha levels for different degrees of freedom to record how it works from one study to another. Given a particular alpha level, there i8 perfect consistency from one user to apother in determination of the number of factors for a given set of data. There is also conceptual and computational convenience in knowing that essentially the same rationale and calculations are used in the chi-square tests of [3], [7], [9], [lo], and [Ill to examine questions about how well m prinaipal components reproduce R, the f i t of a model in common fqqtor analysis, the fit of a, model of a residual matrix, and the significance of a coxrelatian matrix. Finally, too, there is value in seeing the Bartlett test embedded in, and amenable to the calculations deriving from, 8 general theory of statistics. Granted that there is much to be wary about in using this theory, as indicated in Kaiser's (1976) thoughtful ariticisms of maximum-likelihood factoring proceduras, still it is desirable to use the theory to the extent that i t can be intelligently 298

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John L. Horn and Roberk Engstrom

used and to work to develop modifications of the theory that can better serve scientific needs. But the scree test is useful. There is little doubt about it. For one thing it forces the user to consider the overall patt'ern for the ordered roots. No particular root among the (n-m) latecalculated roots determines the scree; all must be consider+d. If one is out of line, i t shows up. And the late-calculated roots are seen in contrast to those calculated early in the extraction process. In the Bartlett test, on the other hand, the information of first m roots is not used. Also each new hypothesis about the nurrlber of factors is a new set of calculations for the Bartlett test. But the scree is set once and for all when the roots have been calculated and plotted. The scree provides m overall view of the pattern of the order roots. Thus both tests can be of use. It is desirable to see .that in many respects they can tell the same story about the number of factors.

REFERENCE NOTES 1. M c k w n , K. W, F a c t d l validi.ty of a rating instruncent. Unpublished doctoral dissertation, University of Illinois, 1960. 2. Kaiser, H. F. The application of electronic computers to factor analysis. Paper presented a t s symposium on applieati~nof computers to p$ychological problems. American Psychological Association Meetings, 1959.

REFERENCES Albert, A. A. The minimum rank of a correlation matrix. Proceedings, National Academy of Sciences, 1944, 50,144-146. Bartlett, M. S. Tests of significance in factor analysis. British Joumzl of Psychology, Statistics Section, 1950, 8, 77-85. Bartlett, M. S. A further note on tests 02' signifiance in factor analysis. British Journal of Psychology, Statistical Section, 4 , 1-2. Beyer, W. B. CRC Handbook of Tables for Probability a d Statatistics (2nd Edition). Cleveland: Chemical Rubber, 1968. Browne, M. W. A comparison of factor analytic techniques. Psyckomeifrilea, 1968, 33, 267-334. Cattell, R. B. Extracting the correct number of factors in factor analysis, Educational and Psychological Measurement, 1958,18,691-738. Cattell, R. B. The scree tegt for the number of factors. MultivaricGte Blehavioral Research, 1966,1, 245-276. Cattell, R. B. The meaning and strategic use of factor analysis. Chapter 6 in Handbook of Multivaviate Experimental Psychology, R. B . Cattell (Ed.). Chicago: Rand MoNally & Go., 1966. Cattell, R. B., & Horn, J. L. A check on the theory of fluid and crystallized intelligence with descriptian of new subtest designs. Journal of Educational Measurement, 1978,15,139-164. Cattell, R. B., & Jaspers, J. A general plasmode for factor analytic exercises and research. M~ltiuariatc BehaGoral Ressareh Monographs, 1967, 3, 1-212. Cattell, R. B., & Vogelman, S. A comprehensive trial of the scree anti KG criteria for determining the number of factors, Multivariate B e h ~ n ' o r d Research, 1977,12, 289-325.

Downloaded by [University of Wisconsin - Madison] at 09:20 25 April 2013

John I.. Horn and Robert Engstrom

Cliff, N. The relation between sample and population characteristic vectors. Psycbmetm'ka, 1970,35, 163-178. Cliff, N., & Hamburger, C. Study of sampling errors in factor analysis by means of artificial experiments. Psychological Bulletin, 1967, 68, 430445. Duncan, 0. D. Unmeasured variables in linear models for panel analysis. In H. L. Costner (Ed.) : Sociological Methodology. San Francisco: JoseyBass, 1972, 36-82. Gorsuch, R. L. Factor Analysis. Philadelphia: Saunders, 1974. Green, P. E., & Carroll, J. D. Mathematical Tools for Applied Multivariate Analysis. New York: Academic Press, 1976. Guttman, R. Some necessary conditions for common-factor analysis. Psychinnetrika, 1954,19, 149-161. Harris, M. L., & Harris, C. W. A factor analytic interpretation strategy. Educational and Psy~hologicalMeasurement, 1971,3l, 589-606. Horn, J. L. Equations representing combinations of components in scoring psychological variables. Acta Psychologica, 1963, 21, 184-217. Horn, J. L. A rationale and test for the number of factors in factor analysis. Psychometrika, 1965,30,178-185. Horn, J. L. On subjectivity in factor analysis. Educational and Psychological Measurement, 1967,27, 811-820. Horn, J. L. Integration of concepts of reliability and standard error of measurement. Educational and Psychological Measurement, 1971, 31, 57-74. Horn, J. L., & Knapp, J. R. On the subjective character of the empirical base of Guilford's structure-of-intellect model. Psychological Bulletin, 1973, 80, 33-43. Horn, J. L., & Knapp, J. R. Thirty Wrongs do not make a right: 4 reply to Guilford. Psychological Bulletin, 1974, 81, 502-504. Humphreys, L. G., & Ilgen, D. Note on a criterion for the number of common factors. Educational and Psychological Measurement, 1969,29, 571-578. Humphreys, L. G., Ilgen, D., McGrath, D. & Montanelli, R. Capitalization on chance in rotation of factors. Educational and Psychological Measurement, 1969,29,259-271. Joreskog, K. G. A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 1969,34, 183-202. Jiireskog, K. G., & Siirbom, D. LZSREL ZV: Linear Structural Equations by the Ml~cmimumLikeiihood Method. Chicago: International Educational Services, 1978. Kaiser, H. F. Review of Lawley, D. N., & Maxwell, A. E. Factor Analysis As a Statistical Method. Educational and Psychological Measurement, 19'16,66, 586-589. Lawley, D, N. The estimation of factor loadings by the method of maximum likelihoo& Proceedings of the Royal Society of Edinburgh, 1940, 60, 64-82. Lawley, D. N., & Maxwell, A. E. Factor Analysis As a Statistical Method. London: Butterworth, 1963. Lawlgy, D. N., & Maxwell, A. E. Factor Analysis As a Statistical Method. New York: Elsevier, 1971. Lederman, W. On the rank of the reduced correlation matrix in multiplefactor analysis. Psychometvika, 1937,.2,85-93. Linn, a. L. A Monte Carlo approach to *he number of factors problem. P~ychometrika,1968, 83, 37-72. Maxwell, A. E. Statistical methods in factor analysis. Psyehologieal Bulletin, 1959, 56, 228-235. Mulaik, S. A. Foundations of Factor Analysis. New York: McGraw-Hill, 1972. Tucker, L. Rt,, Koopman, R. F., & Linn, R. L. Evaluation af factor analytic research procedures by means of simulated correlation matrices. Psyohometrika, 1969, 3.4, 421-460. 300

MULTIVARIATE BEHAVIORAL RESEARCH

Cattell's Scree Test In Relation To Bartlett's Chi-Square Test And Other Observations On The Number Of Factors Problem.

It is demonstrated that Cattell's scree test and Bartlett's chi-square test for the number of factors are both based on the same rationale, so the for...
1MB Sizes 0 Downloads 5 Views