This article was downloaded by: [Ohio State University Libraries] On: 08 June 2012, At: 23:48 Publisher: Psychology Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Multivariate Behavioral Research Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hmbr20

Component Analysis versus Common Factor Analysis: Some issues in Selecting an Appropriate Procedure Wayne F. Velicer & Douglas N. Jackson Available online: 10 Jun 2010

To cite this article: Wayne F. Velicer & Douglas N. Jackson (1990): Component Analysis versus Common Factor Analysis: Some issues in Selecting an Appropriate Procedure, Multivariate Behavioral Research, 25:1, 1-28 To link to this article: http://dx.doi.org/10.1207/s15327906mbr2501_1

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/termsand-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Multivariate Behavioral Research, 25 ( 1 ), 1-28 Copyright O 1990, Lawrence Erlbaum Associates, Inc.

Component Analysis versus Common Factor Analysis: Some Issues in Selecting an Appropriate Procedure Wayne F. Velicer

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

University of Rhode Island

Douglas N. Jackson University of Western Ontario

Should one do a component analysis or a factor analysis? The choice is not obvious, because the two broad classes of procedures serve a similar purpose, and share many important mathematical characteristics. Despite many textbooks describing common factor analysis as the preferred procedure, principal component analysis has been the most widely applied. Here we summarize relevant information for the prospective factor/component analyst. First, we discuss the key algebraic similarities and differences. Next, we analyze a number of theoretical and practical issues. The more practical aspects include: the degree of numeric similarity between solutions from the two methods, some common rules for the number of factors to be retained, effects resulting from overextraction, problems with improper solutions, and comparisons in computational efficiency. Finally, we review some broader theoretical issues: the factor indeterminacy issue, the differences between exploratory and confirmatory procedures, and the issue of latent versus manifest variables.

Factor analysis and component analysis are two broad classes of procedures that share a common goal: to reduce a set of p observed variables to a set of rn new variables (m < p). This reduction serves two different purposes. First, the pattern matrix A can be interpreted to describe the relationship between the original variables and the new variables. Second, scores for the m new variables can be derived to replace the original observed scores. These scores can be interpreted or employed as the basis of a subsequent analysis. A number of writers have drawn careful distinctions between these two broad classes of methods. After reviewing the distinctions, most writers recommend the use of factor analysis. In spite of these recommendations, principal component This research was partially supported by Grant CA 27821 from the National Cancer Institute and Canada Council Research Grant No. 575-0495. Send requests for reprints or further information to Wayne F. Velicer, Psychology Department, University of Rhode Island, Kingston, RI 0288 1. An early version of the article was presented at the Society for Multivariate Behavioral Research meeting, Evanston, IL, November. 1984. The authors would like to thank Lisa Harlow, James Steiger, and Roderick McDonald, for their comments on earlier versions of this article.

JANUARY 1990

1

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

analysis, a type of component analysis, remains the most widely employed of the techniques (Glass & Taylor, 1966; Pruzek & Rabinowitz, 1981). The purpose of this article is to review some of the issues involved in the selection of one of these two classes of procedures. The first section will describe the algebraic similarities and differences at the sample level. Four issues will then be discussed in the context of the analysis of data: (a) The degree of similarity between alternative solutions, (b) issues relating to the number of components retained, (c) problems with improper solutions in factor analysis, and (d) comparisons with respect to computational efficiency. Three broader theoretical issues will also be considered: (a) The factor indeterminacy problem, (b) the distinction between exploratory and confirmatory analysis, and (c)the contrast between latent and manifest variables. This article will not consider these issue(s) within the broader context of structural analysis, primarily because the theory is still developing and the limited number of empirical examples do not provide an adequate basis for discussing these issues. However, the choice between methods discussed in this article seems to involve the same issues (Bentler, 1980; Fornell & Bookstein. 1982) that would be involved in selecting between a factor analysis based approach such as LISREL (Joreskog, 1970, 1978, 198l), EQS (Bentler. 1985) or COSAN (McDonald, 1978, 1980) and a component analysis based approach such as the PLS (Partial Least Squares) approach (Wold, 1966, 1982). I t is clear that, at the present time, general procedures for structural analysis based on the factor analysis model are better developed and have been employed more extensively. Algebraic Relations

An examination of the mathematical representations of the two approaches will serve to highlight the similarities and differences. This section will present only a brief description. Readers interested in a more detailed and rigorous derivation are referred to standard texts such as Gorsuch (1983), Lawley and Maxwell (1971), McDonald (1985), Meredith and Millsap (1985), Mulaik ( 1972) or the Schonemann and Steiger ( 1976) article on component analysis. The basic problem for both factor analysis and component analysis involves the description of a set of p random variables q' = (Y,, Y ,, . . ., Y,,) in terms of m < p random variable 5 = (X,, X,. . . ., X,,) and p residuals E' = (e,,e,. . . ., e,,). Both factor analysis and component analysis can be expressed as a model of the form

2

MULTIVARIATE BEHAVIORAL RESEARCH

W. Velicer and D. Jackson

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

where A is the p x m multiple regression pattern for optimally predicting the p variates in q from the m variates in 5. In component analysis, 5 must be expressable as 5 = A'q, in which case the variance-covariance matrix of E cannot be diagonal and of full rank. In factor analysis, the variance-covariance matrix of E must be diagonal and of full rank. Alternative methods result in different sample estimates of A, the pattern matrix. For a sample of size N, the observed data can be represented as an N x p matrix Y. When Y is expressed in deviation score form, a p x p sample covariance matrix C can be represented as

Component analysis will be defined as any eigen decomposition of a covariance matrix. The most widely employed version is principal component analysis where all observed variables are transformed to standard score form and the covariance matrix is now the p x p matrix of correlations, R. Principal component analysis can be expressed as

where Dz is an m x m diagonal matrix containing the largest m eigen values and LCis thep x m matrix containing the corresponding eigen vectors. This approach is sometimes referred to as truncated principal component analysis, indicating that for some applications all p components may be retained. Alternatives to principal component analysis are described by Bartholomew (1984, 1985), Meredith and Millsap (1985) and Schonemenn and Steiger (1976). A variation of principal component analysis that has received considerable attention is image component analysis (Guttman, 1953; Harris, 1962), which can be expressed as (4)

S ' R S '= L,D:L',

where L, and D,2are the matrices of eigen vectors and values, respectively, and

The principal component pattern can be expressed as

JANUARY 1990

W. Velicer and D. Jackson

and the image component pattern can be expressed as

A, = SL, D,

(7)

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Factor analysis, as we have seen, can be viewed as a linear model relating manifest and (using the term loosely) latent variables with an important constraint on the variance-covariance structure of the latent variables. However. this version of the factor model fits a given q if and only if the variancecovariance matrix of the q variables may be written in the form

Cllll= AA' + U'.

(8)

Consequently, an alternative view of the factor analysis model is available. Specifically, the factor model fits q if and only if there exists a positive definite, diagonal matrix U' which, when subtracted from Cqll leaves a residual that is Gramian and of rank m. Regardless of whether or not the factor model fits C in the population for ?'1 a small number of factors, it is a virtual certainty that ~t will not fit a sample variance-covariance matrix C,:,perfectly. Hence, when working with a sample variance-covariance matrix, we fit the model

where E is to be kept as small as possible by choosing A and U' to minimize a loss-function. Different loss-functions yield different solutions, which often are subsumed under the generic name c.ommonfactoranalysis. Maximum Likelihood Factor Analysis (MLFA), which results when A and U2 are chosen to minimize the maximum likelihood loss function, will be considered as the exemplar of this approach. Alternative derivations from different rationale have produced the same results (Howe, 1955; Lawley, 1940, 1941; Rao, 1955) and comparative studies (Browne, 1968) support this preference. If the diagonality constraint (A' U A = Diagonal) is employed for MLFA, the result can be expressed as an eigen decomposition

'

( 10)

U-'R U - ' =LF D2L' F F

where LFand D,2 are the matrices containing, respectively, the eigen vectors and eigen values, and the corresponding factor pattern can be represented as (1 1 )

A,, = ULF(Di - 1)lJ2.

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

An examination of the two methods of analysis will demonstrate that they differ only to the extent that factor analysis involves a reduction of the variance or diagonal elements of the covariance matrix. Component analysis permits no operation on the diagonal elements that will not also affect the covariance or offdiagonal elements. It should be noted that the values of U2 are typically unknown and must be estimated from the data. Estimation in factor analysis always employs an iterative procedure where U2 is estimated by a numerical minimization algorithm, while A is estimated in closed form at each value of U' found by the algorithm. The procedure continues until convergence occurs. The presence of the U2 matrix is also the source of the indeterminacy problem (Mulaik & McDonald, 1978; Steiger & Schonemann, 1978). The extent to which the algebraic differences in the two approaches result in practical differences in the solutions will be discussed in the next section.

Similarity of Solutions An examination of the algebraic representations of the two methods of analysis has served to highlight the differences between them. However, when the same number of components or factors are extracted, the results from different types of component or factor analysis procedures typically yield highly similar results. Discrepancies are rarely, if ever, of any practical importance in subsequent interpretations. The comparisons have either focused on the similarity between the derived scores produced by the two approaches or the similarity between the patterns produced by the two methods. Studies have involved both real data examples and artificial data sets. Velicer (1976b) calculated the correlation between image component scores, principal component scores, and three alternative estimates of factor scores. Data from nine well known studies were employed. The correlations between the different types of scores were typically close to unity. More recently, Fava and Velicer (in press) confirmed these results in an extensive simulation study. The use of simulated data permitted a systematic variation of the degree of saturation, the sample size, and the population structure underlying the sample correlation matrix. When the components/factors are well defined, that is, an adequate number of variables possessing at least a moderate loading (.60), the correlation between the alternative types of scores typically exceeded .99. The only exceptions occurred in the poorly defined cases, a combination of low loadings and few variables per component, and even in these cases the correlations typically exceeded .90. Velicer (1977) employed the same nine data sets in a comparison of image component, principal component, and factor patterns. The patterns were JANUARY 1990

5

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

compared after a varimax rotation, after a promax rotation, and after a procrustes rotation to maximum similarity. Two types of comparisons were performed: a direct loading-by-loading comparison of the patterns and a summary statistic defined on the matrix of differences between the patterns. Velicer concluded that the ". . . patterns produced by each of the three methods are remarkably similar. Rotation position has little effect on the degree of similarity" (p. 20). Velicer, Peacock, and Jackson ( 1982) performed an extensive comparison of component, image, and factor patterns using simulated data sets. Under all conditions, the patterns produced by all three methods were highly similar. typically with differences only in the seconddecimal of the loadings. The degree of similarity increased as sample size and saturation increased. In other words. improvements in the quality of the data increased the degree of similarity. This similarity is easily illustrated. Following the procedure described by Velicer et al. (1982), a target pattern was used to construct a population correlation matrix. A sample correlation matrix (N = 108), generated by the Montanelli (1975) program, was analyzed by principal component analysis. image component analysis, and maximum likelihood factor analysis. Table 1 presents the target pattern and the pattern produced by each of the three methods. An examination of Table 1 will illustrate the high degree of similarity between the three patterns. Although empirical studies generally show that image and principal component analysis are slightly more similar to each other than to factor analysis, and Velicer et al. (1982) found that factor analysis fits better to the factor target and component analysis fits better to component targel, these differences are so small as to be of no practical importance. A somewhat different conclusion was reached by Snook and Gorsuch ( 1 989). On the basis of a simulation study, they concluded that component and factor analysis give discrepant solutions, particularly when the number ol' variables (p) is small. The component analysis loadings are described as systematically inflated. We view their conclusions as incorrect for two reasons. First, the differences in the numeric size of the loadings was what would be expected on the basis of the algebraic differences described in the previous section. In effect, the equivalence of the unique variance of one variable must be included in forming the linear composite for a component analysis. A numeric example from Velicer et al. (1982) will serve to illustrate this. "Consider the case of six variables which all have loadings of .80 on a factor. 'The total common variance for the set is 3.84 and the unique variance is 2.16. The unique variance of one variable (.36) is added to the common variance for component analysis, for a total of 4.20. Divided among the six variables, this results in a loading for each of the six variables in component analysis of .837." (p. 387) This is the source of the differences reported by Snook and Gorsuch and the differences are, as expected, greater for small variable problems. However. 6

MULTIVARIATE BEHAVIORAL RESEARCH

W. Velicer and D. Jackson

Table I Target Pattern and Sample Patterns from Three Methods Varimax Rotated MLFA Pattern

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Target Pattern

JANUARY 1990

7

W. Velicer and D. Jackson

Table 1 (cont.)

Varimax Rotated ICA Pattern

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Varimax Rotated PCA Pattern

8

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

there is no reason to believe that differences of the magnitude reported have any effect in practice on conclusions regarding which factor loadings are salient. Second, the use of the word inflated is perjorative and misleading. It involves the implicit assumption that the factor loading is the coi-1-ectvalue. It would be equally appropriate but similarly misleading to describe the factor loadings as deflated. A more appropriate procedure for assessing the stability of results under small variable conditions would be to compare each sample pattern to its corresponding population pattern, a procedure followed by Velicer and Fava (1987, 1990) which resulted in a very different set of conclusions. In any case, it does not seem appropriate to select a procedure on the basis of the very small differences in the numeric values of the loadings. There is no reason to believe that the differences reported will result in differences in interpretation for the patterns. Arguments that the slightly larger loading are better or worse are not likely to convince anyone. None of the studies provide support for the persistent belief that the methods produce highly discrepant solutions. One explanation for the possible source of discrepant solutions may lie in the area of overestimation of the number of components, a topic that is discussed in the next section. Number of Components (Factors)Retained An area that has received too little attention is the determination of the number of components or factors to retain. Rigorous and accurate procedures for determining the number of factors are now available, but inaccurate procedures are still widely employed. If factor analysis is the procedure of choice, at least one of the following three situations is assumed to exist: (a) The number of factors is known a priori, (b) the asymptotic chi-square statistic will accurately determine how many factors to retain, or (c) the problem is trivial and of no interest. For component analysis a diversity of alternative procedures have been proposed, including Bartlett's (1950, 195I ) test of the equality of the last p - m eigen values, Cattell's (1966) scree procedure, Horn's (1965) parallel analysis procedure, Jackson and Morf s (1973) approach to the reliability of components, Kaiser's (1960) eigen value greater than unity rule, and Velicer's (1976a) minimum average partial (MAP) correlation procedure. The most widely employed of these is the Kaiser criterion. Indeed, some critics (Comrey, 1978)of component analysis assume that principal component analysis implicitly involves the use of the Kaiser rule. The Kaiser rule has been criticized for retaining too many components (Browne, 1968; Cattell & Jaspers, 1967; Linn, 1968) and by one investigator, for retaining too few (Humphreys, 1964). Cliff (1988) has questioned the justification for this rule. Recent investigations of the accuracy of alternative JANUARY 1990

9

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

methods for determining the number of components (Hakstian, Rogers, & Cattell, 1982; Zwick & Velicer, 1982, 1986) found that the Kaiser rule was the leust accurate of the procedures studied. Typically, the number of components retained equalled one third of the number of original variables as determined by the Kaiser rule, irrespective of the actual number of components underlying the sample (Revelle & Rocklin, 1979; Zwick & Velicer, 1982; 1986). Because the Kaiser rule is the default value in most computer programs, overextraction has been a typical problem in many published studies. However, the problem of overextraction has received little attention. Possibly the problem of overextraction has received little attention because of the advice given by textbooks in the area, ranging from the early work of Thurstone (1947) four decades ago to more recent references such as that of Cattell (1978), that overextraction is not a problem, only underextraction. Comrey (1978) recently took issue with this viewpoint, pointing out that overextraction followed by a varimax rotation will result in the last retained factors being inappropriately inflated at the expense of the earlier major factors. thus distorting the interpretation. In essence, unreplicable factors are created by degrading well-defined factors. The number of components question may be related to the conflict over the similarity-dissimilarity of factor analysis and component analysis solutions. Dziuban and Harris (1973) found large differences between the patterns produced by alternative methods. The Kaiser rule was employed and threc components were retained. Velicer ( 1977) reanalyzed the same data, retaining only one component as indicated by alternative rules, and found the resulting patterns were highly similar. Lee and Comrey (1979) also employed the Kaiser rule and overextracted. They likewise reported clear differences between the patterns produced by factor analysis and component analysis. The overextraction was intentional and was attributed by the authors as the likely source of the differences. Overextraction, a very common problem with the Kaiser rule, may be the cause of observed differences between factor analysis and component analysis. Retention of the correct number of components would lead to very similar solutions. in order to illustrate this hypothesis, a target pattern was constructed for 36 variables and six components, with six variables loading on each of the six components (a,,= .50 or 0.0). This pattern was used to construct a population correlation matrix. The Montanelli (1975) program was then employed to generate a sample correlation matrix (N = 144), following the design of the Velicer et al. ( 1 982) study. The sample correlation matrix was then analyzed by maximum likelihood factor analysis, principal component analysis, and image component analysis. The resulting patterns were then compared. A summary statistic, g , was calculated. where g is the average (squared) difference between 10

MULTIVARIATE BEHAVIORAL RESEARCH

W. Velicer and D. Jackson

Table 2 Value of g for Five Alternative Solutions Number of Factors Extracted

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Procrustes Rotated Pattems F-C

F-I I-C

.OO2046 .OO 1894 .OOO 134

,007286 .O07556 ,000274

,007743 .OO5 128 .OO 1475

.01048 1 .OO8077 .OO 1472

.O03309 .OO2652 .OOO1 79

.OO8849 .OO8536 ,000675

.O14191 .O17175 .OO 1896

.O17417 .O15914 .OO5067

**** ****

Varimax Rotated Pattems F-C F-I

I-C

.OO2051 .OO1894 .OOO139

.005730

Note. rn = 14 suggested by Kaiser KI rule. rn = 6 is correct solution, MAP solution. * MLFA Failed to Converge, 150 iteration solution employed. ** Boundary attained. Variable 29 partialed out. *** Boundary attained. Variable 27 partialed out. **** No one to one match of patterns was possible.

the loadings of the two patterns compared. The square root of 8 is the well known root mean square statistic. An examination of Table 2 indicates that all three patterns were very similar when six components were retained. However, the solutions became increasing dissimilar for solutions involving 7, 8, 9, and 14 components retained. The last case, the one indicated by the Kaiser rule, resulted in patterns that could not be successfully matched in the varimax position. The results indicate that overextraction is a probable reason for cases where dissimilarities have been found between component analysis and factor analysis. The effects of overextraction is a largely unrecognized problem that warrants further investigation. Overextraction can also be a problem with factor analysis when sample sizes are large and the maximum likelihood significance test is employed. The approximate chi-square test associated with the MLFA approach is a test of the equality of the remaining p - m eigen values after m common factors have been fitted. Therefore, the logic and development of this test statistic is the same as that employed for the development of the Bartlett (1950, 195 1) test for principal component analysis. Horn and Engstrom (1979) discuss the comparability of the two tests of significance. However, neither the Maximum Likelihood or JANUARY 1990

11

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

Bartlett test may be testing the hypothesis of interest for the applied researcher. Consider a set of p variables with m common factors (m < p). Augment this initial set with a new set of q variables that are uncorrelated with the original 17 variables and with each other. The logic underlying both tests would indicate the retention of m + q factors or components. An investigator will typically only be interested in retaining and interpreting the m common factors or components. Tucker and Lewis (1973) found that the MLFA extracted more factors than a practitioner would wish to identify. Hakstian, Rogers and Cattell ( 1982) and M. B. Richman (personal communication) using simulated data found that the MLFA test consistently overestimated the number of factors known to be present. Gorsuch (1973) and Zwick and Velicer (1982, 1986) have reported similar findings with the Bartlett test. Statistical significance may be viewed as a necessary, but not sufficient condition for factor interpretability (Jackson & Chan. 1980). In addition, the MLFA and Bartlett significance tests may be criticized on other grounds. First, both tests assume a multivariate normal distribution, an assumption that is rarely tested and difficult to defend for most real world examples, particularly i f p is large. The test results in inflated values when this assumption is violated (Boomsma, 1982, 1983; Harlow, 1985). Second, the logic of the tests makes each a badness-ofyit test, contrary to the usual logic of a statistical test that would require positive evidence to retain a factor or component. Third, each test is an approximate test, with no means for evaluating the accuracy of the approximation. Approaches such as the Minimum Average Partial Correlation Test (MAP) (Velicer, 1976a),the evaluation of the reliability of component scores (Jackson and Morf, 1973),or Scree test (Cattell. 1966)have better empirical support and, from some perspectives, a superior theoretical basis than either the Kaiser rule (Cliff, 1988) or significance tests like the maximum likelihood test. Zwick and Velicer (1986) provide an extensive review of this area. The reported similarity of the two approaches in controlled studies and the overextraction problems that occur when the Kaiser rule is employed for component analysis or the maximum likelihood test is employed for factor analysis suggest some tentative conclusions. The problem of overextraction needs to be studied empirically for both factor analysis and component analysis. It potentially has different effects for each of the two methods of analysis. At a minimum, if data sets, analyzed by the two methods, produce very different results, the investigators should reevaluate the number of components (factors) retained. Alternative, more accurate procedures, should replace the common. but problematic, procedures.

MULTIVARIATE BEHAVIORAL RESEARCH

W. Velicer and D. Jackson

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Problems with Improper Solutions Factor theorists have been aware for many years (Thurstone, 1947) of the occurrence of solutions to the factor problem, which, although numerically satisfying certain conditions of estimation, are uninterpretable in terms of the common factor model. Such a situation might arise, for example, when the diagonal matrix of uniqueness estimates contains one or more elements that are equal to zero or negative. The idea that an observed variable would contain no unique (and hence no error) variance is simply not consistent with expectations based on measurement theory and the common factor model. Therefore, such a uniqueness estimate is uninterpretable classically as a legitimate partitioning of test variance. But such solutions, in spite of their status as curiosities, occur with such frequency (Joreskog, 1967; Velicer & Fava, 1987, 1990) in numerical solutions to the factor analysis problem (such as MLFA) that they constitute an embarrassment to the use of such solutions. It has been customary to constrain numerical solutions in such a way as to prevent the offending estimates from becoming smaller than a prescribed positive quantity. Thus, the uniqueness is set to some arbitrarily small positive number (Joreskog, 1967) or to zero (Harman & Fukuda, 1966). Such a practice is unsatisfactory for a number of reasons. First, the solution is arbitrary and not unique, and actually represents an abandonment of the common factors model for a mixed factor-components model (Bentler, 1976). Second, this practice has the character of dealing with the problem without necessarily understanding its cause, or of sweeping the problem under the rug without determining which one of several reasons for it (including the appropriateness of the common factor model) is responsible and which remedial act is most appropriate. Third, maximum likelihood factor analysis has not been proved to be valid in boundary minima (Driel, 1978). An appreciation of the distinct causes for improper solutions in common factor analysis may be obtained following Driel(1978), from a consideration of the different types of conditions leading to improper solutions. Driel distinguishes between admissible solutions (those obtainable when the classical constraints are dropped, such as those of positive semi-definiteness of the matrix based in the major product of the factor matrix and positive definiteness of the diagonal matrix of uniquenesses), and interpretable solutions within the sets of admissible and interpretable solutions. If the observations on which a particular common factor solution is based are viewed as samples from a population, the elements of a factor solution also are considered estimates of population parameters. When population parameter values are close to a boundary, (case A), sampling fluctuations may lead to estimates at or beyond the boundary. Alternatively, population values of columns of a factor matrix may be so discrepant in size as to cause a solution based on a particular sample to yield one or more estimated JANUARY 1990

13

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

columns that vanish, resulting in an improper solution. This problem may be exacerbated by numerical algorithms seeking to minimize some distance function based on faulty or unreliable sample data. Alternatively, no sample solutions may be interpretable (Case B). Such a condition might arise when one or more of the assumptions of the common factor model. such as normality, linearity, or linear independence of random variables do not hold for the data. or when errors have been made in transcribing the data. A third situation involves a kind of indefiniteness of solution (Case C), in which a number of values are admissible and will give almost the same solution. Numerically, the choice among these solutions may be quite arbitrary, but some may be improper in terms of the classical model. According to Driel(l978), this may occur, for example, when Anderson and Rubin's (1956) identifiability theorem is violated (cf. Thurstone, 1947), and there are fewer than three non-zero elements in any column of the factor matrix, resulting in the impossibility of a unique solution to the factor equation. This kind of unidentifiability may also occur due to sampling fluctuations in which population data approach values that would constitute a violation of the Anderson and Rubin theorem. Velicer and Fava ( 1 987, 1990) illustrated both the Case A and Case C empirically. In a study of variable sampling, when the average number of variables-per-factor ratio approached the minimum for identifiability, some factors were undersampled and boundary cases occurred with high frequency. Boundary cases also occurred with high frequency when the average variable-to-factor ratio was higher but the saturation was low. Boundary cases also occurred more frequently with small sample sizes (Velicer & Fava, 1990). The combination of low saturation variables, small sample size, and a low variable-to-factor ratio almost always produced a boundary case. The occurrence of boundary conditions and improper solutions, therefore. presents a problem for factor analysis if treated as they typically are in current practice, that is, one of a variety of statistical manipulations are employed to correct the problem. In addition to the previously mentioned approach of setting the offending estimate equal to zero, proposed alternativesinclude reparameterizing the model (Bentler, 1976) and placing constraints on the unique variances Rindskopf (1 983) or limiting the number of iterations (Snook & Gorsuch, 1989). Such corrections suffer from questionable logic and/or do not perform well in practice (Dillon, Kumar, & Mulani, 1987). However, improper solutions could be transformed from an area of concern to an advantage for factor analysis if their occurrence was treated as a diagnostic. In such a case, the investigator could consider changing the model, that is, reducing the number of factors retained to exclude uninterpretable factors, or rejecting the data set as inadequate to investigate the question of interest. The second case would include situations in which some factors of interest are not measured by an adequate number of 14

MULTIVARIATE BEHAVIORAL RESEARCH

W. Velicer and D. Jackson

variables or the number of subjects sampled is inadequate (Guadagnoli & Velicer, 1988; Velicer & Fava, 1990). Because improper solutions do not occur with component analysis, their occurrence with factor analysis can be an advantage, if treated as a diagnostic, or a disadvantage, if treated as a minor annoyance to be fixed and ignored.

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Computational Issues

A critical issue for many investigators in their selection of component analysis is computational speed. In general, component analysis will involve less computer processing time. In addition, component analysis can be performed on larger variable sets. The exact relation between the two approaches depends on anumber of aspects, including the number of variables and the actual solution. For problems with a small number of variables, the differences are minor while for large variable problems, the differences in speed are striking. Component analysis, which involves less computer space, has been used to analyze data sets which exceed the capacity of factor analysis programs on most mainframe computers. In recent years, variable sets ranging from 200 to 400 variables or more have been successfully analyzed with component analysis procedures. Beyond the number of variables issue, the nature of the correlation matrix (and the corresponding pattern matrix) will affect the speed of the analysis. Factor analysis is necessarily an iterative procedure. Convergence is generally fast for well defined problems. Incorrectly specifying the number of factors or including poorly defined factors, that is, defined only by few variables with low loadings, will result in very slow convergence speed and, in some cases, a failure to converge for most computer programs (Velicer & Fava, 1990). The problems of speed and space associated with factor analysis procedures have occasionally been minimized as representing a transitory state of affairs. Computers are rapidly improving with respect to speed and size. What one generation of researchers considered time consuming will be near instantaneous for the next generation. Thus, according to this argument, advances in computer hardware can be expected to minimize any existing distinctions. However, two different trends have shown this cavalier dismissal of the issue to be incorrect. First, the difference can always be extended. The same computers that make speed differences negligible for small variable problems permit the extension of component analysis to large variable problems and the investigation of a class of previously prohibited problems. Second, most analyses in the future will probably take place on large personal computers rather than mainframes. This trend has served to make the issue of speed and size very salient again. Some alternative procedures have been developed within the factor analysis framework (Bentler, 1982; Hagglund, 1982) that will provide faster computational speeds but the adequacy of these approaches has not been extensively evaluated. JANUARY 1990

15

W. Velicer and D. Jackson

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Factor Indeterminacy Problem An issue that has received a great deal of attention in the last two decades is the factor indeterminacy problem in the common factor model (Green, 1976,; Heerman, 1964, 1966; McDonald, 1974, 1977; McDonald & Mulaik, 1979: , 1 ;Schonemann Mulaik, 1972,1976;Mulaik & McDonald, 1978;S c h o n e m a ~197 & Steiger, 1976, 1978; Schonemann & Wang, 1972; Steiger, 1979a, 1979b: Steiger & Schonemann, 1978). Component analysis can be viewed as a transformation of p observed scores into a set of m (m < p) uncorrelated new scores. Factor analysis involves p observed variables and p + m new variables. "Factor indeterminacy is the inability to determine uniquely the common and unique factor variables of the common factor model from the uniquely defined observed variables because the number of observed variables is smaller than the number of common and unique factors" (Mulaik & McDonald, 1978, p. 177). This means that, for a given individual, two different factor scores could be calculated, both of which fit the factor model perfectly. However, these two scores can be radically different. The present section will provide a brief review of the issue. More extensive reviews are provided by Steiger (1979a), Steiger and Schonemann (1978) and McDonald and Mulaik (1979). A number of alternative approaches have been developed to demonstrate the factor indeterminacy problem. Piaggio (1931, 1933, 1935) introduced and Guttman (1955) later generalized the construction approach. This approach demonstrated that different sets of common factor scores can be constructed for any data set by means of the factor pattern and a vector of random numbers. These factor scores, unlike the more usual factor score estimates, fit the factor model exactly, but may be quite distinct from each other. A second approach is the transformation approach (Ledermann, 1938;Schonemann, 1971 ;Thomson. 1935). This approach demonstrates that a transformation matrix exists that will transform any set of factor scores into a different set of factor scores that also satisfy the model. For either approach, one may calculate the ~ninimum correlation between alternative factor scores (Guttman, 1955; Schonemann, 1971; Schonemann & Wang, 1972), as a measure of the extent to which alternative sets of factor scores are correlated. For both common and unique factors, the minimum average correlation given by Schonemann (197 1 ) is

a value that is not dependent on the actual data set. For only common factors in a MLFA solution, the degree of indeterminacy of any factor can be determined 16

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

from the associated eigen value (Schonemann & Wang, 1972). For real data examples, Schonemann and Wang demonstrated that the minimum correlation can be zero or negative. These arguments have served to demonstrate that factor indeterminacy does exist. The factor pattern, factor structure, and factor correlations are all determined satisfactorily. (Rotational indeterminacy, a very different issue from factor indeterminacy, is a separate problem). However, psychologists have seldom been interested in the actual factor scores. The interest is rather in the latent variables underlying the observed variables that the researcher hopes to discover. Mulaik and McDonald (1978; McDonald & Mulaik, 1979) have demonstrated that the indeterminacy problem can also affect the interpretation of the latent variables. Two investigators, starting with the same core set, could build two extended batteries of infinite length that are very different from each other. That is, the factor indeterminacy problem may lead to radically different conceptualizations of the latent variable. Problems have also been discussed with respect to the external validity of factor scores (Schonemann, 1981; Schonemann & Steiger, 1978; Steiger, 1979b). The implications of the factor indeterminacy issue are still being debated. At one extreme, Steiger and Schonemann (1978) view the problem as severe and recommend the replacement of factor analysis with component analysis. At the other extreme, McDonald and Mulaik (1979) view the problem only as potentially troublesome for exploratory factor analysis. The high degree of similarity between component and factor solutions described previously indicate that the issue will have at most minor practical consequences for well defined data sets. Most writers continue simply to ignore the issue. However, a number of problems with factor analysis are related to the indeterminacy problem in either direct or subtle ways, including: the divergence of factor and component solutions under poor conditions, determining the numbers of factors to retain, the problems involving improper solutions, and the awkwardness of defining factor score estimates. Whatever the final resolution of this issue, we can conclude that whatever advantageous theoretical properties are attributed to the factor analysis model, there also exist some very severe theoretical problems.

Latent versus Manifest Variables Some investigators distinguish between factor analysis and component analysis by referring to the former as a latent variable procedure and the latter as a manifest variable procedure. A latent variable is an unobserved, underlying, error free variable that, with the addition of sampling error, accounts for the observed or manifest variables. Component analysis can be described as a weighted linear composite of the p observed variables, and, is therefore, JANUARY 1990

17

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

described as a manifest variable procedure. Factor analysis involves the reduced correlation matrix (see Equation 10) where the p x p diagonal matrix of uniquenesses is removed from the correlation matrix. This reduction is an attempt to remove measurement error. Measurement error can be part of the variances, but, under the assumption of independence, cannot be part o f covariance. This argument is presented in a number of ways. Component analysis is Other sometimes described as limited to the s p u c ~cvthe observed ~~ur-iublcs. writers describe component analysis as assuming that all the variables are perfectly reliable. The major assumed advantage of a latent variable model is the perceived added degree of generalization to other, unsampled variables. Component analysis is described as unstable under sampling of variables. Factor analysis is assumed to yield the same factors, even if new variables augment or replace some of the original observed variables. This issue is directly related to the indeterminacy issue. The reduction of the correlation matrix involves the estimation of p independent variables in addition to the m variables that both procedures require. Because 17 and ttl variables are estimated from p observed variables, an indeterminacy problem occurs. Heerman ( 1964) ascribes the generalization property attributed to factor analysis to the indeterminacy problem: "Determinate factor solutions permit unique calculation of factor scores, but the factors are always contained in the test space and hence cannot be expected to represent anything which goes beyond the original measures" (Heerman, 1964, p. 380). Mulaik ( 1972, 1985 views the choice between factor analysis and component analysis as being a choice between a general but indeterminate solution and a determina~ebut restricted solution. However, consider the following. If latent variables are thought to possess desirable properties, that is, error free and stable across selection of variables, then individual scores for the N subjects on those latent variables might be aclear goal of the analysis. Commonly employed factor score estimates (McDonald and Burr, 1967; Harris, 1967) would be desirable. However, it is difficult to discern any substantial differences between factor score estimates and component scores (Velicer, 1976; Fava & Velicer, in press). Factor score estimate5 are in the space of the variables and calculated using formulas similar to those I'or component scores. Because factor score estimates are nearly identical to component scores, and are themselves determinate, it is difficult to understand how factor scores can be better and more generalizable. One of the approaches to the indeterminacy problem discussed in the previous section was to construct factor scores instead of estimating factor scores (Guttman, 1955; Schonemann and Wang, 1972). These estimates consist of a weighted linear composite of the p observed variables plus an arbitrarily chosen random component. The 18

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

addition of a random component results in infinitely many sets of scores that fit the factor analysis model. It is difficult to see how such an addition of a random component would result in an improvement, that is, increased generalizability, for the latent variable approach. In fact, the addition of a random component to factor scores has never been seriously recommended. The issue of differential generalizability can be evaluated empirically. Only a limited number of variable sampling studies exist. Joreskog (1969) compared two types of factor analysis, image factor analysis and maximum likelihood factor analysis and concluded that image factor analysis was slightly more stable. Velicer ( 1972) adapted the same procedure to a comparison of two types of component analysis and maximum likelihood factor analysis. An initial analysis is performed on a matrix of p variables. Then a subsample of the p variables is randomly selected and the same type of analysis performed. The pattern from the subsample of variables is compared to corresponding rows of the original pattern matrix. If the latent variable approach results in greater generalization under sampling, the factor analyses on the subsample should be closer to the population than the component analysis. Velicer (1972), using several existing well known data sets, found no observable difference. More recently, Velicer and Fava (1987) employed the same procedure to evaluate the effects of variable sampling in an extensive simulation study. The manipulation of two factors, the number of variables sampled and the degree of saturation, resulted in strong effects. Dramatic differential effects on boundary cases and convergence problems were also found. Pattern reproduction was very poor if saturation was low, or if the number of variables per factor was small. The combination of these two conditions produced extremely poor results. These conditions also resulted in a high rate of nonconvergence problems and boundary conditions. The differences between component analysis and factor analysis were minimal and inconsistent. Velicer and Fava (1990) replicated and extended these results. Subject sampling was also found to have a large effect, with low subject sample size leading to poor pattern reproduction, especially if the saturation was low and/or the number of variables per factor was small. Only a minimal difference was found between factor and component analysis, with principal component analysis demonstrating slightly superior performance under variable sampling. The existing empirical evidence does not support the advantages of factor analysis under variable sampling. On the contrary, what slight differences have been found favor component analysis (Velicer & Fava, 1990). Until such empirically demonstrable differences are provided, the applied researcher should be skeptical of the claims made for the superiority of latent variables. The more general concept of aconstruct (Cronbach & Meehl, 1955) might profitably replace the latent variable as a guide to test development and generalization. JANUARY 1990

19

W. Velicer and D. Jackson

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

Exploratov versus Confirmatory Analysis The issue between Exploratory Analysis and Confirmatory Analysis relates only peripherally to the purpose of this article. Some authors have made the distinction between component analysis as an exploratory procedure and factor analysis as a confirmatory procedure. The basis for this distinction is unclear, because almost identical operations are employed when either technique is employed in an exploratory manner. As previously mentioned, structural modeling procedures based on the factor analysis model are better developed and more extensively used at the present time than those based on a component approach. However, confirmatory approaches based on either model are clearly possible. A number of writers have asserted that exploratory procedures should not be used at all. The argument is that theory-testing is the fundamental purpose of science. Confirmatory (structural modeling) procedures are specifically designed for theory testing and should, therefore, completely supplant the use of exploratory techniques. This is a viewpoint that we disagree with for a number of reasons. First, most research problems fall on the continuum somewhere between the two extreme positions. Very few studies can be found that represent apure example of an exploratory or a confirmatory study. Most studies represent a mix - partially based on theory, which is either formally stated or implicit in the formulation of the research question, and partially exploratory, involving novel situations, stimuli, or areas where almost n o previous data or theory exists. When theories do exist, they are often vague, poorly formulated, and imprecise. Second, the view that exploratory andconfirmatory approaches are competing is unfortunate. It would be better to view them as complementary. From this viewpoint, a study is conceived as only one in a sequence of studies, whether as the research program of a single individual or laboratory or the combined contributions of different researchers. In such a sequence, some studies may be exploratory and some confirmatory in orientation. Exploratory studies would be more appropriate in the initial stages or after rejection of a theory. Recently, articles which combine both exploratory and confirmatory procedures in a sequence of studies have appeared (i.e., Prochaska, Velicer, DiClemente, 13 Fava, 1988; Velicer, Huckel, & Hansen, 1989). Gorsuch (1987) argues that exploratory procedures should be employed as conservative confirmatory procedures. Confirmatory procedures would be best employed in attempts to disconfirm an established theory (Popper, 1959). Third, the problems of a theory-confirming approach as a basic research paradigm need to be recognized. Greenwald, Pratkanis, Leippe and Baumgardner 20

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

(1986) distinguished between a theory-oriented research strategy and a r-esultcentered research strategy. These two strategies correspond to confirmatory and exploratory approaches to data analysis. In their criticism of the theorycentered strategy, Greenwald et al. warn of the dangers of a theory-confirming strategy. Researchers are prejudiced against the null hypothesis and tend to persevere by modifying procedures and/or fail to report when results do not support the theory. When results do agree with the theory, researchers tend to over-generalize, that is, independent of conditions and procedure. This reflects a phenomenon that has been labeled a confirmation bias. A confirmation bias has been demonstrated to result in a wide variety of effects, including a perseverance of belief in discredited hypotheses (Snyder & Uranowitz, 1978). A review of some of the recent articles employing the LISREL analysis will provide the reader with examples of initial rejection of the hypothesis followed by changes indicated by the modification indices, such as extensive correlated error terms for non-longitudinal studies, to get the model to fit the data. Exploratory analytic approaches (or result-centered research strategies) should be preferred except for those cases where a well-defined theory exists. Exploratory approaches avoid a confirmation bias, do not force a theoryoriented approach prematurely, and represent a conservative strategy. Confirmation approaches should be employed in a more limited role. If a well developed theory exists, a well designed study which attempts to disconfirm the theory is desirable. Alternatively, when two (or more) legitimate competing theories exist, a confirmatory strategy can be employed to determine which provides a better fit to the data. A key phrase here is legitimate because of the tendency to create weak theories that are not seriously entertained as competitors. Further, if several theories provide reasonably similar degrees of fit to the data, the theory that involves the fewest parameters should be selected. Theories involving more parameters will generally fit the data better than simpler models, but simpler models will generally cross-validate better (Collyer, 1985). The reader is reminded that structural modeling is also applicable to a wide variety of problems (i.e., panel analysis, hierarchical designs) other than the confirmatory factor analysis problem to which these comments are addressed.

Discussion The major conclusion of this article is that there is little basis to prefer either component analysis or factor analysis. For practical purposes, the choice of method is not a decision that will greatly affect empirical results or substantive co~clusions. Other decisions, such as the number of factors/components to retain or the method of rotation are much more critical. This conclusion is based on the very high degree of similarity between alternative solutions that has been JANUARY 1990

21

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

demonstrated empirically both in comparisons of patterns (Velicer, 1977: Velicer et al., 1982) and comparisons of derived scores (Velicer, 1976b; Fava & Velicer, in press). There is reason to believe that this similarity of solutions is even stronger for well designed data sets and when an accurate decision about the number of factors/components to retain is made. Although the solutions become dissimilar under poor conditions, there is no basis to assume that either method is more accurate. This lack of clear empirical differences between the two approaches has implications for each of the other issues discussed. The problem of overextraction is frequently ignored but potentially critical. The methods are similar for the correct number of factors but diverge markedly when overestimation occurs. The widely employed eigen ~lulue81-eufer.thurr unity rule typically leads to overextraction. The chi square test associated with maximumlikelihood factor orcomponent analysis likewise leads tooverextraction, particularly when the sample size is large. The existence of an alternative procedure such as the MAP correlation procedure(Velicer, 1976a) which provides accurate guidance on the number of components to retain. represents an advantage in selecting component analysis. The problem of improper solutions is a much more complex matter. These solutions are uninterpretable in terms of the model and occur with a high frequency, particularly when the saturation is low, the sample size small. or there are not enough variables to identify the factor. It has been customary to constrain numeric solutions in such a way as to prevent the offending estimates from becoming smaller than a prescribed non-negative quantity. This approach tends to ignore the cause of the problem and is inconsistent with the factor analysis model. A more reasonable alternative would be to treat the occurrence of an improper solution as a diagnostic indicator of a problem. Corrections would result from an assessment of the cause of the improper solution. Such corrections could range from changing the number of factors retained to gathering a new data set with modification in the subject or variable sampling. If employed in this way, improper solutions would represent an advantage in selecting factor analysis. However, current practices, which generally involve defining the problem out of existence, would make component analysis the preferred method. Component analysis will typically handle more variables and provide more rapid solutions. The increasing capacity and speed of computer will not alter this situation. Component procedures will always be able to handle proportionately more variables and be proportionately faster computationally. With the increased use of multi-category response items, item level analysis has become the rule rather than the exception. Problems involving 100 to 300 variable correlation matrices are now common. The increased used of PC computers makes speed an important consideration for many problems. The computational efficiency issue represents an advantage in selecting component analysis. 22

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson

The factor indeterminacy issue is the result of solving for p + rn new variables from a set of p observed variables. This lack of a unique solution is the basis for the factor indeterminacy problem. The practical implications of this problem are not clear, particularly in light of the high degree of similarity between the alternative solutions. This issue, although of theoretical importance, does not represent a practical basis for selecting between the two methods of analysis. The association sometimes made between factor analysis as a confirmatory procedure and component analysis as an exploratory procedure is incorrect. Both exploratory and confirmatory procedures can be based on either of the two models. Exploratory factor or component analysis should be viewed as an unrestricted procedure, that is, the bias of the reasearcher cannot affect the end result. Confirmatory approaches are an attempt to fit the data to a preconceived model, which may result in a confirmatory bias as a consequence of the failure of the investigator to specify the best model. Unrestricted procedures can be viewed as a more conservative conjirmatory test. Exploratory and confirmatory procedures should be viewed as complementary rather than competing. This issue does not represent in principle a basis for selecting between the two methods of analysis, although it is recognized that more work needs to be done to develop confirmatory component analysis approaches that parallel the sophistication and flexibility of packages such as LISREL. One result of such development might be the availability of a confirmatory approach that will permit the evaluation of data sets with a larger number of variables than is now possible. Latent variable procedures (i.e., factor analysis) are asserted to have greater generalizability to the set of unsampled variables and thus represent an advantage for factor analysis. However, empirical studies (Velicer, 1974; Velicer & Fava, 1987, 1990) have generally found no difference between the methods under conditions of variable sampling. This issue does not represent a basis for selecting either method of analysis. The high degree of similarity between the results of either a component analysis or factor analysis is the basis of the major conclusion that the choice of method is unlikely to result in any empirical or substantive differences. From one viewpoint, component analysis can be viewed as acomputationally efficient approximation to factor analysis. However, it is equally correct to assert, as Steiger and Schonemann ( I 978) do, that factor analysis is ". . . a computationally difficult, theoretically problematic, approximation to component analysis" (p. 174). The advantages of selecting the component approach include the availability of accurate methods of determining the number of components to retain and, particularly in large variable data sets, computational efficiency. The occurrence of improper solutions can be eitheraproblem orapositive characteristic of factor analysis if properly employed as a diagnostic indicator. The other JANUARY 1990

23

W. Velicer and D. Jackson

issues considered provide little basis for selecting between the two procedures. Perhaps the principal of parsimony, applied to parsimony procedures, provides the strongest argument for preferring component analysis over factor analysis.

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

References Anderson, T. W.. & Rubin, H. (1956). Statistical inference in factor analysis. Proc,eeding,\ of the Third Berkeley Symposium on Mathematical Statistics and Probability. b', 1 I 1 - 150. Bartholomew, D. J. (1984). The foundations of factor analysis. Biometrika, 71. 221-232. Bartholomew, D. J. (1985). Foundations of factor analysis: Some practical implications. British .lournu1 oj' Mathematical unri Stati.stic,al Psychology, 38, 1 - 10. Bartlett. M. S. (1950). Tests of significance in factor analysis. The British .lourrral of Psychology, 3. 77-85, Bartlett, M. S. (1951). A further note on tests of significance in factor analysis. Thc Hriti.slr Jour.nal of P.syc,hology, 4 , 1-2. Bentler, P. M. (1976). Multistructure statistical model applied to factor analysis. Multi\~~riarc Behavioral Re.sear.c~lz. 11. 3-25. Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. In M. R. Rosenzweig & L. W. Porter (Eds.), Annual Review, ofPsyc,hologv (pp 41 9-454). Stanford. CA: Annual Reviews, Inc. Bentler. P. M. (1982). Confinnatory factor analysis via nonilerarive estimation: A fast. inexpensive method. .l~~urnaI of Marketing Researv'h, 19, 417-424. Bentler, P. M. ( 1985). Theory and implementation of EQS: A .s/r.uc~turalequurions pro,grat~~. [Computer program]. Los Angeles: BMD Statistical Software. Boomsma, A. (1982). The robustness of LISREL against small sample sizes in factor analysih models. In K. G. Joreskog & H. Wold (Eds.), Systems under indircc,t oh.se~.~~~rion (Purr 11 (pp 149- 173). Amsterdam: North Holland. Boomsma, A. ( 1983). 0 1 1 the rohu.stne.ss ($LISREL (ma.ximum likelihood) estimcrtes riguitr.\r small sample sizes and non-normality. Unpublished doctoral dissertation, Kijksuniversiteit Gronigen, the Netherlands. Browne, M. W. (1968). A comparison of factor analysis techniques. P.syc~hornrtriktr,.i.(. 267-333. Cattell, R. B. (1966). 'The scree test for the number of Factors. M~~ltil~ar.iutc, Beho~~ior~~rl Re.scwrc,h. I , 245-276. Cattell. R. B. (1978). The sc.ietzt$ic. use (fjuc.tor at~uly,si.sin hehat~iorulutrd llfi~S ~ ~ ~ C INew I(~~'J. York: Plenum Press. Cattell, R. B., & Jaspers. J. (1967). A general plasmode for factor analytic exercises and research. Multir~ariuteBehavioral Rc~searchMonographs, 3. 1-212. Cliff. N. (1988). The eigenvalues-greater-than-one-rule and the reliability of componenth. P.sjr~hologic.alRullrtin, 103, 276-279. Collyer. C. E. (1985). Comparing strong and weak models by fitting them to computcrgenerated data. Perc,c~ptionand psycho physic,^, 38. 476-48 1. Comrey. A. L. ( 1978). Common methodological problems in factor analytic studies. .Iotrrtrtrl o f ' consult in^ and Clinic,al Psychology, 46, 648-659. Cronbach, L. J., & Meehl. P. E. ( 1 955). Construct validity in psychological test\. Psyc~hologrc~c~l Hulleriti, .52. 28 1-302.

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson Dillon, W. R., Kumar, A,, & Mulani, N. (1987). Offending estimates in covariance structure analysis: Comments on the causes of and solutions to Heywood cases. Psych~~logical Bulletin, 101, 126-135. Driel, 0 . P. Van. ( 1 978). On various causes of improper solutions in maximum likelihood factor analysis. Psychometrika, 43, 225-243. Dziuban, C. D., & Harris, C. W. (1973). On the extraction of components and the applicability of factor analytic techniques. American Educational Research Journal, 10, 93-99. Fava, J. L, & Velicer, W. F. (in press). An empirical comparison of factor, image, component: and scale scores. Multivariate Behavioral Research. Fomell, C., & Bookstein, F. L. (1982) A comparative analysis of two structural equation models: LISREL and PLS applied to market data. In C. Fomell (Ed.) A second generation of multivariate ana1ysi.s (Vol. 1. pp. 289-324). New York: Praeger. Glass, G., & Taylor, P. A. (1966). Factor analytic methodology. Review of Educational Research, 36, 566-587. Gorsuch, R. L. (1973). Using Bartlett's significance test to determine the number of factors to anrl Psychologic~alMeasurement, 33, 36 1-364. retain. Educ~~tional Gorsuch, R. L. (1983). Factor ana1ysi.s (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Gorsuch, R. L. (1987). Exploratory factor analysis. In J. R. Nesselroade & R. B. Cattell (Eds.), Handbook o f multivariate p.syc,holo,qy (2nd edition, pp. 23 1-258). New York: Plenum Press. Green, B. F. (1976). On the factor score controversy. Psyc.hometrika, 41, 263-266. Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., & Baurngardner, M. H. (1986). Under what conditions does theory obstruct research progress? Psyc~hologicalBulletin, 93, 216-229. Guadagnoli, E., & Velicer, W. F. (1988). The relationship of sample size to the stability of component patterns: A simulation study. Psychological Bulletin, 103, 265-275. Guttman, L. (1953). Image theory for the structure of quantitative variates. Psychometrika, 18, 277-296. Guttman, L. (1955). The determinancy of factor score matrices with implications for five basic problems of common-factor theory. British Journal of Stati.stic,al Psychology, 8, 65-81. Hagglund, G. (1982). Factor analysis by instrumental variables methods. Psychometr.ika, 47, 209-222. Hakstian, A. R., Rogers, W. T., & Cattell, R. B. ( 1982). The behavior of number of factors rules with simulated data. Multivariate Behavioral Research, 17, 193-2 19. Harlow, L. L. (1985). Behavior of some elliptical theory estimator.^ with nonnormal data in a c.ovariance structurusfi-amew80rk;A Monte CUI-lostudy. Unpublished doctoral dissertation, University of California, Los Angeles. Harman, H. H., & Fukuda, Y. (1966). Resolution of the Heywood case in the minres solution. Psyc'hometriku, 31, 563-57 1. Harris, C. W. (1962). Some Rao-Guttman relationships. Psychometrika, 27, 247-263. Harris, C. W. (1967). On factors and factor scores. P.sychometrika, 32, 363-379. Heerman, E. F. (1964). The geometry of factor indeterminacy. Psychometrika, 29, 37 1-38 1. Heerman, E. F. (1966). The algebra of factor indeterminacy. Psychometrika, 31, 539-543. Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. P.sychornetrika, 30. 179- 185. Horn, J. L., & Engstrom, R. (1979). Cattell's scree test in relation to Bartlett's chi-square test and other observations on the number of factors problem. Multivariate Behavioral Re.sear.ch, 14, 283-300. Howe, W. G. (1955). Some, c.ontrihutions to factor unal.ysis. Oak Ridge, TN: Oak Ridge National Laboratory.

JANUARY 1990

25

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson Humphreys, L. G. (1964). Number of cases and number of factors: An example where N is very large. Educational and Psyc~hologicalMeasur-ement, 24, 457. Jackson, D. N., & Chan, D. W. (1980). Maximum-likelihood estimation in common factor analysis: A cautionary note. Psychologicul Bulletin, 88, 502-508. Jackson, D. N., & Morf, M. E. (1973). An empirical evaluation of factor reliability. Multivariate Beha~ior-a1Re.searc,h. 8. 439-459. Joreskog, K. G. (1967). Somecontributions tomaximum likelihood factoranalysis. Psychomc,rriXu. 32, 443-482. Joreskog, K. G. (1969). Efficient estimation in image factor analysis. Psyc.hometriku. 12. 443-482. Joreskog, K. G. ( 1 970). A general method for the analysis of covariance structures. Biometrikn. 57, 238-25 I . Jijreskog, K. G. ( 1978). Structural analysis ofcovariance andcorrelation matrices. P.syc,homrtr-iku. 43, 443-447. Joreskog, K. G. ( 198 1). Analysis of covariance structures. Scandinu~iun.lournu/ c!fStuiistic.\. 8, 65-83. Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educ~~rionrrl und Psychologic~ulMeuslo.emenr, 20, 141 - 15 1. Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. PI-oc,eeding.sof the Koyul Soe.i~tyof Edinburgh, 60, 64-82. Lawley, D. N. (1941). Further investigation in factor estimation. Pr-oc.eedin,y.\ of thc Royal Society o f Edinburgh, 61, 176- 185. Lawley, D. N., & Maxwell, A. E. (1971). Fac,ror analysis as a statistical mcthod. London: Butterworths. Lee, H. B., & Comrey, A. L. (1979). Distortions in a commonly used factor analytic procedure. Mulrivrtriute Behavior-ul Reseur-c,h. 14, 301-32 1 . Ledermann, W. (1938). The orthogonal transformation of a factorial matrix Into itself. Psyc,hornetr-ika.3, 18 1 - 187. Linn, R . L. (1968). A Monte Carlo approach to the number of factors problem. P.syc.homefr-iku. 33, 37-71. McDonald, R. P. (1974). The measurement of fac~orindeterminacy. P.syc~lromerr-iXu.39. 203-22 1. McDonald, R. P. (1977). The indeterminacy of components and the definition of common factors. British Jour.nul of Mathemaric.ul and Statistical Psyc~holo,yy,30. 165- 176. McDonald, R. P. (1978). A simple comprehensive model for the analysis of covariance structures. British .lournu1 of Muthematic~ulund Statistic~ulP.sye.ho/ogy, 31. 59-72, McDonald, R. P. (1980). A simple comprehensive model for the analysis of covariance structures: Some remarks on applications. British Jo~rrnalqfMathemutic~u1und Stuti~i~c~uI P,syc,hology, 33, 161- 183. McDonald, R. P. (1985). Fac,tor una1y.si.s und related method.^. Hillsdale. NJ: Erlbaum. McDonald, R. P., & Burr. E. J. (1967). A comparison of four methods of constructing factor scores. P.syc,homc~tt-iku.32, 38 1-40 1 . McDonald, R. P., & Mulaik, S. A. (1979). Determinancy of common factors: A nonlechnical review. P,syc~l~ologic~al Bulleritz. 86. 297-306. Meredith, W., & Millsap, R. E. (1985). On component analysis. Psyr~homeirikrr.50.495-507. Montanelli. R. G.. Jr. (1975). A computer program to generate sample correlation and 35. 195- 197. covariance matrices. Educ,utionul und P.syc.holo,qic~alMeus~~r-cment, Mulaik. S. A. (1972). T/tc,folo~dation of',fuc.ior.ut1rr1y.si.s. New York: McGraw-Hill.

26

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson Mulaik, S. A. (1976). Comments on6'Themeasurementof factor indeterminacy". Psyc.hometrika, 41, 249-262. Mulaik, S. A. (1985, June). A brief history of the philosophical foundations of factor analysis from Aristotle to Quine. Paper presented at the Annual Meeting of the Psychometric Society, Nashville, TN. Mulaik, S. A., & McDonald, R. P. (1978). The effect of additional variables on factor indeterminacy in models with a single common factor. Psychometrika, 43, 177-192. Piaggio, H. T. H. (1931). The general factor in Spearman's theory of intelligence. Nature, 127, 56-57. Piaggio, H. T. H. (1933). Three sets of conditions necessary for the existence of a g that is real and unique except in sign. British Journal of Psyc.hologv, 24, 88-105. Piaggio, H. T. H. (1935). Approximate general and specific factors without indeterminate parts. British Journal of Psyc,hology, 25, 485-489. Popper, K. R. ( 1959). The logic ofsc.ientfic. disc,o~~c~r:v. New York: Basic Books. (Original work published 1934). Prochaska, J. O., Velicer, W. F., DiClemente, C. C., & Fava, J. L. (1988). Measuring the processes of change: Applications to the cessation of smoking. .lournu/ of'Consulting and Clinical Psychology, 56, 520-528. Pruzek, R. M., & Rabinowitz, S. N. (198 1 ). A simple method for exploratory structural analysis. American Educational Reseurc~hJournal, 18, 173-189. Rao, C. R. (1955). Estimation and tests of significance in factor analysis. Psyc~hometrika.20, 92-1 11. Revelle, W., & Rocklin, T. (1979). Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Beha~ior.alResearch, 14, 403-414. Rindskopf, D. (1983). Parameterizing inequality constraints on unique variances in linear structural models. Psychometrika, 48, 73-83. Schonemann, P. H. (1971). The minimum average correlation between equivalent sets of uncorrelated factors. Psychometriku, 36, 2 1-30. Schonemann, P. H. (1981). Factorial definitions of intelligence: Dubious legacy of dogma in data analysis. In I. Borg (Ed.) Multidimetzsional riutu repr.e.sentation: Whc~n& shy (pp. 325-374). Ann Arbor, MI: Mathesis Press. Schonemann, P. H., & Steiger. J. H. (1976). Regression Component Analysis. British .lolrr.nal of Mathematical and Stc~tisric~al Psyc.holo,qy, 29. 175-189. Schonemann, P. H., & Steiger. J. H. (1978). On the validity of indeterminant factor scores. Bulletin ( f the Psychonomic~Soc,iety, 12, 287-290. Schonemann, P. H., & Wang, M. (1972). Some new results on factor indeterminacy. Psychometrika, 37, 6 1-9I . Snook, S. C., & Gorsuch, R. L. (1989). Principal component analysis versus common factor Bulletin, 106, 148-154. analysis: A Monte Carlo study. P.syc~ho/ogic~c~/ Snyder, M., & Uranowitz, S. W. (1978). Reconstructing the past: Some cognitive consequences of person perception. .lorcrnul of Per..sonulity und Social Psyc.hology, 36. 941-950. Steiger, J . H. (1979a). Factor indeterminacy in the 1930s and the 1970s: Some interesting parallels. Psychometr.ika, 44, 157- 167. Steiger, J. H. (1979b). The relationship between external variables and common factors. Psychomc~triku,44, 93-97. Steiger, J. H., & Schonemann, P. H. (1978). A history of factor indeterminacy. In S. Shye (Ed.), Theory cvnstr-uc,tion anti data ut~u/y.si.\ it1 thc, hehn17ior.u/ .sc~ii~nc.es (pp. 136- 178). San Francisco: Jossey-Bass.

JANUARY 1990

27

Downloaded by [Ohio State University Libraries] at 23:48 08 June 2012

W. Velicer and D. Jackson Thomson, G. H. (1935). The definition and measurement of g (general intelligence). .lourno1 of Educ.ationa1 Psychology, 26. 241 -262. Thurstone, L. L. (1947). Multiple factor analysis. Chicago: University of Chicago Press. Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psyc.homctriXa, 38, 1 - 10. Velicer, W. F. (1974). A comparison of the stability of factor analysis. principal component analysis. and rescaled image analysis. Educa/ional and Psycholo,qical Measul-c~metit..jJ. 563-572. Velicer, W. F. (197ha). Determining the number of components from the matrix of partial correlations. Psyc.hometriku, 41, 32 1-327. Velicer, W. F. ( 1976b). The relation between factor score estimates, image scores, and principal component scores. Educ,utional and Psychologic.al Measurement, 36. 149- 159. Velicer, W. F. (1977). An empirical comparison of the similarity of principal component. image, and factor patterns. Multivariate Behu~,iorulReseurc,h, 12, 3-22. Velicer, W. F.. & Fava. J. L. (1987). An evaluation of the effects of variable sampling on component, image. and factor analysis. Multi\.uriate Behuvior.al Re.searc~li.22, 193-2 10. Velicer, W. F., & Favn. J. L. (1990). The effects of variable and subject sampling on factor pattern recovery. Manuscript under review. Velicer, W. F.. Huckel, L., & Hansen. C. E. (1989). A measurement model for measuring attitudes towards violence. Personuli~and Soc,iul Psyc,hology Bulletin, 15, 349-364. Velicer, W. F., Peacock. A. C.. &Jackson, D. N. (1982). A comparison of component and factor Rehuviorul Re~.sc~~r.c.h. 17. 37 1-388. patterns: A Monte Carlo approach. Multi~~ur~iutr Wold, H. (1966). Nonlinear estimates by interactive least squares procedures. In F. M. David (Ed.) Researc,h Papers in Stutistic.~.F'estsc,hrifr,for.J . Neymun. New York: John Wile). Wold. H. (1982). Systems under indirect observation using PLS. In C. Fomell (Ed.)A ,Sec,onil Generatiot~c!fMulti~w~.iate Anabsis (Vol. 1, pp. 325-347). New York: Praeger. Zwick, W. R.. & Velicer. W. F. (1982). Factors influencing four rules for determining the number of components to retain. Multi\,u~.iutcBeha,~iora/Resr~a~-c~h, 17. 153-169. Zwick, W. R., & Velicer. W. F. ( 1986). A comparison of five rules for determining the number of components to retain. P.syc~hologic~ul Bulle.tir~. 99. 432-442.

MULTIVARIATE BEHAVIORAL RESEARCH

Component Analysis versus Common Factor Analysis: Some issues in Selecting an Appropriate Procedure.

Should one do a component analysis or a factor analysis? The choice is not obvious, because the two broad classes of procedures serve a similar purpos...
566B Sizes 1 Downloads 8 Views