JOURNAL OF

Journal of Sport & Exercise Psychology, 2014, 36, 179-188 http://dx.doi.org/10.1123/jsep.2013-0070 © 2014 Human Kinetics, Inc.

SPORT EXERCISE PSYCHOLOGY

Official Journal of NASPSPA

www.JSEP-Journal.com ORIGINAL RESEARCH

Testing Measurement Invariance Across Spanish and English Versions of the Physical Self-Description Questionnaire: An Application of Exploratory Structural Equation Modeling Inés Tomás,1 Herbert W. Marsh,2,3,4 Vicente González-Romá,1 Víctor Valls,1 and Benjamin Nagengast5 1University

of Valencia; 2Australian Catholic University; 3King Saud University; 4University of Oxford; 5University of Tübingen

Test of measurement invariance across translated versions of questionnaires is a critical prerequisite to comparing scores on the different versions. In this study, we used exploratory structural equation modeling (ESEM) as an alternative approach to evaluate the measurement invariance of the Spanish version of the Physical Self-Description Questionnaire (PSDQ). The two versions were administered to large samples of Australian and Spanish adolescents. First, we compared the CFA and ESEM approaches and showed that ESEM fitted the data much better and resulted in substantially more differentiated factors. We then tested measurement invariance with a 13-model ESEM taxonomy. Results justified using the Spanish version of the PSDQ to carry out cross-cultural comparisons in sport and exercise psychology research. Overall, the study can stimulate research on physical self-concept across countries and foster better cross-cultural comparisons. Keywords: test translation and adaptation, confirmatory factor analysis, exploratory structural equation modeling (ESEM), physical self-concept, measurement invariance, Physical Self-Description Questionnaire (PSDQ) Physical self-concept is a key construct in sport and exercise psychology. The focus of the present investigation is on the advance of our theoretical understanding on this topic, by stimulating research on physical selfconcept across countries, and fostering better crosscultural comparisons. In this regard, we used exploratory structural equation modeling (ESEM) to investigate measurement invariance across English and Spanish versions of the Physical Self-Description Questionnaire (PSDQ; Marsh, Richards, Johnson, Roche, & Tremayne, 1994), one of the most widely used multidimensional physical self-concept instruments. Inés Tomás is with the Department of Methodology of Behavioral Sciences, University of Valencia, Valencia, Spain. Herbert W. Marsh is with the Institute for Positive Psychology and Education, Australian Catholic University, Strathfield, NSW, Australia; King Saud University, Riyadh, Saudi Arabia; and the Department of Education, University of Oxford, Oxford, United Kingdom. Vicente González-Romá is with the Department of Social Psychology, University of Valencia, Valencia, Spain. Víctor Valls is with IDOCAL, University of Valencia, Valencia, Spain. Benjamin Nagengast is with the Center for Educational Science and Psychology University of Tübingen, Tübingen, Germany

Test translation and adaptation is a major concern in psychometric research and a leading topic in the crosscultural research context (Byrne & Watkins, 2003). In this field of study, even after ensuring the fidelity of test translations, cross-cultural researchers should empirically confirm the psychometric equivalence of the translated questionnaire (Sireci, Yang, Harter, & Ehrlich, 2006). Therefore, it is necessary to carry out analyses to detect whether the different versions of the questionnaire are measuring the same latent factors, and it is essential to evaluate whether item parameters are invariant across the different versions of a questionnaire. Typically, studies in cross-cultural research have carried out this kind of measurement invariance analysis with confirmatory factor analysis (CFA) (e.g., Alessandri, Vecchione, Caprara, & Letzring, 2012; Balducci, Fraccaroli, & Schaufeli, 2010; Christopher, Norris, D’Souza, & Tiernan, 2012; Guerin, Marsh, & Famose, 2004; Guo, Aveyard, & Dai, 2009; Li, Harmer, Chi, & Vongjaturapat, 1996; Yoon, Schmidt, & Ilies, 2002). Nevertheless, recent research has pointed to overly restrictive CFA assumptions as problematic for research in different fields, such as personality (Marsh, Lüdtke, Muthén, Asparouhov, Morin, Trautwein, & Nagengast, 2010a) and sport and exercise psychology (Myers, Chase, Pierce, & Martin, 2011), as

179

180  Tomás et al.

well as for research with psychological instruments in general (Marsh, Muthén, Asparouhov, Lüdtke, Robitzsch, Morin, & Trautwein, 2009). Thus, it is reasonable to look for methods that overcome CFA’s shortcomings. One of these methods is exploratory structural equation modeling (ESEM) (Asparouhov & Muthén, 2009), an integration of CFA and exploratory factor analysis (EFA). As previously stated, in this study, we used ESEM as an alternative approach to evaluate the measurement invariance of the Spanish version of the PSDQ (Marsh et al., 1994). Rather than arguing that the ESEM approach should replace the corresponding CFA approach, we propose ESEM as a suitable alternative for testing the psychometric equivalence of the translated questionnaires. From a practical perspective, the development of a Spanish version of the PSDQ will provide the Spanishspeaking research community with a strong instrument for measuring physical self-concept. Moreover, testing measurement invariance across Spanish and English versions of the PSDQ will guarantee that researchers make appropriate comparisons when studying similarities or differences across groups that do not share the same language (Spanish or English). From a theoretical perspective, testing the invariance of the Spanish version of the PSDQ will contribute to the construct validation and cross-national generalizability of the instrument. Moreover, addressing this issue using ESEM approach will provide insight into the application of this methodology to other multidimensional constructs and other fields of study. Therefore, the objective of the current study was to test for measurement invariance across the Spanish and English versions of the PSDQ using ESEM approach.

Testing Factorial and Measurement Invariance in Translated Questionnaires Meredith (1993) has provided a comprehensive conceptual framework on the issue of factorial invariance. According to Meredith, three types of factorial invariance can be tested: configural invariance, strong factorial invariance, and strict factorial invariance. In this section, we evaluate the implications of these three factorial invariance types from a test translation perspective, and finally address how strong and strict factorial invariance provide the basis for meaningful comparisons across groups when using different versions of an instrument. The test for configural invariance investigates whether the same pattern of parameter estimates is supported across groups. Constraints of parameter invariance across groups (i.e., requiring that parameters take on the same value in different groups) are not imposed. A second step in testing for the factorial invariance of translated questionnaires focuses on the analysis of factor loading invariance across groups. However, the invariance of factor loadings does not guarantee the appropriateness of direct comparisons of latent factors across groups. To do this, invariance of both factor loadings and intercepts is required.

According to Meredith (1993), under appropriate conditions, a particular item would be strongly factorial invariant when both the factor loading and the intercept are invariant across the subpopulations derived by selection on a grouping variable. If strong factorial invariance holds, between-group differences in average score of an item will reflect between-group differences in latent means. Therefore, when strong factorial invariance is supported, average item and scale scores are comparable across groups. A more restrictive model, however, should be applied to avoid item bias (Meredith, 1993). When assessing strong factorial invariance, differences in measurement errors or uniquenesses across groups could be present. Between-group differences in uniquenesses for a particular item indicate that the item has differential precision across groups. These differences can still be the cause of the observed differences across groups in the diagonal values of the covariance matrices for the measured variables (e.g., Ferrando, 1996). Thus, Meredith claims that testing for uniqueness invariance is also relevant. According to Meredith, a particular item would be strictly factorial invariant when the factor loading, the intercept, and the uniqueness are all invariant across subpopulations derived by selection on a grouping variable. Hence, strict factorial invariance implies that the differences in the mean values of the measured variables and the observed differences in the covariance matrices among measured variables are not caused by differences in any item parameter, but only by the differences in factor means, and factor variances and covariances, respectively. If strict factorial invariance holds, between-group differences in item variances reflect between-group differences in latent variances. So that, when strict factorial invariance is supported, not only average item and scale scores, but also item and scale variances are comparable across groups (Meredith, 1993, 1995). From a test translation-adaptation perspective, testing for strong and strict invariance implicitly involves a test for differential functioning of item parameters across the different versions (original and translated) of a particular questionnaire. Figure 1 represents two hypothetical examples of items showing uniform and nonuniform differential item functioning (DIF) in the context of a continuous CFA model (that can be transferred to the ESEM approach). These examples represent the regression line of a particular item in two different groups: the group that has completed the original version of the item and the group that has completed the translated version of the item. As shown in Figure 1A, uniform DIF exists when only the intercept (the difficulty parameter, τx) is not invariant across groups; however, the slope in the regression equation (discrimination parameter, λx) is invariant across groups (there is no interaction between ability level and group). This example shows how the expected item value for individuals at the latent trait value of zero is higher in the group that completed the original version of the item. Thus, the translated version of the item is “more difficult” for individuals at the same level

Measurement Invariance Across Spanish and English Versions of the PSDQ   181

Figure 1 — Two hypothetical items showing uniform and nonuniform differential item functioning (DIF).

of the latent trait than the original version of the item. In other words, the “evocativeness” of the item regarding the situation to which it refers is higher for the original item. As shown in Figure 1B, nonuniform DIF occurs when the slope in the regression equation or discrimination parameter (λx) is not invariant across groups, whether the intercept or difficulty parameter (τx) is invariant. Then, there is an interaction between trait level and group. This example shows how individuals with similar latent trait levels present a greater difference in the expected item value when responding to the original item than when responding to the translated version of the item (a > b). Thus, the translated version of the item is less discriminative than the original version of the item. It is clear that from a test translation-adaptation perspective, testing for item parameter invariance allows researchers to determine whether the different versions of the questionnaire are psychometrically equivalent, and thus, comparisons across groups are meaningful. In these regards, as cross-cultural research usually focus on mean score comparisons, it is essential that there is support for the invariance of factor loadings and item intercepts, but not the invariance of item uniquenesses. Nevertheless, researchers should be aware of the meaning of not getting support for item uniquenesses invariance (e.g., item and scale variances are not comparable across groups). Moreover, in the field of cross-cultural research, it is also interesting to test hypotheses of the invariance of factor parameters (factor correlation, factor variance, factor means). This is especially relevant when working with multidimensional models (see McDonald, 2000).

The Development of a Spanish Version of the PSDQ Self-esteem enhancement has consistently been a common objective of physical education and exercise programs. In fact, self-perception components, such as perceived competence, physical self-discrepancy, physical self-efficacy, or sport-confidence, remain central to

theories and models of sport and exercise behavior (e.g., Brunet, Sabiston, Castonguay, Ferguson, & Bessette, 2012; Jackson, Whipp, Chua, Dimmock, & Hagger, 2013; Scanlan, Russell, Magyar, & Scanlan, 2009; Taylor, Ntoumanis, Standage, & Spray, 2010; Weiss & Amorose, 2005; Wilson, Sullivan, Myers, & Feltz, 2004). Historically, self-concept researchers have emphasized a broad, global construct of self-esteem that did not differentiate among specific domains, and the result has been instrumentation of questionable validity (Fox & Corbin, 1989). However, support found for the multidimensionality of self-concept (Shavelson, Hubner, & Stanton, 1976) and its widespread acceptance (Fox & Corbin, 1989) have led researchers to develop instruments to measure a particular domain of self-concept, such as the academic domain (Marsh, 1990; Marsh, Byrne, & Shavelson, 1988) or, of particular relevance to the present investigation, the physical domain (Marsh, Martin, & Jackson, 2010b; Marsh et al., 1994; Sonstroem, Speliotis, & Fava, 1992). Indeed, the current study focuses on a multidimensional instrument developed to measure physical self-concept, the Physical Self-Description Questionnaire (PSDQ; Marsh et al., 1994). The PSDQ was developed for use with adolescent populations and includes nine specific components of physical self-concept and two global components (see Marsh, 1996, for a description of the PSDQ scales). The psychometric properties of the PSDQ have been tested in several studies (Marsh, 1996; Marsh, Hey, Roche, & Perry, 1997; Marsh et al., 1994). Results have demonstrated the appropriateness of the PSDQ and provided support for the reliability and construct validity of the questionnaire with English-speaking samples. More recently, Marsh et al. (2010b) developed a short version of the PSDQ (PSDQ-S) that balanced brevity and psychometric quality in relation to established guidelines for evaluating short forms (Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) and the construct validity approach that is the basis of PSDQ research. There is general agreement among sport/exercise psychologists and physical educators about the need to develop psychological instruments that focus

182  Tomás et al.

on the physical domain (Marsh et al., 1994). Thus, the development of a Spanish version of the PSDQ would provide the Spanish-speaking research community with a useful instrument for measuring physical self-concept. Some of the advantages of the adaptation and translation of a well-established instrument would be, firstly, cost reduction and time saving; secondly, making available accumulated results from previous research that would not be available for a new questionnaire; and thirdly, the facilitation of cross-cultural comparison studies (BenPorath, Almagor, Hoffman-Chemi, & Tellegen, 1995; Hambleton & Kanjee, 1995). In the current study, we used ESEM methods to investigate measurement invariance across English and Spanish versions of the PSDQ. Exploratory structural equation modeling is a more general approach that incorporates traditional CFA models and EFA as special cases (see Marsh, Liem, Martin, Morin, & Nagengast, 2011; Morin, Marsh, & Nagengast, 2013). Many psychological instruments have an apparently well-defined factor structure when using EFA methods, but cannot be represented adequately within a CFA approach. Typically, this is the result of their factor structures not being consistent with the highly restrictive independent clusters model (ICM) typically used in CFA studies, in which each item is allowed to load on one factor, and all nontarget loadings are constrained to zero. Furthermore, the misspecification of zero factor loadings usually leads to distorted factors with overestimated factor correlations that might lead to distortions in structural relations. This is important because previous studies have pointed out that within pure CFA models, some widely used instruments might be questionable (e.g., Marsh et al., 2009, 2010a; Marsh, Morin, Parker, & Kaur, in press). Clearly, in these studies, the ESEM solution was superior to the CFA solution; moreover, within the CFA framework they obtained substantially inflated correlations among the factors in a way that undermined their discriminant validity and usefulness as diagnostic feedback. Accordingly, it has been recommended (Marsh et al., 2010a, 2011) that the psychometric evaluation of psychological assessment instruments should routinely apply ESEM as a viable alternative, and juxtapose the results with corresponding CFA models that are traditionally used. Essentially, all CFA models can be tested with ESEM, eliminating the necessity of the highly restrictive measurement model in CFAs that are unlikely to be met by responses to any instruments in sport/exercise psychology or any other psychology discipline. Thus, in the current study we used ESEM methods to investigate the PSDQ’s measurement invariance across two language versions, and compared the results obtained with those provided by CFA.

Method Participants Physical Self-Description Questionnaire responses from Australian and Spanish high school students aged

between 12 and 16 years of age were gathered for this study. In the Australian sample, 986 high school students (54% males, 46% females) with a mean age of 13.5 (SD = 1.11) completed the English version of the PSDQ. In the Spanish sample, 986 high school students (51% males, 49% females) with a mean age of 13.3 (SD = 1.07) completed the translated-into-Spanish version of the questionnaire. Participation was voluntary, informed consent was obtained from each study participant, and the educational institution granted permission for the study.

Instruments and Test Translation The PSDQ is a 70-item instrument designed to measure the following 11 scales: Health, Coordination, Activity, Body Fat, Sports Competence, Appearance, Strength, Flexibility, Endurance/Fitness, Global Physical Self-Concept, and Global Self-Esteem. Each scale is represented by six or eight items; each item is a simple declarative statement; and participants respond using a 6-point Likert-type scale format with choices ranging from “false” (1) to “true” (6)—see Marsh et al. (1994) for a full presentation of the instrument. In this study, the original English version of the PSDQ and a translation of the questionnaire into Spanish were used. The PSDQ was translated into Spanish following the back-translation procedure widely described in the literature (Hambleton & Kanjee, 1995). First, three native Spanish speakers independently translated the original English PSDQ version into Spanish. Translation discrepancies among the three translated forms were discussed to develop an initial Spanish version of the questionnaire. Next, a second bilingual translator whose native language was English translated the initial Spanish version of the test back into English. The original and back-translated versions of the tests were then compared. Finally, translation differences revealed by the back-translation procedure were corrected. After the back-translation procedure had concluded, a pilot study was carried out to test the adequacy of the questionnaire for use with Spanish teenagers. No problems were found, so this version was adopted as the final Spanish version of the PSDQ (the Spanish version of the questionnaire can be obtained from the first author).

Statistical Analysis Analyses were conducted with Mplus (Version 6.12; Muthén & Muthén, 2010). Preliminary analyses consisted of a traditional CFA based on the Mplus robust maximum likelihood (RML) estimator, with standard errors and tests of fit that are robust in relation to the non-normality and non-independence of observations (Muthén & Muthén, 2008). The main focus is on the application of ESEM to responses to the 70-item PSDQ. The ESEM approach differs from the typical CFA approach in that all factor loadings are estimated, subject to constraints, so that the model can be identified (for further details, see Asparouhov & Muthén, 2009; Marsh et al., 2010a). Here we

Measurement Invariance Across Spanish and English Versions of the PSDQ   183

used a robust maximum likelihood (MLR) estimation and oblique geomin rotation with an epsilon value of .5. Browne (2001) noted that epsilon values of .01 work well for a three- or four-factor (as confirmed in Asparouhov and Muthén (2009) simulation study based on a small scale) but may need to be increased for more factors. The PSDQ has 11 factors, so increasing the epsilon value seemed to be necessary. Thus, the ε = .50 value suggested by Marsh et al. (2009; also see Marsh et al., 2010a), based on large-scale empirical studies, was used. Factorial and Measurement Invariance.  An important advantage of the ESEM approach is the ability to test full measurement invariance for an EFA solution in relation to multiple groups or occasions. In applying the ESEM approach to tests of multigroup invariance, there are two approaches (e.g., Marsh et al., 2009; Myers et al., 2011). The approach taken in this study follows the taxonomy of 13 ESEM models proposed by Marsh et al. (2009). This taxonomy integrates traditional CFA approaches to factor invariance of covariance structures (e.g., Jöreskog & Sörbom, 1993; Marsh, 1994, 2007; Marsh & Grayson, 1994) and item-response-theory approaches to measurement invariance with mean structures (e.g., Meredith, 1964, 1993; Meredith & Teresi, 2006; also see Millsap, 2011; Vandenberg & Lance, 2000)—both quite widely used traditions to test invariance over multiple groups or occasions. The taxonomy includes some key models that test goodness of fit with no invariance constraints (MG_1, configural invariance); test the invariance of factor loadings alone (MG_2, weak measurement invariance); or in combination with the invariance of factor correlations (MG_4, factor variance–covariance invariance), item intercepts (MG_5, strong measurement invariance), or item intercepts and measurement errors (MG_7, strict measurement invariance). Models MG_1, MG_2, MG_5,

and MG_7 correspond with the models considered in Myers and colleagues’ (2011; Myers, 2013) approach, and include the key models to test for measurement invariance across groups. Within the 13 ESEM model approach (Marsh et al., 2009), additional models are also tested using a broader invariance perspective. Models vary from the least restrictive model of configural invariance (MG_1) to a model of complete invariance (MG_13) that posits strict invariance as well as the invariance of the latent means and the factor variance–covariance matrix (see Table 1; for a more extended discussion of these issues, see also Marsh et al., 2009). As stated previously, in order for these tests to be interpretable, it is essential that there is support for the invariance of factor loadings and item intercepts, but not the invariance of item uniquenesses or the factor variance–covariance matrix (e.g., Meredith, 1964, 1993). Regarding the determination of the numbers of factors to retain, ESEM methods can be used from a more exploratory or a more confirmatory approach. Myers (2013) refers to those two alternative approaches as the comparing-models approach and the theory-model approach, respectively, and indicates his preference toward the first one (see also Myers, Feltz, Guillén, & Dithurbide, 2012; Myers et al., 2011). The comparingmodels approach implies testing for alternative factor solutions manipulating the number of factors, while considering the interpretability of the solution with regard to a priori theory. Within the theory-model approach, the decision for the number of factors to accept is fixed at a theory-based value, that is, with more strictly confirmatory purposes. The authors of the current study would recommend using the comparing-models approach when developing or providing initial validity for new measurement instruments, and the theory-model approach when

Table 1  Taxonomy of Multiple Group Tests of Invariance Testable with ESEM Model MG_1 MG_2 MG_3 MG_4 MG_5 MG_6 MG_7 MG_8 MG_9 MG_10 MG_11 MG_12 MG_13

Parameters Constrained To Be Invariant none (configural invariance) FL (weak factorial/measurement invariance) FL, Uniq FL, FVCV FL, Inter (strong factorial/measurement invariance) FL, Uniq, FVCV FL, Uniq, Inter (strict factorial/measurement invariance) FL, FVCV, Inter FL, Uniq, FVCV, Inter FL, Inter, FMn (latent mean invariance) FL, Uniq, Inter, FMn (manifest mean invariance) FL, FVCV, Inter, FMn FL, Uniq, FVCV, Inter, FMn (complete factorial invariance)

Note. FL = factor loadings; FVCV = factor variance–covariances; Inter = item intercepts; Uniq = item uniquenesses; FMn = factor means. Models with freely estimated latent factor means constrain intercepts to be invariant across groups, whereas models in which intercepts are free imply that mean differences are a function of intercept differences.

184  Tomás et al.

there is a strong a priori theory and there are previous empirical studies that offer support for this theory. Nevertheless, we agree with Myers (2013) that there are no “golden rules mandating one approach over the other” (p. 717). As the present paper focuses on a widely validated instrument, this study follows the theory-model approach.

less, the change in CFI and RMSEA is an appropriate criterion for evaluating the relative fit of the taxonomy of nested measurement equivalence models because large sample sizes, such as ours, will almost always signal a statistically significant chi-square difference. Support for the more parsimonious model requires a change in CFI of less than .01 (Chen, 2007; Cheung & Rensvold, 2001) or a change in RMSEA of less than .015 (Chen, 2007). Thus, the changes in CFI and RMSEA were the primary methods used to assess the measurement invariance of the PSDQ across the Spanish and English versions.

Goodness of Fit.   To be consistent with previous works, the root-mean-square error of approximation (RMSEA), the Tucker–Lewis index (TLI), and the comparative fit index (CFI) provided by Mplus (Muthén & Muthén, 2008) will be reported. For both the TLI and CFI, values greater than .90 and .95, respectively, typically reflect acceptable and excellent fit to the data. For the RMSEA, values less than .05 and .08 reflect a close fit and a reasonable fit to the data, respectively (Marsh, Hau, & Wen, 2004). However, there is considerable evidence that, realistically, large factor structures (e.g., instruments with at least 50 items and at least five factors) are typically unable to satisfy even the minimally acceptable fit standards (Marsh, 2007; Marsh, Hau, & Grayson, 2005). Thus, considering the complexity of the model involved in our study (11 factors, and six or eight items per factor) and the large sample sizes (986 per group), it would seem reasonable to use less stringent cutoff values. Neverthe-

Results PSDQ Factor Structure: ESEM Versus CFA We began by testing single group models with CFA and ESEM, which allowed us to compare the CFA and ESEM approaches regarding goodness of fit and parameter solutions. Although the ICM-CFA solution provided a marginally acceptable fit to the data for both the Australian group (CFI = .919, TLI = .915; see Australia CFA in Table 2) and the Spanish group (CFI = .901, TLI = .896; see Spanish CFA in Table 2), the corresponding ESEM

Table 2  Summary of Goodness-of-Fit Statistics for All Models χ2 (df)

NFParm

CFI

TLI

RMSEA

SRMR

4986.785 (2290) 5358.483 (2290) 2678.441 (1700) 3344.682 (1700)

265 265 855 855

.919 .901 .971 .947

.915 .896 .958 .925

.035 .037 .024 .031

.053 .052 .016 .019

Total group Australia CFA Total group Spain CFA Total group Australia ESEM Total group Spain ESEM

Multiple (Two) Group PSDQ Invariance (Also See Table 1)  MG_1 6022.781 (3400) 1710 .959  MG_2 7106.432 (4049) 1061 .953  MG_3 7326.698 (4119) 991 .950  MG_4 7296.244 (4115) 995 .951  MG_5 7846.950 (4108) 1002 .942  MG_6 7532.655 (4185) 925 .948  MG_7 8068.562 (4178) 932 .940  MG_8 7999.935 (4174) 936 .941  MG_9 8238.306 (4244) 866 .938  MG_10 8093.396 (4119) 991 .938  MG_11 8364.309 (4189) 921 .935  MG_12 8268.344 (4185) 925 .937  MG_13 8515.452 (4255) 855 .934

.942 .943 .942 .942 .932 .940 .930 .931 .929 .928 .925 .927 .925

.028 .028 .028 .028 .030 .028 .031 .030 .031 .031 .032 .031 .032

.018 .027 .029 .052 .029 .053 .032 .053 .054 .032 .035 .055 .055

IN = none (FMn = 0) IN = FL (FMn = 0) IN = FL, Uniq (FMn = 0) IN = FL, FVCV (FMn = 0) IN = FL, Inter (FMn free) IN = FL, Uniq, FVCV (FMn = 0) IN = FL, Uniq, Inter (FMn free) IN = FL, FVCV, Inter (FMn free) IN = FL, FVCV, Inter, Uniq (FMn free) IN = FL, Inter, FMn IN = FL, Uniq, Inter, FMn IN = FL, FVCV, Inter, FMn IN = FL, FVCV, Inter, Uniq, FMn

Model Single Group Models  Australia_CFA  Spain_CFA  Australia_ESEM  Spain_ESEM

Description

Note. NFParm = number of free parameters; CFI = comparative fit index; TLI = Tucker–Lewis index; RMSEA = root mean square error of approximation; SRMR = standardized root mean square residual. All chi-square values were statistically significant (p < .001). For multiple group invariance models, “IN =” means the sets of parameters constrained to be invariant across the multiple groups: FL = factor loadings; FVCV = factor variance–covariances; Inter = item intercepts; Uniq = item uniquenesses; FMn = factor means.

Measurement Invariance Across Spanish and English Versions of the PSDQ   185

solutions fitted the data much better (Australia ESEM: CFI = .971, TLI = .958; Spanish ESEM: CFI = .947, TLI = .925; see Table 2). Furthermore, a detailed evaluation of the factor correlations among the PSDQ factors demonstrated a critical advantage of the ESEM approach over the ICMCFA approach. Although the patterns of correlations were similar, the CFA factor correlations tended to be systematically larger than the ESEM factor correlations for both the Australian group and the Spanish group. In the ICM-CFA solution, the median factor correlation was .50 (range = .11–.83) for the Australian group and .39 (range = .01–.76) for the Spanish group. In contrast, in the ESEM solution, the median factor correlation was .32 (range = .09–.64) for the Australian group and .29 (range = –.05 to .60) for the Spanish group. Thus, the sizes of correlations among factors were substantially smaller for the ESEM solution in both the Australian and Spanish groups. In summary, the ESEM solution was superior to the CFA solution, in terms of both fit and distinctiveness of the factors. Although for these data there was a welldefined CFA structure, the difference in results based on the four models provided a demonstration of how the ICM-CFA approach can increase the size of relations among the factors by constraining all cross-loadings to zero.

Invariance Across Australian and Spanish Groups: Multiple Group ESEM The model with no invariance constraints (MG_1 in Table 2) provided a good fit to the data (MG_1: CFI = .959, TLI = .942). These results supported the configural invariance of the PSDQ. That is, the ESEM model was able to fit data from the Spanish and English versions of the PSDQ when no additional invariance constraints were imposed. Model MG_2 (Table 2) constrained factor loadings to be invariant across the two groups. It is perhaps the most important model from both factor analysis and measurement invariance perspectives. Because the number of freely estimated factor loadings was so substantial, MG_2 was much more parsimonious than MG_1. Nevertheless, MG_2 provided a very good fit to the data. Indeed, for the traditional fit indexes that control for parsimony, the fit of MG_2 was slightly better than the fit of MG_1 (i.e., TLI = .943 vs. .942; RMSEA = .028 for both models). Moreover, the CFI is monotonic with parsimony for nested models, but even here the change in CFIs (.953 vs. .959) was considerably less than the .01 value recommended to support interpretations of invariance (Cheung & Rensvold, 2001). Thus, the comparison of Models MG_1 and MG_2 provided clear support for the invariance of the factor loadings, sometimes referred to as weak measurement invariance. Strong measurement invariance requires item intercepts—as well as factor loadings— to be invariant across groups. Therefore, the critical comparison was between Models MG_2 and MG_5 (Table 2). Based on traditional

fit indexes, the MG_5 fitted the data slightly worse than the MG_2 (TLI = .932 vs. 943; RMSEA = .030 vs. .028). However, in the MG_5, for both the TLI and CFI, values were within an acceptable range, whereas the RMSEA value reflected a close fit. Thus, the fit of MG_5 was almost equivalent to the fit of MG_2. Moreover, between the more and less complex models, the change in RMSEA was considerably less than the .015 value recommended to support interpretations of invariance (Chen, 2007). Thus, the gain in parsimony was greater than the loss in fit. Therefore, there was reasonable support for strong measurement invariance, suggesting that observed means could appropriately be compared in the Australian and Spanish groups. Strict measurement invariance requires item uniquenesses, item intercepts, and factor loadings to all be invariant across the groups. Here, the critical comparison was between models MG_5 and MG_7. Based on traditional fit indexes that control for parsimony, the fit of MG_7 was almost equivalent to the fit of MG_5 (TLI = .930 vs. .932; RMSEA = .031 vs. .030). Regarding MG_7, the TLI and CFI values were within an acceptable range, and the RMSEA value showed a close fit. Moreover, between the more and less parsimonious models, the changes in CFI and RMSEA were considerably less than the .01 and .015 values recommended, respectively, to support interpretations of invariance (Cheung & Rensvold, 2001). Again, the gain in parsimony was greater than the loss in fit. Therefore, there was good support for strict measurement invariance, suggesting that the observed factor means and the factor variances could appropriately be compared in the Australian and Spanish groups. Invariance of the factor variance–covariance matrix is typically not a focus in studies of measurement invariance because they usually focus on tests of unidimensionality based on a single construct. However, it is frequently an important focus in studies of the invariance of covariance structures—particularly, studies of the discriminant validity of multidimensional constructs that might subsequently be extended to include relations with other constructs. Here, the most basic comparison was between Models MG_2 and MG_4. Based on fit indexes that control for parsimony, the fit of MG_4 was almost equivalent to the fit of MG_2 (TLI = .942 vs. .943; RMSEA = .028 vs. .028). Regarding MG_2 and MG_4, the TLI and CFI values were within an acceptable range, while the RMSEA value reflected a close fit. Thus, the fit of MG_4 was as good as the fit of MG_2. The changes in CFI and RMSEA were considerably less than the .01 and .015 values recommended, respectively, to support interpretations of invariance (Cheung & Rensvold, 2001). Thus, there was good support for the invariance of the factor variance–covariance matrix across the two groups. Tests of the invariance of the latent factor variance– covariance matrix, as is the case with other comparisons, could be based on any pair of the six models in Table 2 that differed only in relation to whether the factor variance–covariance matrix was free or not. In our study, the support for the invariance of the factor variance–

186  Tomás et al.

covariance matrix was consistent across each of these alternative comparisons. Finally, we addressed the issue of the invariance of the factor means across the two groups. Again, there were several models that could be used to make this comparison (MG_5 versus MG_10; MG_7 versus MG_11; and MG_8 versus MG_12; MG_9 versus MG_13), provided that they differed in whether they assumed invariant factor means or not. We carried out all the possible comparisons, and the outcomes obtained supported the invariance of factor means across the two groups, with fit indexes that control for parsimony all showing that the more constrained models fitted the data as well as the less constrained models. In summary, there was good support for the invariance of factor means across the Australian and Spanish groups. It is important to note that the ICM-CFA solution offered similar results, although consistently the ICMCFA approach fitted the data worse than the ESEM solution for all 13 tested models (the CFA solution can be obtained from the first author).

Discussion According to our results, and congruent with previous applications of ESEM methods in other research fields, the ESEM structure fitted the data better than the ICMCFA approach. The median correlation among ESEM factors for the Australian and Spanish groups was only .32 and .29, respectively, whereas among CFA factors based on the same data for the Australian and Spanish groups, it was .50 and .39, respectively. Although such a huge difference might seem surprising, it is consistent with the logic of the ESEM approach. In particular, when a large number of relatively small cross-loadings are constrained to zero, as in the ICM-CFA solution, the only way these cross-loadings can be represented is by inflating the size of the factor correlations. Overall, the PSDQ exhibited good psychometric properties in the Spanish and Australian samples. Configural invariance was supported, demonstrating that the hypothesized 11-factor structure was invariant across responses by the two groups of Australian and Spanish adolescents. Strong factorial invariance (invariance of factor loadings and intercepts) was also supported. Moreover, we also found support for the invariance of uniquenesses. Thus, strict factorial invariance across the two linguistic versions of the PSDQ was supported by the ESEM approach. According to Meredith (1993, 1995), the following overarching implications for crosscultural research using the two studied versions of the PSDQ were reached: the observed average item and scale scores, and the observed item and scale variances can be meaningfully compared across the English and Spanish versions. An additional implication of the strict factorial invariance model is that because of uniqueness invariance, the English and Spanish versions of the PSDQ items show equivalent precision across groups.

Our findings have a number of theoretical and practical implications. First, results have substantive importance for self-concept theory, as they provide evidence for the cross-cultural validation of the PSDQ, one of the most widely used multidimensional physical self-concept instruments. Second, this study strengthened the PSDQ’s potential usefulness in sport and exercise research and applied settings in Spanish-speaking countries. Third, the study provides further evidence of the broad applicability of ESEM methods, and contributes to foster better crosscultural research in sport/exercise psychology. Despite strengths in the present research in the construct validation approach, some shortcomings should be noted. First, although our results support the appropriateness of the PSDQ for Spanish high school students, they do not guarantee it for other settings and other groups. Further research is needed in different Spanish settings and groups (such as sporting clubs, elite athlete institutions, sport and leisure associations, and university students) to contribute to the growing body of knowledge about selfconcept validation. Second, in ESEM, like EFA, there is a rotational indeterminacy such that different rotation strategies can result in different factor solutions that all have the same fit. In the current study, we used oblique geomin rotation with an epsilon value of .5, as suggested in previous empirical studies using scales with more than four factors (Marsh et al., 2009, 2010a). The rationale, as subsequently emphasized by Sass and Schmitt (2010), is that there is necessarily a balance between constraints on the sizes of cross-loadings and factor correlations. By using a slightly larger value of epsilon, the resulting solution places less emphasis on minimizing the sizes of cross-loadings and more emphasis on minimizing the sizes of the factor correlations, noting that none of these differences had any effect on goodness of fit and that the a priori factor structure was clearly evident with ε = .5. Nevertheless, despite the recent contributions on rotation criteria (e.g., Sass & Schmitt, 2010), we note that issues about the most appropriate rotational strategy require more research and might be idiosyncratic to particular applications. Moreover, further research comparing solutions based on varying epsilon values is needed. In summary, the current study provided evidence of empirical generalizability of physical self-concept with Spanish adolescents, which contributed to the construct validation of interpretations of PSDQ responses and their cross-national generalizability. Moreover, the use of ESEM proved to be a valuable approach for testing for measurement invariance across different language versions of an instrument in the test translation-adaptation process. Acknowledgments This investigation was supported by Research Grant CONSOLIDER-C (SEJ2006-14086/PSIC) and PROMETEO (2012/048) from the Spanish Ministry of Education and Science. Note. Requests for further information about this investigation should be sent to first-listed author, Inés Tomás.

Measurement Invariance Across Spanish and English Versions of the PSDQ   187

References Alessandri, G., Vecchione, M., Caprara, G., & Letzring, T.D. (2012). The ego resiliency scale revised. A crosscultural study in Italy, Spain and the United States. European Journal of Psychological Assessment, 28, 139–146. doi:10.1027/1015-5759/a000102 Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397–438. doi:10.1080/10705510903008204 Balducci, C., Fraccaroli, F., & Schaufeli, W.B. (2010). Psychometric properties of the Italian version of the Utrecht Work Engagement Scale (UWES-9): A cross-cultural analysis. European Journal of Psychological Assessment, 26, 143–149. doi:10.1027/1015-5759/a000020 Ben-Porath, Y.S., Almagor, M., Hoffman-Chemi, A., & Tellegen, A. (1995). A cross-cultural study of personality with the multidimensional personality questionnaire. Journal of Cross-Cultural Psychology, 26, 360–373. doi:10.1177/0022022195264002 Browne, M.W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111–150. doi:10.1207/S15327906MBR3601_05 Brunet, J., Sabiston, C., Castonguay, A., Ferguson, L., & Bessette, N. (2012). The association between physical selfdiscrepancies and women’s physical activity: The mediating role of motivation. Journal of Sport & Exercise Psychology, 34, 102–123. PubMed Byrne, B.M., & Watkins, D. (2003). The issue of measurement invariance revisited. Journal of Cross-Cultural Psychology, 34, 155–175. doi:10.1177/0022022102250225 Chen, F.F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464–504. doi:10.1080/10705510701301834 Cheung, G.W., & Rensvold, R.B. (2001). The effects of model parsimony and sampling error on the fit of structural equation models. Organizational Research Methods, 4, 236–264. doi:10.1177/109442810143004 Christopher, M.S., Norris, P., D’Souza, J.B., & Tiernan, K.A. (2012). A test of the multidimensionality of the self-construal scale in Thailand and the United States. Journal of Cross-Cultural Psychology, 43, 758–773. doi:10.1177/0022022111406119 Ferrando, P.J. (1996). Calibration of invariant item parameters in a continuous item response model using the extended LISREL measurement submodel. Multivariate Behavioral Research, 31(4), 419–439. doi:10.1207/ s15327906mbr3104_2 Fox, K.R., & Corbin, C.B. (1989). The physical self-perception profile: Development and preliminary validation. Journal of Sport & Exercise Psychology, 11, 408–430. Guerin, F., Marsh, H.B., & Famose, J.P. (2004). Generalizability of the PSDQ and its relationship to physical fitness: The European French connection. Journal of Sport & Exercise Psychology, 26, 19–38. Guo, B., Aveyard, P., & Dai, X. (2009). The Chinese intelligence scale for young children. Testing factor structure and measurement invariance using the framework of the Wechsler intelligence tests. Educational and Psychological Measurement, 69, 459–474. doi:10.1177/0013164409332209

Hambleton, R.K., & Kanjee, A. (1995). Increasing the validity of cross-cultural assessments: Use of improved methods for test adaptations. European Journal of Psychological Assessment, 11(3), 147–157. doi:10.1027/1015-5759.11.3.147 Jackson, B., Whipp, P.R., Chua, K.L.P., Dimmock, J.A., & Hagger, M.S. (2013). Students’ tripartite efficacy beliefs in high school physical education: Within- and cross-domain relations with motivational processes and leisure-time physical activity. Journal of Sport & Exercise Psychology, 35, 72–84. PubMed Jöreskog, K.G., & Sörbom, D. (1993). LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Chicago, IL: Scientific Software International Inc. Li, F., Harmer, P., Chi, L., & Vongjaturapat, N. (1996). Crosscultural validation of the task ego orientation in sport questionnaire. Journal of Sport & Exercise Psychology, 18, 392–407. Marsh, H.W. (1990). A multidimensional, hierarchical model of self-concept: Theoretical and empirical justification. Educational Psychology Review, 2, 77–171. doi:10.1007/ BF01322177 Marsh, H.W. (1994). Confirmatory factor analysis models of factorial invariance: A multifaceted approach. Structural Equation Modeling, 1, 5–34. doi:10.1080/10705519409539960 Marsh, H.W. (1996). Construct validity of Physical Self-Description Questionnaire responses: Relations to external criteria. Journal of Sport & Exercise Psychology, 18, 111–131. Marsh, H.W. (2007). Application of confirmatory factor analysis and structural equation modeling in sport/exercise psychology. In G. Tenenbaum & R.C. Eklund (Eds.), Handbook of sport psychology (3rd ed., pp. 774–798). Hoboken, NJ: Wiley. Marsh, H.W., Byrne, B.M., & Shavelson, R. (1988). A multifaceted academic self-concept: Its hierarchical structure and its relation to academic achievement. Journal of Educational Psychology, 80, 366–380. doi:10.1037/0022-0663.80.3.366 Marsh, H.W., Ellis, L., Parada, L., Richards, G., & Heubeck, B.G. (2005). A short version of the Self Description Questionnaire II: Operationalizing criteria for short-form evaluation with new applications of confirmatory factor analyses. Psychological Assessment, 17, 81–102. PubMed doi:10.1037/1040-3590.17.1.81 Marsh, H.W., & Grayson, D. (1994). Longitudinal stability of latent means and individual differences: A unified approach. Structural Equation Modeling, 1, 317–359. doi:10.1080/10705519409539984 Marsh, H.W., Hau, K.T., & Grayson, D. (2005). Goodness of fit evaluation in structural equation modeling. In A. MaydeuOlivares & J. McCardle (Eds.), Psychometrics: A festschrift to Roderick P. McDonald (pp. 275–340). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Marsh, H.W., Hau, K.T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis testing approaches to setting cutoff values for fit indexes and dangers in overgeneralising Hu & Bentler’s (1999) findings. Structural Equation Modeling, 11, 320–341. doi:10.1207/s15328007sem1103_2 Marsh, H.W., Hey, J., Roche, L., & Perry, C. (1997). Structure of physical self-concept: Elite athletes and physical education students. Journal of Educational Psychology, 89, 369–380. doi:10.1037/0022-0663.89.2.369

188  Tomás et al.

Marsh, H.W., Liem, G.A.D., Martin, A.J., Morin, A.J.S., & Nagengast, B. (2011). Methodological measurement fruitfulness of exploratory structural equation modelling (ESEM): New approaches to key substantive issues in motivation and engagement. Journal of Psychoeducational Assessment, 29(4), 322–346. doi:10.1177/0734282911406657 Marsh, H.W., Lüdtke, O., Muthén, B., Asparouhov, T., Morin, A.J.S., Trautwein, U., & Nagengast, B. (2010a). A new look at the big five factor structure through exploratory structural equation modeling. Psychological Assessment, 22(3), 471–491. PubMed doi:10.1037/a0019227 Marsh, H.W., Martin, J.M., & Jackson, S. (2010b). Introducing a short version of the physical self description questionnaire: New strategies, short-form evaluative criteria, and applications of factor analyses. Journal of Sport & Exercise Psychology, 32, 438–482. PubMed Marsh, H.W., Morin, A.J.S., Parker, P.D., & Kaur, G. (in press). Exploratory Structural Equation Modeling: An Integration of the Best Features of Exploratory and Confirmatory Factor Analysis. Annual Review of Clinical Psychology. doi:10.1146/annurev-clinpsy-032813-153700 Marsh, H.W., Muthén, B., Asparouhov, T., Lüdtke, O., Robitzsch, A., Morin, A.J.S., & Trautwein, U. (2009). Exploratory structural equation modeling, integrating CFA and EFA: Application to students’ evaluations of university teaching. Structural Equation Modeling, 16, 439–476. doi:10.1080/10705510903008220 Marsh, H.W., Richards, G.E., Johnson, S., Roche, L., & Tremayne, P. (1994). Physical Self-Description Questionnaire: Psychometric properties and a multitraitmultimethod analysis of relations to existing instruments. Journal of Sport & Exercise Psychology, 16, 270–305. McDonald, R.P. (2000). A basis for multidimensional Item Response Theory. Applied Psychological Measurement, 24, 99–114. doi:10.1177/01466210022031552 Meredith, W. (1964). Rotation to achieve factorial invariance. Psychometrika, 29, 187–206. doi:10.1007/BF02289700 Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. doi:10.1007/BF02294825 Meredith, W. (1995). Two wrongs may not make a right. Multivariate Behavioral Research, 30, 89–94. doi:10.1207/ s15327906mbr3001_8 Meredith, W., & Teresi, J. (2006). An essay on measurement and factorial invariance. Medical Care, 44(Suppl. 3), S69–S77. PubMed doi:10.1097/01.mlr.0000245438.73837.89 Millsap, R.E. (2011). Statistical approaches to measurement invariance. New York, NY: Routledge. Morin, A.J.S., Marsh, H.W., & Nagengast, B. (2013). Exploratory structural equation modeling. In G.R. Hancock & R.O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 395–436). Charlotte, NC: Information Age Publishing, Inc. Muthén, L.K., & Muthén, B. (2008). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén. Muthén, L.K., & Muthén, B.O. (2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Author. Myers, N.D. (2013). Coaching competency and (exploratory) structural equation modeling: A substantive-methodologi-

cal synergy. Psychology of Sport and Exercise, 14, 709–718. PubMed doi:10.1016/j.psychsport.2013.04.008 Myers, N.D., Chase, M.A., Pierce, S.W., & Martin, E. (2011). Coaching efficacy and exploratory structural equation modelling: A substantive-Methodological Synergy. Journal of Sport & Exercise Psychology, 33, 779–806. PubMed Myers, N.D., Feltz, D.L., Guillén, F., & Dithurbide, L. (2012). Development of, and initial validity evidence for, the Referee Self-Efficacy Scale: A multi-study report. Journal of Sport & Exercise Psychology, 34, 737–765. PubMed Sass, D.A., & Schmitt, T.A. (2010). A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research, 45, 73–103. doi:10.1080/00273170903504810 Scanlan, T.K., Russell, D.G., Magyar, T.M., & Scanlan, L.A. (2009). Project on elite athlete commitment (PEAK): III. An examination of the external validity across gender, and the expansion and clarification of the sport commitment model. Journal of Sport & Exercise Psychology, 31, 685–705. PubMed Shavelson, R.J., Hubner, J.J., & Stanton, G.C. (1976). Self-concept: Validation of construct interpretations. Review of Educational Research, 46, 407–441. doi:10.3102/00346543046003407 Sireci, S.G., Yang, Y., Harter, J., & Ehrlich, E.J. (2006). Evaluating guidelines for test adaptations: A methodological analysis of translation quality. Journal of Cross-Cultural Psychology, 37, 557–567. doi:10.1177/0022022106290478 Smith, G.T., McCarthy, D.M., & Anderson, K.G. (2000). On the sins of short-form development. Psychological Assessment, 12, 102–111. PubMed doi:10.1037/1040-3590.12.1.102 Sonstroem, R.J., Speliotis, E.D., & Fava, J.L. (1992). Perceived physical competence in adults: An examination of the Physical Self-Perception Scale. Journal of Sport & Exercise Psychology, 10, 207–221. Taylor, I.A., Ntoumanis, N., Standage, M., & Spray, C.M. (2010). Motivational predictors of physical education students’ effort, exercise intentions, and leisure-time physical activity: A multilevel linear growth analysis. Journal of Sport & Exercise Psychology, 32, 99–120. PubMed Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70. doi:10.1177/109442810031002 Weiss, M.R., & Amorose, A.J. (2005). Children’s self-perceptions in the physical domain: Between- and within-age variability in level, accuracy, and sources of perceived competence. Journal of Sport & Exercise Psychology, 27, 226–244. Wilson, R.C., Sullivan, P.J., Myers, N.D., & Feltz, D.L. (2004). Sources of sport confidence of master athletes. Journal of Sport & Exercise Psychology, 26, 369–384. Yoon, K., Schmidt, F.L., & Ilies, R. (2002). Cross-cultural construct validity of the five-factor model of personality among Korean employees. Journal of Cross-Cultural Psychology, 33, 217–235. doi:10.1177/0022022102033003001 Manuscript submitted: April 1, 2013 Revision accepted: December 30, 2013

Copyright of Journal of Sport & Exercise Psychology is the property of Human Kinetics Publishers, Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Testing measurement invariance across Spanish and English versions of the physical self-description questionnaire: an application of exploratory structural equation modeling.

Test of measurement invariance across translated versions of questionnaires is a critical prerequisite to comparing scores on the different versions. ...
370KB Sizes 1 Downloads 3 Views

Recommend Documents