______________________________________________________________________________________________________________

Fam Proc 29:191-198, 1990

Measurement Issues in Family Assessment: The Case of the Family Environment Scale MARK W. ROOSA, Ph.D.a JANETTE BEALS, Ph.D.b aDepartment of Family Resources and Human Development, Program for Prevention Research, Arizona State University, Tempe AZ

85287-2502. bNational Center for American Indian and Alaska Native Mental Health Research, University of Colorado Health Sciences Center,

Denver CO.

The internal consistency reliabilities of 5 subscales of the Family Environment Scale were re-examined in a study of 311 stressed and 74 control families. The reliability coefficients generated were lower than those originally reported for this instrument; most coefficients generated were below the acceptable level for practical or research use, and there was considerable variation in the reliabilities across type of stressed family. An attempt to generate more reliable scales using the original items was unsuccessful, and questions about the validity of the subscales were raised. Our results illustrate the importance of examining reliabilities of instruments, even well-known and widely used instruments, for each sample studied. In recent years there has been an increased interest in developing measures that describe family-level phenomena, especially systemic characteristics of families. One of the most commonly used of these relatively new measures is the Family Environment Scale (FES; 17, 23). The 90-item FES consists of 10 subscales that describe the social environment of the family as perceived by its members. The FES has been used in a wide variety of studies, including studies of families with alcoholic members (18, 19, 20, 21, 24) and families with psychiatric patients as members (3, 28, 29). In addition, the FES has been used in studies of family typologies (2, 22), "psychological father presence" (7, 8), adolescent personality (10), social support (1), family and individual therapy (3, 12, 23), and stress in police officers' families (16). Generally speaking, there are two yardsticks for assessing the quality of measures: reliability and validity. Reliability can be thought of as the accuracy of a measure or the consistency with which it repeatedly measures the same phenomenon (9). Although there are several types of reliability, this article will focus upon internal consistency reliability or the degree to which items in a given multiple-item scale inter-correlate or measure the same phenomenon. No measure is totally consistent, but the more consistent a measure is or the less that the measurement reflects error, the more reliable it is considered to be. For research purposes, reliability coefficients above .70 are generally considered satisfactory for data at the group level; when data are to be used at the individual level (for example, to determine whether a given family needs therapy or what issues to focus on in therapy), higher levels of reliability are required. The lower the reliability of a measure, the more the scale score represents measurement error and not the construct being measured. To test hypotheses using measures with low reliability is problematic because the results, whether significant or not, cannot be interpreted with confidence; it is impossible to determine whether the results are due to actual differences (or lack thereof) in the construct of interest, or due to bias in the measure. Validity is the relationship between a measure and the abstract concept it represents, that is, the degree to which the measure assesses what it purports to measure (9). If the validity of the measure is not clearly established, hypothesis testing with the measure is of little utility since one is not sure of the construct being measured. Therefore, it should be apparent that, without a clear demonstration that an instrument possesses an acceptable degree of both reliability and validity, researchers and practitioners have no basis for deciding whether an instrument is appropriate for their particular use. Since the FES is a widely used measure, one might assume that its psychometric properties have been thoroughly tested and documented. Surprisingly, this does not seem to be the case. In the early stages of the development of the instrument, Moos (17) did determine test-retest reliability and internal consistency for the 10 FES subscales. Moos reported internal consistency reliability coefficients (Cronbach's alpha) for the 10 subscales of the FES ranging from .61 to .78, with five of the reliabilities being above the preferred .70. Interestingly, most persons who have used the FES since then refer to the original internal-consistency estimates (see 2, 8, 10, 12, 16), while only a few studies reported these estimates for their samples (see 4, 25). This pattern could potentially be quite serious for the family research field since one of the major problems in environmental research is that different levels of any environment can be related to different levels of measurement sensitivity (26). That is, the reliability of a measure like the FES may be different for different family structures (for example, one- or two-parent families) or for families experiencing different levels of distress. In such cases, it is possible to have a complex interaction between the environment, measurement sensitivity, and the level of the variable being measured in the environment. It should be noted that different reliabilities for different samples on a given measure 1

______________________________________________________________________________________________________________

make meaningful comparisons of the measure on those samples impossible. This is particularly problematic for family scholars since such comparisons are among the most common uses of family-level measures. Thus, it is particularly important for researchers to report the reliability of such scales across samples. Factor analysis is one method used to maximize internal consistency. One can perform factor analysis on a series of items and the resultant factor structure will identify those items that are intercorrelated with one another yet distinct from items loading on other factors. Theoretically, this factor structure should be identical to the a priori defined subscales. Unfortunately, no factor analytic studies of the FES at the item level were found. Interestingly, the factor analytic studies that have been done with the FES used subscale scores rather than the individual items and, thus, provided no information about the subscale structure of the FES (5, 11). The fact that studies of the factor structure at the item level of the individual subscales were not found could be a critical shortcoming since many researchers selectively use individual subscales (see 1) or groups of subscales (see 7) that often do not correspond to either the original dimensions of the FES or the dimensions derived from factor analysis. The question of the reliability of the FES takes on added importance when one closely follows the steps taken to develop the measure. First, a 200-item version of the instrument was given to over 1,000 adults and adolescents from 285 families; that is, 3 or more people per family were included in the study (17, 23). Items were assigned to subscales on an unspecified, a priori basis. Then, using responses from the same 285 families, the poorly fitting items were deleted from the scales based on inter-item correlations. Finally, subscale reliabilities (internal consistencies) were calculated using the responses of the same sample. The developers of the FES followed a relatively standard, instrument-development process although, to their credit, the sample used was larger than that commonly used for this purpose. There are two problems with the above process with respect to using the FES without reporting reliabilities. First, having multiple reporters from each family, although a useful technique for understanding family dynamics when using an established scale, may introduce bias into the instrument-development process due to the lack of independence of the data. Second, after dropping items with low inter-item correlations, one would expect to obtain acceptable reliabilities since items that would lower estimates have been dropped. Thus, the question remains as to whether the reported reliabilities were artifacts of the sample used. Ideally, one should confirm reliabilities on a different sample than the one used to select the items for the scales. Since this rather expensive final step in the instrument-development process was not carried out, it is doubly necessary for researchers using the instrument to determine the reliabilities for their samples before testing hypotheses using the FES. Unfortunately, no such option is available for the practitioner. Because of the potential value of the FES as a tool for helping researchers understand family functioning and family impact on children, it is important that its psychometric properties be examined thoroughly and in a fashion that will be most relevant to the way researchers and practitioners are likely to use the measure. The purpose of this article is to test the properties of 5 subscales of the FES in order to provide information to help future researchers and clinicians make informed decisions regarding its use.

STUDY I: METHOD Sample As part of a study of 8- to 15-year-old children in stressed families, personal interviews were conducted with: 94 divorced families; 92 families with a recently deceased parent; 99 families with a child with chronic illness (asthma); 26 families with a parent seeking treatment for alcoholism; and 74 control families (that did not meet criteria for the other conditions). In all, 385 adults (one from each family) completed measures used in this study. The respondents were predominantly female (86%), white (87%), and middle class (most had at least some college education and incomes of $25,000 to $30,000). On average, respondents were 38 years old and had over two children, with their oldest child between 11 and 12 years old.

Procedure The stressed families were recruited through court records, death certificates, and physician or clinic referrals. Control families were recruited by first randomly selecting 25% of the stressed families and then finding control families (that is, families that had not experienced any of the stressors of interest) in the neighborhoods of the selected stressed families; in addition, families were selected as controls only if they matched with the stressed families on SES and child age and gender. After families agreed to participate in the study, two trained interviewers were sent to their homes to conduct the interviews. One person interviewed the participating parent while a second person interviewed the study child in a separate part of the home. Some of the interviews with alcoholic families took place in the agencies where the alcoholic was receiving treatment. The adult participants completed a large battery of instruments including the Family Environment Scale (17, 23). Five

2

______________________________________________________________________________________________________________

sub-scales of the FES were used: cohesion, expressiveness, conflict, organization, and control. The 5 "personal growth" subscales were excluded because they were unrelated to the central focus of the study.

Analyses The analyses consisted of calculating estimates of internal consistency reliability (Cronbach's alpha) for the subscales of the FES. Additionally, confirmatory factor analysis (14) was used to test whether the proposed factor structure did, in fact, fit the data. Confirmatory factor analysis allows one to define the expected factor structure in advance (for example, those items that make up a given subscale) and test the null hypothesis that this subscale structure adequately fits the data. Confirmatory factor analysis provides a more rigorous and controlled test of factor structure than the more commonly used factor analysis that is exploratory in nature and not useful for hypothesis testing.

RESULTS As shown in Table 1, estimates of the internal consistency reliability of the FES subscales for the whole sample were generally much lower than those reported by Moos (17, 23). While the FES manual reports reliability estimates for the 5 scales that range from .67 to .78, the same subscales in the present sample attained internal consistencies ranging from .36 to .75. Furthermore, when examining the reliability coefficients across the different family types, one notices that there is tremendous variability in the estimates. None of the coefficients for expression or control fell within an acceptable range. Only for the conflict subscale was the majority of the reliability coefficient estimates within the acceptable range. Table 1 Internal Consistency Reliability Coefficients (Cronbach's Alpha) of Five Unaltered Subscales of the Family Environment Scale across Different Types of Familiesy Cohesion

Expression

Conflict

Organization

Control

Whole sample

.62

.46

.71

.63

.47

Alcoholic families

.58

.52

.72

.74

.47

Asthma families

.61

.49

.70

.55

.42

Bereaved families

.63

.53

.76

.62

.46

Divorced families

.53

.36

.61

.60

.59

Control families

.63

.40

.74

.64

.47

Moos's reported reliabilities

.78

.69

.75

.76

.67

y

N = 385 adults

Confirmatory factor analysis allows one to test specific models that are proposed to fit a set of data. In this case, Moos's contention that the 45 items fit in the 5 subscales, as defined in the scoring manual, provided a testable model. However, this model provided a poor fit to the data (χ2 [935, N = 368] = 1559.6, p < .001; Goodness-of-Fit Index [GFI] = .833) according to the results of a confirmatory analysis using the total sample of adults. In confirmatory factor analysis, the chi-square test of the null hypothesis that the subscale structure adequately fits the data is rejected because the result is statistically significant. However, since the chi-square statistic is dependent upon sample size, it is possible that this null hypothesis would be rejected for all models because of the relatively large sample used (15). In such cases, the GFI is a more appropriate indicator, and, for most purposes, it should be .900 or higher to indicate a good fit. That was not the case here. Since Moos's model of the FES is a 90-item measure containing 10 subscales, the confirmatory factor analysis of a 45-item, 5-subscale model falls short of a test of the complete model. However, since the results of the confirmatory factor analysis were consistent with the results of the reliability analysis reported above, there is good support for the argument that the subscale structure for this half of the FES is at least suspect and may be inadequate for many research purposes in its current form.

STUDY II: METHOD An examination of the parameter estimates and their standard errors from the confirmatory factor analysis in Study I found indications that several items should be dropped from their specified subscales or loaded on other subscales to improve reliabilities. In fact, some other studies that have discovered reliability problems with the FES have found that most, but not all, FES subscales could achieve acceptable reliability coefficients by dropping one or two poorly fitting items from each subscale (Scott Christopher, personal communication, 1987). However, since the reliabilities for the current sample were so low and the process by which items were originally assigned to subscales was unknown, such a strategy

3

______________________________________________________________________________________________________________

could result in the development of reliable subscales of questionable validity. Therefore, the more conservative approach of using an expert panel to guide the scale development process was followed.

Procedure A panel of 12 graduate students in psychology was asked to evaluate the subscale assignment of each item in the 5 FES subscales used. Each rater was given the title and definition of each subscale (17, 23) and asked to assign each FES item to the definition to which it belonged. If they were not certain of the appropriate placement of an item, or if an item fit equally well under more than one category, panelists were asked to place the item in a discard pile. The new subscale structure would be determined by using only those items on which at least 8 of the 12 (67%) panelists agreed. Once new subscales were formed, internal consistency reliability estimates and confirmatory factor analysis were used again to estimate reliabilities.

RESULTS At the conclusion of the sorting task, the panelists agreed with each other and the original subscale structure on only 4 of 9 items for cohesion; 6 of 9 for expression; 5 of 9 for conflict; 4 of 9 for organization; and 5 of 9 for control. In addition, 10 of the 12 panel members thought that one item from the expressiveness subscale was more appropriate for the organization subscale. Thus, there was considerable disagreement with the face validity of the items originally assigned to the subscales. As shown in Table 2, the internal reliability coefficients (Cronbach's alpha) for the new subscale structure were no higher than those of the original subscale structure. However, since the revised subscales are based on fewer items, one should expect lower estimates. Again, the reliabilities for the individual samples were quite varied. For example, the coefficient for the cohesion subscale for the control families was twice that of the same coefficient for families with an asthmatic child. The results of the confirmatory factor analysis using the total sample indicated that the new structure provided a somewhat better fit to the data than the original (χ2 [265, N = 368] = 456.7, p < .001; GFI = .911). Although the hypothesis that the subscale structure provided a good fit to the data must still be rejected, according to the chi-square statistic, the GFI was greater than .900, indicating that the new subscale structure does provide a reasonable fit to the data. That is, the restructured subscales fit the data better than the original structure. However, at least in part, because of the small number of items remaining in each subscale, it was not possible to achieve an acceptable level of internal consistency reliability (13) in the restructured subscales. In such cases, the only way to achieve acceptable levels of reliability is to generate additional items for each subscale of interest. However, there remains considerable concern about the sizable variation in the reliability coefficients across samples;it would appear that people may have been interpreting the items differently depending upon the family's characteristics. Table 2 Internal Consistency-Reliability Coefficients (Cronbach's Alpha) for Five Subscales of the Family Environment Scale Across Different Types of Familiesy Cohesion (4)

Expression (6)

Conflict (5)

Organization (4)

Control (5)

Whole sample

.57

.43

.64

.48

.56

Alcoholic families

.40

.31

.57

.51

.71

Asthma families

.34

.39

.62

.42

.54

Bereaved families

.62

.53

.69

.39

.50

Divorced families

.59

.49

.55

.54

.58

Control families

.70

.31

.66

.48

.53

Moos's reported reliabilities

.78

.69

.75

.76

.67

Note: Number in parentheses indicates how many of the 9 items (in each subscale) on which panelists agreed with each other and the original subscale structure. y N = 385 adults

DISCUSSION Although the FES has been widely used, reliability estimates generally have not been reported. The reliability estimates reported for the sample in the current study were considerably lower than those originally reported by Moos (17, 23), and few of these estimates were in the acceptable range. Further, reliability coefficients varied by type of stressed family, making comparisons across types impossible. In addition, ratings by a panel raised questions about the validity of the subscales, although more elaborate validity testing needs to be done. Finally, efforts to generate replacement subscales

4

______________________________________________________________________________________________________________

using the original 45 items failed, at least in part because of the reduced number of items assigned to each subscale. The sample in this study was similar to Moos's original sample in that it included both stressed and nonstressed samples. A major sampling difference, however, was that Moos included multiple family members in his sample, whereas the current study is based on the report of one parent. The original estimates also were generated from the same sample used to select items for the scales (as was done in the second study above). The combination of a lack of independence of these data and the use of one sample both to select items and generate reliability estimates may have contributed to the higher reliability estimates originally reported; that is, the higher estimates may be artifacts of the methodological decisions made rather than representative of the FES. Our results reaffirm the importance of performing basic psychometric analyses prior to using scales in more complex analyses. Given the fact that internal consistency is a limiting factor in the strength of the relationships one can find between measures (the lower the internal consistency of one's measures, the lower the correlation between measures [13]), it is to the researcher's benefit to assess the reliability of subscales prior to performing any major analyses. These results raise serious concerns about the future use of the FES. Researchers should be cautious in choosing to use the FES until more is known about its reliability across samples and its applicability to their specific sample. If the FES is shown in other studies to be sensitive to varied interpretations by different samples, the utility of the FES for comparing types of families or for understanding family behavior and influences upon family members must be questioned. Finally, one must be particularly cautious in interpreting the results of those studies that have used the FES without reporting reliability estimates. The implications of this article reach far beyond the use and interpretation of the FES. There is no reason to believe that other family measures are immune to the same limitations of the FES unless researchers have carefully established psychometric properties across samples and have confirmed these results with the samples under consideration. In order to avoid the publication of misleading research findings, editors and reviewers of family journals should adopt a policy of requiring authors to provide at least the most basic psychometric information about their instruments and how they performed with the sample studied. It has been said that science develops and progresses as the quality of its measures improve (6, 27). If so, those in family research, especially those who serve as gatekeepers, need to take steps to assure that family science continues to grow and improve by paying more attention to psychometric issues during the review process.

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

5

Barrera, M., Sandler, I. N. and Ramsay, T. B., Preliminary development of a scale of social support: Studies on college students. American Journal of Community Psychology, 9, 435-447, 1981. Billings, A. G. and Moos, R. H., Family environments and adaptations: A clinically applicable typology. American Journal of Family Therapy, 10(2), 26-38, 1982. Billings, A. G. and Moos, R. H., Treatment experience of adults with unipolar depression: The influence of patient and life context factors. Journal of Consulting and Clinical Psychology, 52, 119-131, 1984. Bloom, B. L., A factor analysis of self-report measures of family functioning. Family Process, 24, 225-239, 1985. Boake, C. and Salmon, P. G., Demographic correlates and factor structure of the Family Environment Scale. Journal of Clinical Psychology, 39, 95-100, 1983. Borg, W. R. and Gall, M. D., Educational research: An introduction (4th ed.). New York: Longman, 1983. Boss, P., A clarification of the concept of psychological father presence in families experiencing ambiguity of boundary. Journal of Marriage and the Family, 39, 141-151, 1977. Boss, P., The relationship of psychological father presence, wife's personal qualities and wife/family dysfunction in families of missing fathers. Journal of Marriage and the Family, 42, 541-549, 1980. Carmines, E. G. and Zeller, R. A., Reliability and validity assessment. Beverly Hills CA: Sage Publications, 1979. Forman, S. G. and Forman, B. D., Family environment and its relation to adolescent personality factors. Journal of Personality Assessment, 45, 163-167, 1981. Fowler, P. C., Maximum likelihood factor structure of the Family Environment Scale. Journal of Clinical Psychology, 37, 160-164, 1981. Fuhr, R. A., Moos, R. H. and Dishotsky, N. H., The use of family assessment and feedback in ongoing family therapy. American Journal of Family Therapy, 9, 24-36, 1981. Ghiselli, E. E., Campbell, J. P. and Zedeck, S., Analysis of linear structural relationships by maximum likelihood and least squares methods. Chicago: National Educational Resources, 1985. Jøreskog, K. G. and Sorbom, D., LISREL. Analysis of linear structural relationships by the method of maximum likelihood. Chicago: National Educational Resources, 1985. Long, J. S., Confirmatory factor analysis. Beverly Hills CA: Sage Publications, 1983. Maynard, P., Maynard, N., McCubbin, H. I. and Shao, D., Family life and the police profession: Coping patterns wives employ in managing job stress and the family environment. Family Relations, 29, 495-501, 1980.

______________________________________________________________________________________________________________

17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.

6

Moos, R. H., The social climate scales: An overview. Palo Alto CA: Consulting Psychologists Press, 1974. Moos, R. H. and Billings, A. G., Children of alcoholics during the recovery process: Alcoholic and matched control families. Addictive Behaviors, 7, 155-163, 1982. Moos, R. H., Bromet, E., Tsu, V. and Moos, B., Family characteristics and the outcome of treatment for alcoholism. Journal of Studies on Alcohol, 40, 78-88, 1979. Moos, R. H., Finney, J. W. and Chan, D. A., The process of recovery from alcoholism. Journal of Studies on Alcohol, 42, 383-402, 1981. Moos, R. H., Finney, J. W. and Gamble, W., The process of recovery from alcoholism: II. Comparing spouses of alcoholic patients and matched community controls. Journal of Studies on Alcohol, 9, 888-909, 1982. Moos, R. H. and Moos, B. S., A typology of family social environments. Family Process, 15, 357-371, 1976. Moos, R. H. and Moos, B. S., Family Environment Scale manual. Palo Alto CA: Consulting Psychologists Press, 1981. Moos, R. H. and Moos, B. S., The process of recovery from alcohol: III. Comparing functioning in families of alcoholics and matched controls. Journal of Studies on Alcohol, 45, 111-118, 1984. Oliveri, M. E. and Reiss, D., Family concepts and their measurement: Things are seldom what they seem. Family Process, 23, 33-48, 1984. Rankin, R., Methodology in environmental research. In R.W. Henderson (ed.), Parent-child interaction: Theory, research, and prospects. New York: Academic Press, 1981. Torgerson, W. S., Theory and methods of scaling. New York: John Wiley & Sons, 1958. Wetzel, J. W., Depression and dependence upon unsustaining environments. Clinical Social Work Journal, 6, 75-89, 1978. Wetzel, J. W. and Redmond, F. C., A person-environment study of depression. Social Service Review, 54, 363-375, 1980.

Measurement issues in family assessment: the case of the Family Environment Scale.

The internal consistency reliabilities of 5 subscales of the Family Environment Scale were re-examined in a study of 311 stressed and 74 control famil...
33KB Sizes 0 Downloads 0 Views