This article was downloaded by: [Central Michigan University] On: 05 January 2015, At: 03:49 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Multivariate Behavioral Research Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hmbr20

Agreement of Personality Profiles Across Observers Robert R. McCrae Published online: 10 Jun 2010.

To cite this article: Robert R. McCrae (1993) Agreement of Personality Profiles Across Observers, Multivariate Behavioral Research, 28:1, 25-40, DOI: 10.1207/ s15327906mbr2801_2 To link to this article: http://dx.doi.org/10.1207/s15327906mbr2801_2

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

Downloaded by [Central Michigan University] at 03:49 05 January 2015

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Multivariate Behavioral Research, 28 (I), 25-40

Agreement of Personality Profiles Across Observers Robert R. McCrae

Downloaded by [Central Michigan University] at 03:49 05 January 2015

Gerontology Research Center National Institute on Aging, NIH

To assess cross-observer agreement on personality profiles, an Index of Profile Agreement and an associated coefficient, rpa,are proposed which take into account both the difference between the ratings and the extremeness of their mean. Using data from the Revised NEO Personality Inventory (NEO-PI-R), this coefficient is shown to be superior to Cattell's (1949) rp in identifying matched versus mismatched pairs of peer ratingslself-reports (N = 250) and spouse ratingslself-reports (N = 68). Suggestions are made for the interpretation and use of the two measures of profile agreement for group comparisons and for the interpretation of individual cases. Limitations of the coefficients are also discussed.

Both self-reports and observer ratings yield valuable information on personality, and there is now considerable evidence documenting crossobserver correlations in the .4 to .6 range (Funder & Colvin, in press; McCrae & Costa, 1989a). At the level of the individual case, both personality psychologists (Pervin, in press) and clinicians (MutCn, 1991) have begun to interpret profiles that combine self-reports and ratings. A reasonable first step in such an interpretation would seem to be an assessment of the degree of agreement between the two sources of information: Marked disagreement might suggest invalidity of one or both sets of responses, or might indicate significant misperceptions of self or other that should themselves be a focus of interpretation. There appears to be no generally recognized index of profile agreement across observers, although there are several indices intended to assess the similarity of personality profiles of different individuals (e.g., Butler, 1983; Paunonen, 1984; Skinner, 1978). As long ago as 1949, Cattell (1949) recognized that a simple Pearson correlation of profile elements, although sensitive to the shape of the profile, could give misleading results: The same shape might be seen in two individuals whose actual attributeswere diametrically opposed, as the first panel of Figure 1 illustrates. Cattell suggested that profiles were similar to the extent that the individuals received similar scores, and Thanks are due to S. V. Paunonen, R. C. Gardner, and A. B. Zonderman for comments on early drafts of this manuscript. Address correspondence to Robert R. McCrae, Personality, Stress and Coping Section, Gerontology Research Center, 4940 Eastern Avenue, Baltimore, MD 21224.

MULTIVARIATE BEHAVIORAL RESEARCH

25

R. McCrae

proposed a coefficient of pattern similarity, rp, based on the sum of squared differences (8)between standardized profile elements x and y:

Downloaded by [Central Michigan University] at 03:49 05 January 2015

x2

where k "is the median for on a sample size n" (p. 292), a value slightly less than n itself. An empirical comparison of 13 profile similarity measures (Carroll & Field, 1974) showed that rpwas superior or equal to all other measures, so it would appear to be a useful starting point for the development of an index of cross-observer profile agreement. However, assessments of similarity in terms of rp do not always square with clinical judgments of similarity, and a new index is therefore proposed here. For purposes of exposition it is useful to have concrete examples of personality profiles. Data to be presented below utilize the Revised NEO Personality Inventory (NEO-PI-R; Costa, McCrae, & Dye, 199 1 ), which measures the five recurrent personality factors found in natural languages and in personality questionnaires (Digman, 1990). These factors are Neuroticism (N), Extraversion (E), Openness to Experience ( 0 ) , Agreeableness (A), and Conscientiousness (C). Figure 1 presents three hypothetical cases with selfreports (solid lines) and spouse ratings (dotted lines) on these five factors. Consider Case 2. Although the two profiles are not identical, it seems apparent that they are attempts to describe the same individual. Both suggest the individual is high on Nand 0 , and low on E, A, and C. Although the ratings on N differ by 1.5 SDs, the difference in percentiles is rather small: The individual considers himself in the top 1 percent of all cases, whereas his wife puts him in the top 16 percent. In either case, he is clearly prone to psychological distress. Note also in Case 2 the perfect agreement on a z-score of 2.0 for Openness; it is very unlikely that there would be agreement on such an extreme score by chance. By contrast, it is much less clear that there is agreement in Case 3. Whereas the individual describes herself as neurotic but conscientious, her husband sees her as well-adjusted and low in conscientiousness. There is closer agreement on E, 0,and A, but all these values are near average, and thus not remarkable. The perfect agreement on a z-score of zero for Openness may reflect accurate perceptions of average standing, or may simply represent neutral, "don't know" responses. These two cases were constructed to illustrate the insensitivity of r,, to score extremeness. Case 3 is derived from Case 2 by maintaining the same 26

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

Carl

'.

Car 3:

:: .m-.M =

5

=

1.5

P

'.

/

\

-06

4

\.

d

-1.5

4 -26 N

E

0 NEO-PI Factors

A

C

Figure 1 Hypothetical profiles for three cases on five personality factors. Self-reports are indicated with solid lines; spouse ratings with dotted lines. For Case 1 , rp= -.68, rpo= -.95; for Case 2, rp = .20, rpo= .89; for Case 3 , rp = .20, rpa= -.06.

MULTIVARIATE BEHAVIORAL RESEARCH

27

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

z-score distance between self-reports and ratings, but centering them around the neutral point. Because the distances are identical, rp is identical for each. Although it was designed to take into account differences in elevation of profile elements, rpconsiders only relative elevation, not absolute elevation. A more sensitive index would discount disagreementamong extreme scores, where the two ratings suggest a smaller difference in percentile standing than they would in the middle range. Similarly, a better index would give more credit for agreement on extreme values than for agreement on neutral values. Because score elevation is considered in absolute terms, the indices to be developed make sense only in terms of some normative distribution of scores, and, like rp,assume standard scores from normal distributions of traits. The indices are not invariant over changes in scale metric, although they are invariant over scale reflection. Indices of Profile Agreement

These considerations suggest that measures of profile agreement should take into account both the differences between ratings on profile elements and the extremeness of the traits. Many formulas might be considered; one is developed here. Following Cattel(1949), the square of the difference between standardized ratings (8)can be used to measure disagreement; it is less sensitive to small and probably less meaningful differences than the absolute value Idl. By analogy, the extremeness of the trait might be estimated by the square of the mean of the two ratings for each profile element (W). (The extremeness of the trait should not be confused with the elevation of the profile as defined by Cronbach & Gleser, 1953.) For agreement on a single trait, the index might be based on some weighted combination of W and 8,increasing with the extremeness of the mean rating and decreasing with the difference between ratings. The most obvious index of profile agreement might therefore be based on W - 8,but this expression gives too much weight to 8. Because it is the squared mean of two standard variables, W itself has an expected variance of 0.5 under the null assumption that the two ratings are independent;by contrast, dZ has an expected variance of 8.0. W - 8 has an expected variance of 8.5; thus W accounts for only 1/17 of the total variance in the expression, probably too little to make an adequate correction for extremeness. The alternative of 4M- 8 , which equalizes the weights of the two components, is too extreme a correction for near-average traits. Indeed, when either rating is 0, the expression itself is 0, regardless of the other rating. Thus, perfect agreement on average standing would not be credited, and dramatic disagreementbetween average and extreme standing would not be penalized. 28

MULTIVARIATE BEHAVIORAL RESEARCH

R. McCrae

Downloaded by [Central Michigan University] at 03:49 05 January 2015

An intermediate expression, 2ML - 8, reduces both these problems.' This expression has an expected mean of -1 and variance of 10. When a profile of k elements is considered, the sum will have a mean of -k and a variance of 10k, and a standardized Index of Profile Agreement (Ipa)can be written as

an alternative computational formula based on the two sets of ratings, x and y. As Cattell (1949)noted, d2 is distributed as 2~:; similarly, ML is distributed as EX:. AS k increases, Ipawill tend toward a normal distribution, but where k is small (as in the case of 5 variables), it will be slightly negatively skewed. For some purposes it will be desirable to express profile agreement in terms of a coefficient that resembles the Pearson correlation coefficient. One way to do this is by developing a coefficient of profile agreement, rpa,which will have approximately the same statistical significance as a Pearson r of the same magnitude based on k cases. Because Ipaapproximates a normal distribution as kincreases, its significance can be determined directly from a table of values for the z distribution. The significance of a Pearson r for a large number of cases is determined by

(cf. Edwards, 1954). By substituting Ipafor z, rpafor r, and kfor n, and solving for rpa,this yields the formula

' An empirical approach to determining the best weighting of M Zand dZ would be to use the sums of both in a multiple regression predicting the two criterion variables described in Table 2. In this analysis, the ratios of the regression weights for CMZand -Z& in predicting peerlself agreement on NEO-PI-R factors were 1.63 and 1.83, roughly consistent with the choice of 2 as a weight for DlZ. MULTIVARIATE BEHAVIORAL RESEARCH

29

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

Where k is small, however, this interpretation underestimates the significance of positive values of rpO.TOexamine the distribution of rpafor k = 5,lO random variables were generated for 3000 cases using standardized scores from the NORMAL function of SAS (SAS Institute, 1990), and rpOstatistics were calculated across five pairs. Some characteristics of this distribution are reportedin the first column of Table 1. The 95th percentile was .62 and the 99th percentile was .75; these values could be used as an approximate guide to the statistical significance of rpavalues when k = 5. Note that rpufor Case 2 in Figure 1 is significant at p < .01, whereas rpafor Case 3 does not approach significance.

Table 1 Distribution of r for Random Agreement and for PeerISelf and SvouseISelf Agreement on NEO-PI-R Factors and Domain Scales Empirical Agreement Random Agreement

M SD Lowest Highest Percentiles 1 5 7 31 50 69 93 95 99

Spouse/ Self Factors

Factors

Domains

0.5 1 0.26 -0.66 0.86

0.4 1 0.32 -0.88 0.93

0.37 0.36 -0.92 0.96

Note. Random agreement shows the distribution of rpo with five profile elements for uncorrelated (rs = 0) and perfectly correlated (rs = 1) pairs of random variables. Empirical agreement is based on 68 spouselself pairs and 250 peerlself pairs.

30

MULTIVARIATE BEHAVIORAL RESEARCH

R. McCrae

An alternative approach to the development of a coefficient of profile agreement that more closely resembles Cattell's (1949) rDwould be

"pa

-

k+2I;MZ+CdL

Downloaded by [Central Michigan University] at 03:49 05 January 2015

However, when dL = 0, r' = 1, regardless of the extremeness of the trait. This .pa. is inconsistent with the initial rationale for the index. Empirical Evaluation of rpa

One of the difficulties faced in the evaluation of measures of profile similarity across individuals is the lack of clear criteria against which to test them. These measures are often intended to locate clusters of individuals with similar profiles that might be considered personality types; independently established membership in different types would be the criterion of choice for validating profile similarity measures. Yet research over the past half-century has called into question the existence of such types (McCrae & Costa, 1989b; Mendelsohn, Weiss, & Feimer, 1982), at least among psychiatrically normal individuals. By contrast, there is a clear empirical criterion for evaluating the validity of indices of profile agreement across raters: their familiarity with the target. In 1966,Norman and Goldberg (1966) demonstratedthat increasing familiarity with the target leads to increasing correlations between ratings and self-reports, and that there is essentially zero agreement when the targets are unknown to the raters. In the present study, I report data comparing self-reports with spouse ratings and with single peer ratings. These data can be contrasted with constructed data sets in which targets and raters are randomly paired to test the ability of rp and rpato distinguish real matches from random pairings. The data to be examined are taken from studies of the NEO-PI-R (Costa & McCrae, 1992a; Costa, McCrae, & Dye, 1991). This instrument consists of 240 items that measure six specific traits, or facets, for each of the five broad domains of N, E, 0 , A, and C. Domain scores, obtained by summing the facet scores, can be considered estimates of the underlying dimensions of the fivefactor model (Digman, 1990). Factor scores can be calculates from the normative factor scoring coefficients given in the manual (Costa & McCrae, 1992a). Agreement might be examined at the level of individual facet scales, global domain scales, or factors. There are two reasons for choosing the factors. First, the development of rpaassumes that the variables in the profile are independent. MULTIVARIATE BEHAVIORAL RESEARCH

31

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

The facet scales systematically violate this assumption, because they are intended to represent correlated aspects of one or another of the domains; the domain scales are also non-trivially correlated (Costa, McCrae, & Dye, 1991 ) . Factor scores, based on principal components, are orthogonal. Second, although individualfacet scales do contain valid specificvariancenot attributable to the five common factors (McCrae & Costa, 1992), specific variance accounts for only a small portion of the correlation between self-reports and ratings on each facet scale. The five factors themselves account for the bulk of common variance within methods, and for the bulk of redundant variance across methods. The data are from studies of members of the Baltimore Longitudinal Study of Aging (BLSA; Shock et al., 1984) and their spouses and peers. Participants are generally white, healthy, well-educated, community-dwelling volunteers who range in age from about 20 to 90. Data from spouses were collected on two occasions. The NEO-PI was administeredto spouses in 1986(Costa & McCrae, 1988);a set of supplemental items from which the new facet scales of the NEO-PI-R are scored were administered in 1991. Because the stability of rated personality is very high, the time interval has little impact on results (Costa & McCrae, 1992b). Complete data were obtained from 37 men, aged 44 to 91 in 1991, and from 31 women, aged 38 to 91. Correlations between the spouse ratings and selfreports on the five factors were .61, .58, .67, .57, and .37 for N, E, 0,A, and C, respectively, all N = 68, p < .001. Details on the sample of targets in the peer ratings study are given elsewhere (Costa & McCrae, 1992b); in general, the peer raters were about the same age as the targets, and had typically known them for many years in a variety of settings. The correlations between single peer ratings and selfreports on the five factors were .33, .43, .55, .5 1, and .46 for N, E, 0 , A, and C, respectively, all N = 250, p < .001.

Analyses and Results The distribution of rpain the two samples is described in the third and fourth columns of Table 1. Clearly, rpais much higher in both these samples than would be expected by chance. In comparison to the distribution from the random simulation (column I), 87% of the peer ratings and 96% of the spouse ratings showed better than chance agreement, defined as the median of the random distribution. Further, 28% of the peers and 35% of the spouses show agreement significant atp < .05; 11% of the peers and 16% of the spouses show agreement significant at p < .O1.

32

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

In interpreting rpavalues, it is essential to recall that the observed value is limited by the extremeness of the traits. An individual who is given z-scores of 0 on all five factors by both observers will have an rpaof only .38. For comparison, the second column of Table 1 shows the distribution of scores when five randomly generated sets of scores are paired with themselves- that is, show perfect agreement. Evaluated in comparison to the distribution of unrelated scores shown in the first column, all these values are well above the median, but 20% would not be considered significant at the .05 level, and 50% would not be significant at the .O1 level, despite the perfect match of scores. This peculiarity of the coefficient is not shared with rp. When two profiles show identical scores for all elements, rp equals 1, and in this sense it is a superior index of profile agreement. But perfect matches are not found in comparisons of personality ratings from different observers, and the question of which coefficient is superior in the rangeof agreement commonly found can be addressed empirically. To test this, the matched pairs of peerlself and spouse/self ratings are contrasted with systematically mismatched pairs to create a criterion variable. A value of 1 is assigned to this criterion variable when the peer or spouse rating is matched with the self-report of the individual who was rated; a value of 0 is assigned when the peer or spouse rating is randomly paired with the self-report of another individual. Two random pairings are sampled for each type of rater in order to allow replication of effects. Correlations of rpawith the two criterion variables for spouse/self and peer/ self agreement are given in the top of Table 2 (next page). In all four cases, rpa is larger than rp. Further, multiple regression analyses show that rpais always a significant contributor to the prediction of the matching criterion even after rpis entered, whereas rpis never significant after rpais entered. Thus, it appears that rpacontains all the relevant variance in rpas well as additional information not contained there. Supplementary Analyses of Domain Scales The primary analyses employed factor scores because they have somewhat higher validities than NEO-PI-R (Costa, McCrae, &Dye, 1991) domain scales and are designed to be orthogonal. However, factor scores cannot be obtained from hand-scoring; could domain scale scores be substituted to provide an approximation? Supplementary analyses were conducted on the peer sample, again using matched and mismatched pairs of ratings as the validity criterion. There are at least three ways to obtain values for 8 and M2 when analyzing profiles based on non-orthogonal scales. The raw scores can be treated as if they were orthogonal, yielding indices based on Raw distance; true Euclidean MULTIVARIATE BEHAVIORAL RESEARCH

33

R. McCrae

Table 2 Correlations of Two Criterion Variables with r,, and rPo Criterion Variable

rI'U

-

.54 .50

.60 .54

First Second

.36 .34

.42 .40

Euclidean distance

First Second

.39 .36

.46 .43

Vector distance

First Second

.3 1 .30

.36 .33

NEO-PI-R Factors Spouse/Self Downloaded by [Central Michigan University] at 03:49 05 January 2015

rP

First Second First Second NEO-PI-R Domains Raw distance

Note. Criterion variable is 1 if profiles are matched on target; 0 if they are mismatched. Correlations are based on 136 spouse/self and 500 peerlself pairings; NEO-PI-R Domain analyses are based on peerlself pairs.

distances can be calculated (Overall, 1964);or scores can be summed as vectors to yield Vector distance magnitudes (Butler, 1983). If D is a row vector of profile differences and R is the intercorrelation matrix between profile elements (here based on the normative sample of self-reports), the sum of squared differences is DD' for the Raw indices, DR-ID' for the Euclidean indices, and DRD' for the Vector indices. Corresponding values can also be generated for the sum of the squared means, and rp and rpucan be calculated from these results. Results, given in the bottom of Table 2, were consistent across both comparisons of matched with mismatched pairs: In every case, rpUwas larger than the corresponding rp,and for both indices, Euclidean distances yielded the highest and Vector distances the lowest correlations with the criterion. Raw score rpavalues were intermediate, and all three domain scale-based indices showed somewhat lower correlations with the criteria than did the factor scorebased indices. MULTIVARIATE BEHAVIORAL RESEARCH

R. McCrae

Downloaded by [Central Michigan University] at 03:49 05 January 2015

It appears that estimations of rpubased on standardized NEO-PI-R (Costa, McCrae, & Dye, 1991) domain scales may be useful approximations to the values that would be obtained from an analysis of factor scores. Calculating true Euclidean distances would improve the index a bit, but calculating vector distance magnitudes would not. Although a statistical treatment of the expected values for non-orthogonal profile elements is beyond the scope of this article, normative data provided in Table 1 may be useful in interpreting obtained values for the NEO-PI-R. Interpretation and Use of rpu Use as a Group Statistic The coefficient of profile similarity, rpa, summarizes agreement on a number of orthogonal dimensions. High scores thus indicate greater familiarity with the target, or greater accuracy in personality judgment, and rpamight be considered an index of accuracy in person perception. However, it should be noted that in this index, scores from one observer are compared with scores from another observer, not with the true score. Thus an excellent rater might receive a low rpavalue because the self-reports with which she was compared were themselves faulty. Further, even if both sets of ratings were accurate, a relatively low rpawould be obtained if the true scores were themselves near average for all traits. However, most of these problems are less relevant to group comparisons, where it is reasonable to assume that differences in target accuracy and extremeness will be randomly distributed. We have argued before (Costa & McCrae, 1992b) that spouse ratings are more accurate than single peer ratings, and in the present study higher correlations were found between self-reports and spouse ratings than between self-reports and peer ratings for four of the five factors. Only the difference in N, however, is statistically significant. But a t-test on the mean r values shown in Table 1 is significant at p < .01; by Pa combining informat~onon five different elements, the superiority of spouse ratings can be demonstrated. This design could be extended to test many hypotheses about accuracy in person perception. Consider a design in which a group of target individuals provide self-report data and are then rated by one psychologist and one layperson. Because the same individual is the target for both raters, the influencesof trait extremeness and the accuracy of the self-reportsis controlled, and a comparison of mean rpas would allow a test of the hypothesis that psychologists are better judges of personality. Self-reports need not be used as the standard of accuracy. Two psychologists and two laypersons might be MULTIVARIATE BEHAVIORAL RESEARCH

35

R. McCrae

Downloaded by [Central Michigan University] at 03:49 05 January 2015

asked to rate each of a series of target individuals, and rpas between the psychologists could be compared to rpusbetween lay raters. Finally, Casselden and Hampson (1990) have recently suggested that personality may be perceived differently in different situations or relationships, and that multiple perspectives on the individual could be obtained by gathering data from groups of raters "each representing different aspects of the target's life (e.g., work, home, or hobby)" (p. 360). The hypothesis that individuals show consistently different profiles in different situations could be tested by across rating groups. comparing mean rpawithin rating groups with mean Interpretation of Individual Cases The initial incentive for developing rpuwas to assist in the interpretation of individual cases in which two observers rate personality. As we have seen, however, many factors affect the magnitude of rpa,and although it appears to be somewhat superior to rp, it cannot be considered a very precise estimate of agreement. This does not, however, mean that it is useless. In comparison with a random distribution, it is worth noting that r is PU sensitive to agreement, but not specific to it. That is, most legitimate ratings will produce positive values of rpa, but so will many random pairings. A positive value does not necessarily mean that the ratings are meaningful, but a negative value strongly suggests aproblem. Clinicians who find negative rpUs may wish to regard both sets of ratings with some suspicion. It is probably not appropriate to evaluate individual rpas with regard to statistical significance in the usual sense. A sizable number of cases of perfect agreement would be considered nonsignificant by that criterion, and in any case the conventional levels of significance are not applicable here. Why should we expect that a given pair of ratings will show a higher agreement than 95% of random pairs of ratings? A more appropriate standard is a normative distribution of actual cases of agreement, such as those provided in the last three columns of Table 1 . The convention for describing NEO-PI-R (Costa, McCrae, & Dye, 1991) scores is to regard those in the middle SD (T-scores 45 to 55) as average, those in the next SDs (T-scores 35 to 44 and 56 to 65) as low or high, and those beyond these limits as very low and very high. Similarly, the 7th, 31st, 69th, and 93rd percentiles in Table 1 could be used to demarcate very low, low, average, high, and very high agreement. This interpretation would not work well when all or most of the traits were rated near average by both observers. However, this is relatively rare in normal samples, and probably rarer in clinical samples. When individuals have no

36

MULTIVARIATE BEHAVIORAL RESEARCH

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

particularly salient personality traits, degree of agreement is not particularly important either. Having established the overall agreement of two personality profiles, the interpreter then turns to specific traits. In general, where the two ratings are similar, the best estimate of the true score would be the mean, and this could be adopted as the trait score of the individual. Where the two ratings are very different, however, it suggests that further inquiry is needed to understand the differing perceptions. In effect, an adaptive testing procedure could be used, with values of profile agreement used to determine the need for further information gathering from one or both raters, or from some new source. As a rule of thumb, we might suggest that scores are sufficiently different to warrant special attention when Ipofor that factor is negative. (In the peer sample above, this would call into question about 18% of the factor score pairs; on average, about one factor on each profile.) Application of the quadratic formula to Equation 2 (using x as the standard score for one rating and y as the standard score of the other rating on the same trait) shows that Ipais equal to zero when

Figure 2 (next page) provides a nomograph of the relationship. If the intersection of the scores for self-report and rating (or any two ratings) falls within the shaded area, the scores are considered to be in agreement; otherwise, they are considered in disagreement. Note that there can be factors on which the two raters disagree even when overall agreement is good. Note also that this nomograph can be used to evaluate ratings on any scale, although when applied repeatedly to correlated scales, such as NEO-PI-R (Costa, McCrae, & Dye, 1991) facets, this will lead to correlated errors of interpretation. Application of this rule to Cases 2 and 3 in Figure 1 shows that all of the ratings are in agreement for Case 2, but that ratings for Neuroticism and Conscientiousness for Case 3 are sufficiently different to merit further investigation. Limitations of the Coefficient of Profile Agreement

The coefficient of profile agreement developed here was intended as an alternative to r to be used when comparing ratings from two observers. This P is a rather specialized application, and rpashould not be assumed to be superior to other indices of profile similarity in all contexts. In clinical samples, for example, extreme scores will be common, and rpnwill probably be considerably higher than in the volunteers samples examined here, even though agreement MULTIVARIATE BEHAVIORAL RESEARCH

37

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae

lgry Low

Low

Average

Hlgh

Hlgh

Figure 2 Nomograph for determining agreement between two ratings on any single factor. Ratings are considered to be in agreement when the intersection of the two z-scores falls within the shaded area.

between patients and their spouses may be comparable. IPaand rpaassume that the profile elements are independent, and probably should not be applied to ipsatively scored instruments, such as Block's (1961) California Q-Set. Again, further research would be needed to determine whether rpais a useful measure of agreement of individual profiles with a reference profile (e.g., the average personality profile of individuals with a specific psychiatric diagnosis). In this instance, it might be more appropriate to use the squared scores of the reference profile in place of MZ as the index of trait extremeness. (Indeed, many alternative formulas which take into consideration both difference and extremeness might be investigated for use in difference contexts.) Ipaor rpamight also be used for determining the similarity of personality profiles for different individuals, the original use of rp itself. The present findings, however, suggest considerable caution in that application. Although better than rp, rpowas not particularly specific to profile agreement: Many 38

MULTIVARIATE BEHAVIORAL RESEARCH

R. McCrae

random combinations of profiles resulted in large values of rpa.When applied to the search for individuals with similar personality profiles, it seems likely that many large values would again occur simply by chance. Coupled with the absence of clear independent criteria for evaluating the results, this observation discourages the search for person types or factors based on rpa.

Downloaded by [Central Michigan University] at 03:49 05 January 2015

References Block, J. (1961). The Q-sort method in personality assessment and psychiatric research. Springfield, IL: Charles C Thomas. Butler, J. K., Jr. (1983). A vector model for describing and comparing profiles. Educational and Psychological Measurement, 43, 747-758. Carroll, R. M., & Field, J. (1974). A comparison of the classification accuracy of profile similarity measures. Multivariate Behavioral Research, 9 , 373-380. Casselden, P. A., & Hampson, S. E. (1990). Forming impressions from incongruent traits. Journal of Personality and Social Psychology, 59, 353-362. Cattell, R. B. (1949). rpand other coefficients of pattern similarity. Psychometrics, 14,279298. Costa, P. T., Jr., & McCrae, R. R. (1988). Personality in adulthood: A six-year longitudinal study of self-reports and spouse ratings on the NEO Personality Inventory. Journal oj Personality and Social Psychology, 54, 853-863. Costa, P. T., Jr., & McCrae, R. R. (1992a). Revised NEO Personality Inventory (NEO-PIR ) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources, Inc. Costa, P. T., Jr., & McCrae, R. R. (1992b). Trait psychology comes of age. In T. B. Sonderegger (Ed.), Nebraska Symposium on Motivation: Psychology and Aging (pp. 169-204). Lincoln, NE: University of Nebraska Press. Costa, P. T., Jr., McCrae, R. R., & Dye, D. A. (1991). Facet scales for Agreeableness and Conscientiousness: A revision of the NEO Personality Inventory. Personality and Individual Differences, 12, 887-898. Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarities between profiles. Psychological Bulletin, 50, 456-473. Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41,417-440. Edwards, A. L. (1954). Statistical methods for the behavioral sciences. New York: Holt, Rinehart, & Winston. Funder, D. C., & Colvin, C. R. (in press). Congruence of self and others' judgments of personality. In S. R. Briggs, R. Hogan, & W. H. Jones (Eds.), Handbook ofpersonality psychology. New York: Academic Press. McCrae, R. R., & Costa, P. T., Jr. (1989a). Different points of view: Self-reports and ratings in the assessment of personality. In J. P. Forgas & M. J. Innes (Eds.), Recent advances in social psychology: An international perspective (pp. 429-439). Amsterdam: Elsevier Science Publishers. McCrae, R. R., & Costa, P. T., Jr. (1989b). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57, 17-40. McCrae, R. R., & Costa, P. T., Jr. (1992). Discriminant validity of NEO-PI-R facet scales. Educational and Psychological Measurement, 52,229-237.

MULTIVARIATE BEHAVIORAL RESEARCH

39

Downloaded by [Central Michigan University] at 03:49 05 January 2015

R. McCrae Mendelsohn, G. A., Weiss, D. S., & Feimer, N. R. (1982). Conceptual and empirical analysis of the typological implications of patterns of socialization and femininity. Journal of Personality and Social Psychology, 42, 1157- 1170. MutCn, E. (1991). Self-reports, spouse ratings, and psychophysiological assessment in a behavioral medicine program: An application of the five-factor model. Journal of Personality Assessment, 57,449-464. Norman, W. T., & Goldberg, L. R. (1966). Raters, ratees, and randomness in personality structure. Journal of Personality and Social Psychology, 4 , 681-691. Overall, J. (1964). Note on multivariate methods for profile analysis. Psychological Bulletin, 61, 195-198. Paunonen, S. V. (1984). A note on Cohen's Profile Similarity Coefficient r ( . Journal o f Classification, 1 , 125- 13 1. Pervin, L. A. (in press). Personality: Theory and research (6th ed.). New York: Wiley. SAS Institute, Inc. (1990). SAS Language: Reference, Version 6 . Cary, NC: Author. Shock, N. W., Greulich, R. C., Andres, R., Arenberg, D., Costa, P. T., Jr., Lakatta, E. G., & Tobin, J. D. (1984). Normal human aging: The Baltimore Longitudinal Study ofAging (NIH Publication No. 84-2450). Bethesda, MD: National Institutes of Health. Skinner, H. A. (1978). Differentiating the contribution of elevation, scatter and shape in profile similarity. Educational and Psychological Measurement, 38, 297-308.

Accepted July, 1992

MULTIVARIATE BEHAVIORAL RESEARCH

Agreement of Personality Profiles Across Observers.

To assess cross-observer agreement on personality profiles, an Index of Profile Agreement and an associated coefficient, rpa, are proposed which take ...
871KB Sizes 3 Downloads 10 Views