Percep~alandMotorSkills,1991, 72, 759-765.

O Perceptual

and Motor Skills 1991

INTERSCORER RELIABILITY FOR T H E HAND TEST ADMINISTERED TO CHILDREN ', ' DONALD E. CARTER AND JOSEPH J. MORAN Stnte University College at Buffalo Summary.-The utility of the Hand Test as a quick, reliable measure of 100 children's personalities was assessed. The interscorer reliability of the Hand Test was estimated bv both intraclass correlations and the K a. *~ o acoefficient for 100 children. Following training, satisfactory intraclass correlations were obtained for the Quantitative scores (20 of 22 above .70) and Qualitative scores (12 of 27 above ,701 Kappa coefficients were generally lowet Scorers' memory overload and low response frequency are discussed as possible bases for the low reliabilities of Qualitative scores. Although the Hand Test reliability for Quantitative scores is consistent with those of other projective tests, consideration should be given to the modification of the directions of administration for young children and clarification of scoring rules.

Personality assessment of children continues to present problems for clinicians and school psychologists (Knoff, 1986), since they make judgments about the placement of children who appear to have emotional problems and must have objective information about the level of emotional distress and the potential for acting out of aggressive responses by children. A viable measure of these responses is provided by the Hand Test. Although the interscorer reliability of the Hand Test (Wagner, 1983) has been established for adults concerning quantitative scoring categories and for some scores for children (Wagner, Rasch, & Marsico, 1990), the present study extends the analysis to d quantitative and summary scores for children and includes the qualitative categories that were neglected by previous researchers. The Hand Test has significant applications in the assessment of adult personality (see Wagner, 1983). Although Wagner (1983) states that "it can be used with any individual old enough to verbalize a response" (p. I), limited data are available on the applicability of the test to children. In a study of Hand Test responses of kindergarten and second-grade children, Stoner and Spencer (1984) concluded that normal young children frequently provide responses that would be considered to be indicators of pathology or disturbance in adults. Their findings emphasize the need for more research and normative information (e.g., Moran & Carter, in press) regarding the use of the Hand Test with chddren. The Hand Test is a diagnostic, projective measure of personality. Nine 'This research was funded by a grant from the Research Foundation of the State University of New York. We ratefully acknowledge the assistance and sup ort of Dr. Crayton Buck and the b l t y and chilcfren of the College Learning Laboratory at ~ u i k l oState College. Address correspondence to D. E. Carter, Ed.D., Department of Educational Foundations, State University College at Buffalo, 1300 Elmwood Avenue, Buffalo, New York 14222.

760

D. E. CARTER & J. J. MORAN

cards are printed black ink drawings of hands in various positions and the tenth card is blank. The examinee indicates what the hand might be doing. The responses are taken as an indication of what the individual is likely to do. Of particular interest is that the Hand Test appears to be relatively culture-fair. The Hand Test can be administered, scored, and interpreted in 30 minutes. It is free from third party effects, and although it is a narrow band measure rather than a global personality measure (Panek & Wagner, 1985), it provides separate scores specifically to assess emotional maladjustment and potential for acting out. These two scores are of particular interest in the personality assessment of children. Additional indicators assess such constructs as impulsivity, anxiety, self-esteem, and need for achievement. These scores are potentially helpful in planning educational and therapeutic programs for individual children. The Hand Test scoring system has two parts, Quantitative and Qualitative. Fifteen Quantitative scores are based on a category system (both mutually exclusive and exhaustive). Six summary scores, two ratios, two latency scores, and the number of scorable responses round out the Quantitative measures. Seventeen major Qualitative categories and 11 additional categories are recommended in the manual (Wagner, 1983). All Qualitative categories are signs (neither mutually exclusive nor exhaustive). In addition to the 28 Qualitative categories, Wagner also recommends the use of the Quantitative scores as secondary Qualitative scores if they are present in the response but not primary. This paper addresses the consistency of scoring and estimates of interscorer reliability. Both Kappa (K) and Intraclass correlation coefficients (ICC) were used to evaluate the scores. K is generally accepted as a reliability estimate that corrects for chance agreement, but has the drawback of giving credit, for fixed observed proportions of agreement, to raters who produce different marginal proportions (Zwick, 1988). Frick and Semmel (1978) recommend ICC as the best index of interscorer reliability. Cicchetti (1980) stressed the importance of utilizing both indexes because the same K value can "reflect quite different levels of observed agreement" (p. 301).

Sample The population consists of a l l the children enrolled as full-time students in the College Learning Laboratory at Buffalo State College. There were a total of 679 students enrolled in Grades Kindergarten through 8, over 100 of whom were receiving some type of special services, such as resource room support or in a self-contained classroom. Most of the special service students were either learning disabled or retarded. The school is somewhat unique in that, while it is a college laboratory school, it is also part of the Buffalo Public Schools. It seems to provide a reasonable represen-

761

HAND TEST RELIABILITY

tation of the population of urban children. A random sample of 10 children at each grade (1 through 8) and 20 children from self-contained special education classes were selected for interscorer reliability estimates (N= 100).

Scorers Two white female graduate students administered and scored the Hand Test following 20 hours of training. They lacked knowledge of the interpretations of the test. A licensed psychologist (one author) and an educational and school psychologist (the other author) provided the rescoring for the reliability data. Procedure The subjects were taken individually from their classrooms and administered the Hand Test according to the directions found in the 1983 edition of the manual. The responses for each child were scored after the test was completed.

RESULTS The ICCs for the summary scores (see Table 1) were all high, as indicated, Interpersonal .33, Environmental .93, Maladjustive .85, Withdrawal .97, TABLE 1 RELIABILITY OF QUANTITAT~VE SCORES (N = 100) HANDTEST: Score Affection (AFF) Dependence (DEP) Communication (COM) Exhibition (EXH) Direction (DIR) Aggression (AGG) Interpersonal (INT) Acquisition (ACQ) Active (ACT) Passive (PAS) Environmental (ENV) Tension (TEN) Crippled (CRIP) Fear (FEAR) Maladjustive (MAL) Description (DES) Bizarre (BIZ) Failure (FAIL) Withdrawal (WITH) Responses (R) Acting Out Score (AOS) Pathological (PATH) 'Wendler and Zachary (1983).

Children Kappa .66 .39 .71 .59 .53 .64 .71 .54 .67 .68 .69 .76 .61 .50 .69 .67 .50 .79 .75 .60

Adults*

ICC .96 .75 .93 .83

.85 .90 .93 .69 .90 .76 .93 .89 .74 .74 .85 .94 .49 .97 .97 .93 .92 .97

Kappa .92 .91 .45 1.00 .78 .87 .92 .53 .6 1 .63 .75 .87 .84 .50 .83 .55 .66 1.OO .71

762

D. E. CARTER & J .J. MORAN

Pathological .97, and Acting Out Score .92. The determination of the number of Responses was also quite high: .93. This determination is variable because interpretations must be made of the response to probes. The intraclass reliabilities of the individual Quantitative scores were more variable. Nine of the 15 were between .82 and .97. Four were between .74 and .76, while for Acquisition the value was .69 and for Bizarre .49. All ICCs were statistically significant ( p < .01), as the main effect for each score reached statistical significant (Lahey, Downey, & Sad, 1983). The K coefficients were uniformly lower than the ICCs except for Bizarre (K .50; intraclass r .49). They ranged from .39 for Dependence to .79 for Failure, with the summary scores varying around .70: Interpersonal .71, Environmental .69, Maladjustive .69, and Withdrawal .75. The Qualitative scores showed even more extreme coefficients; see Table 2. Some scoring categories were not observed (i.e., frequency = 0) in the reliability sample: Drugs, Emotion, Evasive, Evidence, and Symbolism. Two Quantitative scores (Aggression and Crippled) are included as secondary Qualitative scores. Three Qualitative scores yielded K coefficients of 0: Ambivalent, Sensual, and Sexual. Of the 27 K coefficients, 17 values were above 3 0 . TABLE 2 H A N DTEST:RELIABILITYOF QUNITATNESCORES( N = 100) Score Ambivalent (AMB) Automatic Phrase (AUT) Ball (BALL) Concreteness (CONC) Cylindrical (CYL) Denial (DEN) Drugs (DRUG) Emotion (EMO) Evasive (EVAS) Evidence (EVI) Feminine Explicit (FEM) Food (FOOD) Gross (GRO) Hiding (HID) Homosexuality (HOMO) Immature (IM) Impotent (IMP) Inanimate (INA) Masculine Explicit (MASC) Money (MON) Movement (MOV) Oral (ORA)

Kappa

ICC

f

.OO .43 .91 .67 .53 .54

.OO .48 .97 .66 .64 .71

1 25 19 6 11 8 0 0 0 0 9 19 3 5 5 46 7 3 4 7 22 10

.36 .54 .50 .89 .57 .62 .83 .50 .40 1.00 .66 .18 (continued on next page)

.66 .89 .49 .88 .85 .87 .93 .49 .39 1.00 .86 .22

763

HAND TEST RELIABILITY TABLE 2 (CONT'D) HAND TEST:R E U A B I L ~OF QUALITATIVE SCORES(N = 100) Score

Kappa

ICC

f

Original (0) Perplexity (PER) Personification (PERS) Repetition (RF'T) Self (SELF) Sensual (SEN) Sexual (SEX) Symbolism (SYM) Aggression (Qualitative) Crippled (Qualitative)

The ICCs were again hlgher in all but six cases. Twelve ICCs were above .80, 5 between .64 and .71, 6 between .39 and .53, and 5 below .23 (Ambivalent 0, Oral 2 2 , Self .09, Sensual .13, and Sexual 0). All ICCs were statistically significant ( p < .O1) except the five lowest coefficients (Lehey, Downey, & Sad, 1983). As can be seen by the frequencies on Table 2, many Qualitative responses are relatively rare. All five of the lowest ICCs have frequencies of 11 or less.

CONCLUSIONS These results compare favorably to the interscorer reliabilities for children's Pathology scores of .79 to .91 (Wagner, Rasch, & Marsico, 1990) and interscorer reliabhties (K) reported for adults by Wendler and Zachary (1983), as cited in the Hand Test manual (Wagner, 1983); see Table 1. Fourteen of the 19 K coefficients reported in the manual were lower than the ICCs in this study. However, 12 of the 19 K coefficients reported by Wendler and Zachary exceeded the K reliabilities found in this study. Some Ks are similar to the ICCs given the mathematical relationship between the two (Fleiss, 1975). The K coefficients are generally lower than the ICCs because K includes a correction for chance. Kappa seems unnecessarily stringent in attributing so much observer agreement to chance (Walter, 1984). If certain categories predominate (which is the case for our data), good agreement can result in a low K. I t appears that the interscorer reliability of the Quantitative scores of the Hand Test administered to children is consistent with that attained by other projective instruments (Holtzman, Thorpe, Swartz, & Herron, 1961). The low reliability of the Bizarre scale was primarily due to the reluctance of the scorers to assign responses to the Bizarre category when the child provided a description of what the hand "looks like." T h s seems to be a phenomenon particular to young children (Grade 3 and younger). This finding supports the conclusion of Wagner, Rasch, and Marsico (1990) that

764

D. E. CARTER

& J. J. MORAN

scores "associated with psychopathology (BIZ, WITH, PATH) should not be interpreted too rigidly for very young children" (p. 804). Consideration should be given to a change in the administrative directions provided for young children. Although projective techniques generally involve some subjective scoring, it is felt that some clarification of scoring categories would make the Hand Test a more reliable measure. With regard to the Qualitative variables, the 17 scores with reliabilities above .64 are satisfactory. In general, qualitative categories which occur more frequently have satisfactory reliabilities. Eight of the 11 ICCs for variables with frequencies greater than ten were above .64. The sheer number (43; 30 identified by the manual and 13 possible Quantitative variables as secondary Qualitative scores) must be seen as a problem for interscorer reliabhty. It is assumed that much of the low reliability of the Qualitative scores is attributable to this fact and memorv overload for the scorer. The memory overload hypothesis is supported by the fact that much of the disagreement between the trained examiners and the authors was that of ohssion of some of the Qualitative scores. Despite the problems, the clinical significance of the Qualitative scores may require their continued use. It should be noted that the number of Qualitative scores is smaller on the Hand Test scoring sheet and has been reduced in the manual supplement for children and adolescents (Wagner, 1991). Thus, this problem has been at least partially resolved. The incidence of the Qualitative scores is noteworthy. The frequency of Immature (IM) responses seems to establish this category as a score which occurs more in children than adults. The low frequency of Sex responses supports Wagner's (1983) finding of low incidence and his conclusion that two or more Sex responses should be considered pathological. This low frequency also supports Rasch and Wagner's (1989) assertion that Sex responses are rare in children. REFERENCES CICCHE~?, D. V. (1980) Reliability for reviews of the American Psychologist: a biostatistical assessment of the data. American Psychologist, 35, 300-303. FRICK,T., & SFJAMEL, M. I. (1978) Observer agreement and reliabilities of classroom observational measures. Review of Educational Research, 48, 157-184. HOLTZMAN, W. H., THORPE,J. S., SWARTZ, J. D., & HERRON,E. W. (1961) Inkblotperception and personality. Austin, T X : Univer. of Texas Press. KNOFF,H. M. (1983) Personali assessment in the schools: issues and procedures for school psychologists. School ~sycxologyReuiew, 12, 391-398. LAHEY,M. A., DOWNEY,R. G., & SAAL,F. E. (1983) Intradass correlations: there's more there than meets the eye. Psychological Bulletin, 93, 586-595. MORAN, J. J., & CAXTER, D. E. (in press) Comparisons among children's responses to the Hand Test by grade, race, sex and social class. Journal of Clinical Psychology. PANEK,P. E., & WAGNER, E. E. (1985) The use of the Hand Test with older adults. Springfield, IL: Thompson. RASCH,M. A,, & WAGNER, E. E. (1989) Initial psychological effects of sexual abuse on female children as reflected in the Hand Test. Journal of Personality Assessment, 53, 761-769.

HAND TEST RELIABILITY

765

STONER,S . B., & SPENCER, B. (1984) Age and sex differences on the Hand Test with children. Journal of Clinical Psychology, 40, 598-602. WAGNER, E. E. (1983) The Hand Test: Revised 1983. Los Angeles, CA: Western Psychological Services. WAGNER, E. E. (1991) Hand Test manual supplement: interpreting child and adokscent responses. Los Angeles, CA: Western Psychological Services. E. E., RASCH,M. A,, & ~ S I C OD., S. (1990) Hand Test characteristics of severely WAGNER, behavior handicapped children. Journal of Personalio Assessment, 54, 802-806. WALTER,S. D. (1984) Measuring the reliability of clinical data: the case for using three observers. Revue E idemologie et Sante Publique, 206-211. [Cited by Hanley, J. A. (1987) Standard error o f t h e Kappa statistic. Psycholopical Bulletin, 102, 315-321.1 WENDLER, C. L. W., & ZACHARY, R. A. (1983) Reliab~l~ty of scorin categories on a projective test. Paper presented at the meeting of the Western ~sycf~ological Association, San Franc~sco [Cited in the manual by Wagner (1983)l ZWICK,R. (1988) Another look at interrater agreement. Psychological Bulletin, 108, 374-378.

Accepted April 4, 1991.

Interscorer reliability for the Hand Test administered to children.

The utility of the Hand Test as a quick, reliable measure of 100 children's personalities was assessed. The interscorer reliability of the Hand Test w...
239KB Sizes 0 Downloads 0 Views