Journal of Affective Elsevier

Disorders, 20 (1990) 143-149

143

JAD 00749

Quantifying

depressive symptomatology: inter-rater and inter-item correlations

A. Km-ner, B.M. Nielsen, F. Eschen, S. MDller-Madsen, A. Stender, H. Aggernzes, M. Kastrup and J.K. Larsen Psychiatric Department,

Frederiksborg

General Hospital, DK-3400

Hillered,

reliability

E.M. Christensen,

Denmark

(Received 2 March 1990) (Revision received 30 May 1990) (Accepted 14 June 1990)

Summary In this study an evaluation of the inter-rater reliability of the Hamilton Depression Rating Scale, the Melancholia Scale and the Montgomery-Asberg Depression Rating Scale has been carried out. Furthermore, the inter-item correlations of these scales have been investigated in relation to diagnostic ratings. It was found that the quantitative scales had satisfactory inter-rater relationship. Only low agreement was found between the quantitative and diagnostic scales, as expected, while there was high agreement between the quantitative scales. The relevance of introducing new scales and the definition of major depression are discussed.

Key words: Depressive

disorder;

Quantification;

Reliability;

Introduction Although the Hamilton Depression Rating Scale (HDS, Hamilton, 1960) is the most frequently used rating scale for quantifying depressive states, it has been criticized for insensitivity to change during treatment (Montgomery and Asberg, 1979) and for low homogeneity and low transferability between different groups of patients (Beth, 1981). Furthermore, a recent study

Address for correspondence: partment, Frederiksborg General Denmark.

0165-0327/90/$03.50

A. Ksmer, Psychiatric DeHospital, DK-3400 Hillererd,

0 1990 Elsevier Science Publishers

Validity

by Zitman et al. (1989) has shown that apparently there is no agreement in the scientific literature of the definition of the individual items of the scale. Hamilton published two versions (1960, 1967) of which the later defined the items of agitation from 0 to 4, while the 1960 versions ranged from 0 to 2. The most frequently used version (Zitman et al., 1989) is the American one (Guy, 1976) which includes more than the original 17 items and which according to Hamilton (1986) is a checklist rather than a rating scale. In his original version Hamilton defined some of the items (e.g., lack of interest and work) specifically for males and females. However, among the revised versions of his scale Hamilton

B.V. (Biomedical

Division)

144

has accepted the Scandinavian version (Beth et al., 1986), which has similar item definitions for males and females. Concerning the number of items Hamilton (1986) recommended the 17 items, but when used as a scale exclusively for measuring severity of depressive state only six of these items have been recommended (Beth et al., 1975; Maier and Phillip, 1985; Maier et al., 1988). Of the items that have been found to be less valid in measuring severity are early awakening and weight loss. However, these items have diagnostic validity when subdividing depressive illness into endogenous and reactive depression (Beth et al., 1988). A number around six might be too low for quantifying depressive states and two recently developed scales have around 10 items (Montgomery and Asberg, 1979; Beth and Rafaelsen, 1980). While Beth and Rafaelsen developed their Melancholia Scale (MES) on the basis of the six Hamilton items, Montgomery and Asberg (1976) developed their MADRS from the universe of items covered by the Comprehensive Psychopathological Rating Scale (Asberg, 1978). There exist very few studies that have analyzed the inter-rater reliability of the HDS, MES and the MADRS (Maier et al., 1988). This study was undertaken to analyze the inter-rater reliability of these scales when applied to the same group of patients in joint rating sessions. The inter-item correlation of these scales has further been investigated in relationship to diagnostic ratings. Material and methods Patients The patients were all rated only once while they were hospitalized suffering from major depressive disorder or dysthymic disorder according to DSMIII. Thirty-one females and nine males participated (29 major depressive and 11 dysthymic disorder). Their mean age was 60.0 years with a range of 26-89 years. Patients with organic brain disease or schizophrenia were excluded. Rating scales Quantitative scales (1) The Hamilton Depression Scale (HDS-17) was used in the 1986 version (Beth et al., 1986)

with 17 items each operationally defined on either a 3-point scale or a 5-point scale. We calculated the six-item subscore (HDSS-6) shown by Beth et al. (1981) to be a more correct indicator of severity of depression than the total score of HDS-17. The six items are marked l-6 in Table 1. (2) The Melancholia Scale (MES, Beth and Rafaelsen, 1980) with 11 items each of which is operationally defined on a 5-point scale. (3) The Montgomery-Asberg Depression Rating Scale (MADRS, Montgomery and .&berg, 1979) consisting of 10 items on a 7-point scale with a definition of points 0, 2, 4 and 6 while points 1, 3 and 5 have no definitions. (4) MADRS index. The MADRS items are contained in the HDS/MES universe. Table 1 illustrates the relationship of the items of the quantitative scales including the method of deriving the MADRS index scores. These scores are derived by multiplying the corresponding HDS/ MES scores by a factor 1.5 when the HDS/MES item is scored on a 5-point scale and by a factor 3 when the HDS/MES item is scored on a 3-point scale, thus reaching a maximum score of 6 on each MADRS index item. However, MADRS has two items for depressed mood, reported and observed, therefore the corresponding item on HDS/MES is multiplied by 3 reaching a maximum score of 12 on these items on the MADRS index. (5) The Clinical Global Impression Scale (CGI, Guy, 1976) with a 7-point scale (1, not depressive to 7, among the most extremely depressive patients). (6) The Visual Analogue Scale (VAS, Beth et al., 1986) where the severity of depression is judged on a lo-cm scale labeled ‘No depression’ at the left hand end and ‘Among the most severely depressed patients’ at the right.

Qualitative scales (1) The Newcastle Scale II (N-II, Gurney, 1971). The scale consists of three positively weighted and seven negatively weighted items each with a 3point scale. The scale has been criticized for containing two subscales which by different weights are transferred to only one bimodal dimension of endogenous versus reactive depression (Eysenck, 1970).

145 TABLE

1

UNIVERSE

OF ITEMS AND THEIR HDS

Item

---

Depressed

mood

(0-4)al I

--II

(O-4)

2 0

(O-4)

0

Suicide insomnia

MES MADRSMADRS index

II II I

qq l-l q q

Guilt

initial

COMBINATIONS

(O-2)

X3b x1.5 x 1.5

U

Middle insomnia (O-2) Late insomnia

0

(O-2)

Work Retardation Agitation

q qqu

(O-4)

3

(O-4)

4 0

(motor) (motor)

(O-4) tl

Anxiety

x 1.5

(psychic) (O-4)

5 0

(O-4)

0

(O-2)

0

(O-2)

6 0

(O-2)

0

qu

x 1.5

Anxiety (somatic)

q

Gastrointestinal Somatic

x3

general

Sex Hypochondriasis

(O-4) tl

Loss of insight (O-2) Weight Insomnia

(O-2) (general)

q

(O-4)

Motor retardation

(O-4)

Verbal retardation

(O-4)

Intellectual

0

x1.5

q

retardation(O-4)

x1.5 00

Emotional

retardation

(O-4)

(2) The Diagnostic Melancholia Scale (DMS, Beth and Rafaelsen, 1980) consisting of two axes with five items each, scored on a 3-point scale (O-2). In this study we calculated the DMS endogenous axis as follows: the items feeling worst in the morning, early awakening and persistence of clinical picture could be transformed directly from the corresponding items from the N-11; the item weight loss could be transformed directly from the mean score of item 17 on the HDS-17; the item quality of depression has been omitted as the relevance of this has been challenged lately and consequently has been omitted from the DSM-IIIR (1987). Thus, the endogenous axis calculated in this study ranges from 0 to 8. The DMS reactive axis was transformed directly from the N-II for the items psychosocial stressor and reactivity of symptoms, whereas the 4-point scale of the item duration was transferred to a 3-point scale, thus: 0 = 0, 1-2 = 1, 3-4 = 2. The item character neurosis was obtained from the discharge diagnosis. Any patient who had been given the additional diagnosis of a character neurosis at discharge was rated 2 on this item, otherwise 0. The DMS reactive axis calculated in this study thus ranges from 0 to 10. Rating procedure The rating took place once weekly between 8.15 and 9.15 a.m. The number of raters for each patient varied between three and seven, because of the unavoidable absence of one or more of the raters on certain days. The raters all took turns in interviewing the patients. Each rater took part in 24-34 of the sessions (mean 29.7). In Table 2 the raters are described in terms of age, sex, years of practice in psychiatry, treatment orientation, attitude to the use of rating scales and experience with rating scales. In the joint rating sessions the interview lasted approximately 15 min. First, the VAS followed by the CGI was to be scored. Then the scoring of HDS-17, MES and MADRS was performed. These

x 1.5 tKl

Tiredness

and pains

(O-4) 0

Notes to Table 1: a The point scales for the HDS/MES items. b The factor with which the HDS/MES score was multiplied to obtain a MADRS index score.

146 TABLE

2

EXPERIENCE

AND ATTITUDES

OF THE RATERS

Rater

Age

Sex

Years in psychiatry

Experience with rating scales a

Attitude to rating scales b

Balance of preference of treatment ’

1 2 3 4 5 6 7

46 38 43 43 41 38 37

M F F F M F F

16 10 9 8 I 9 7

+ + + + + + +

+ ? + + + + ?

12 62 48 83 80 71 72

a Experience with rating scales: + = had experience with scales in research/clinic; - = no experience b Attitude to rating scales: + = high value; - = little value; ? = doubtful. ’ Balance of preference of treatment: 0 = psychotherapy only; 100 = pharmacotherapy only.

three scales had been printed in such a way that corresponding items were scored simultaneously. After the joint rating the rater who had chosen the patient and had performed the interview assessed the patient according to the N-II.

TABLE

HDS-17 HDS-6 MES MADRS MADRS CGI VAS

TABLE

Statistical analysis When analyzing the inter-rater reliability of the group of raters participating in this study intraclass coefficients were used (Bartko and Carpenter, 1976). As claimed by Beth and Clemmensen (1983), coefficients between 0.41 and 0.60 should be considered moderately good agreement, while coefficients between 0.61 and 0.80 show substantial agreement. Coefficients of 0.81 or higher equal perfect agreement.

3

INTRA-CLASS

COEFFICIENTS 0.83 0.73 0.81 0.86 0.77 0.70 0.64

index

with scales.

Results Table 3 shows the intra-class coefficients of the various rating methods. The coefficients of all the scales are in the range of substantial agreement.

4

INTER-CORRELATIONS

OF SCALES HDS-17

HDS-6

MES

MADRS

MADRS index

CGI

VAS

0.82 0.88 0.82 0.88 0.72 0.74

_ 0.92 0.85 0.84 0.78 0.81

_ _ 0.92 0.93 0.86 0.87

_

_ _ _ _

_ _

_ 0.96 0.87 0.87

_ _ _ _ 0.83 0.83

0.94

_ _ _

DMS endogenous DMS reactive

0.46 0.21

0.36 0.25

0.39 0.10

0.43 0.10

0.46 0.06

0.47 0.03

0.43 0.13

Spearman

n = 40, coefficients:

HDS-6 MES MADRS MADRS CGI VAS

index

coefficients:

0.31-0.40:

P < 0.05; 0.41+ : P i 0.01.

DMS endogenous _ _ _ - 0.01

147

The three rating scales HDS-17, MES and MADRS are, however, close to perfect. Table 4 shows the intercorrelation between the various scales. No correlation was found between the two dimensions of the DMS. According to the DMS 19 patients (47.5%) were found to be endogenously depressed and five patients (12.5%) reactively depressed while two patients (5%) were both and 14 patients (35%) neither. In comparison the distribution of the patients according to the N-II was that 29 (72.5%) were of the endogenous type and 11 (27.5%) of the reactive type. Of the 29 endogenously depressed patients 24 were suffering from major depression and five from dysthymic disorder according to DSM-III, and of the 11 reactively depressed patients five were suffering from major depression and six from dysthymic disorder. This seems to prove that the two dimensions endogenous/ reactive cannot without loss of information be reduced to a single continuum as presumed by the N-II. The reactive dimension of DMS had no correlation with the severity scales. The endogenous dimension had a very weak positive correlation with the severity scales, lowest for the HDSS-6 and MES. The two global ratings of severity (CGI and VAS) correlated highly (0.94). In general, however, VAS correlated highest with the rating scales, lowest with HDS-17 (0.74) and highest with MES (0.87) and MADRS (0.87). To see if the raters’ attitudes towards rating scales and preferences for treatment strategies had biased the observations, we divided the seven raters into two groups. One group consisted of four raters (nos. 1, 4, 5 and 6, Table 2) who all found

rating scales to be of great value and tended to prefer pharmacotherapy as the primary treatment for depression. The other group consisted of three raters (nos. 2, 3 and 7, Table 2) who found rating scales of doubtful value or tended to prefer psychotherapy as the primary treatment for depression. The global ratings of these two groups were compared and no significant differences were found. The mean scores of CGI were 3.55 and 3.56, respectively (P = 0.95) and the mean scores of VAS were 39.45 and 37.99, respectively (P = 0.50) which indicates that the observations were unbiased. As expected, the MADRS and the MADRS index correlated most highly (0.96). W. Maier (personal communication) has found a similar correlation and both Maier and the present study have found that the MADRS index scores tend to be higher than the corresponding MADRS scores. Maier found the intercept of the regression line to be 1.6 with a slope of 0.9 whereas we found the intercept to be 5.2 with a slope of 1.01. The MES, MADRS, HDS-17 and HDSS-6 correlated mutually. However, as shown in Table 5 there seem to be differences between the scales when defining major depression. It is customary to use a score of 18 or more on the HDS-17 to define major depression (Beth, 1988). Likewise, major depression on the HDSS-6 equals a score of 9 or more and on MES a score of 15 or more. Table 5 indicates that the HDS-17 is the most exclusive in this respect, while HDSS-6 is the most inclusive. The corresponding MADRS total scores range between 23.4 and 27.2 (the MADRS index between 28.7 and 32.6).

TABLE 5 A CROSSWALK

BETWEEN THE SCALES IN RELATION

TO MAJOR DEPRESSION

Major depression

CGI

VAS

MADRS

MADRS index

Number of patients

HDS-17 18 or more

4.1 f 0.8

48.2k13.6

27.2 + 6.0

32.6+5.5

18

HDS-6 9 or more

3.7+0.8

41.5 f 14.4

23.4 f 6.7

28.7 + 6.4

32

MES 15 or more

3.8 f 0.8

43.2 + 13.9

24.3k6.3

29.6 + 6.0

30

148

Discussion The HDS-17, HDSS-6, MADRS and MADRS index all have perfect inter-rater reliability. The CGI and VAS have less but the level is still acceptable. It appears that if the number of items is less than 10 there is a tendency to less perfect coefficients. These scales measure severity of depression, reflected by the lack of correlation with the reactive dimension of the DMS. The dimension of endogenous depression had a weak positive correlation with the severity scales, less so for MES and HDSS-6 which have no items in common with the DMS in contrast to the HDS-17 and MADRS. While there was low agreement between the quantitative and qualitative scales, the agreement between the quantitative scales was considerable. Very good agreement was found between the MADRS and the MADRS index. This brings into focus the question of the relevance of constructing new scales inasmuch as they do not seem to assess new aspects of the clinical disorder or at least assess the same aspects considerably more accurately than the already existing scales. A certain tendency to create new scales either partly or totally based on already existing ones seems not to be uncommon (e.g., Nilsson and Axelssen, 1988) while in many ways it would seem more rational to improve the existing ones instead. The definition of the severity of a depression for which treatment is needed in a research setting has often been equal to major depression (Beth, 1988). Recent studies by Paykel (1989) in general practice have shown that amitriptyline is superior to placebo in patients with a HDS-17 score of 13 or more, while no difference was seen between placebo and active treatment in patients with a score below 12. This study has shown that a HDS-17 score of 18 or more is considerably more exclusive than both the HDSS-6 score of 9 and the MES score of 15. This finding would seem to support 14 as the clinically more relevant and meaningful cut-off score of HDS-17 to define major depression in a research setting. Acknowledgement The statistical analysis of this study was supported by a grant from the Lundbeck-fonden.

References American Psychiatric Association (1987) Diagnostic and Statistical Manual of Mental Disorders, 3rd edn., rev. (DSM-IIIR). American Psychiatric Association, Washington. DC. Asberg, M., Montgomery, S.A., Pertis, C., Stalling, D. and Sedvall, G. (1978) A comprehensive psychopathological rating scale. Acta Psychiatr. Stand. Suppl. 271. 5-27. Bartko, J.J. and Carpenter, W.T. (1976) On the methods and theory of reliability. J. Nerv. Ment. Dis. 163, 307-317. Beth, P. (1981) Rating scales for affective disorders: their validity and consistency. Acta Psychiatr. Stand. 64, Suppl. 295. l-101. Beth, P. and Rafaelsen, O.J. (1980) The use of rating scales exemplified by a comparison of the Hamilton and the Beth Rafaelsen Melancholia Scale. Acta Psychiatr. Stand. 62, Suppl. 185, 128-132. Beth, P. and Clemmensen, L. (1983) The diagnosis of depression: 20 years later. Acta Psychiatr. Stand. 68, Suppl. 310, 9-30. Beth, P., Gram, L.F., Dein, E., Jacobsen, O., Vitger, .I. and Bolwig, T.G. (1975) Quantitative rating of depressive states. Acta Psychiatr. Stand. 51, 161-70. Beth, P., Kastrup, M. and Rafaelsen, O.J. (1986) Mini compendium of rating scales for states of anxiety, depression, mania, schizophrenia with corresponding DSM-III syndromes. Acta Psychiatr. Stand. 73, Suppl. 326, l-37. Beth, P., Allerup, P., Gram, L.F., Kragh-Sorensen, P., Rafaelsen, O.J., Reisby, N., Vestergard, P. and the DUAG (1988) The Diagnostic Melancholia Scale (DMS): dimension of endogenous and reactive depression with relationship to the Newcastle Scales. J. Affect. Disord. 14, 161-170. Eysenck, H.J. (1970) The classification of depressive illness. Br. J. Psychiatry 117, 241-250. Gurney, C. (1971) Diagnostic scales for affective disorders. Proc. 5th World Conf. Psychiatry Mexico City, p. 330. Guy, W. (1976) ECDEU Assessment Manual for Psychopharmacology (Revised). National Institute of Mental Health, Psychopharmacology Revised Branch, Rockville, MD. Hamilton, M. (1960) A rating scale for depression. J. Neurol. Neurosurg. Psychiatry 23, 56662. Hamilton, M. (1967) Development of a rating scale for primary depressive illness. Br. J. Sot. Clin. Psychol. 6, 278-296. Hamilton, M. (1986) The Hamilton Rating scales for depression. In: N. Sartorius and T. Ban (Ed%), Assessment of Depression. Springer Verlag, Berlin, pp. 143-152. Maier, W. and Phillip, M. (1985) Comparative analysis of observer depression scales. Acta Psychiatr. Stand. 72, 230245 Maier, W., Buller, R., Phillip, M. and Heuser, I. (1988) The Hamilton Anxiety Scale. Reliability, validity and sensitivity to change in anxiety and depressive disorders. J. Affect. Disord. 14, 61-68. Montgomery, S.A. and Asberg, M. (1979) A new depression scale designed to be sensitive to change? Br. J. Psychiatry 134, 382-389. Nilsson, A. and Axelssen, R. (1989) Psychopathology during

149 long-term lithium treatment of patients with major affective disorders. Acta Psychiatr. Stand. 80, 375-388. Paykel, ES. (1989) The use of the Hamilton Scale in general practice. In: P. Beth and A. Coppen (Eds.), The Hamilton Depression Scale. Springer, Heidelberg (in press).

Zitman, F.G., Mennen, M.F.G., Griez, E. and Hooiper, C. (1989) The different versions of the Hamilton Depression scale. In: P. Beth and A. Coppen (Eds.), The Hamilton Scale. Springer, Heidelberg (in press).

Quantifying depressive symptomatology: inter-rater reliability and inter-item correlations.

In this study an evaluation of the inter-rater reliability of the Hamilton Depression Rating Scale, the Melancholia Scale and the Montgomery-Asberg De...
557KB Sizes 0 Downloads 0 Views