International Journal of Psychiatry in Clinical Practice

ISSN: 1365-1501 (Print) 1471-1788 (Online) Journal homepage: http://www.tandfonline.com/loi/ijpc20

DSM-IV Major Depressive Episode criteria: An evaluation of reliability and validity across three different rating methods Mark J. Hilsenroth, Matthew R. Baity, Megan A. Mooney & Gregory J. Meyer To cite this article: Mark J. Hilsenroth, Matthew R. Baity, Megan A. Mooney & Gregory J. Meyer (2004) DSM-IV Major Depressive Episode criteria: An evaluation of reliability and validity across three different rating methods, International Journal of Psychiatry in Clinical Practice, 8:1, 3-10 To link to this article: http://dx.doi.org/10.1080/13651500310004795

Published online: 13 Sep 2010.

Submit your article to this journal

Article views: 27

View related articles

Citing articles: 2 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ijpc20 Download by: [University of Wisconsin Oshkosh]

Date: 12 November 2015, At: 02:18

# 2004 Taylor & Francis

International Journal of Psychiatry in Clinical Practice 2004

Volume 8

Pages 3 /10

3

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

DSM-IV Major Depressive Episode criteria: An evaluation of reliability and validity across three different rating methods MARK J. HILSENROTH1, MATTHEW R. BAITY2, MEGAN A. MOONEY3 AND GREGORY J. MEYER4 1

The Derner Institute of Advanced Psychological Studies, Adelphi University, 2Department of Psychiatry, Massachusetts General Hospital & Harvard Medical School, 3Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine and 4Department of Psychology, University of Toledo, USA

Correspondence Address Mark J. Hilsenroth, Ph.D., ABAP, The Derner Institute of Advanced Psychological Studies, Adelphi University, Garden City, NY 11530, USA Tel: /(1) 516 877 4748 Fax: /(1) 516 877 4805 E-mail: [email protected]

This study examined the reliability and validity of the DSM-IV Major Depressive Episode (MDE) symptoms in a sample of outpatients based on clinician ratings of assessment interviews, videotape of these interviews, and chart information. A group of 44 patients admitted to a university based outpatient community clinic were rated by trained clinicians on the nine MDE symptom criteria (A1 A9). Patients also completed a selfreport measure of depressive symptomatology. Interrater reliability was found to be highest between the interview and videotape ratings. Conversely, when comparing chart ratings with those made from either interview or videotape, reliability for many of the individual criteria were found to be in the poor range of reliability. Good internal consistency was found for both the interview and videotape ratings, while internal consistency was less adequate for chart-based ratings. However, the three MDE ratings converge on a single dimension in a similar manner regardless of which source (i.e. interview, videotape, or chart) provided the rating. Additional analyses revealed a consistent relationship between the three MDE ratings (interview, videotape, and chart) with measures of patient reported depressive symptoms that exceeded the relationship between these scores with global levels/severity of psychopathology. The effects of longer interviews as well as examining depressive symptoms within the context of interpersonal functioning is discussed regarding the reliable assessment of depressive symptomatology in the mild to moderate range of severity. (Int J Psych Clin Pract 2004; 8: 3 10)



/

/

Received 23 December 2002; accepted for publication 10 November 2003

Keywords depression assessment

INTRODUCTION

A

ssessing depressive symptomatology is a task that has been given much attention within psychiatric classification. This has been reflected in extensive clinical field trials both internationally and in North America.1  5 The findings of these studies indicated that the classification (presence/ absence) of depressive symptomatology at the broader ‘‘group’’ (i.e. Mood Disorder) and ‘‘sub-group’’ (i.e. Major Depressive Disorder, Depressive Episode, Recurrent Depressive Disorder) levels exhibited good (k /0.6/0.74)6 to excellent (k /0.74) inter-rater reliability. The diagnosis of severe depressive symptomatology at the ‘‘category’’ level (i.e. Severe Depressive Episode, Recurrent Depressive Disorder/ Severe) also exhibited good inter-rater reliability. However, these studies indicated that the assessment of less severe

DSM-IV reliability

manifestations of depressive symptoms at the ‘‘category’’ level (i.e. Mild Depressive Episode, Moderate Depressive Episode, Recurrent Depressive Episode/Mild, Recurrent Depressive Episode /Moderate, Dysthymia) had poor (k B/0.4) to fair (k /0.4 /0.59) inter-rater reliability. This impact of the number and severity of symptoms on the ability of clinicians to accurately evaluate manifestations of depression seems to warrant further exploration for several reasons. First, the accurate assessment of depressive symptomatology in the mild to moderate ranges of severity would be especially important in outpatient settings where less extreme depressive reactions are likely to be found in relation to more symptomatic inpatient populations. This is especially relevant in regard to the assessment of ‘‘subthreshold’’ depression that can have significant prognostic implications regarding the development of more severe and chronic DOI: 10.1080/13651500310004795

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

4

MJ Hilsenroth et al

conditions.7  11 Second, can these criteria distinguish across a range of severity to assess depressive characteristics in the absence of Major Depressive Disorder sensitive enough to be utilized in epidemiological studies and in relation to treatment outcome?12  15 Thus, the potential may exist to provide important information when evaluating a broader conception of depression characterized by a range of impairment.16,17 A third area where further evaluation of depressive symptoms is needed is at the individual item level. Utilization of individual items organized additively can provide a dimensional evaluation of depressive symptomatology that has been shown to be more reliable and valid than categorical classification. Dimensional approaches to assessing the construct of depression avoid the problems of imposing categorical cutoffs on diagnostic features that are largely continuous in nature.18 Information regarding item level reliability and convergence (adjusted item-to-scale correlations) to the depressive syndrome may also prove useful in future revisions of depressive symptom criteria sets. A fourth area of investigation that needs further exploration is the congruence between different methods of evaluating depressive symptoms. Most studies to date have utilized clinician ratings of depressive symptoms by two (or more) clinicians who are simultaneously rating the same interview [one as the interviewer and the other(s) as observer(s)], two (or more) clinicians conducting independent interviews within a short time frame (usually 1 /2 weeks), or a second clinician rating an audio or videotape of the interview conducted by an index rater. Since chart material represents a very accessible resource for many in the field it is important to determine how information gleaned from chart review is similar or different to more common methods of rating based on actual interview and recorded narrative. There has yet to be a study that has examined the congruence across interview, audio/ videotape and chart ratings of depressive symptoms. Finally, the investigation of the relationship between clinician-rated depressive symptoms with patient self-report measures of depression would evaluate the extent that clinician and patient reports of depressive symptomatology are similar. The present study represents an extension of earlier research investigating the reliability of depressive symptomatology. The primary goal of the study was one regarding psychiatric assessment, specifically to examine the reliability of DSM-IV MDE criteria made by clinicians in the assessment process. The severity of various depressed patient samples (i.e. inpatient or outpatient) provides a context in which different patient populations displaying depressive symptoms have been reliably diagnosed in the past (i.e. severe or inpatient samples) versus those populations where the reliability of assessing depressive symptoms has been problematic (i.e. mild to moderate severity or outpatient samples). Thus, a distinctive feature and need for the current study on a mild to moderate severity or outpatient sample. Both the individual and total number of Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) Major Depressive Episode (MDE) symptoms (A1 /A9, p. 327)19 will be evaluated in a sample of outpatients based on

clinician ratings of assessment interviews, videotape of these interviews, and chart information. The convergence and divergence across these three different methods will be examined. In addition, the psychometric characteristics of the MDE criteria set will be computed using methods of internal consistency. Finally, convergent validity of total MDE symptoms will be evaluated in relation to a self-report measure of depressive symptomatology.

METHOD PARTICIPANTS All participants were representative of those actually seeking outpatient treatment at a university-based community clinic. Cases were assigned to treatment team and clinicians in an ecologically valid manner based on real world issues regarding aspects of clinician availability, case load, etc. Moreover, patients were accepted into treatment regardless of disorder or comorbidity. At any time during the 22-month period of data collection for this study, the number of treatment teams at this clinic ranged from three to five. The participants were 44 patients consecutively admitted to one of two treatment teams over a 22-month period. In this sample, 20 patients were male, 24 were female; 22 were single, 11 were married, and 11 were divorced. The mean age was 29.2 (S.D. /11.2). DSM-IV Axis I diagnoses included: Mood Disorder (n /25; Major Depressive Disorder/12, Depressive Disorder NOS /6, Dysthymia /7), Anxiety Disorder (n /1), Substance-Related Disorder (n / 2), Adjustment Disorder (n /4), and V-Code-Relational Problem (n /10). Fifteen individuals were diagnosed with a DSM-IV Personality Disorder and eight as having Personality Disorder features or traits. Each participant provided written informed consent to be included in this research.

PROCEDURE Each participant completed a videotaped semi-structured clinical interview that lasted approximately 2 h and an interpretive/feedback interview that lasted approximately 1 h. The clinical interview focused on a number of salient therapeutic topics such as presenting problems; past psychiatric history; past medical history; family history; developmental, social, educational, and work history; an exploration of both historic and current relational episodes; and a mental status exam that included an assessment of all DSM-IV symptom criteria for Schizophrenia, Major Depressive/Manic/Mixed episode, Dysthymia, as well as many anxiety symptoms. Interviewers inquired about and assigned DSM-IV MDE ratings regardless of whether a mood disorder diagnosis was under consideration. Each feedback session, also videotaped, was organized according to a Therapeutic Model of Assessment.20,21 This approach focuses on collaboration, alliance building, exploration of factors maintaining

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

MDE reliability and validity

life problems (often relational) and identification of potential solutions, and therapist-patient interaction. Ten advanced students (five men and five women) enrolled in an American Psychological Association (APA) approved Clinical Psychology Ph.D. program conducted the psychological assessment, feedback sessions, and ratings of DSM-IV MDE symptoms from the various different perspectives. All clinicians had completed graduate course training in descriptive psychopathology and were supervised by a licensed, Ph.D., Clinical Psychologist with several years of applied experience. Prior to rating the DSM-IV MDE symptoms for this study, the 10 clinicians participated in both individual and group training on scoring where scoring guidelines were reviewed for each of the variables of interest. Also, each clinician received a minimum of 3.5 h of supervision per week (1.5 h individually, and 2 h in a group treatment team meeting) on the therapeutic assessment model/process, scoring/interpretation of assessment measures, clinical interventions, and presentation/organization of collaborative feedback. All MDE symptoms were scored by the clinician conducting the assessment immediately at the end of the feedback session and were based on information gained during both the clinical interview and feedback sessions. As per DSM-IV guidelines individual MDE symptoms were rated as present only if they represented a functional impairment in the patient’s life. A subsample of raters from the larger set of trained clinicians conducting the assessment interviews (the advanced graduate students and in some cases the study supervisor) then independently rated all MDE symptoms for each participant immediately after viewing a videotape of both the clinical interview and feedback sessions. For all cases, rating of the MDE symptoms by the second rater was completed independent of the assessing clinician’s ratings of MDE symptoms. A subsample of raters from the larger set of trained clinicians conducting the assessment interviews (the advanced graduate students) also conducted all chart ratings of the DSM-IV MDE symptoms. These clinicians had completed graduate course training in descriptive psychopathology and participated in both individual and group training on scoring where scoring guidelines were reviewed for each of the variables of interest prior to rating the DSM-IV MDE symptoms for this study. Chart raters also completed over 20 h of training on practice chart materials prior to making ratings for this study. For all cases, rating of the MDE symptoms by the chart rater was based solely on the intake report, psychological evaluation report, and session notes from the evaluation interviews. These chart materials were written by the patients’ clinician in the course of standard treatment procedures and planning. These chart ratings were completed independent of the assessing clinician’s and external rater’s ratings of MDE symptoms. Participants were asked to complete the Symptom Checklist-90-Revised (SCL-90-R; 22) during the assessment process, after the clinical interview and prior to the feedback session. The SCL-90-R has a scale that specifically assesses depressive symptomatology (SCL-DEP), as well as a Global

5

Severity Index (SCL-GSI) that serves as measures of overall level of symptom severity. Before administering the SCL-90R, the clinician and patient discussed how completing the measure as openly as possible would aid in a better understanding of the patient’s current life problems as well as facilitate the development of treatment goals. The patient independently completed the SCL-90-R; however, the clinician was available in the clinic to the patient during this time to answer any questions that may arise during the testing.

DATA ANALYSES The data analyses proceeded in six steps. First, descriptive information [mean (M), standard deviation (SD), and endorsement frequencies (FQ)] for the MDE criteria will be presented for each of the three different perspectives [interview (I), videotape (V), and chart (C)]. This descriptive information will be provided for the nine individual MDE criteria (MDE-1 /MDE-9), the mean of the individual MDE criteria (MDE-Mean), and the number of MDE criteria present (MDE-Present). Second, inter-rater reliability of the individual MDE criteria, MDE-Mean, and MDE-Present will be examined by computing agreement across each of the three rating perspectives (I, V, and C). To assess reliability we calculated intraclass correlation coefficients (ICC) using a two-way random effects model (Ref. 23, model 2; see also Ref. 24 ). The ICC is a chance-corrected reliability coefficient suitable for continuous data and equivalent to k under appropriate conditions.25,26 A two-way random effects model assumes the same fixed raters rated each case. While this is not strictly true for the present data set because more than one rater contributed to each rating perspective, it is the case that the three rating perspectives remained fixed. Thus, because our interest was in evaluating the agreement between the interview, video, and chart perspectives, the two-way random effects model was most appropriate. (As a practical matter, our decision made little difference. Across all variables, the average difference between results from a one-way and a two-way random effects model was 0.011.) Third, the internal reliability of the individual MDE criteria were evaluated in relation to the MDE-Present score from each of the three rated perspectives utilizing adjusted-itemto-scale correlations. In computing the adjusted item-to-scale correlations, the criterion being evaluated was excluded from the MDE-Present score. The item-to-scale correlations provide an indication of the convergence of the individual items with the overall content of the criteria set. A criterion was considered to possess adequate convergence if its adjusted item-to-scale correlation was equal to or greater than 0.30.27 Fourth, coefficient a values were calculated on the MDE criteria set from each of the three rated perspectives. The coefficient a is a reliability statistic that provides an evaluation of the internal consistency of an item/criteria set. A coefficient a of 0.70 or greater is considered to represent adequate internal consistency for an item/criteria set.27 Fifth, the convergence of the MDE-Present score between each of the three rated perspectives were examined utilizing a

6

MJ Hilsenroth et al

Pearson correlation (r ). Finally, the convergent validity between patient self-reports of depression and global severity with each of the three rated perspectives was examined utilizing a Pearson correlation (r ).

( B/0.40). However, MDE-Present scores from chart ratings had a fair degree of correspondence with total scores from interview and video (0.40 /0.59). Table 3 provides the adjusted item-to-scale correlations of the individual MDE criteria as well as the coefficient a for each rated perspective. These results showed two MDE criteria (#4 (sleep) problems and #5 (psychomotor)) from the interview ratings, three criteria (#3 (weight loss), #4 (sleep problems) and #5 (psychomotor)) from the video ratings, and four criteria (#3 (weight loss), #5 (psychomotor), #8 (concentrate), and #9 (death/suicide)) from the chart ratings exhibited adjusted item-to-scale correlations below 0.30. One criterion (#5 (psychomotor)), revealed an adjusted item-to-scale correlation below 0.30 for each of the three rated perspectives. Overall coefficient a for the nineitem MDE criteria set shows good internal consistency for both the interview (0.75) and videotape (0.75) ratings. However, the overall internal consistency was less adequate for chart based ratings (0.63). Table 4 provides correlations between the MDE-Present scores for each of the three rated perspectives. As implied from the high reliability found in Table 2, MDE-Present scores from the interview and videotape perspectives were extremely high (r /0.96, P B/0.0001). In addition, MDEPresent scores derived from chart ratings were moderate to highly correlated with the MDE-Present scores from both interview (r /0.55, P B/0.001) and videotape (r /0.61, P B/0.001). (We would like to remind the reader that the Pearson (r) correlations between the MDE-Present scores reported in Table 4 are conceptually different from the Intraclass Correlation Coefficients (ICC) reported in Table 2. Pearson (r) correlations examine the degree of association between variables, whereas the ICC, an inter-rater reliability statistic examines both the degree of association and the exact level of agreement.)

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

RESULTS Table 1 provides the M, SD, and FQ for each of the individual nine MDE criteria, MDE-Mean and MDE-Present for each of the three rated perspectives. Table 1 reveals that ratings from the chart review had the highest mean and frequency for each of the nine individual MDE criteria. The one exception was for MDE criterion #7 (worthless/guilt), where the video ratings of the symptom was found to be the highest. While the MDE-Present score for chart (Mean /3.82, SD /2.01) ratings was almost one point higher than either the interview (Mean /2.93, SD /2.28) or video (Mean/2.91, SD / 2.25) perspectives, this was not a significant difference across the three groups (P /0.05). Only one MDE criterion (#1 depressed mood), exhibited significant differences across the three perspectives (F /7.24, P /0.001). A post-hoc Sheffe test showed that the chart mean (Mean /0.91) for this criterion was significantly higher (P B/0.05) than both the interview (Mean /0.61) and video (Mean /0.59) ratings. Table 2 presents the reliability, ICC (2,1), of the nine individual MDE criteria, MDE-Mean, and MDE-Present. A consistent pattern of reliability for the nine individual MDE criteria, MDE-M, and MDE-TOT was found. ICC reliability was found to be highest between the interview and videotape ratings. In this comparison, only one MDE criterion (#3 weight loss), was found to be lower than 0.60 and most of the others were in the excellent (/0.74) range. Conversely, when comparing chart ratings with those made from either interview or videotape, reliability for many of the individual criteria were found to be in the poor range of reliability

Table 1 Mean (M), standard deviation (SD), and endorsement frequencies (FQ) for the three rated perspectives Interview

Video

Chart

Variable

Mean

SD

FQ

Mean

SD

FQ

Mean

SD

FQ

MDE-1 (depressed mood) MDE-2 (diminished interest) MDE-3 (weight loss) MDE-4 (sleep problems) MDE-5 (psychomotor) MDE-6 (fatigue) MDE-7 (worthless/guilt) MDE-8 (concentrate) MDE-9 (death/suicide) MDE-Mean MDE-Present

0.61 0.36 0.09 0.32 0.09 0.41 0.45 0.45 0.14 0.33 2.93

0.49 0.49 0.29 0.47 0.29 0.50 0.50 0.50 0.35 0.43 2.28

27 16 4 14 4 18 20 20 6

0.59 0.39 0.02 0.25 0.16 0.36 0.59 0.39 0.16 0.32 2.91

0.50 0.49 0.15 0.44 0.37 0.49 0.50 0.49 0.37 0.42 2.25

26 17 1 11 7 16 26 17 7

0.91 0.41 0.16 0.43 0.18 0.41 0.48 0.61 0.23 0.42 3.82

0.29 0.50 0.37 0.50 0.39 0.50 0.51 0.49 0.42 0.44 2.01

40 18 7 19 8 18 21 27 10

N / 44. MDE, Major Depressive Episode symptoms (A1 /A9, p. 327; APA, 1994); MDE-Mean, mean of the individual MDE criteria; MDE-Present, number of MDE criteria present.

MDE reliability and validity

7

Table 2 Inter-rater reliability between the three rated perspectives ICC (2,1)

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

Variable

Interview & Videotape

Interview & Chart

0.95 0.86 0.38 0.84 0.70 0.91 0.74 0.86 0.91 0.79 0.96

0.28 0.34 0.28 0.19 0.44 0.44 0.59 0.25 0.55 0.37 0.51

MDE-1 (depressed mood) MDE-2 (diminished interest) MDE-3 (weight loss) MDE-4 (sleep problems) MDE-5 (psychomotor) MDE-6 (fatigue) MDE-7 (worthless/guilt) MDE-8 (concentrate) MDE-9 (death/suicide) MDE-Mean MDE-Present

Videotape & Chart 0.26 0.48 /0.04 0.32 0.44 0.43 0.51 0.23 0.50 0.35 0.56

N /44. ICC (2,1), intraclass correlation coefficient two-way random effects model (Shrout & Fleiss, 1979, Model 2); MDE, Major Depressive Episode symptoms (A1 /A9, p. 327; APA, 1994); MDE-Mean, mean of the individual MDE criteria; MDE-Present, number of MDE criteria present.

The relationship between the three MDE-Present scores was further examined utilizing a principal components factor analysis. Bartlett’s test indicated reasonable sampling adequacy (Bartlett x2 /129.9, P B/0.0001). The number of factors to retain was determined by inspection of eigenvalues. The first factor had an eigenvalue of 2.4 and accounted for 81% of the total variance. (The eigenvalues and proportion of variance accounted from the other two potential factors were 0.53/18% and 0.04/1%, respectively.) As expected all three MDE-Present scores had high positive loadings on this factor (Interview MDE-Present /0.95, Video MDE-Present /0.97, and Chart MDE-Present /0.78). Table 4 also provides the correlation between the MDEPresent scores for each of the three rated perspectives with patient self-report of depression and general level of severity. It appears that each of the MDE-Present scores are more strongly related to the SCL-DEP scale than to the SCL-GSI, indicating a specificity for depression more so than general or global levels of psychopathology. This is especially true of the Table 3 Adjusted item-to-scale correlations and coefficient a values for the three rated perspectives Variable MDE-1 (depressed mood) MDE-2 (diminished interest) MDE-3 (weight loss) MDE-4 (sleep problems) MDE-5 (psychomotor) MDE-6 (fatigue) MDE-7 (worthless/guilt) MDE-8 (concentrate) MDE-9 (death/suicide) Coefficient a

Interview

Video

Chart

0.65 0.64 0.44 0.22 /0.05 0.56 0.53 0.42 0.35 0.75

0.63 0.62 0.22 0.19 0.05 0.55 0.63 0.54 0.32 0.75

0.32 0.39 0.28 0.33 0.09 0.42 0.42 0.27 0.27 0.63

N / 44. MDE, Major Depressive Episode symptoms (A1 /A9, p. 327; APA, 1994).

Interview and Videotape MDE-Present scores, where the correlation with the SCL-DEP (both r/0.60) was substantially higher than with the SCL-GSI (r /0.40 and r /0.42, respectively). While the Chart MDE-Present score was also more related to the SCL-DEP (r /0.55) than the SCL-GSI (r /0.51), this difference was much smaller.

DISCUSSION This study sought to investigate the reliability and validity of the DSM-IV MDE criteria set from three different rating perspectives. The results indicated that the interrater reliability of interview and videotaped MDE ratings were substantially higher in this study than previous research on the assessment of less severe manifestations of depressive symptoms at the ‘‘category’’ level (i.e. Mild Depressive Episode, Moderate Depressive Episode, Recurrent Depressive Episode /Mild, Recurrent Depressive Epsiode /Moderate, Dysthymia; Keller et al1  5). Our analyses indicated a great deal of similarity between the MDE ratings made from interview and a videotape of these interview interactions. Table 4 Concurrent and convergent validity for the three rated perspectives MDE-Present score MDE-Present Variable Interview MDE-Present Video MDE-Present SCL-DEP# SCL-GSI#

Interview

Video 0.96***

0.60*** 0.40*

0.60*** 0.42**

Chart 0.55*** 0.61*** 0.55*** 0.51***

N / 44; #n /36. MDE-Present, number of MDE criteria present; SCL-DEP, Symptom Checklist-90-Revised, Depression Subscale; SCL-GSI, Symptom Checklist-90-Revised, Global Severity Index. * P / B/0.05; ** P / B/0.01; *** P / B/0.005.

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

8

MJ Hilsenroth et al

Conversely, chart ratings of MDE criteria seemed to show some divergence from the interview or videotaped ratings. It appeared that chart ratings of MDE symptoms were rated with greater frequency that led to lower levels of inter-rater reliability when the classification between each set of ratings was examined. The exception to the poor reliability found across the chart-rated MDE criteria, was when the total MDE chart score was utilized. Using this summary score led to fair/ moderate range of reliability with the interview (0.51) and videotaped (0.56) ratings. Finally, one potential limitation in the inter-rater reliability findings of the present study may relate to the broad focus of our interviews. Reliability rates may be higher when clinical evaluation is only focused on one or a few disorders. The pattern observed in the inter-rater reliability analyses was also observed in the evaluation of internal consistency/ reliability of the DSM-IV MDE criteria sets. While the MDE criteria sets rated from the interview and videotaped perspectives demonstrated high levels of internal consistency (both 0.75), the chart-rated MDE criteria set (0.63) was lower and did not surpass Nunnally and Bernstein’s27 recommended lower bound for coefficient a of 0.70. Further improvement in the MDE criteria set for utilization in chart review would be desirable and might be aided by additional criteria that further evaluate depressed mood, diminished interest, social isolation/withdrawal, pleasurable activities, sleep problems, fatigue, energy, guilt, and feelings of inadequacy/worthlessness. Another focus of this study was to assess the validity of the MDE-Present score from each of the three rated perspectives. First, the concurrent validity between the three MDE-Present scores was high (r/0.55 /0.96). Results of a factor analysis revealed that the interview, videotape, and chart MDE-Present rated scores formed one factor, and this factor accounted for 81% of the variance from these three scores. This indicates that the three MDE ratings converge on a single dimension in a similar manner regardless of which source (i.e. interview, videotape, or chart) provided the rating. An additional analysis evaluated the relationship between the three MDE scores with a self-report measure of psychopathology. In these analyses we found a consistent larger relationship between the three MDE scores (interview, videotape, and chart) with measures of patient reported depressive symptoms (r /0.60, 0.60, and 0.55, respectively) than with global levels/severity of psychopathology (r /0.40, 0.42, and 0.51, respectively). This supports the convergence and divergence of utilizing the MDE scores from any of the three rated perspectives to evaluate depression specifically, and not simply general impairment. While it does appear that researchers may utilize a chartrated MDE score as a global measure of depressive symptomatology in a reliable and valid manner, we would urge they do so with caution given the less than satisfactory reliability findings of the individual MDE criteria as well as the less than optimal coefficient a . However, one potential limitation with the current study was the use of graduate students rather than experienced practitioners to review and rate the

MDE chart information. While the use of graduate, or undergraduate, level research assistants is standard practice in a great deal of clinical research, future investigators may find higher rates of inter-rater reliability for chart-rated MDE criteria utilizing more experienced clinicians or graduate research assistants with greater than 20 h of training. Other limitations of the study include the exclusive focus on the DSM-IV MDE criteria as these criteria represent only a limited subset of the possible phenomenology of depression. While the goal of this study was to focus specifically on the ability to accurately assess the DSM-IV MDE criteria in order to effectively compare with prior work, future research should examine the reliability of a broader range of dysphoric symptomatology than is currently represented in DSM-IV. Similarly, while DSM-IV provides severity specifiers for Mood Disorder diagnoses (i.e. mild, moderate, severe), it does not do so at the individual MDE item level. This inclusion of severity ratings at the individual item level may also be an important issue for future research. Given this exclusive focus on DSM-IV MDE criteria some could argue that not utilizing the SCID was another limitation of this study. However, given the methodology for assessing these MDE criteria described earlier in the interview, the inquiry into these depressive symptoms was done in both a direct (i.e. similar to the SCID) and indirect (i.e. relational context) manner. Despite having a high degree of generalizability to other outpatient groups and representing a range of depressive symptomatology, the size of the sample in the current investigation is somewhat limited. While the focus on a sample with mild to moderate levels of depression might seem to be a limitation of the study, upon broad consideration of the issues being examined here this focus becomes less problematic. Since severe levels of depression have already been shown in the literature to be assessed with good to excellent reliability, whereas mild to moderate levels of depression exhibit poor or fair levels of reliability, the use of mild /moderate range of depressive symptoms is an important area for further research and distinctive feature of this study. In other words, we already know clinicians can reliably assess depressive symptoms in the severe range. What has been more problematic for clinicians is the assessment of depressive symptoms in the mild to moderate range. The current study may offer some insight into how and why this might be for mild to moderate levels of depression severity in that it specifically focuses on the extant limitations of the research literature (i.e. the reliable assessment of DSM-IV MDE in the mild to moderate range of diagnostic severity). Concerning this issue we offer two hypotheses for further research in the assessment of depressive symptomatology/ disorders especially in the mild to moderate range of diagnostic severity often found in outpatient settings. First, we would encourage future investigators to examine the differential impact of the time or length of the interview in relation to reliability. The length of interviews utilized in earlier reliability field trials have ranged from approximately 45 min2 up to approximately 2 h for a SCID interview. The

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

MDE reliability and validity

ratings of the present study were based on two sessions, together lasting approximately 3 h. The higher levels of interrater reliability that were found in the present work may be related to this additional time spent interacting with the patient. The implications of time or length of interviews on reliability have rarely been discussed in the psychiatric literature and given the current impingement of third-party payers and the reduced support for more thorough evaluations28,29 this seems an especially important issue. Should clinicians be unduly limited in their time spent during an assessment it may lead to decreased reliability, misdiagnosis, and potential problems for treatment. In addition to the additional time spent in relation to some of the earlier field trials, the present study also focused parts of the interviews on key relational episodes in the patients’ life. The focus on patient narratives has been identified as figural in real-world clinical assessment of personality and patient functioning.30,31 Also the approach to interviewing in the current study is based on a therapeutic model of assessment20,21 and may have contributed to higher reliability between the interview and videotape raters. Rather than focusing simply on the description of psychiatric symptomatology, patients were encouraged to describe and explore relational interactions (thoughts, feelings and fantasies) associated with the occurrence of symptoms. In this manner clinicians attempted to enlist the patients to help clarify and understand the impact of these experiences, both past and present, on their functioning. In addition, this relationally based exploration was focused on gaining a better understanding of personal meaning of relational life experiences related to psychiatric

9

symptoms as well as explore prior successful and unsuccessful ways of coping with symptoms. It is possible that the additional time or relational focus, or both, of the interviews employed in the present study aided clinicians in making more reliable assessment of depressive symptomatology in the mild to moderate range. Perhaps, when examining less extreme manifestations of depression clinicians would be aided by spending additional time with these patients and/or examining depressive symptoms within the context of interpersonal functioning.

KEY POINTS . An ecologically valid sample of outpatients were rated on DSM-IV Major Depressive Episode (MDE) symptoms . Clinician ratings of MDE symptoms were made from three different perspectives (interviews, videotape, and chart) . Reliability and validity of the interview and videotape MDE ratings were found to be high . Reliability and validity of the chart MDE ratings were found to be only fair . However, the three MDE ratings converge on a single dimension in a similar manner regardless of source (i.e. interview, videotape, or chart)

REFERENCES 1. 2.

3.

4.

5.

6. 7.

8.

Keller M, Klein D, Hirschfeld R et al (1995) Results of the DSM-IV Mood Disorders Field Trial. Am J Psychiatry 152: 843 /9. Regier D, Kaelber C, Roper M et al (1994) The ICD-10 Clinical Field Trial for Mental and Behavioral Disorders: results in Canada and the United States. Am J Psychiatry 151: 1340 /50. Sartorius N, Kaelber C, Cooper J et al (1993) Progress toward achieving a common language in psychiatry: results of the Field Trial of the Clinical Guidelines Accompanying the WHO Classification of Mental and Behavioral Disorders in ICD-10. Arch Gen Psychiatry 50: 115 /24. Sartorius N, Ustun T, Korten A et al (1995) Progress toward achieving a common language in psychiatry. II. Results from the International Field Trials of the ICD-10 Diagnostic Criteria for Research for Mental and Behavioral Disorders. Am J Psychiatry 152: 1427 /37. Williams J, Gibbon M, First M et al (1992) The Structured Clinical Interview for DSM-III-R (SCID): multisite test-retest reliability. Arch Gen Psychiatry 49: 630 /6. Fleiss J (1981) Statistical methods for rates and proportions , 2nd edn. New York: Wiley. Chen L, Eaton W, Gallo J et al (2000) Empirical examination of current depression categories in a population-based study: symptoms, course, and risk factors. Am J Psychiatry 157: 573 /80. Goldberg D (2000) Plato versus Aristotle: categorical and dimensional models for common mental disorders. Comp Psychiatry 41: 8 /13.

9.

10.

11.

12.

13.

14.

15.

16. 17.

Judd L, Akiskal H, Zeller P et al (2000) Psychosocial disability during the long-term course of unipolar major depressive disorder. Arch Gen Psychiatry 57: 375 /80. Judd L, Paulus M, Schettler P et al (2000) Does incomplete recovery from first lifetime major depressive episode herald a chronic course of illness. Am J Psychiatry 157: 1501 /4. Lewinsohn P, Solomon A, Seeley J et al (2000) Clinical implication of ‘‘subthreshold’’ depressive symptoms. J Abnorm Psychol 109: 345 /51. Barkham M, Shapiro D, Hardy G et al (1999) Psychotherapy in two-plus-one sessions: outcomes of a randomized controlled trial of cognitive-behavioral and psychodynamic-interpersonal therapy for subsyndromal depression. J Consult Clin Psychol 67: 201 /11. Helmchen H, Linden M (2000) Subthreshold disorders in psychiatry: clinical reality, methodological artifact, and the double-threshold problem. Comp Psychiatry 41: 1 /7. Judd L, Paulus M, Wells K et al (1996) Socioeconomic burden of subsyndromal depressive symptoms and major depression in a sample of the general population. Am J Psychiatry 153: 1411 /7. Magruder K, Calderone G (2000) Public health consequences of different thresholds for the diagnosis of mental disorders. Comp Psychiatry 41: 14 /8. Angst J, Selaro R, Merikangas K (2000) Depressive spectrum diagnoses. Comp Psychiatry 41: 39 /47. Mojtabai R (2001) Impairment in major depression: implications for diagnosis. Comp Psychiatry 42: 206 /12.

10

18.

19.

20.

21. 22.

Downloaded By: [informa internal users] At: 15:11 7 September 2010 Downloaded by [University of Wisconsin Oshkosh] at 02:18 12 November 2015

23. 24.

MJ Hilsenroth et al

Brown T, DiNardo P, Lehman C et al (2001) Reliability of DSM-IV anxiety and mood disorders: implications for the classification of emotional disorders. J Abnorm Psychol 110: 49 /58. American Psychiatric Association. (1994) Diagnostic and statistical manual of mental disorders. 4th edn. American Psychiatric Association, Washington, DC. Finn S, Tonsager M (1997) Information-gathering and therapeutic models of assessment: complementary paradigms. Psychol Assess 9: 374 /85. Fischer C (1994) Individualized psychological assessment . Hillsdale, NJ: Erlbaum. Derogatis L (1994) Symptom Checklist-90-Revised: Administration, Scoring, and Procedures Manual , 3rd edn. Minneapolis, MN: National Computer Systems. Shrout P, Fleiss J (1979) Intraclass correlations: uses in assessing rater reliability. Psychol Bull 86: 420 /8. McGraw K, Wong S (1996) Forming Inferences about some intraclass correlation coefficients. Psychol Methods 1: 30 /46.

25.

26. 27. 28.

29.

30.

31.

Fleiss J, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measure 33: 613 /9. Shrout P, Spitzer R, Fleiss J (1987) Quantification of agreement in psychiatric diagnosis revisited. Arch Gen Psychiatry 44: 172 /7. Nunnally J, Bernstein I (1994) Psychometric theory, 3rd edn. New York: McGraw-Hill. Eisman E, Dies R, Finn S et al (2000) Problems and limitations in the use of psychological assessment in contemporary healthcare delivery. Professional Psychol Res Pract 31: 131 /40. Piotrowski C (1999) Assessment practices in the era of managed care: current status and future directions. J Clin Psychol 55: 787 / 96. Westen D (1997) Divergences between clinical and research methods for assessing personality disorders: implications for research and the evolution of Axis II. Am J Psychiatry 154: 895 /903. Westen D, Arkowitz-Westen L (1998) Limitations of Axis II in diagnosing personality pathology in clinical practice. Am J Psychiatry 155: 1767 /71.

DSM-IV Major Depressive Episode criteria: An evaluation of reliability and validity across three different rating methods.

This study examined the reliability and validity of the DSM-IV Major Depressive Episode (MDE) symptoms in a sample of outpatients based on clinician r...
345KB Sizes 2 Downloads 4 Views