584970

research-article2015

ASMXXX10.1177/1073191115584970AssessmentDeskovitz et al.

Article

Interpretive Reliability of Six ComputerBased Test Interpretation Programs for the Minnesota Multiphasic Personality Inventory–2

Assessment 1­–12 © The Author(s) 2015 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/1073191115584970 asm.sagepub.com

Mark A. Deskovitz1, Nathan C. Weed1, Joseph K. McLaughlan2, and John E. Williams3

Abstract The reliability of six Minnesota Multiphasic Personality Inventory–Second edition (MMPI-2) computer-based test interpretation (CBTI) programs was evaluated across a set of 20 commonly appearing MMPI-2 profile codetypes in clinical settings. Evaluation of CBTI reliability comprised examination of (a) interrater reliability, the degree to which raters arrive at similar inferences based on the same CBTI profile and (b) interprogram reliability, the level of agreement across different CBTI systems. Profile inferences drawn by four raters were operationalized using q-sort methodology. Results revealed no significant differences overall with regard to interrater and interprogram reliability. Some specific CBTI/ profile combinations (e.g., the CBTI by Automated Assessment Associates on a within normal limits profile) and specific profiles (e.g., the 4/9 profile displayed greater interprogram reliability than the 2/4 profile) were interpreted with variable consensus (α range = .21-.95). In practice, users should consider that certain MMPI-2 profiles are interpreted more or less consensually and that some CBTIs show variable reliability depending on the profile. Keywords computer-based test interpretation, MMPI-2, q-sort, test interpretation, reliability The definition of test reliability and the methods used to estimate it ultimately depend on the attribute being measured and the sources of inconsistency thought to be present in the measurement of that attribute. Therefore, reliability can be operationalized in a variety of ways. Minnesota Multiphasic Personality Inventory–Second edition (MMPI2) reliability has been conceptualized within several contexts, such as test–retest reliability (Butcher, Graham, Ben-Porath, Tellegen, Dahlstrom, & Kaemmer, 2001), temporal stability (Graham, 2011a), and internal consistency (Butcher et al., 2001). These examples all represent methods of evaluating the reliability of scores on individual scales. In contrast, however, the MMPI-2 is typically not interpreted one scale at a time when applied clinically. Rather, information from scores on multiple scales is integrated into a descriptive account of the examinee. It would seem to follow that researchers should examine the reliability of this descriptive narrative rather than the reliability of individual scale scores. Unfortunately, this integrated descriptive account is rarely the subject of research inquiry. In the current study, we sought to examine the reliability of integrated MMPI-2 interpretations in the form of six commercially available computer-based test interpretation (CBTI) programs.

Graham (2011a) acknowledged that there is no single, best approach when using MMPI-2 data to make inferences about the examinee. Rather, test users in the field interpret MMPI-2 profiles using a variety of interpretive strategies. This point is illustrated by examination of the diversity of popular textbooks commonly used in teaching MMPI-2 interpretation (e.g., Butcher & Williams, 2000; Friedman, Lewak, Nichols, & Webb, 2001; Graham, 2011a; Greene, 2000). Theorists such as Dawes (1994) and Meehl (1956) have identified many fallacies and biases to which clinicians are subject when arriving at decisions, making prediction, and producing interpretation. Dawes (1994) argued that when a clinician is involved in integrating test information, clinical judgment is often based on a number of cognitive “heuristics” or rules of thumb. One of these is termed 1

Central Michigan University, Mt. Pleasant, MI, USA Rehabilitation Institute of Michigan, Detroit, MI, USA 3 University of Northern Iowa, Cedar Falls, IA, USA 2

Corresponding Author: Joseph McLaughlan, Department of Rehabilitation Psychology and Neuropsychology, Rehabilitation Institute of Michigan, 261 Mack Boulevard, Suite 555, Detroit, MI 48201, USA. Email: [email protected]

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

2

Assessment 

the “availability heuristic,” which involves searching one’s memory (including memory of one’s training) for instances similar to the one at hand. Unfortunately, availability can be quite biased by selective exposure and recall. Another heuristic is called the “representativeness heuristic,” which involves matching cues or characteristics associated with a category. Meehl (1956) discussed how clinicians sometimes make judgments about patients by comparing them with previous patients, a prototypical patient, or even a relative on the basis of a single common characteristic. Although these heuristics do lead to some validity in making judgments (i.e., clinical judges generally do better than chance in making predictions), they do not perform as well as automated combinations of relevant factors determined by a statistical model (Dawes, 1994). In 1956, Meehl, cognizant of the potential for clinical error, endorsed use of a formula that a clinician can use to make decisions with perfect reliability in the form of a “cookbook” based on explicit rules. In collaboration with Meehl, Halbower (1955) demonstrated consistent superiority of this method over clinical judgment when he correlated blind MMPI profile ratings from experienced clinicians with therapist notes and other available psychometrics. The mean cookbook correlation was .78 as opposed to .48 based on rule of thumb interpretation. With the widespread availability of computers, there have been numerous efforts to automate the assessment process, including the interpretation phase. CBTI systems are designed to take advantage of technological advancements by providing users with detailed reports that are programmed to mimic what would be written by a clinician based on a combination of empirical correlates and clinical expertise (Conoley, Plake, & Kemmerer, 1991). Although computers have some definite advantages over clinicians, including greater efficiency and reliability, superior storage capacity, and cost-efficiency (Graham, 2011a; Friedman et al., 2001), some disadvantages have also been documented, such as excessive generality, algorithm limitations, potential for misuse, and the confusing abundance of packages (Friedman et al., 2001). Furthermore, the extent to which algorithms are based on quantitative research or clinical judgment and the specific interpretive strategy followed are often not provided (Williams & Weed, 2004a), forcing the consumer to rely on individual experiences, word of mouth, and advertising in making their CBTI program decisions. Before deciding whether to use CBTIs, clinicians should first examine the potential benefits and limitations and how they will integrate CBTI interpretations into their client conceptualizations.

Status of MMPI-2 CBTIs The frequency with which CBTIs have been incorporated into assessments has been rising over the course of the

past several decades (Butcher, Perry, & Atlis, 2000), a trend that is expected to continue with increasing advancements in and applications of technology (Garb, 2000). It is essential that CBTIs regularly receive updates if they are to keep pace with advancements in personality and psychopathology theory, assessment instruments, and technology. Previous studies have identified at least eight commercially available CBTI systems for the MMPI-2 in the United States alone (e.g., Williams & Weed, 2004a). Only personal computer (PC) interface systems were used within the current study, thereby excluding the two mail-in CBTI systems (The Caldwell Report and Behavior Data). The current standing of the six remaining CBTI programs for the MMPI-2 will be reviewed below.

Automated Assessment Associates (AAA) The AAA Adult Interpretive Report–Version 10 incorporates MMPI-2 validity, clinical, content, and MacAndrews Alcoholism Scale–Revised, Addictions Potential Scale, and Addictions Acknowledgement Scale substance abuse scales into an integrated report. Programming is described as being determined by MMPI and MMPI-2 research and clinical literature as well as 70 years of clinical and research experience between the system’s authors, Donald Strassberg, PhD, and Leslie Cooper, PhD. The report contains codetype and single-scale interpretations, therapeutic implications, and diagnostic considerations. At the time of data collection, the cost of AAA was $525 with unlimited usage on the first computer and $250 for installation on each additional computer. This program has been removed from commercial availability since that time, however (Automated Assessment Associates, 2015).

Pearson Assessments (PA), Minnesota Report The Minnesota Report, authored by prominent MMPI-2 authority James N. Butcher, is available in three different versions based on clinical (used within this study), personnel, or forensic setting. This system was last revised in 2001. Important features of the PA include narrative sections that provide setting-specific descriptions, interpretive statements that are relevant to mental health settings, and item response percentages to provide clinicians with accurate perceptions of the frequency of particular statements in each setting. The following narrative sections are included in each report: profile validity, symptomatic patterns, profile frequency, profile stability, interpersonal relations, diagnostic considerations, and treatment considerations. PA reports are sold individually with the price based on quantity, ranging from $44.60 each for less than five to $34.35 each if 2,500 or more reports are ordered (Pearson, 2015b).

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

3

Deskovitz et al.

Psychological Assessment Resources (PAR) The MMPI-2 Adult Interpretive System Version 4 is authored by a well-known MMPI-2 researcher and clinician, Roger Greene (Williams & Weed, 2004a). This system provides clinicians with the option of receiving one of three types of reports: a Codetype Report, a Content-Based Report, or a Codetype/Content-Based Report that combines the features of the first two types. The combined Codetype/ContentBased Report was used within this study. The Codetype Report is described as an “empirically derived prototypic profile based on ‘goodness-of-fit’” with the client’s MMPI-2 scale scores (Greene, Brown, & PAR Staff, 2012). Interpretive statements are based on the client’s best match with one of 381 prototypic profiles. In addition, the Codetype Report includes interpretive statements based on individual scale scores, including Validity scales, Clinical scales, Harris and Lingoes scales, Restructured Clinical scales, Content and Content Component scales, Supplementary scales, and PSY-5 scales. Finally, PAR reports also include descriptions of treatment prognosis and diagnostic considerations. The PAR system costs $725 for unlimited uses. Upgrading to the newest system costs $375 (Greene et al., 2012).

PsychScreen (PS) The PS MMPI-2 Interpreter (1998) is a portion of a larger suite of psychological measures that are integrated into one complete report. The interpretive statements provided for the MMPI-2 are based on the Clinical, Content, HarrisLingoes, Supplementary, and Subtle-Obvious scales (Williams & Weed, 2004a). The PS CBTI report sections include intellectual functioning, validity of test results, emotional functioning, alcohol and drug use, somatic functioning, interpersonal functioning, self-image, defenses, personality dynamics, diagnostic considerations, and treatment recommendations (PsychScreen, Inc., n.d.). When the MMPI-2 is administered in conjunction with other measures, it is not clear from which test each interpretive statement is derived and if the MMPI-2 is incorporated into each report heading, however. The PS software costs a one-time fee of $250 for unlimited uses of the MMPI-2, excluding all other measures within the suite (PsychScreen, Inc., n.d.).

Psychometric Software, Inc. (PSI) The PSI program (Rainwater, 1996) is an individual PC program that produces reports in both integrated and scaleby-scale format with references to literature on which specific interpretive statements were derived. Interpretive statements are grouped into categories including validity, emotion, personality/behavior, interpersonal relationships, bodily concerns, cognitive distortions, diagnosis, prognosis, and treatment. The author indicated the report is based

mostly on MMPI and not MMPI-2 literature (Williams & Weed, 2004a).

Western Psychological Services (WPS) The CBTI program from WPS (2003) produces interpretive statements from the MMPI-2 clinical, content, and supplementary scales that are reportedly based on actuarial data. As opposed to an integrated report, WPS reports are organized in a sectional fashion with content areas including profile validity, basic scales, supplementary scales, content scales, and diagnostic impressions. Unique to the WPS system, specific scale elevations result in possible medications for treatment.

CBTIs in Clinical Practice Specific usage data have not been provided by CBTI publishers; however, McMinn, Ellens, and Soref (1999) attempted to determine the frequency and appropriate function of CBTI use in a survey of 364 members of the Society for Personality Assessment. Among respondents, the MMPI-2 was the most frequently used test (M = 4.8 uses per month, SD = 16.0, range = 0-250) and CBTI system (M = 3.3 uses per month, SD = 13.8, range = 0-250), though these averages were positively skewed. In 1994, over one third of survey respondents from the Society for Personality Assessment and the American Psychological Association, Clinical Psychology Division acknowledged using CBTI software (Ball, Archer, & Imhoff, 1994). In 1999, McMinn and colleagues found 85% of survey respondents had used CBTIs at least rarely for scoring purposes, although the extent to which survey response bias influenced this finding is unclear. Generally, CBTI use was deemed ethical for scoring and supplementing standard clinical methods. Respondents typically viewed use of CBTI reports as unethical when they are the primary case formulation resource or provided to the referring professional in lieu of writing a report. Findings were mixed concerning the appropriateness of extracting components of CBTI reports to be pasted into clinician reports; 42% of respondents viewed this as never or rarely ethical, whereas 38% indicated it is unquestionably or under many circumstances ethical. Taken together, users seem to use CBTI reports in a support function (i.e., to allow for refinement of interpretations based on integration with standard assessment practices) as opposed to a replacement function (i.e., to allow for elimination of an analogous portion of assessments). Empirical evaluation of the psychometric properties of CBTI systems has largely been neglected in the extant literature. In particular, measurement of interpretive reliability has been incidental to validity in studies that could be found. Previous research has emphasized two models of CBTI validity: customer satisfaction surveys and external

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

4

Assessment 

criterion designs. Customer satisfaction surveys have used rating scales asking clinicians to indicate the quality of various CBTI programs in terms of variables such as clarity, organization, accuracy, provision of new and/or useless information, and usefulness both individually and in comparison with other reports. To overcome the uncertainty as to whether positive CBTI impressions result from Barnum effects (i.e., the tendency to evaluate statements as accurate due to their high base rate rather than a discriminating characteristic among individuals), some authors have provided users either modal reports based on average endorsements within particular settings or authentic reports (Williams, 2002; Williams & Weed, 2004a). Superior ratings for authentic reports are then interpreted as evidence of validity. Based on the discrepancy between total rating scores of authentic and modal reports, Williams and Weed (2004a) found large effect sizes (d ≥ .80) for five of the six CBTI programs under consideration within the present article. The sixth program, PS, produced a medium effect size. AAA received the highest overall rating with the largest positive effects in categories including confirmation of the user’s opinion, accuracy, provision of new information, and clinical usefulness. The PA report was rated with the greatest clarity, and the WPS report was most concise. These findings are consistent with previous research in which the same systems were rated as more accurate than several others by clinical psychologists with access to case history materials (Eyde, Kowal, & Fishburne, 1991). External criterion studies have indexed validity of MMPI interpretations via operationalization of CBTI output in comparison with ratings provided by at least one individual who knows the examinee well (i.e., therapist, family member, close friend). Validity coefficients comparing CBTI ratings with therapist ratings have ranged from a median of r = .07 (Harrington, 1999) to an average of r = .36 (Moreland, 1983). The scarcity of published CBTI validity studies interferes with the ability to discern meaningful differences based on the relationship of the informant to the examinee, but it is noteworthy that Harrington (1999) found median correlations between CBTI ratings and ratings made by significant others were as low as −.08. At the individual program level, AAA reports were consistently associated with higher intercorrelations with informant report, whereas PS reports were among the lowest when correlated with therapist or informant report. As the validity of a system is limited by the interpretive reliability, the extent to which low validity coefficients are the product of limitations of interrater reliability remains largely unknown. When available, interinterpreter reliabilities have typically been more satisfactory than validity coefficients. In a study of diagnostic reliability, Pant, McCabe, Deskovitz, Weed, and Williams (2014) compared ratings of the psychiatric diagnoses most likely to be appropriate for examinees based on CBTI reports from the same six programs under

consideration within the present study. These authors found moderate interinterpreter diagnostic reliability overall (mean r = .56) with a range from .46 (2/4/7 profile) to .67 (1/2/3 profile). At the CBTI program level, overall interinterpreter reliabilities ranged from .51 (PS) to .61 (WPS). When comparing reliabilities between the different programs, the average correlation was .56 with a range from .34 (2/7 profile) to .71 (1/2/3 profile). Because the study emphasized diagnostic reliability, the degree to which raters agreed about specific examinee characteristics is unknown. The majority of research examining the reliability of CBTIs for MMPI instruments has indexed agreement between interpreters by comparing ratings taken from large sets of behavioral descriptors that are commonly drawn from MMPI/MMPI-2 interpretation. Despite low validity coefficients, Harrington (1999) demonstrated adequate interinterpreter reliability (median r = .64) based on ratings provided by advanced clinical psychology graduate students. Generally, studies have found average interrater reliabilities ranging from r = .45 (Marshall, 2001) to r = .74 (McNeal, 1997). Substantial improvements in interrater reliability have not been measured based on clinical experience within the few studies that have been conducted. For example, Deskovitz (2003) found average correlations of .62 and .64 based on MMPI-2, 2/7 and 6/8 profiles, respectively, using a sample of 20 practicing clinicians who commonly use the MMPI-2. Interestingly, the same study measured interpretive reliability for the same 2/7 and 6/8 profiles as r = .41 and r = .36, respectively, based on clinicians’ personal interpretations of profiles, indicating that raters are substantially more in agreement when interpreting a CBTI report than when freely interpreting MMPI-2 profiles. To date, little is known about the variables that may affect reliability between clinicians (e.g., length of report, clarity of statements, presence of contradictory statements, pattern of scale elevations). Furthermore, most of the available studies were conducted between the 1960s and 2004. Consequently, numerous developments have been made to CBTI programming based on technological, empirical, and clinical advancements.

Present Study As discussed above, there is little known about the reliability of CBTIs. Although it is clear that commercially available CBTIs provide different interpretations of the same profile, we do not know the extent to which they differ. The present study was designed to appraise the relative interpretive reliability of six commercially available MMPI-2 interpretive programs. This study addressed two major aspects of CBTI reliability: (a) interrater reliability, which concerns the degree to which the same CBTI reports are reliably interpreted across individuals, and on what type of profiles the CBTIs are most reliably interpreted and (b)

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

5

Deskovitz et al. interprogram reliability, which concerns how consensual the programs are with one another, and on what types of MMPI-2 profiles the CBTIs are most consensual. For example, when clinicians view reports from different CBTI systems based on the same response set, to what extent do their inferences overlap? The study aimed to build on the understanding of CBTI use based on MMPI-2 profile pattern and CBTI program, thereby allowing practitioners a more refined approach to evaluating clinical inferences and decisions. Based on existing studies (e.g., Harrington, 1999; Williams & Weed, 2004a), it was hypothesized that reliabilities for AAA and WPS would tend to evidence reliabilities superior to those of PS.

Method Participants Data in this study comprised CBTI q-sorts produced by four raters in an American Psychological Association– accredited clinical psychology doctoral program based on 20 archival MMPI-2 profiles representing commonly occurring codetypes. Raters were not informed of whether there would be specific similarities or differences among each profile or of the profile selection method. Each rater completed 120 q-sorts (20 MMPI-2 profiles × 6 CBTIs systems). As part of their graduate training, each rater received extensive training in MMPI-2 interpretation, use of the Midwestern Q-sort, and psychopathology.

Instruments MMPI-2.  The CBTIs were based on the Minnesota Multiphasic Personality Inventory–Second edition (Hathaway, McKinley, & MMPI Restandardization Committee, 1989), a self-report inventory composed of 567 true or false selfreferenced items. The MMPI-2 profiles chosen represent the most frequent codetypes in both inpatient and outpatient settings: 1/2, 1/3, 2/3, 2/4, 2/7, 2/8, 4/6, 4/7, 4/8, 4/9, 6/8, 6/9, 7/8, 8/9, 1/2/3, 2/4/7, 2/7/8, 6/7/8, over five scale elevations, and within normal limits (WNL; Archer, Griffin, & Aiduk, 1995; Graham, Ben-Porath, & McNulty, 1999; Greene, 2000; Lewandowski & Graham, 1972). The profiles were chosen with T-scores representing well-defined codetypes. The profiles were selected from a large archival data set consisting of 5,913 subjects assembled by Han, Weed, Calhoun, and Butcher (1995). CBTIs.  The included CBTIs are those that were identified as commercially available for PC use at the time of data collection. Additionally, they were selected to be consistent with previous studies (e.g., Harrington, 1999; Williams & Weed, 2004a). Examined CBTIs include AAA (Strassberg & Cooper, 1997); PA, formerly known as National Computer

Systems: The Minnesota Report (Butcher, 1993); PAR (Green, Brown, & Kovan, 1998); PSI (Rainwater, 1996); PS MMPI-2 Interpreter (PsychScreen, Inc., 1998); and WPS (2003). For a more thorough review of these programs, see Williams and Weed (2004a, 2004b). Midwestern Q-Sort. The q-sort method has been used in many studies examining the reliability of personality assessment (e.g., Crumpton, 1975; Deskovitz, 2003; Harrington, 1999; Little & Schneidman, 1959; Moos, 1962; Pant et al., 2014). This procedure requires judges to sort a set of items or statements into ordered categories, ranging from extremely characteristic or salient to extremely uncharacteristic (Ozer, 1993). The categories into which items are sorted are given a numerical label that becomes the score of the items in that category. The number of items permitted in each category is fixed in advance, so the shape of the distribution is constant for all judges, providing benefits such as controlling for halo effects and extreme ratings, easing computation of data, and ensuring statements are relevant, based on their link to clinical and personality concepts (Block, 1956). Usually, the items consist of a set of verbal statements that are likely to vary in terms of how descriptive they are of a specified target. The instrument that was used to operationalize CBTI interpretation in this study is the Midwestern Q-sort. The Midwestern Q-sort is a collection of 100 statements selected to represent behavior patterns assessed by the MMPI-2. Initially developed as an aid to teaching MMPI-2 interpretation (Weed, 1997), the items were written based on an inspection of descriptive statements associated with scores on key scales of the MMPI-2; specifically, scale correlates proposed by Graham (2011b). Following revisions of items deemed unsatisfactory, based on lack of clarity, irrelevance, or redundancy, 100 remaining items were subjected to an item-by-item examination of correlations between q-sort descriptions of individuals created by their therapist based on personal interactions with and knowledge of the individuals (McNeal, 1999). Correlations of less than .10 with external criteria were obtained for 33 items that were then altered or replaced. In addition, items were rewritten to reflect observable behaviors rather than presumed internal processes or states, compound items were simplified, and wording of the q-sort was simplified to allow for use by sorters who are not health care professionals. More information regarding the development of the Midwestern Q-sort can be found in Weed (1997) and McNeal (1999). The Midwestern Q-sort is a web-based computer program that presents a split screen with a list of interpretive statements on the right side and seven numbered bins on the left side. Each bin is programmed to contain a fixed number of items designed to approximate a normal distribution. Participants are instructed to select a statement and sort it into one of seven numbered bins ranging from least

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

6

Assessment 

descriptive to most descriptive that are capable of holding 5, 10, 20, 30, 20, 10, and 5 items each, respectively. As each statement is selected and placed in a bin, the statements on the right side disappear until all items have been sorted. The program alerts the participant if a bin is not completely filled or too many items have been placed in a bin.

Procedures Twenty MMPI-2 profiles, representing the common codetypes described above, were interpreted by each of the six commercially available CBTI programs. Printouts of the interpretive reports were made available to the four raters, who then described each CBTI interpretation using the Midwestern Q-sort. Sorts were then imported into SPSS for analysis. The reports were all edited (all information identifying the CBTI was removed, tables were removed, and the font was changed) to make them look similar so the rater would be unable to tell which CBTI was being rated. The reports were put in random order and presented to all the raters in the same order.

Analyses The analyses for the present study were selected based on their ability to delineate the extent to which (a) the raters arrived at the same or similar interpretations after examining the same CBTI report and (b) the raters interpreted the same CBTI profiles produced by the six different CBTI programs consensually (e.g., how similarly the AAA 1/2 profile and PA 1/2 profile were interpreted). The first question was conceptualized as a measurement of interrater reliability. Coefficient alphas, although often used as indicators of internal consistency, were selected to index interrater reliability, based on the ability to provide a single value as an indicator of interrater reliability across multiple raters. This process of averaging across raters allows for reduced variability and, consequently, reduced error variance (Osborne, 2008). To conduct these analyses, each rater who completed q-sorts was treated as an item. These were calculated for each profile within the six CBTI programs, resulting in 120 alphas in total. The second question, interprogram reliability, was indexed by corrected item-total correlations (CITCs) due to the strength of this metric in examining the degree to which a single value (i.e., a single CBTI system) is consistent or inconsistent with the average of other values under consideration (i.e., the five other CBTI systems). Each CITC represents the Pearson product–moment correlation of a single CBTI report with the total of the five remaining CBTI reports for a given profile. A CITC was produced for each of the 20 profiles within each CBTI program. Analyses are described in greater detail below.

Results Coefficient alphas were calculated for each profile report to determine the interrater reliability of the commercially available CBTIs. The inclusion of six CBTI systems and 20 profiles for each resulted in a total of 120 alphas (see Table 1). To obtain each alpha, the raters’ responses were placed into a 100 (number of Q-sort items) by 4 (number of raters) matrix. Every alpha provided in Table 1 represents the output from one of these 100 × 4 matrices. There were only three values that were substantially less consensual than the grand mean: 1/3 profile for PS, 7/8 for PAR, and the WNL profile for AAA, each of which was two standard deviations below the mean of its respective profile. Overall, interrater reliability ranged from α = .82 to α = .88 with an average alpha of M = .85 across the six CBTI systems. Consequently, no reports were found to consistently result in better or worse agreement among raters.

Interprogram Reliability To determine the interprogram reliability (see Table 2), the mean sort of the four ratings for each profile was assigned to a matrix beside the mean ratings for the same profile between each CBTI program. This resulted in a 6 (number of CBTI programs) by 100 (number of Q-sort items) matrix for each profile. Therefore, 20 such matrices were produced (one for each of the MMPI-2 profiles). Following this step, a CITC could be computed for each profile within each system. These were inserted into a single table for ease of comparison between CBTI programs. As an example of this process, the mean item ratings from the AAA 1/2 profile were copied and pasted into Table 2, and this step was repeated for the other 5 CBTI programs for the same profile, 1/2. This table was then used to compute the CITCs found in the top row (.95, .92, .76, .94, .94, .93). Represented in Table 2 are the CITCs associated with each of the 6 CBTIs and each of the 20 MMPI-2 profiles. The values ranged from .24 to .97 with an overall mean of .89. There are several CBTI reports that were far less consensual than the grand mean, most notably PAR 7/8, AAA 2/4, and PAR 2/8, which were approximately 2 standard deviations below the mean within their respective profile. These correlated the least with the other programs interpreting the same profile. The 4/9 profile for AAA, PA, and PAR were interpreted most similarly with the other programs interpreting the same MMPI-2 profile. Overall, profiles 4/9 (.95) and 1/2/3 (.94) were interpreted most similarly. The least consensual overall interpretations were found for profiles 2/4 (.78) and 7/8 (.79); however, these means were greatly reduced by one outlying CITC among the programs rather than consistently lower rater agreement among CBTI programs.

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

7

Deskovitz et al. Table 1.  Interrater Alpha Reliabilities by CBTI Program and MMPI-2 Profile.

1/2 1/3 2/3 2/4 2/7 2/8 4/6 4/7 4/8 4/9 6/8 6/9 7/8 8/9 1/2/3 2/4/7 2/7/8 6/7/8 Over 5 WNL Gr M Gr SD Gr 95% CI

AAA

PA

PAR

PS

PSI

WPS

M

SD

95% CI

.88 .84 .88 .82 .86 .75 .80 .85 .88 .91 .89 .84 .87 .84 .87 .88 .89 .91 .88 .21 .83 .15 [0.76, 0.90]

.91 .85 .92 .83 .89 .85 .87 .79 .87 .90 .91 .86 .85 .88 .91 .91 .91 .89 .91 .88 .88 .03 [0.86, 0.90]

.81 .76 .88 .83 .83 .84 .87 .87 .89 .89 .83 .85 .38 .86 .84 .82 .89 .82 .86 .89 .83 .11 [0.77, 0.88]

.86 .44 .84 .79 .85 .83 .87 .83 .82 .83 .85 .84 .84 .87 .88 .77 .86 .89 .86 .82 .82 .09 [0.78, 0.87]

.89 .80 .87 .84 .87 .92 .86 .85 .86 .89 .87 .82 .83 .87 .90 .82 .87 .87 .88 .86 .86 .03 [0.85, 0.88]

.89 .87 .85 .87 .89 .86 .87 .77 .88 .86 .89 .85 .79 .87 .87 .84 .86 .90 .89 .84 .86 .03 [0.85, 0.88]

.87 .76 .87 .83 .87 .84 .86 .83 .87 .88 .87 .84 .76 .87 .88 .84 .88 .88 .88 .75 .85 .04 [0.83, 0.87]

.04 .16 .03 .03 .02 .05 .03 .04 .03 .03 .03 .01 .19 .01 .02 .05 .02 .03 .02 .27 .02

[0.84, 0.91] [0.59, 0.93] [0.84, 0.90] [0.80, 0.86] [0.84, 0.89] [0.78, 0.90] [0.83, 0.89] [0.79, 0.87] [0.84, 0.89] [0.85, 0.91] [0.84, 0.90] [0.83, 0.86] [0.56, 0.96] [0.85, 0.88] [0.85, 0.90] [0.79, 0.89] [0.86, 0.90] [0.85, 0.91] [0.86, 0.90] [0.47, 1.03] [0.79, 0.90]    

Note. CBTI = computer-based test interpretation; MMPI-2 = Minnesota Multiphasic Personality Inventory–Second edition; AAA = Automated Assessment Associates; PA = Pearson Assessments Systems; PAR = Psychological Assessment Resources; PS = PsychScreen, Inc.; PSI = Psychometric Software, Inc.; WPS = Western Psychological Services; M = mean of the six MMPI-2 CBTI systems; SD = standard deviation; 95% CI = 95% confidence interval; Gr indicates grand mean across all 20 profiles of the aforementioned metrics; WNL = within normal limits. Each alpha coefficient is based on 400 data points (100 q-sort items completed by each of the four raters). Bold values indicate highest coefficient alphas within each program. Bold and underline values indicate lowest coefficient alphas within each program. In the SD column, bold values represent the profile with the smallest standard deviation, whereas the bold and underlined value represents the profile with the largest standard deviation.

PAR was found to possess the poorest interprogram reliability overall (CITC = .81); however, it is worth noting that this is somewhat attributable to greater variability among the profiles (SD = .17) when compared with AAA, PA, PS, PSI, and WPS (SD = .11, .05, .06, .04, and .03, respectively). Therefore, the lower mean CITC that was found for the PAR program largely reflects a few profiles with poor reliabilities (e.g., 7/8, 2/8, 1/3) attenuating the mean.

Discussion This study sought to examine the reliability of six commercially available CBTI systems for the MMPI-2. Specifically, the authors analyzed interrater reliability (i.e., the extent to which different clinicians arrive at the same inferences when reviewing the same CBTI report) and interprogram reliability (the extent to which clinicians’ inferences about an examinee overlap based on review of CBTI reports based on the same response set but produced by different CBTI systems). Based on previous literature (e.g., Harrington, 1999; Williams & Weed, 2004a), AAA and WPS profiles

were predicted to evidence superior reliability, particularly as related to PS.

Interrater Reliability Findings Although there was considerable variability across CBTI/ profile combinations, the overall interrater reliability was found to be .87. The fact that ratings aggregated across raters were more reliable suggests that a substantial portion of the interrater unreliability reflected rater idiosyncrasy or lack of clarity of the q-sort items. There were no significant differences overall between the CBTIs (bottom margin of Table 1) indicating that all CBTIs were understood at a similar level of reliability. Also, there were no significant interrater differences overall between individual profiles (right margin of Table 1) indicating that none of the profiles were notable in terms of their ability to produce consensually understood CBTI reports. There were significant differences in interrater reliability for particular CBTI/profile combinations, however. Three CBTI/profile combinations were interpreted with significantly

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

8

Assessment 

Table 2.  Corrected Item-Total Correlations for Interprogram Reliability. Profile 1/2 1/3 2/3 2/4 2/7 2/8 4/6 4/7 4/8 4/9 6/8 6/9 7/8 8/9 1/2/3 2/4/7 2/7/8 6/7/8 Over 5 WNL Gr M Gr SD Gr 95% CI

AAA

PA

PAR

PS

PSI

WPS

M

SD

95% CI

.95 .88 .94 .45 .95 .88 .84 .92 .95 .96 .94 .92 .94 .94 .95 .89 .95 .95 .94 .81 .90 .11 [0.85, 0.95]

.92 .90 .95 .71 .93 .95 .92 .93 .92 .97 .94 .93 .92 .93 .92 .88 .93 .96 .94 .95 .92 .05 [0.90, 0.95]

.76 .69 .90 .86 .81 .51 .92 .93 .90 .96 .78 .90 .24 .80 .94 .75 .96 .82 .81 .92 .81 .17 [0.73, 0.89]

.94 .70 .92 .85 .92 .87 .92 .84 .90 .91 .91 .86 .91 .92 .92 .78 .90 .93 .91 .86 .88 .06 [0.86, 0.91]

.94 .90 .92 .92 .93 .89 .90 .93 .95 .93 .95 .89 .87 .83 .95 .92 .93 .95 .96 .85 .92 .04 [0.90, 0.93]

.93 .91 .91 .89 .94 .92 .93 .90 .92 .94 .93 .85 .88 .91 .94 .85 .90 .93 .95 .90 .91 .03 [0.90, 0.93]

.91 .83 .92 .78 .91 .84 .90 .91 .92 .95 .91 .89 .79 .89 .94 .84 .93 .92 .92 .88 .89 .05 [0.87, 0.91]

.07 .11 .02 .18 .05 .16 .03 .04 .02 .02 .06 .03 .27 .06 .01 .07 .03 .05 .06 .05 .07

[0.83, 0.98] [0.72, 0.94] [0.90, 0.94] [0.59, 0.97] [0.86, 0.97] [.67, 1.01] [0.87, 0.94] [0.87, 0.95] [0.90, 0.95] [0.92, 0.97] [0.84, 0.98] [0.86, 0.93] [0.51, 1.08] [0.83, 0.95] [0.92, 0.95] [0.78, 0.91] [0.90, 0.95] [0.87, 0.98] [0.86, 0.98] [0.83, 0.94] [0.82, 0.96]    

Note. AAA = Automated Assessment Associates; PA = Pearson Assessments Systems; PAR = Psychological Assessment Resources; PS = PsychScreen, Inc.; PSI = Psychometric Software, Inc.; WPS = Western Psychological Services; M = mean CITC of the six MMPI-2 CBTI systems; Gr M = grand mean; SD = standard deviation; WNL = within normal limits. A corrected item-total correlation indexes consensus of a single CBTI of a given profile with other CBTIs of the same profile. Bold values indicate highest CITC within each program. Bold and underline values indicate lowest CITC within each program. In the SD column, bold values represent the profile with the smallest standard deviation, whereas the bold and underlined value represents the profile with the largest standard deviation.

less agreement (PS 1/3, PAR 7/8, and AAA WNL). Examination of these reports suggests some explanation for why they were interpreted with less consensus. The AAA WNL report was interpreted with the least amount of agreement (.21). Because the AAA report provides only a brief interpretive summary for profiles that fall WNL, including ruling out of several problems, warning of possible test insensitivity to the individual’s specific concerns, and comments about likely social functioning, the raters were largely left to sort which attributes in the q-sort were most characteristic of the examinee without the guidance of specific statements. Comparatively, other CBTI programs tended to provide more statements suggesting presence of positive attributes, such as life satisfaction, effective coping, and a calm disposition, which may have improved reliabilities within these programs based on their similarity to Midwestern Q-sort descriptors. The report with the next most disagreement was the PAR 7/8 profile (.38). There are several likely explanations for lack of consensus. The report has many inconsistent statements that would make it difficult to provide a reliable description of a client’s mental health functioning. For instance, the report begins by indicating the client likely has

“personal and emotional problems.” In addition, the client is described as “moody, dissatisfied, worried, tense, socially withdrawn, having chronic and severe behavioral problems, and having a possible diagnosis of Schizophrenia.” In another paragraph, the same client is described as “not experiencing any type of emotional distress and describes himself as being happy and not easily upset.” Also, the client is described as “being confident in social situations, not self-conscious, and seeing himself as being well-adjusted.” These contradictions in the report are likely to have contributed to why the four raters could not come to agreement on this report. The final report that had significantly less agreement between raters was the PS 1/3 (.44). Although no formal feedback from raters was solicited, informally, there were complaints from the raters regarding the PS reports in general due to their length (average page length across the 20 reports was 16 pages). This 1/3 report was somewhat shorter than other PS reports (12 pages in length). However, it did possess some factors that may have caused disagreement among raters. First, in many places the report contains statements such as “this scale is within normal limits.” Because

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

9

Deskovitz et al. the report addresses every scale on the MMPI-2 in turn, it often reports some information even when it may not be clinically relevant. Therefore, many pages of the report are filled with information that may confuse attempts to get to the heart of the clinical presentation. In addition, the information that is provided sometimes lacks clear behavioral referents. For example, there are statements such as: “the client does not report subjectively experiencing cognitive efficiency,” “the client prefers ambiguity,” “the client’s use of repression may lead to a lack of cognitive processing,” and “the client is neither overly introverted nor extroverted.” These descriptors may be difficult to translate into the behavioral statements represented in the items of the Midwestern Q-sort. Other reports produced by PS appear to have more examples of concrete behaviors that may be easier for raters to sort.

Interprogram Findings There was considerably more variability observed regarding interprogram reliability. The overall mean interprogram reliability coefficient was .89, which suggests good agreement among the different programs interpreting the same profiles. Overall discrepancies between the CBTIs (Grand Means in row of Table 2) were small, indicating substantial consensus among their interpretations. Interprogram reliability between specific MMPI-2 profiles (Mean in column of Table 2) was more variable, however. The 2/4 and 7/8 profiles were interpreted with less agreement overall, whereas the 4/9 and 1/2/3 profiles were interpreted with more agreement between the CBTIs. The low agreement on the 7/8 profile was chiefly due to the low consensus of the PAR 7/8 report (CITC of .38 with other 7/8 CBTI reports). Mentioned above with reference to interrater reliability, the PAR 7/8 report appeared inconsistent in a number of respects. The low agreement on the 2/4 profile appears to be due to major differences in interpretive statements based on codetype programming. Informal examination of this report suggests that the AAA 2/4 report describes more anxiety than the other 2/4 reports. At one point in the report, the profile’s codetype is identified as 2/7/4, indicating that Scale-7 elevations should play an important role in interpretation, although this scale was only at a T-score level of 59. The other CBTI reports did not emphasize high levels of anxiety, instead focusing on depressed mood and antisocial behavior. There were also two codetypes associated with significantly superior interprogram agreement: the 4/9 (.95) and the 1/2/3 (.94) profiles. The 4/9 profile is one of the most common 2-point codetypes, and because it is often treated as a prototypical MMPI-2 profile, it is perhaps not surprising that all programs should have good consensus. The same is perhaps true of the 1/2/3 profile, which is the most common 3-point codetype.

There was more variability in interprogram reliability among the unique CBTI/profile combinations. The three reports demonstrating the least consensus were PAR 7/8, AAA 2/4, and the PAR 2/8 reports. Possible explanations for the poor reliability of the PAR 7/8 and the AAA 2/4 reports are suggested above, as they showed weaker interrater reliability and inconsistent codetype programming, respectively. It also appears that the PAR 2/8 report possessed inconsistencies, including some descriptors that were near opposites. For example, the client is described as “psychologically well-adjusted,” “having very mild emotional distress,” and “is unlikely to abuse substances or have suicidal ideation.” On the other hand, the client is also described as “feeling depressed, unhappy, sad, self-critical, and suicidal ideation should be evaluated carefully.” Likely diagnoses suggested for the client included dysthymic disorder and dependent personality disorder, diagnoses that contradict other descriptive statements in the report. In addition to the report/CBTI combinations with lower agreement, there were several with significantly high consensus across CBTIs. The highest were the PA 4/9 (.97), AAA 4/9 (.96), and the PAR 4/9 (.96) reports. These reports were extremely consistent in their interpretation and behavioral descriptors of the client across and between reports. In addition, all the reports were interpreting the 4/9 profile, previously identified above as one of the most common 2-point codetypes about which clinicians seem to have very clear ideas. An important point is that there are no consistent differences among the six CBTIs that were investigated based on interrater or interprogram reliability. This is somewhat different when compared with the validity research on CBTIs in which there were clearly programs that excelled and performed inferiorly. One potential explanation for this finding is that as CBTI systems have been revised since earlier studies, improvements based on new empirical literature and user feedback may have resulted in improved programming and fewer discrepancies between systems. In addition, the use of aggregated data to calculate coefficient alphas and CITCs for each CBTI may have inflated reliabilities, relative to other studies, and obscured system differences. In fact, there are four individual sorts for each of the 20 profiles producing 20 means that are then compared with the other five programs. Therefore, there are five correlations that are averaged over 20 profiles (the final correlation is the CITC from the 100 item means of each program). When the data are aggregated to this degree, there are bound to be high-reliability estimates when the idiosyncrasies are averaged out. Based on patterns seen within profiles in which less consensus was found, it appears CBTI programs will benefit from continued attention to details such as elimination of redundancy, use of clear statements with behavioral referents, and efficient organization. Furthermore, the addition

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

10

Assessment 

of interpretive statements highlighting client strengths may be useful in reaching conclusions with reports in which little pathology is endorsed. The finding that no system was apparently superior or inferior across profiles suggests users are likely to arrive at similar conclusions regardless of their chosen CBTI system, allowing them to select based on personal preference (e.g., longer or shorter report, inclusion of integrated summaries) with the caveat that reports occasionally provide discrepant or difficult to interpret information. This highlights the aforementioned need for CBTI users to refrain from relying too heavily on reports for their MMPI-2 interpretations and instead, to use them in a support function.

Limitations and Future Directions There are a number of limitations to the study that affect the generalizability of interpretations. First, the methodology called for CBTIs to be interpreted in isolation without extratest data that ought to be included in clinical practice. As discussed above, there is evidence that statistical models work more effectively than clinical models in test interpretation, but most CBTIs are not based on strict statistical models and are actually clinical models that are automated. Each program tested is authored by a clinician based in large part on their interpretation of the research literature and clinical experience, although there may be some stricter statistical basis for some inferences (Graham, 2011a). Matarazzo (1986, p. 19) argued that CBTIs are “shakily based on still relatively unvalidated, scarce, published actuarial data.” He went on to argue that the interpretation of self-report inventories is a highly subjective art that necessitates evaluation in the context of clinical history. The interpretive reliability values obtained here may underestimate field reliability of clinicians’ CBTI-based inferences because clinicians may make use of supporting examinee information (i.e., demographics, background information, collateral information, behavioral observations, and other test data) to test the hypotheses generated by the CBTI and rule out unfounded inferences, especially in reports where there is conflicting information. In fact, several of the reports disclaim that the reports merely contain hypotheses about an examinee that should be evaluated with extratest information. On the other hand, it is quite possible that when extratest information is synthesized with CBTI interpretive reports, reliability decreases due to idiosyncrasies of clinical judgment. This question requires further evaluation. It is also unclear to what extent these q-correlations adequately operationalize interpretative reliability of inferences from CBTI programs. Furthermore, generalizability of these findings may be compromised by use of clinical psychology doctoral students as raters. Potentially, fully licensed clinicians with significant experience interpreting MMPI-2 CBTI profiles may have more success detecting

nonreconciled statements from CBTI systems that employ a scale-by-scale interpretation method as just that rendering them less susceptible to confusion regarding contradictory statements. Alternatively, more error variance may have been introduced into the study had more experienced clinicians been recruited as raters as opposed to graduate students who had recently undergone the same training in psychopathology and MMPI interpretation. For example, there would likely be greater reliance on clinical experience as opposed to the provided information within each CBTI profile. Although CBTI systems should not be relied on blindly in a clinical context, the introduction of idiosyncratic interpretation practices in a research setting may cloud the interpretation of findings. Additionally, it has not been determined whether individuals with the expertise to reconcile conflicting or unclear statements within CBTI reports are representative of the typical users (Moreland, 1985). Future studies would benefit from comparing and contrasting CBTI interpretations as a function of experience clinically and with CBTI use. When examining the reliability of q-sorts based on blind MMPI-2 interpretations with clinical psychology doctoral student raters, Robinson (2004) found acceptable sort–resort and pairwise reliabilities (.67 and .38 overall, respectively) with a q-sorting system that served as a precursor to the Midwestern Q-sort. Moreover, reliabilities have been shown to increase following the aggregation of ratings (Marshall, 2001; Pant et al., 2014), as was done in this study. The issue of generalizability would be of greater concern had the raters been asked to interpret MMPI-2 profiles as opposed to providing rankings of the extent to which provided CBTI statements are characteristic of an individual. Whereas training in psychopathology is essential to ensure raters are understanding terms similarly (e.g., the word “hallucination” should have a common meaning across levels of experience assuming education in psychopathology) for consensual CBTI interpretation, experience in MMPI-2 interpretation appears to be less crucial for the sake of the present study. A final possible limitation involves the q-sort used in the study. Although carefully developed to correspond to common MMPI-2 inferences, research on the q-sort itself, including properties such as factor structure and sort–resort reliability, is still in progress. Current results should be interpreted within the context of the reliability “ceiling” suggested by sort–resort studies, keeping in mind the impact of aggregation on ratings. In addition, the q-sort was developed to capture prototypical descriptors from a MMPI-2 perspective that may match the perspective of CBTI developers to varying degrees. In other words, it is possible that the q-sort does not capture interpretive repertoire of each CBTI equally well. A potential enhancement to this project would be to include unexamined systems. For example, excluded systems include (a) two mail-in CBTI systems (The Caldwell

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

11

Deskovitz et al. Report and Behavior Data), (b) setting-specific reports offered by some developers—which are among the least studied CBTIs in existence, and (c) two CBTI programs that were released for the MMPI-2-Restructured Form following the collection of data for this study. The MMPI-2Restructured Form systems include the Interpretive Report (Pearson, 2015b) and a system available through PAR (Greene, Brown, & PAR Staff, 2012). It is also noteworthy that the AAA program is no longer available. Acknowledgments The authors would like to acknowledge Andrew Cox, Sangil Kwon, Felix Smith, and Kevin Young for their contributions to this study.

Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The author(s) received no financial support for the research, authorship, and/or publication of this article.

References Archer, R., Griffin, R., & Aiduk, R. (1995). MMPI-2 clinical correlates for ten common codes. Journal of Personality Assessment, 65, 391-407. Automated Assessment Associates. (2015). Retrieved from http:// www.psychsoftware.net/mmpi2report.html Ball, J. D., Archer, R. P., & Imhoff, E. A. (1994). Time requirements of psychological testing: A survey of practitioners. Journal of Personality Assessment, 63, 239-249. Block, W. (1956). A comparison of forced and unforced Q-sorting procedures. Educational and Psychological Measurement, 16, 481-493. Butcher, J. N. (1993). User’s guide for the Minnesota report: Adult clinical system-revised (4th ed.) [Computer Software Manual]. Minneapolis: The Regents of the University of Minnesota. Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., Dahlstrom, W. G., & Kaemmer, B. (2001). MMPI-2: Manual for administration scoring and interpretation (Rev. ed.). Minneapolis: University of Minnesota Press. Butcher, J. N., Perry, J. N., & Atlis, M. M. (2000). Validity and utility of computer-based test interpretation. Psychological Assessment, 12, 6-18. Butcher, J. N., & Williams, C. L. (2000). Essentials of MMPI-2 and MMPI-A Interpretation (2nd ed.). Minneapolis: University of Minnesota Press. Conoley, C. W., Plake, B. S., & Kemmerer, B. E. (1991). Issues in computer-based test interpretation systems. Computers in Human Behavior, 7, 97-101. Crumpton, C. A. (1975). An evaluation and comparison of three automated MMPI interpretive reports (Doctoral dissertation, University of Texas). Dissertation Abstracts International, 35, 6090.

Dawes, R. M. (1994). House of cards. New York, NY: Free Press. Deskovitz, M. A. (2003). The interpretive reliability of two common MMPI-2 codetypes. (Unpublished master’s thesis). Central Michigan University, Mt. Pleasant, MI. Eyde, L. D., Kowal, D. M., & Fishburne, F. J., Jr. (1991). The validity of computer-based test interpretations of the MMPI. In T. B. Gutkin & S. L. Wise (Eds.), The computer and the decision-making process (pp. 75-123). Hillsdale, NJ: Lawrence Erlbaum. Friedman, A. F., Lewak, R., Nichols, D. S., & Webb, J. T. (2001). Psychological assessment with the MMPI-2. Mahwah, NJ: Lawrence Erlbaum. Garb, H. N. (2000). Computers will become increasingly important for psychological assessment: Not that there’s anything wrong with that! Psychological Assessment, 12, 31-39. Graham, J. R. (2011a). MMPI-2: Assessing personality and psychopathology (2nd ed.). New York, NY: Oxford University Press. Graham, J. R. (2011b). MMPI-2: Assessing personality and psychopathology (5th ed.). New York, NY: Oxford University Press. Graham, J. R., Ben-Porath, Y. S., & McNulty, J. L. (1999). MMPI2: Correlates for outpatient community mental health settings. Minneapolis: University of Minnesota Press. Greene, R. L. (2000). The MMPI-2/MMPI: An interpretive manual (2nd ed.). Needham Heights, MA: Allyn & Bacon. Greene, R. L., Brown, R. C., & PAR Staff. (2012). MMPI®-2 Adult Interpretive System Version 4 (MMPI-2 AIS V4). Retrieved from http://www4.parinc.com/Products/Product. aspx?ProductID=MMPI2_AIS_V4 Halbower, C. C. (1955). A comparison of actuarial versus clinical prediction to classes discriminated by the MMPI (Unpublished doctoral dissertation). University of Minnesota, MN. Han, K., Weed, N. C., Calhoun, R. F., & Butcher, J. N. (1995). Psychometric characteristics of the MMPI-2 Cook-Medley Hostility Scale. Journal of Personality Assessment, 65, 567585. Harrington, A. (1999). The relative validity of computerbased interpretations of the MMPI-2 (Doctoral dissertation, The University of Mississippi). Dissertation Abstracts International, 60, 6364. Hathaway, S. R., McKinley, J. C., & MMPI Restandardization Committee. (1989). MMPI-2: Minnesota Multiphasic Personality Inventory-2: Manual for administration and scoring. Minneapolis: University of Minnesota Press. Lewandowski, D., & Graham, J. (1972). Empirical correlates of frequently occurring two-point MMPI code types: A replicated study. Journal of Consulting and Clinical Psychology, 39, 467-472. Little, K. B., & Schneidman, E. S. (1959). The relative contribution of four kinds of data to accuracy in personality assessment. Journal of Consulting Psychology, 23, 483-492. Marshall, E. D. (2001). Aggregated relative validity of MMPI-2 profiles of African American and Caucasian college students (Unpublished doctoral dissertation). University of Mississippi, Oxford. Matarazzo, J. D. (1986). Computerized clinical psychological test interpretations. American Psychologist, 41, 14-21.

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

12

Assessment 

McMinn, M. R., Ellens, B. M., & Soref, E. (1999). Ethical perspectives and practice behaviors involving computer-based test interpretation. Assessment, 9, 71-77. McNeal, T. P. (1997). Examining the validity of computer based test interpretation with interpretive q-sorts (Unpublished master’s thesis). University of Mississippi, Oxford. McNeal, T. P. (1999). The longitudinal impact of personality assessment on understanding clients during the course of psychotherapy (Unpublished doctoral dissertation). University of Mississippi, Oxford. Meehl, P. E. (1956). Wanted—A good cookbook. American Psychologist, 11, 263-272. Moos, R. H. (1962). Effects of training on students test interpretations. Journal of Projective Techniques and Personality Assessment, 26, 310-317. Moreland, K. L. (1983). Diagnostic validity of the MMPI and two short forms. Journal of Personality Assessment, 47, 492-493. Moreland, K. L. (1985). Validation of computer-based test interpretations: Problems and prospects. Journal of Consulting and Clinical Psychology, 53, 816-825. Osborne, J. W. (Ed.). (2008). Best practices in quantitative methods. Thousand Oaks, CA: Sage. Ozer, D. J. (1993). The Q-sort method and the study of personality development. In D. C. Funder, R. D. Parke, C. TomlinsonKeasey & K. Widaman (Eds.), Studying lives through time: Personality and development (pp. 147-168). Washington, DC: American Psychological Association. Pant, H., McCabe, B. J., Deskovitz, M. A., Weed, N. C., & Williams, J. E. (in press). Diagnostic reliability of MMPI-2 computerbased test interpretations. Psychological Assessment. Pearson. (2015a). Retrieved from http://www.pearsonclinical.com/ psychology/products/100000718/minnesota-multiphasic-

personality-inventory-2-adult-clinical-system-revised-4thedition-interpretive-reports.html Pearson. (2015b). Retrieved from http://www.pearsonclinical. com/psychology/products/100000631/minnesota-multiphasic-personality-inventory-2-rf-mmpi-2-rf.html#tabpricing PsychScreen, Inc. (1998). MMPI-2 Interpreter [Computer software]. Milton, WI: Author. PsychScreen, Inc. (n.d.). Retrieved from http://www.psychscreen. com Rainwater, G. D. (1996). MMPI-2 Report (Version 4.0) [Computer Software]. Melbourne, FL: Psychometric Software. Robinson, B. C. (2004). The reliability and internal structure of the Mississippi interpretive Q-sort (Unpublished doctoral dissertation). University of Mississippi, Oxford. Strassberg, D. S., & Cooper, L. M. (1997). MMPI/MMPI-2 Adult Interpretive Report (Version 6.0P) [Computer Software]. Salt Lake City, UT: Automated Assessment Associates. Weed, N. C. (1997). Using the interpretive Q-sort technique to teach MMPI interpretation (Unpublished manuscript). University of Mississippi, Oxford. Western Psychological Services. (2003). Retrieved from http:// www.wpspublish.com/PDF/catalog2003wps/wps03017.pdf Williams, J. E. (2002). The relative validity of eight computerbased interpretations for the MMPI-2 (Unpublished doctoral dissertation). University of Mississippi, Oxford. Williams, J. E., & Weed, N. C. (2004a). Relative user ratings of MMPI-2 computer-based test interpretations. Assessment, 11, 316-329. Williams, J. E., & Weed, N. C. (2004b). Review of computerbased test interpretation software for the MMPI-2. Journal of Personality Assessment, 83, 78-83.

Downloaded from asm.sagepub.com at Bobst Library, New York University on June 22, 2015

Interpretive Reliability of Six Computer-Based Test Interpretation Programs for the Minnesota Multiphasic Personality Inventory-2.

The reliability of six Minnesota Multiphasic Personality Inventory-Second edition (MMPI-2) computer-based test interpretation (CBTI) programs was eval...
308KB Sizes 0 Downloads 7 Views