Archives of Clinical Neuropsychology 29 (2014) 224–235

Using Likelihood Ratios to Detect Invalid Performance with Performance Validity Measures John E. Meyers 1,*, Ronald M. Miller2, Lisa M. Thompson2, Adam M. Scalese 2, Bonnie C. Allred 2, Zachary W. Rupp2, Zacharias P. Dupaix 2, Amy Junghyun Lee 2 1

2

Meyers Neuropsychological Services, Mililani, HI, USA Department of Psychology, Brigham Young University-Hawaii, HI, USA

Accepted 1 January 2014

Abstract Larrabee (2008) applied chained likelihood ratios to selected performance validity measures (PVMs) to identify non-valid performances on neuropsychological tests. He presented a method of combining different PVMs with different sensitivities and specificities into an overall probability of non-validity. We applied his methodology to a set of 11 PVMs using a sample of 255 subjects. The results of the study show that in various combinations of two or three PVMs, a high reliability of invalidity can be determined using the chained likelihood ratio method. This study advances the ability of clinicians to chain various PVMs together and calculate the probability that a set of data is invalid. Keywords: Assessment; Malingering/symptom validity testing; Mild cognitive; Impairment; Test construction

Introduction The purpose of internal and add-on performance validity measures (PVMs) is to objectively evaluate diminished efforts during neuropsychological assessments (Larrabee, 2003, 2008; Meyers & Volbrecht, 2003). In fact, the National Academy of Neuropsychology presented a position paper on the use of PVMs (Bush et al., 2005), which specifically recommends administration of at least two PVMs during an evaluation. Although some authors consider the term Symptom Validity Test (SVT; Bush et al., 2005; Larrabee, 2008) interchangeable with PVM, we will delineate the two terms by using SVT to indicate validity measures that are related to self-report and PVM to indicate validity measures related to performance on neuropsychological testing (Larrabee, 2012). It is important that conclusions extrapolated from a neuropsychological assessment are based on valid neuropsychological performance (Slick, Sherman, & Iverson, 1999; Vickery, Berry, Inman, Harris, & Orey, 2001). The use of PVMs is the standard of practice (Bush et al., 2005; Larrabee, 2008). PVMs are used to detect invalid test performance, but should not be considered “malingering tests.” For consistency and clarity, PVMs are not tests of malingering, but instead are tests of performance validity, meaning that the tests focus on assessing valid data rather than identifying those who are purposefully exaggerating their symptoms. The conclusion/diagnosis of malingering is based on a number of factors, only some of which are the use of PVMs (Meyers, 2007; Meyers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011; Slick et al., 1999). When multiple PVMs are failed in the context of external incentive, with no viable developmental, psychiatric, or neurologic explanation, then the data could be consistent with malingering. The usefulness of a PVM is determined by its sensitivity, specificity, PPP, and NPP (Bianchini, Mathias, & Greve, 2001). Sensitivity is defined as the true positive rate which is a percentage of data sets correctly identified as non-valid. Poor sensitivity occurs when false negative errors are increased which means true invalid data sets are undiscovered. Specificity refers to the true negative rate which is the percentage of invalid data correctly identified as invalid (Etcoff & Kampfer, 1996). Poor specificity, however, indicates false positive errors where valid cases are incorrectly classified as invalid (Greve & Bianchini, 2004). # The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]. doi:10.1093/arclin/acu001 Advance Access publication on 4 February 2014

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

*Corresponding author at: Meyers Neuropsychological Services, Mililani, HI 96857, USA. Tel.: 712 251 7545. E-mail address: [email protected] (J.E. Meyers).

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

225

Method The data set was obtained from a general neuropsychology practice located in the Midwest. The presence or the absence of external incentives was not a variable utilized in group formation. In order for data to be included, participants had to have completed all the tests necessary for the PVMs. If any of the individuals were missing any of the tests that are involved in this battery, they were not included in the analysis. Two hundred and fifty-five patients met the criteria for inclusion. The participants had a mean age of 34.55 (SD 12.12) with a range of 17– 73. The range for educational background was 5 – 18 years, with a mean of 12.55 (SD 1.84). In the group, 231 were right-handed and 24 were left-handed. There were 190 Caucasian participants, 22 African American participants, 18 Hispanic participants, and 25 that were none of the aforementioned races. The participants consisted of 208 men and 47 women who were able to be included in the data set. Of these individuals, 95 were diagnosed with mental health issues, 24 suffered from mild Traumatic Brain Injury, and 49 had multiple diagnoses or did not meet the criteria for one of the other diagnostic groups. There were also 87 other individuals present who had a combination of cognitive complaints and a behavior health diagnosis; 179 were not in litigation and 76 were in litigation. All subjects were independent functioning; none was diagnosed with a dementia. The initial examination of identifying pass or fail on the Word Memory Test (WMT) was done using two methods. On the WMT, a pass was coded if the Immediate, Delayed, and Consistency were all 82.5% or above. If any of the three scores were below 82.5% then the Genuine Memory Impairment Profile (GMIP; Green, Flaro, & Courtney, 2009) was calculated per instructions in Green and colleagues (2009). If the difference between the easy and hard items was ≥30, then a pass was coded (even if the Immediate, Delayed, or Consistency was below 82.5%). This method resulted in 186 (72.9%) passing the WMT and 69 (27.1%) failing the WMT. None of the 255 individuals would have qualified as moderately or severely impaired in accordance with Green and colleagues (2009) to qualify for the use of the GMIP. Next, the WMT pass and fail rate was calculated based only on the Immediate, Delayed, and Consistency scores. If any of the three scores fell below 82.5%, it was coded as a failure, and if all three scores were above 82.5%, then a pass was coded. Using this method alone and not using the GMIP, 157 (61.6%) passed the WMT and 98 (38.4%) failed the WMT. Interestingly, the 38.4% failure rate on the WMT was very similar to the 0.40 base rate reported by Larrabee (2003), and Larrabee, Millis, and Meyers (2009). We chose to use only the second method (cutoff below 82.5%) to establish our pass and fail for two reasons. (a) The population in this sample were not moderately to severely cognitively impaired and all were independently functioning at the time of the assessment and (b) the use of the GMIP with this data sample could introduce a change in the fail rate of 29 individuals or 11.3% (29/255) which could represent a substantial false negative rate into the classification of pass and fail on the WMT. Using only the regular cutoff score of 82.5%, of the 76 individuals in litigation, 31 (40.8%) passed the WMT, and 45 (59.2%) failed the WMT. Of the 179 non-litigants, 126 (70.4%) passed the WMT and 53 (29.6%) failed the WMT.

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Bianchini and colleagues (2001) stated that positive and negative predictive powers (PPP and NPP) refer to the validity of the individual factors of a test and are dependent on the base rate of the particular disorder that is being diagnosed. PPP is the probability of the patient having a given diagnosis when they test positive for it. It is calculated by dividing the number of true positives by the number of true positives plus false positives (Baldessarini, Finklestein, & Arana, 1983). Conversely, NPP is the probability of actually testing negative for a given diagnosis when the patient’s test results indicate that he or she has tested negative. NPP is calculated by dividing the number of true negatives by the number of true negatives plus false negatives (Baldessarini et al., 1983). Sensitivity and specificity can vary for each PVM and will, in turn, yield different PPPs and NPPs. After completing these calculations, it is possible to assess the accuracy of any given test. Since a single measure is rarely ever completely accurate, the use of multiple validity measures is recommended (Larrabee, 2003; Meyers & Volbrecht, 2003; Slick et al., 1999; Victor, Boon, Serpa, & Buehler, 2009). Multiple indicators are required to accurately determine the validity of a participant’s performance (Bianchini et al., 2001). This is because multiple indicators can improve poor sensitivity (Greve & Bianchini, 2004) and discriminatory functions (Victor, Boone, Serpa, Buehler, & Ziegler, 2009). Diagnostic criteria also state that in order for results to possibly be considered invalid, the patient is required to fail multiple indicators within a set of tests; simply failing a single test is insufficient (Slick et al., 1999). In order to detect probable invalid data, diagnostic criteria mandate that multiple indicators of purposeful exaggeration of symptoms and poor effort are measured (Vickery et al., 2001). Detection methods have typically relied on the use of PPP and false positives; however, Larrabee (2008) applies chained likelihood ratios to show the probability of invalidity when one, two, or three PVMs are failed. By chaining likelihood ratios, Larrabee (2008) shows that patients who fail three PVMs in a setting with a base rate of non-validity of 0.40 have an average .99 probability of providing invalid data. For more information on livelihood ratios, see Grimes and Schulz (2005). These same probabilities are maintained even when calculated using a low base rate of invalid data. Based on Larrabee’s (2008) findings, the current study was designed to examine whether such high probabilities would also be found when chained likelihood ratios were applied to other PVMs.

226

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

To be considered in the data set, participants had to have completed the Meyers Neuropsychological Battery (MNB; Meyers & Rohling, 2004; Rohling, Meyers, & Millis, 2003) and from this battery of tests, the 10 internal PVMs were used: Forced Choice (FC; Meyers & Volbrecht, 2003), Dichotic Listening (DL; Meyers, Roberts, Bayless, Volkert, & Evitts, 2002), Sentence Repetition (SR; Strauss, Sherman, & Spreen, 2006), Judgment of Line Orientation (JL; Benton, Hamsher, Varney, & Spreen, 1983), Token Test (TT; Strauss et al., 2006), AVLT-Recognition (AV; Strauss et al., 2006), Finger Tapping-Dominant Hand (FTD; Reitan, & Wolfson, 1985), Memory Error Pattern (MEP), Reliable Digit Span (RDS; Meyers & Volbrecht, 2003), WMTMNB, and Meyers Index (MI; Meyers et al., 2013; Meyers, Millis, & Volkert, 2002). All of the PVMs included in the MNB use very conservative cutoff scores. This is a purposeful choice by the author. The cutoff scores are set at a zero false positive rate for persons with an loss of consciousness (LOC) of less than 8 days and independently functioning (Meyers & Volbrecht, 2003). Other authors have taken a less stringent approach to establishing cutoff scores (Boone et al., 2000). It is the authors’ intensions to use the conservative cutoff scores of these PVMs as the cutoffs are already published and have been in use for many years. The FC test (Brandt, Rubinsky, & Larson, 1985) is an FC recognition task. The participant must choose between two recognition choices. It is made up of 20 items. A failing score of 10 or lower is chance or below (Meyers, Galinsky, & Volbrecht, 1999; Meyers, Morrison, & Miller, 2001; Meyers & Volbrecht, 2003) and was the original cutoff used. A cutoff score of 13 has zero false positives for individuals with less than 8 days with LOC and is consistent with the method used to set the cutoffs for the other internal PVM in the MNB. In the DL Task, the subject responds to different stimuli presented at the same time to both ears. The score that they receive is based on their accuracy. To pass on the PVM, a score of 10 or higher is needed (Meyers & Volbrecht, 2003). Before the subject is tested using the DL, there is a preliminary test to see if their hearing is within normal parameters first. If their hearing is found to be inadequate, the DL is not used (Hamberger & Tamny, 1999). For the SR test (Strauss et al., 2006), the participant repeats sentences and the score is given for the sentences repeated correctly. To pass this PVM, a score of 10 or higher is needed (Meyers & Volbrecht, 2003). On the JL, a failure on the PVM is a score of 12 or lower in the raw uncorrected number correct (Meyers et al., 1999; Meyers & Volbrecht, 2003). The participants are presented with pictures and are asked to match full or partial lines. In the TT (Strauss et al., 2006), a score of 150 or less is a failing score (Meyers & Volbrecht, 2003). This test uses manipulation of shapes and colors to test subtle receptive language dysfunction. To pass the AV test (Strauss et al., 2006), a measure of auditory and visual memory performance, a score of 10 or higher is needed (Meyers et al., 2001; Meyers & Volbrecht, 2003). On the FTD test, the patient’s abilities are assessed using the fingers on the subject’s dominant hand, with the test scores being comparable with the calculated expected Finger tapping score. The formula used is estimated FT ¼ Block Design scale score × 0.361 + Digit Symbol scale score × 0.491 + raw Copy score on CFT × 0.185 + 31.34. Difference ¼ estimated FT 2 actual mean finger tapping score for the dominant hand. If the difference is .10, then this is a failure on the PVM (Meyers & Volbrecht, 2003). MEP is the association of the immediate, delayed, and recognition scores on the Complex Figure Test (CFT; Meyers & Meyers, 1995; Meyers et al., 1996). The profiles for this PVM based on these relationships are delineated as Attention, Encoding, Storage, Retrieval, Consolidation, Peak, and Other (Meyers & Volbrecht, 1999, 2003). Production of an Attention, Encoding, or Storage MEP indicates failure on this PVM. These MEPs of Attention, Encoding, and Storage are only expected with individuals who are institutionalized due to cognitive impairment (Meyers & Volbrecht, 1999, 2003). An Attention MEP is classically identified when scores on the CFT Immediate, Delayed, and Recognition are all below 20 on a T-score scale (Meyers & Meyers, 1995). An Encoding MEP occurs when the CFT Immediate and Delayed are below 20 on a T-score scale and Recognition is higher (usually above 20 on a T-score scale; Meyers & Meyers, 1995). The Storage MEP slope is calculated using the CFT Immediate, CFT Delayed, and Recognition scores (Meyers & Meyers, 1995). RDS is calculated by taking the longest span of Digit Span forward on which both trials are passed plus the longest span on Digit Span backwards on which both trials are passed. Several studies have examined the use of Reliable Digits as a PVM. Meyers and Volbrecht (1998, 2003) proposed a cutoff of 6 or below for a zero false positive rate. Greiffenstein, Baker, and Gola (1994) proposed a cutoff of 7, whereas a study by Greve and colleagues (2009) also found that a cutoff of 6 or less separated valid versus invalid test performance. For the MNB validity measure, a score of 6 or below is a considered a failure on the RDS (Meyers & Volbrecht, 1998, 1999, 2003). For purposes of this study, we used the WAIS-III version of the Digit Span test (Weschler, 1997). A new PVM was added to the MNB since the publication of the original nine PVMs. This new PVM was made using a sample of 264 consecutive referrals (not overlapping with the current study data), which were referred for a neuropsychological assessment and were given the MNB and the WMT (Green, 2003, 2007). The mean age was 35.49 (SD ¼ 12.57) years with 12.62 (SD ¼ 1.88) years of education. The group was composed of 229 right-handed and 26 left-handed persons; 17 were African American, 200 were Caucasian, 3 Asian, 2 Native American, 18 Hispanic, and 14 Pacific Islanders; 52 were women and 203 were men. The

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

227

discriminant function equation was: (AVLT Delayed Raw × 0.226) + (Finger Localization Dominant Hand Raw × 0.187) 2 6.935. It is important to note that this calculation does not include any of the WMT scores, but only attempts to predict if the WMT would be passed or not, based on scores below 82.5% and the GMIP (Green et al., 2009), and is therefore distinct from the WMT. The cutoff score used was less than or equal to 20.5 which is consistent with the zero false positive rates used with the other PVMs used in this study. The discriminant function was able to correctly classify 74.6% of the total group. The data in this first sample of 264 data sets have no overlap with the data used in the current study. Details of this PVM are presented in the MNB electronic manual (Meyers, 2013). The 11th validity measure is the MI (from MMPI-2) and the MI-r (from the MMPI-2-RF). The MI and MI-r are validity measures that look at self-report SVT. The MI and the MI-r are calculated similarly using a weighting of the validity measures for each test and are presented in Tables 1 and 2. Meyers and colleagues (2013) showed that the MI and the MI-r are equivalent SVT measures. A score of 5 or more is considered not valid. Statistical Procedures

Table 1. Weightings of scores on validity scales for originals MI using the MMPI-2 (Meyers, 2002) Scale

Scores

Weight

Scores

Weight

F-K FT FBS (Raw) F(p) Ds-r Es O-S

1–9 75–89 25–29 75–89 75–89 21–30 100–149

1 1 1 1 1 1 1

10+ 90+ 30+ 90+ 90+ 202 150+

2 2 2 2 2 2 2

Notes: F-K ¼ F-K raw score; FT ¼ F scale T score; FBS ¼ Fake Bad Scale raw score; F(p) ¼ Infrequency-pathology scale T score; Ds-r ¼ Dissimulation Scale-Revised T score; ES ¼ Ego Strength T score; O-S ¼ Obvious-Subtle; MI ¼ Meyers Index for the MMPI-2.

Table 2. Weightings of scores on the validity scale for the MMPI-2-RF used to calculate the MI-r (Meyers et al., 2013) Scale

Score

F-r Fp-r Fs FBS-r RBS

, ¼ 74 , ¼ 74 , ¼ 74 , ¼ 74 , ¼ 74

Weight

Score

Weight

Score

Weight

0 0 0 0 0

75–89 75–89 75–89 75–89 75–89

1 1 1 1 1

90 + 90 + 90 + 90 + 90 +

2 2 2 2 2

F-r ¼ Infrequent Responses; Fp-r ¼ Infrequent Psychopathology Responses; Fs ¼ Infrequent Somatic Responses; FBS-r ¼ Symptom Validity, RBS ¼ Response Bias Scale.

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

For the sample utilized, the base rate of invalid data was calculated using the rate of invalid data for the WMT, which was found to be 38.4%; using this calculation, we determined the base rate odds with the formula base rate/(1 2 base rate). Sensitivity for each test was then determined using the percentage of people who passed both the WMT and the individual PVM, whereas the specificity was determined using the percentage of people who failed both the WMT and the PVM (Table 3). The current sample of 255 cases was used to set the sensitivity and specificity for each PVM in this study and was not based on the Meyers and Volbrecht (2003) data sample. Once these figures were calculated, the likelihood ratio for each test was calculated by dividing the sensitivity by (1 2 specificity), as done by Larrabee (2008). The pre-test probabilities of failing each PVM were calculated using the failure rates given by the sample group. This was calculated by dividing the number of failures in a given PVM by the total number of passes and failures for that particular measure. Subsequently, the pre-test odds were calculated by dividing the pre-test probability of each test by (1 2 pre-test probability). Post-test odds of each test were calculated by multiplying the pre-test odds by the likelihood ratio. Using these calculations, post-test probabilities for each test were calculated by dividing the post-test odds by (post-test odds + 1) (Table 3). In order to calculate the post-test probabilities of invalid data for a pair of test failures (Table 4), one would multiply the post-test odds of the first test in the pair by the likelihood ratio of the second test, from the values calculated in Table 3. Post-test probabilities are then calculated, as aforementioned, by dividing the post-test odds by (post-test odds + 1). In order to calculate the post-test probabilities of invalid data for a set of three test failures (Table 5), one would multiply the post-test odds of the combination of the first two tests, as calculated in Table 4, by the likelihood ratio of the third test in the set, as calculated in Table 3.

228

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

Table 3. Diagnostic statistics and probabilities of invalid performance for individual performance validity measures Test

Sensitivity

Specificity

Likelihood ratio

Pre-test oddsa

Post-test odds

Post-test probability

FC DL SR JL TT AV FTD MEP RDS WMTMNB MI

0.720 0.664 0.634 0.642 0.618 0.736 0.665 0.677 0.659 0.817 1.00

0.995 0.971 0.994 0.994 0.994 0.974 0.959 0.994 0.947 0.906 0.631

144.000 22.897 105.667 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710

0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.624 0.624

89.856 14.288 65.936 66.768 64.272 17.973 10.121 70.408 7.759 5.423 1.691

.989 .935 .985 .985 .985 .947 .910 .986 .876 .844 .628

Notes: FC ¼ Forced Choice; DL ¼ Dichotic Listening; SR ¼ Sentence Repetition; JL ¼ Judgment of Line Orientation; TT ¼ Token Test; AV ¼ AVLT-Recognition; FTD ¼ Finger Tapping-Dominant Hand; MEP ¼ Memory Error Pattern; RDS ¼ Reliable Digit Span; WMTMNB ¼ Word Memory Test-Meyers Neurological Battery; MI ¼ Meyers Index (from MMPI-2 or MMPI-2-RF). a Base rate of WMT failure ¼ 0.384, odds are 0.384/1 2 0.384 ¼ 0.624.

Results Table 3 displays the sensitivities and specificities of the sample of 255 subjects calculated based on the failure rate of the tests administered. The sensitivities and specificities were applied to create likelihood ratios, pre-test odds (base rate/1 2 base rate; the base rate was determined by the pass/fail rate of the WMT, not using the GMIP), post-test odds, and post-test probabilities of having invalid data for each of the individual PVMs, as based on Larrabee (2008). Table 4 displays the odds and probabilities of invalid results for each possible pair of tests. The post-test probabilities found in Tables 4 and 5 were calculated following the methodology used by Larrabee (2008). The pre-test odds found in Table 4 are the post-test odds of the first test in the pair taken from Table 3, and the likelihood ratio is the likelihood ratio of the second test in the pair extracted from Table 3. Additionally, Table 5 displays the pre-test odds of each combination of three tests by extracting it from the post-test odds from Table 4 for the first two tests in the combination, and the likelihood ratio for each combination is the likelihood ratio of the third test in the combination extracted from Table 3. The data in Tables 3 – 5 are derived based on the assumption that the 10 PVM indicators and the MI are independent from one another. Evaluation of the assumption of independence between the 10 PVMs and the MI “when using chained likelihood ratios” was done by computing the PVM intercorrelations for the 255 subjects who had complete data on all 10 PVM indicators and the MI. PVM correlations ranged from 2.041 (SR/MEP) to .478 (AV/WMTMNB); the average of the correlations was .123, which was not found to be statistically significantly different from zero, thus showing that the PVM indicators could be considered independent from one another. Table 6 portrays both the post-test odds and post-test probabilities for failure of any one, two, or three of the PVMs. Within this table, it is assumed that sensitivity is set at 0.80 and specificity is set at 0.90, based on the average sensitivity (0.77) and specificity (0.93) of the individual measures listed in Table 3. It is also assumed that each of the tests are independent of one another. We also used the same base rate assumption of 40% used by Larrabee (2008) and identified by Larrabee and colleagues (2009) and Mittenberg, Patton, Canyock, and Condit (2002). As we were attempting to replicate Larrabee’s (2008) method, we chose to use this already identified rate (40%) in our study. Table 6 shows a wide range of post-test probabilities when only one PVM is failed, but as an increasing number of measures are failed, posterior probabilities decrease in range, becoming high and rather unaffected by the base rate by the time three PVMs are failed. See Table 7 for details on the intercorrelations of the PVMs. The selected PVMs used in this study are sufficiently independent of each other to allow this chaining method to be used as we have demonstrated in this article. Discussion As shown in Tables 3 – 5, the post-test probability of having invalid data increases from a range of .579– .982 with the failure of one test to a range of .916– .999 with the failure of two tests, and to .989 – .999 for the failure of three tests. The probability of

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Post-test probabilities are then calculated by dividing the post-test odds by (post-test odds + 1). If desired, post-test odds for a set of “n” number of failures may be calculated by multiplying post-test probabilities of the first (n 2 1) tests by the likelihood ratio of the last test; post-test probabilities would then be calculated by dividing the post-test odds by (post-test odds + 1).

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

229

Table 4. Diagnostic statistics and probabilities of malingering for pairs of performance validity measure failure Pre-test oddsa

Likelihood ratiob

Post-test odds

Post-test probability

FC/DL FC/SR FC/JL FC/TT FC/AV FC/FTDif FC/MEP FC/RDS FC/WMTMNB FC/MI DL/SR DL/JL DL/TT DL/AV DL/FTDif DL/MEP DL/RDS DL/WMTMNB DL/MI SR/JL SR/TT SR/AV SR/FTDif SR/MEP SR/RDS SR/WMTMNB SR/MI JL/TT JL/AV JL/FTDif JL/MEP JL/RDS JL/WMTMNB JL/MI TT/AV TT/FTDif TT/MEP TT/RDS TT/WMTMNB TT/MI AV/FTDif AV/MEP AV/RDS AV/WMTMNB AV/MI FTDif/MEP FTDif/RDS FTDif/WMTMNB FTDif/MI MEP/RDS MEP/WMTMNB MEP/MI RDS/WMTMNB RDS/MI WMTMNB/MI

89.856 89.856 89.856 89.856 89.856 89.856 89.856 89.856 89.856 89.856 14.288 14.288 14.288 14.288 14.288 14.288 14.288 14.288 14.288 65.936 65.936 65.936 65.936 65.936 65.936 65.936 65.936 66.768 66.768 66.768 66.768 66.768 66.768 66.768 64.272 64.272 64.272 64.272 64.272 64.272 17.973 17.973 17.973 17.973 17.973 10.121 10.121 10.121 10.121 70.408 70.408 70.408 7.759 7.759 5.423

22.897 105.667 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710 105.667 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710 103.000 28.308 16.220 112.833 12.434 8.691 2.710 28.308 16.220 112.833 12.434 8.691 2.710 16.220 112.833 12.434 8.691 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710

2,057.433 9,494.814 9,614.592 9,255.168 2,543.644 1,457.464 10,138.722 1,117.270 780.938 243.510 1,509.770 1,528.816 1,471.664 404.465 231.751 1,600.875 177.657 124.177 38.720 7,055.152 6,791.408 1,866.516 1,096.482 7,439.757 819.848 573.050 178.687 6,877.104 1,890.069 1,082.977 7,533.634 830.193 580.281 180.941 1,819.412 1,042.492 7,252.003 799.158 558.588 174.177 291.522 2,027.948 221.238 156.203 48.707 1,141.983 125.845 87.962 27.428 875.453 611.916 190.806 67.433 21.027 14.209

.999 .999 .999 .999 .999 .999 .999 .999 .999 .996 .999 .999 .999 .998 .996 .999 .994 .992 .975 .999 .999 .999 .999 .999 .999 .998 .994 .999 .999 .999 .999 .999 .999 .995 .999 .999 .999 .999 .998 .994 .997 .999 .996 .994 .980 .999 .992 .989 .965 .999 .998 .995 .985 .955 .934

Notes: FC ¼ Forced Choice; DL ¼ Dichotic Listening; SR ¼ Sentence Repetition; JL ¼ Judgment of Line Orientation; TT ¼ Token Test; AV ¼ AVLT-Recognition; FTD ¼ Finger Tapping-Dominant Hand; MEP ¼ Memory Error Pattern; RDS ¼ Reliable Digit Span; WMTMNB ¼ Word Memory Test-Meyers Neurological Battery; MI ¼ Meyers Index (from MMPI-2 or MMPI-2-RF). a Pre-test odds are the post-test odds from Table 3 for the first test in the pair. b Likelihood ratio (LR) is the LR for the second test in each pair, taken from Table 3.

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Test combination

230

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

Table 5. Diagnostic statistics and probabilities for failure of three performance validity measures Pre-test oddsa

Likelihood ratiob

Post-test odds

Post-test probability

FC/DL/SR FC/DL/JL FC/DL/TT FC/DL/AV FC/DL/FTDif FC/DL/MEP FC/DL/RDS FC/DL/WMTMNB FC/DL/MI FC/SR/JL FC/SR/TT FC/SR/AV FC/SR/FTDif FC/SR/MEP FC/SR/RDS FC/SR/WMTMNB FC/SR/MI FC/JL/TT FC/JL/AV FC/JL/FTDif FC/JL/MEP FC/JL/RDS FC/JL/WMTMNB FC/JL/MI FC/TT/AV FC/TT/FTDif FC/TT/MEP FC/TT/RDS FC/TT/WMTMNB FC/TT/MI FC/AV/FTDif FC/AV/MEP FC/AV/RDS FC/AV/WMTMNB FC/AV/MI FC/FTDif/MEP FC/FTDif/RDS FC/FTDif/WMTMNB FC/FTDif/MI FC/MEP/RDS FC/MEP/WMTMNB FC/MEP/MI FC/RDS/WMTMNB FC/RDS/MI FC/WMTMNB/MI DL/SR/JL DL/SR/TT DL/SR/AV DL/SR/FTDif DL/SR/MEP DL/SR/RDS DL/SR/WMTMNB DL/SR/MI DL/JL/TT DL/JL/AV DL/JL/FTDif DL/JL/MEP DL/JL/RDS DL/JL/WMTMNB

2,057.433 2,057.433 2,057.433 2,057.433 2,057.433 2,057.433 2,057.433 2,057.433 2,057.433 9,494.814 9,494.814 9,494.814 9,494.814 9,494.814 9,494.814 9,494.814 9,494.814 9,614.592 9,614.592 9,614.592 9,614.592 9,614.592 9,614.592 9,614.592 9,255.168 9,255.168 9,255.168 9,255.168 9,255.168 9,255.168 2,543.644 2,543.644 2,543.644 2,543.644 2,543.644 1,457.464 1,457.464 1,457.464 1,457.464 10,138.722 10,138.722 10,138.722 1,117.270 1,117.270 780.938 1,509.770 1,509.770 1,509.770 1,509.770 1,509.770 1,509.770 1,509.770 1,509.770 1,528.816 1,528.816 1,528.816 1,528.816 1,528.816 1,528.816

105.667 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710 103.000 28.308 16.220 112.833 12.434 8.691 2.710 28.308 16.220 112.833 12.434 8.691 2.710 16.220 112.833 12.434 8.691 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710 107.000 103.000 28.308 16.220 112.833 12.434 8.691 2.710 103.000 28.308 16.220 112.833 12.434 8.691

217,402.773 220,145.331 211,915.599 58,241.813 33,371.563 232,146.338 25,582.122 17,881.1502 5,575.643 1,015,945.100 977,965.842 268,779.195 154,005.883 1,071,328.350 118,058.517 82,519.429 25,730.950 990,302.976 272,169.870 155,948.682 1,084,843.260 119,547.837 83,560.419 26,055.544 261,995.296 150,118.825 1,044,288.370 115,078.759 80,436.665 25,081.505 41,257.906 287,006.983 31,627.670 22,106.810 6,893.275 164,450.036 18,122.107 12,666.820 3,949.727 126,064.869 88,115.633 27,475.937 9,710.194 3,027.802 2,116.342 161,545.390 155,506.310 42,738.569 24,488.469 170,351.878 18,772.480 13,121.411 4,091.477 157,468.048 43,277.723 24,797.400 172,500.896 19,009.298 13,286.940

.999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 (continued on next page)

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Test combination

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

231

Table 5. (continued) Test combination

1,528.816 1,471.664 1,471.664 1,471.664 1,471.664 1,471.664 1,471.664 404.465 404.465 404.465 404.465 404.465 231.751 231.751 231.751 231.751 1,600.875 1,600.875 1,600.875 177.657 177.657 124.177 7,055.152 7,055.152 7,055.152 7,055.152 7,055.152 7,055.152 7,055.152 6,791.408 6,791.408 6,791.408 6,791.408 6,791.408 6,791.408 1,866.516 1,866.516 1,866.516 1,866.516 1,866.516 1,096.482 1,096.482 1,096.482 1,096.482 7,439.757 7,439.757 7,439.757 819.848 819.848 573.050 6,877.104 6,877.104 6,877.104 6,877.104 6,877.104 6,877.104 1,890.069 1,890.069

Likelihood ratiob 2.710 28.308 16.220 112.833 12.434 8.691 2.710 16.220 112.833 12.434 8.691 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710 103.000 28.308 16.220 112.833 12.434 8.691 2.710 28.308 16.220 112.833 12.434 8.691 2.710 16.220 112.833 12.434 8.691 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710 28.308 16.220 112.833 12.434 8.691 2.710 16.220 112.833

Post-test odds 4,143.091 41,659.865 23,870.390 166,052.264 18,298.670 12,790.232 3,988.209 6,560.422 45,636.999 5,029.118 3,515.205 1,096.100 26,149.161 2,881.592 2,014.148 628.045 19,905.280 13,913.205 4,338.371 1,544.017 481.450 336.520 726,680.656 199,717.243 114,434.565 796,053.966 87,723.760 61,316.326 19,119.462 192,251.178 110,156.638 766,294.939 84,444.367 59,024.127 18,404.716 30,274.890 210,604.600 23,208.260 16,221.891 5,058.258 123,719.354 13,633.6572 9,529.525 2,971.466 92,505.939 64,658.928 20,161.742 7,125.299 2,221.788 1,522.966 194,677.060 111,546.627 775,964.276 85,509.911 59,768.911 18,636.952 30,656.919 213,262.155

Post-test probability .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .998 .999 .999 .999 .999 .998 .997 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 (continued on next page)

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

DL/JL/MI DL/TT/AV DL/TT/FTDif DL/TT/MEP DL/TT/RDS DL/TT/WMTMNB DL/TT/MI DL/AV/FTDif DL/AV/MEP DL/AV/RDS DL/AV/WMTMNB DL/AV/MI DL/FTDif/MEP DL/FTDif/RDS DL/FTDif/WMTMNB DL/FTDif/MI DL/MEP/RDS DL/MEP/WMTMNB DL/MEP/MI DL/RDS/WMTMNB DL/RDS/MI DL/WMTMNB/MI SR/JL/TT SR/JL/AV SR/JL/FTDif SR/JL/MEP SR/JL/RDS SR/JL/WMTMNB SR/JL/MI SR/TT/AV SR/TT/FTDif SR/TT/MEP SR/TT/RDS SR/TT/WMTMNB SR/TT/MI SR/AV/FTDif SR/AV/MEP SR/AV/RDS SR/AV/WMTMNB SR/AV/MI SR/FTDif/MEP SR/FTDif/RDS SR/FTDif/WMTMNB SR/FTDif/MI SR/MEP/RDS SR/MEP/WMTMNB SR/MEP/MI SR/RDS/WMTMNB SR/RDS/MI SR/WMTMNB/MI JL/TT/AV JL/TT/FTDif JL/TT/MEP JL/TT/RDS JL/TT/WMTMNB JL/TT/MI JL/AV/FTDif JL/AV/MEP

Pre-test oddsa

232

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

Table 5. (continued) Test combination

1,890.069 1,890.069 1,890.069 1,082.977 1,082.977 1,082.977 1,082.977 7,533.634 7,533.634 7,533.634 830.193 830.193 580.281 1,819.412 1,819.412 1,819.412 1,819.412 1,819.412 1,042.492 1,042.492 1,042.492 1,042.492 7,252.003 7,252.003 7,252.003 799.158 799.158 558.588 291.522 291.522 291.522 291.522 2,027.948 2,027.948 2,027.948 221.238 221.238 156.203 1,141.983 1,141.983 1,141.983 125.845 125.845 87.962 875.453 875.453 611.916 67.433

Likelihood ratiob 12.434 8.691 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710 16.220 112.833 12.434 8.691 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710 112.833 12.434 8.691 2.710 12.434 8.691 2.710 8.691 2.710 2.710 12.434 8.691 2.710 8.691 2.710 2.710 8.691 2.710 2.710 2.710

Post-test odds 23,501.118 16,426.590 5,122.087 122,195.544 13,465.736 9,412.153 2,934.868 93,673.205 65,474.813 20,416.148 7,215.207 2,249.823 1,572.562 29,510.863 205,289.714 22,622.569 15,812.510 4,930.607 117,627.500 12,962.346 9,060.298 2,825.153 90,171.405 63,027.158 19,652.928 6,945.482 2,165.718 1,513.733 32,894.543 3,624.785 2,533.618 790.025 25,215.505 17,624.896 5,495.739 1,922.779 599.555 423.310 14,199.417 9,924.974 3,094.774 1,093.719 341.040 238.377 7,608.562 2,372.478 1,658.292 182.743

Post-test probability .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999 .998 .998 .999 .999 .999 .998 .997 .996 .999 .999 .999 .995

Notes: FC ¼ Forced Choice; DL ¼ Dichotic Listening; SR ¼ Sentence Repetition; JL ¼ Judgment of Line Orientation; TT ¼ Token Test; AV ¼ AVLT-Recognition; FTD ¼ Finger Tapping-Dominant Hand; MEP ¼ Memory Error Pattern; RDS ¼ Reliable Digit Span; WMTMNB ¼ Word Memory Test-Meyers Neurological Battery; MI ¼ Meyers Index (from MMPI-2 or MMPI-2-RF). a Pre-test odds are the post-test odds from Table 4 for the first two tests in the triplet. b Likelihood ratio (LR) is the LR from Table 3 for the third test in the triplet.

having invalid data can be quickly calculated using Table 1, by multiplying the post-test odds of the first test in a combination by the likelihood ratios of the subsequent tests in combination. As evidenced by the range of post-test probabilities in Table 5, calculating the post-test probability of having invalid data for four-test combinations is redundant and likely to result in similar post-test probabilities for test combinations that are greater than 3.

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

JL/AV/RDS JL/AV/WMTMNB JL/AV/MI JL/FTDif/MEP JL/FTDif/RDS JL/FTDif/WMTMNB JL/FTDif/MI JL/MEP/RDS JL/MEP/WMTMNB JL/MEP/MI JL/RDS/WMTMNB JL/RDS/MI JL/WMTMNB/MI TT/AV/FTDif TT/AV/MEP TT/AV/RDS TT/AV/WMTMNB TT/AV/MI TT/FTDif/MEP TT/FTDif/RDS TT/FTDif/WMTMNB TT/FTDif/MI TT/MEP/RDS TT/MEP/WMTMNB TT/MEP/MI TT/RDS/WMTMNB TT/RDS/MI TT/WMTMNB/MI AV/FTDif/MEP AV/FTDif/RDS AV/FTDif/WMTMNB AV/FTDif/MI AV/MEP/RDS AV/MEP/WMTMNB AV/MEP/MI AV/RDS/WMTMNB AV/RDS/MI AV/WMTMNB/MI FTDif/MEP/RDS FTDif/MEP/WMTMNB FTDif/MEP/MI FTDif/RDS/WMTMNB FTDif/RDS/MI FTDif/WMTMNB/MI MEP/RDS/WMTMNB MEP/RDS/MI MEP/WMTMNB/MI RDS/WMTMNB/MI

Pre-test oddsa

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

233

Table 6. Probability of malingering utilizing independent tests, each with sensitivity of 0.80 and specificity of 0.90 at different base rates of malingering Base rate of malingering

Pre-test odds

Likelihood ratio

Post-test odds failing one test (probability)

Post-test odds failing two tests (probability)

Post-test odds failing three tests (probability)

0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90

0.111 0.250 0.429 0.667 1.000 1.500 2.333 4.000 9.000

8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0

0.889 (.471) 2.000 (.667) 3.432 (.774) 5.336 (.842) 8.000 (.889) 12.000 (.923) 18.664 (.949) 32.000 (.970) 72.000 (.986)

7.112 (.877) 16.000 (.941) 27.456 (.965) 42.688 (.978) 64.000 (.985) 96.000 (.990) 149.312 (.993) 256.000 (.996) 576.000 (.998)

56.896 (.983) 128.000 (.992) 219.648 (.995) 341.344 (.997) 512.000 (.998) 768.000 (.999) 1,194.496 (.999) 2,048.000 (.999) 4,608.000 (.999)

Table 7. Performance validity measures intercorrelations (range: 20.042, 0.418; average: 0.151, SD: 0.118) Correlation

Test combination

Correlation

FC/DL FC/SR FC/JL FC/TT FC/AV FC/FTDif FC/MEP FC/RDS FC/WMTMNB FC/MI DL/SR DL/JL DL/TT DL/AV DL/FTDif DL/MEP DL/RDS DL/WMTMNB DL/MI SR/JL SR/TT SR/AV SR/FTDif SR/MEP SR/RDS SR/WMTMNB SR/MI JL/TT

.336 .094 2.002 2.034 .339 .097 .178 .164 .318 2.031 .153 .035 .144 .350 .241 .164 .276 .214 .078 2.028 .280 .083 .048 2.042 .132 .114 .210 .239

JL/AV JL/FTDif JL/MEP JL/RDS JL/WMTMNB JL/MI TT/AV TT/FTDif TT/MEP TT/RDS TT/WMTMNB TT/MI AV/FTDif AV/MEP AV/RDS AV/WMTMNB AV/MI FTDif/MEP FTDif/RDS FTDif/WMTMNB FTDif/MI MEP/RDS MEP/WMTMNB MEP/MI RDS/WMTMNB RDS/MI WMTMNB/MI

.245 .192 .403 .179 .192 .066 .216 .135 2.024 2.028 .095 .190 .241 .292 .259 .418 .066 .149 .055 .206 .001 .190 .223 .015 .117 2.005 .089

Notes: FC ¼ Forced Choice; DL ¼ Dichotic Listening; SR ¼ Sentence Repetition; JL ¼ Judgment of Line Orientation; TT ¼ Token Test; AV ¼ AVLT-Recognition; FTD ¼ Finger Tapping-Dominant Hand; MEP ¼ Memory Error Pattern; RDS ¼ Reliable Digit Span; WMTMNB ¼ Word Memory Test-Meyers Neurological Battery; MI ¼ Meyers Index (from MMPI-2 or MMPI-2-RF).

The aforementioned ranges of post-test probabilities for various test combinations serve two main purposes. The first is that they support the requirement for failure of multiple indicators when testing for non-valid results. When only one test is failed, it can only be said with 50% – 90% certainty that the data were invalid; however, as more tests are failed, the post-test probability swiftly increases, with the failure of any three tests indicating a near guarantee that the participant provided invalid data. Simultaneously, the process of aggregation via the chaining of likelihood ratios further invalidates the argument of vote-counting enterprises. A vote-counting enterprise is a system in which a person must fail a majority of the given test indicators in order to be suspected of malingering; however, by using likelihood ratios, it becomes evident that it is only necessary to fail three of the ten PVMs, as evidenced in Table 5, rather than the six failed out of 10 total indicators demanded by a vote-counting enterprise (Larrabee, 2008).

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Test combination

234

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

Conflict of Interest J.E.M. is one of the Author of the Rey Complex Figure and Recognition Trial, and the author of the Meyers Neuropsychological Battery software.

References Baldessarini, R. J., Finklestein, S., & Arana, G. W. (1983). The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of General Psychiatry, 40, 569–573. Benton, A., Hamsher, K., Varney, N., & Spreen, O. (1983). Contributions to neuropsychological assessment: A clinical manual. New York: Oxford University Press. Bianchini, K. J., Mathias, C. W., & Greve, K. W. (2001). Symptom validity testing: A critical review. The Clinical Neuropsychologist, 15 (1), 19– 45. Binder, L. M. (1993). Assessment of malingering after mild head trauma with the Portland Digit Recognition Test. Journal of Clinical and Experimental Neuropsychology, 15, 170– 182. Boone, K. B., Lu, P., Sherman, D., Palmer, B., Back, C., Shamieh, E., et al. (2000). Validation of a new technique to detect malingering of cognitive symptoms: The b Test. Archives of Clinical Neuropsychology, 15 (3), 227– 241. Brandt, J., Rubinsky, E., & Larson, G. (1985). Uncovering malingered amnesia. Annals of the New York Academy of Science, 44, 502 –503.

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Table 6 shows that, while post-test probability ranges for one failed PVM are quite high at base rates spanning from .10 to .90, post-test probability ranges decrease greatly for two failed PVMs and even more so for three failed PVMs, when it is assumed that sensitivity (0.80) and specificity (0.90) are held constant and each PVM is an independent test. Therefore, using chained likelihood ratios to detect invalid data is applicable, even when base rates of failure are low, and failing at least three PVMs still indicates a high probability that the patient is producing invalid data. As indicated in Table 6, as the average base rate of invalid performance increases, the number of failed PVMs needed to detect invalid performance decreases. For example, an invalid performance base rate of 0.10 would require three PVM failures for the probability of having invalid data to be over .90, but with an invalid performance base rate of 0.6 or higher, only one PVM test failure would be needed for the probability of having invalid data to exceed .90, using a test with a sensitivity of 0.70 and specificity of 0.90. One difference between the Larrabee (2008) study and the current study is that he used the Portland Digit Recognition Test (PDRT; Binder, 1993) as his standard for identifying the pass and fail for his group of subjects. Larrabee used a known-groups design, with a sample of litigating individuals who performed significantly worse than chance on the PDRT, and meeting Slick and colleagues (1999) criteria for malingered neurocognitive deficit. The current paper does not use a known-groups design to evaluate detection of malingering but rather uses the WMT as the standard to detect invalid performance. The paper also did not use a criterion of worse than chance performance on the WMT. Also Larrabee used different PVMs, with specificity closer to 0.90 that were lower than most of specificities in the current paper, and sensitivities that were closer to 0.50 than the 0.77 in the current paper. Examining the data presented in Table 6 in the present paper compared with Table 6 in Larrabee (2008), we see that both use the same pre-test base rate (0.40), and pre-test false positive rate of 0.10, but Larrabee used a pre-test sensitivity of 0.50 compared with 0.70 used in the current study. We feel that the method used in the current study yields results that are more useful across various clinical settings compared with the previous study by Larrrabee (2008). In Larrabee (2008), he used a known-groups design; however, our findings also show that his chaining approach can be used when the individual group membership is not known. This chaining approach is applicable not only to a forensic setting, but to a regular clinical setting. Although the use of chained likelihood ratios to determine the validity of data in PVMs is worthy of further study on its own merits, it could also be argued that the probability of purposefully providing invalid data could be calculated using likelihood ratios for any set of tests that meet the criteria of having low intercorrelations. This study, in conjunction with previous studies conducted by Larrabee (2003, 2008), suggests that it would be reasonable to apply chained likelihood ratios to a broader range of measures utilized in clinical settings. This article builds on the work of Larrabee (2008) and presents more PVMs that the clinician can use. Even if the MNB is not used, if the tests that comprise the PVMs are given, then they can serve additionally as embedded PVMs. One noted caution is the use of the MI which is an SVT. The other measures are PVMs. The noted difference in sensitivity and specificity from the other tests presented in Table 3 may be due to the nature of the SVT as a self-report and not a performance-based task. These results indicate that self-report may not be well correlated with PVM test performance. One of the strengths of this study is that if a clinician does not use all of these PVMs, the data obtained can still be useful. Chaining of likelihood ratios is a methodology that is portable across different PVM and SVT measures as only one need to estimate the base rate of invalid performance/malingering and know the specific sensitivity and specificity of the measures being used to be able to compute post-test probability from administration of multiple PVMs and SVTs.

J.E. Meyers et al. / Archives of Clinical Neuropsychology 29 (2014) 224–235

235

Downloaded from http://acn.oxfordjournals.org/ at UQ Library on June 17, 2015

Bush, S. S., Ruff, R. M., Tro¨ster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., et al. (2005). Symptom validity assessment: Practice issues and medical necessity NAN policy & planning committee. Archives of Clinical Neuropsychology, 20, 419–426. Etcoff, L. M., & Kampfer, K. M. (1996). Practical guidelines in the use of symptom validity and other psychological tests to measure malingering and symptom exaggeration in traumatic brain injury cases. Neuropsychology Review, 6 (4), 171– 201. Green, P. (2003). Word Memory Test. Edmonton, Alberta: Green Publishing. Green, P. (2007). The pervasive influence of effort on neuropsychological tests. Physical Medicine Rehabilitation Clinics of North America, 18 (1), 43– 68. Green, P., Flaro, L., & Courtney, J. (2009). Examining False Positives on the Word Memory Test in adults with mild traumatic brain injury. Brain Injury, 23, 741–750. Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6, 218–224. Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cut-offs on psychometric indicators of negative response bias: A methodological commentary with recommendations. Archives of Clinical Neuropsychology, 19, 533–541. Greve, K. W., Bianchini, K. J., Etherton, J. L., Meyers, J. E., Curtis, K. L., & Ord, J. S. (2009). The Reliable Digit Span Test in chronic pain: Classification accuracy in detecting malingered pain-related disability. The Clinical Neuropsychologist, 24, 137–152. Grimes, D. A., & Schulz, K. F. (2005). Epidemiology 3. Refining clinical diagnosis with likelihood ratios. Lancet, 365, 1500–1505. Hamberger, M. J., & Tamny, T. R. (1999). Auditory naming and temporal lobe epilepsy. Epilepsy Research, 35, 229 –243. Larrabee, G. J. (2003). Detection of malingering using atypical performance patterns on standard neuropsychological tests. The Clinical Neuropsychologist, 17, 410–425. Larrabee, G. J. (2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22, 666–679. Larrabee, G. J. (2012). Performance validity and symptom validity in neuropsychological assessment. Journal of the International Neuropsychological Society, 18 (4), 625. Larrabee, G. J., Millis, S. R., & Meyers, J. E. (2009). 40 plus or minus 10, a new magical number: Reply to Russell. The Clinical Neuropsychologist, 23 (5), 841–849. Meyers, J. E. (2007). Malingering mild traumatic brain injury: Behavioral approaches used by both malingering actors and probable malingerers. In K. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective. New York: Guilford Press. Meyers, J. E. (2013). Electronic Manual for the Meyers Neuropsychological Battery/System. www.meyersneuropsychological.com. Meyers, J. E., Bayless, J., & Meyers, K. R. (1996). The Rey Complex Figure: Memory error patterns and functional abilities. Applied Neuropsychology, 3, 89–92. Meyers, J. E., Galinsky, A., & Volbrecht, M. (1999). Malingering and mild brain injury: How low is too low. Applied Neuropsychology, 6, 208– 216. Meyers, J. E., & Meyers, K. R., (1995). Rey complex figure test and recognition trial: Professional manual. Odessa, FL: Psychological Assessment Resource. Meyers, J. E., Miller, R. M., Haws, N. A., Murphy-Tafiti, J. L., Curtis, T. D., Rupp, Z. W., et al. (2013). An Adaptation of the MMPI-2 Meyers Index for the MMPI-2-RF. Applied Neuropsychology: Adult, doi:10.1080/09084282.2013.780173. Meyers, J. E., Millis, S. R., & Volkert, K. (2002). A validity index for the MMPI-2. Archives of Clinical Neuropsychology, 17 (2), 157– 169. Meyers, J. E., Morrison, A. L., & Miller, J. C. (2001). How low is too low, Revisited: Sentence Repetition and AVLT-Recognition in the detection of malingering. Applied Neuropsychology, 8 (4), 234–241. Meyers, J. E., Roberts, R. J., Bayless, J. D., Volkert, K. T., & Evitts, P. E. (2002). Dichotic listening: Expanded norms and clinical application. Archives of Clinical Neuropsychology, 17, 79– 90. Meyers, J. E., & Rohling, M. L. (2004). Validation of the Meyers Short Battery on mild TBI patients. Archives of Clinical Neuropsychology, 19, 637 –651. Meyers, J. E., & Volbrecht, M. (1998). Validation of reliable digits for detection of malingering. Assessment, 5, 301– 305. Meyers, J. E., & Volbrecht, M. (1999). Detection of malingerers using the Rey Complex Figure and recognition trial. Applied Neuropsychology, 6, 201–207. Meyers, J. E., Volbrecht, M., Axelrod, B. N., & Reinsch-Boothby, L. (2011). Embedded Symptom Validity Tests and Overall Neuropsychological Test performance. Archives of Clinical Neuropsychology, 26, 8 –15. Meyers, J. E., & Volbrecht, M. E. (2003). A validation of multiple malingering detection methods in a large clinical sample. Archives of Clinical Neuropsychology, 18, 261– 276. Mittenberg, W., Patton, C., Canyock, E. M., & Condit, D. C., (2002). Base rates of malingering and symptom exaggeration. Journal of Clinical and Experimental Neuropsychology, 24, 1094–1102. Reitan, R., & Wolfson, D. (1985). The Halstead-Reitan Neuropsychological Test Battery: Theory and interpretation. Tucson: Neuropsychology Press. Rohling, M. L., Meyers, J. E., & Millis, S. R. (2003). Neuropsychological impairment following traumatic brain injury: A dose-response analysis. The Clinical Neuropsychologist, 17, 289–302. Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. Clinical Neuropsychologist, 13 (4), 545 –561. Strauss, E., Sherman, E. M., & Spreen, O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary (3rd ed.). New York: Oxford University Press. Vickery, C. D., Berry, D. T. R., Inman, T. H., Harris, M. J., & Orey, S. A. (2001). Detection of inadequate effort on neuropsychological testing: A meta-analytic review of selected procedures. Archives of Clinical Neuropsychology, 16, 45– 73. Victor, T. L., Boone, K. B., Serpa, J., Buehler, J., & Ziegler, E. A. (2009). Interpreting the Meaning of Multiple Symptom Validity Test Failure. The Clinical Neuropsychologist, 23 (2), 297– 313. Wechsler, D. (1997). WAIS-III: Administration and scoring manual. San Antonio, TX: The Psychological Corporation.

Using likelihood ratios to detect invalid performance with performance validity measures.

Larrabee (2008) applied chained likelihood ratios to selected performance validity measures (PVMs) to identify non-valid performances on neuropsycholo...
151KB Sizes 0 Downloads 0 Views