EDITORIAL

Reliability is Necessary but Far From Sufficient How Might the Validity of Pain Ratings be Improved? Robert H. Dworkin, PhD,* Laurie B. Burke, RPh, MPH,w Jennifer S. Gewandter, PhD, MPH,* and Shannon M. Smith, PhD*

(Clin J Pain 2015;31:599–602)

W

ith few exceptions, studies of pain in human participants include ratings of pain that is either caused by a clinical condition or by an experimental stimulus. These pain ratings have been a fundamental feature of pain research for many years, and their reliability and validity have been considered well established.1 However, recent failures of randomized clinical trials (RCTs) in demonstrating the benefit of medications with known analgesic efficacy have provided the basis for considering whether such apparently false-negative results can be explained by limitations of pain ratings when used as outcome measures. Such considerations have not typically involved possible differences between numerical rating scales (NRS) and visual analog scales (VAS) given that these 2 broad types of pain measures have generally comparable reliability, validity, and ability to detect change in an individual and demonstrate group differences in analgesic effects. It has been suggested, however, that inadequate or inconsistent clinical trial assay sensitivity might be explained, at least in part, by limitations of these pain measures and the processes by which they are administered to participants.2–5

THE RELIABILITY OF PAIN RATINGS It is within this context that 3 recent articles examined the reliability and ability to detect treatment effects of pain ratings conducted using different approaches.6–8 Jensen et al6 evaluated the reliability and assay sensitivity of pain intensity ratings in a placebo-controlled RCT of an opioid analgesic for chronic low back pain. As expected, when the number of pain ratings included in various composite outcomes was increased from 2 to 9, the internal consistency reliability (Cronbach a) also generally increased, From the *Department of Anesthesiology, University of Rochester, Rochester, NY; and wLora Group LLC, Royal Oak, MD. All authors are associated with the Analgesic, Anesthetic, and Addiction Clinical Trial Translations, Innovations, Opportunities, and Networks (ACTTION) public-private partnership with the US Food and Drug Administration (FDA) Silver Spring, MD, which has provided financial support for the pain rating training system discussed in this article. ACTTION has received research contracts and grants from the FDA, multiple pharmaceutical and device companies, and other sources. The authors declare no conflict of interest. Reprints: Robert H. Dworkin, PhD, Department of Anesthesiology, University of Rochester Medical Center, 601 Elmwood Ave., Box 604, Rochester, NY 14642 (e-mail: robert_dworkin@urmc. rochester.edu). Copyright r 2014 Wolters Kluwer Health, Inc. All rights reserved. DOI: 10.1097/AJP.0000000000000175

Clin J Pain



although negligibly. However, the assay sensitivity of these composites to detect the analgesic effect of the opioid versus placebo did not increase, with standardized effect sizes (Cohen d) of 0.57 for the composites that consisted of both 2 and 9 pain ratings (vs. a Cohen d of 0.52 for a single pain rating). It was concluded that these results raise the possibility that future RCTs of pain treatments might be able to use a single-item rating of pain intensity as the primary outcome measure.6 When compared with daily or more frequent pain ratings, such an approach would greatly reduce participant burden and increase clinical trial efficiency. As Jensen et al6 acknowledged and as Stone et al8 then emphasized, this failure to find an increase in assay sensitivity with increasing reliability was based on a single RCT; replication is therefore necessary before revising existing approaches to rating pain in clinical trials. Stone et al8 found in pain diary data from 3 observational studies and 1 clinical trial that both reliability and treatment effect estimates increased as the number of ratings in composite measures increased from 2 to 7 daily ratings. They concluded that, if reliability is appreciably lower than observed by Jensen et al,6 RCTs using single pain ratings could be underpowered and have an increased likelihood of falsenegative results. Jensen et al,7 in attempting to account for the apparent discrepancies between their data and those of Stone et al,8 suggested that differences in study type (eg, RCT vs. observational), participant characteristics, assessment setting (eg, clinic vs. home), and staff and patient training could account for differences in reliability, and they emphasized that reliability and validity depend on a measure’s specific purpose and context of use. Both of these groups of investigators call for more research on the reliability of pain ratings,7,8 a conclusion with which it is difficult to disagree. Indeed, one example of the need for additional research goes back 20 years, when it was hypothesized that ratings of usual or average pain over time intervals such as a week might be as reliable and valid as the mean of daily pain ratings.9,10 Although the results of a recent observational study suggest that a 1-week recall rating might not be as valid as the mean of 7 daily ratings,11,12 such results must be examined for different contexts of use, perhaps especially clinical trials versus other types of clinical research. Nevertheless, it is possible that the internal consistency reliabilities found by Jensen et al6 and stone et al8—which were generally in the range of 0.85 to 0.95—are about as high as they can be for pain ratings. Because of this possibility, efforts should be devoted to identifying and investigating methods to increase the validity of pain ratings and thereby potentially improve the assay sensitivity of analgesic clinical trials. As emphasized over 40 years ago,

Volume 31, Number 7, July 2015 Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved.

www.clinicalpain.com |

599

Clin J Pain

Dworkin et al

internal consistency is “a necessary but far from sufficient condition for a one-to-one correspondence to obtain between verbal report and internal states.”13(p385) If the reliability of pain ratings is as good as it gets or nearly so, is it still possible that these ratings do not optimally measure the internal state of interest? In other words, what might be done to increase validity?

THE VALIDITY OF PAIN RATINGS Several types of validity have been described, including content, construct, and criterion validity.14–16 In addition, convergent and discriminant validity can be considered aspects of construct validity,17 as can the ability of a measure to detect change over time or differences between treatment groups (both of which have also been termed “responsiveness”). Criterion validity is generally not relevant to ratings of pain intensity because there is currently no criterion or “gold standard” with which the patient’s pain intensity rating can be compared; it is generally accepted that pain is a subjective experience, and pain is therefore what the person says it is. Content validity refers to the extent to which the scores generated by a measure adequately represent the internal state or other attribute (eg, physical activity) being assessed with item content that is relevant and comprehensive; for example, does an intelligence test include a sample of items that measure verbal and quantitative abilities, memory, reasoning, and the other important components of intelligence. It could be argued that the content validity of measures of pain intensity can be considered acceptable given that they are intended to assess the individual’s subjective evaluation of their own pain and they provide relatively straightforward approaches for accomplishing this. However, the content validity of pain intensity measures may be limited in a specific population with pain because the experience of pain can be variable or multidimensional, leaving respondents to use judgment and make assumptions about the aspect of their pain experience that is of interest to the investigator. Rarely do pain measures include systematic information that clarifies what should and should not be considered when rating pain, nor do they include standardized education or training for the research participants and patients who are asked to provide pain ratings. For example, patients with spontaneous ongoing pain may not know whether any stimulus-evoked pain they also experience (eg, in response to light touch) should be considered when rating their pain intensity. Construct validity refers to the extent to which data generated by a measure are consistent with preexisting hypotheses concerning relationships of those data to data collected with other measures that assess the same or a related construct (and, in evaluations of discriminant validity, measures that assess unrelated constructs). When there is no criterion or gold standard, an evaluation of a measure’s construct validity involves examining the pattern of relationships between the measure and other variables (eg, measures of the same or similar constructs; predicted group differences in the construct the measure is intended to assess) within the context in which it will be used. In their landmark discussion of construct validity, Cronbach and Meehl14 emphasized that construct validity should be examined with respect to a “nomological network” of predictions about what is—and is not—associated with the construct. The results of studies in which these predictions

600 | www.clinicalpain.com



Volume 31, Number 7, July 2015

are tested provide the evidence base for assessing a measure’s construct validity.16 However, evaluations of construct validity alone do not adequately reveal whether the most relevant aspects of pain experience are being assessed in the specific population tested—that is, whether the measure has acceptable content validity. In attempting to increase the construct validity of measures of pain intensity, it is important to consider our current understanding of the experience of pain and the basis it provides for a network of predictions about pain intensity. The pain experience can differ greatly among people and depends on, for example, the sensory, affective, and evaluative components of their pain,18 the duration of their pain, the degree to which their pain interferes with physical and emotional functioning, and the individual’s coping skills and social support.19–21 The primary objective of pain assessment in many clinical trials of pain treatments and in much other clinical research on pain typically involves an evaluation of the intensity of the sensory component of the participant’s pain experience. Predictions of which specific variables are associated with a measure of sensory pain intensity (convergent validity) and which are not (discriminant validity) can include variables such as other measures of pain, interference with physical and emotional functioning, sleep quality, coping efforts, personality, and global improvement with treatment. Pain intensity can be predicted to be most highly correlated with other measures of pain intensity; for example, pain ratings on a valid 0 to 10 NRS should be highly correlated with contemporaneous ratings on valid VAS measures and verbal rating scales (eg, none, mild, moderate, and severe pain). Associations between pain intensity and measures of the extent to which pain interferes with daily activities should also be substantial, but not as high as associations between different measures of pain intensity. It might also be predicted that pain intensity would be associated with personality traits like extroversion but that this association would not be nearly as high as associations between measures of pain intensity and pain interference. Such predicted associations between the construct of pain intensity and other constructs within the pain nomological network can be used to assess the construct validity of measures of pain intensity. Studies of these hypothesized relationships can test whether the direction and magnitude of associations between a measure of pain intensity and other measures are as predicted, thereby providing evidence of construct validity. If an association fails to confirm the prediction, it is possible that either the measure of pain intensity lacks adequate construct validity or that the predicted association (ie, the specific link in the pain nomological network) was not correct. Within the context of RCTs of pain treatments, it can be predicted that measures of pain intensity with greater content and construct validity would have a better ability to detect change in patients and would also be better able to demonstrate the analgesic effects of a truly efficacious treatment versus placebo (ie, standardized effect sizes would be larger with more valid measures).

CAN PAIN RATING VALIDITY BE IMPROVED? Unquestionably, there is interindividual variation in the experience of pain based on distinct life circumstances, personal characteristics, and functional impact. These types of variability can be addressed to some extent with careful

Copyright

r

2014 Wolters Kluwer Health, Inc. All rights reserved.

Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved.

Clin J Pain



Volume 31, Number 7, July 2015

attention to clinical trial entry criteria. However, it may be possible to improve the intraindividual validity of pain intensity ratings within the context of RCTs and other types of clinical research by implementing efforts to standardize pain intensity ratings for each participant. Such efforts have the potential to improve the content and the construct validity of pain ratings.

Pain Reporting Training One example of a systematic attempt to incorporate training into ratings of pain intensity is provided by the Pain-Related Outcomes Training and Evaluation for Conducting Clinical Trials (PROTECCT) working group of the Analgesic, Anesthetic, and Addiction Clinical Trial Translations, Innovations, Opportunities, and Networks (ACTTION) public-private partnership with the US Food and Drug Administration. The PROTECCT working group has developed a training program for participants in clinical research on chronic pain that has been designed for daily ratings of pain intensity using a 0 to 10 NRS. Key features of the training include: (1) identification by participants of personal anchors corresponding to very mild and worst pain intensity to increase the consistency of their pain ratings; (2) education about the meaning of “average” pain intensity and its relation to least and worst pain; (3) instructions that pain ratings should be for a specific pain condition, rather than considering all of the kinds of pain that participants might experience in a given day; (4) informing participants about the critical importance of rating their pain intensity as distinctly as possible from stress, mood, fatigue, or limitations in daily activities; and (5) discussing with participants that they are partners in the study and that it is critically important they rate their pain intensity as conscientiously as possible, which will help ensure that the results of the study will be meaningful. This training has the potential to improve the content validity of pain intensity ratings because of the attention it devotes to clarifying the concepts and parameters that will help participants rate their pain more carefully and consistently. Training participants to identify personal anchors and educating them about average, least, and worst pain intensity, and clearly specifying the type of pain, time frame, and need to exclude nonpain experiences when rating pain, can clarify for participants the specific subjective experiences that should—and should not—be considered when they are providing ratings, crucial elements of a measure’s content validity. Training programs for pain intensity ratings can be examined using qualitative and quantitative research to determine whether the training has the intended impact on the content validity of patient responses. It can be predicted that a consequence of increasing the content validity of a pain intensity NRS will be an increase in its construct validity. This can be tested by examining the associations between pain intensity ratings made by trained and untrained patients and measures of other variables within the pain intensity nomological network. If training increases construct validity, then convergent validity associations with variables predicted to be associated with pain intensity should increase; any associations between pain intensity ratings and variables that are not predicted to be associated with pain intensity (ie, discriminant validity) should decrease. In addition, standardized training of participants in clinical trials is expected to increase the ability of pain Copyright

r

Editorial

intensity measures to detect change as a result of increased validity. Educating participants about how to rate their daily pain has the potential to reduce variability in pain ratings—for example, by reducing any effects of nonpain experiences from the ratings, by explicitly excluding consideration of extraneous pain conditions, and by reducing uncertainty about the meaning of terms like average, worst, and least pain intensity. Pain ratings provided by trained participants are expected to be more consistent and have greater sensitivity to any changes in the experience of pain. If training has such consequences, the implementation of participant training as a component of the pain intensity assessment in clinical trials of pain treatments would become very important to consider because pain intensity ratings are typically the primary endpoint in these trials.

Other Methods to Improve Pain Rating Validity A variety of other approaches may increase the validity of pain intensity ratings. In many cases, for example, participants may enter analgesic RCTs with the expectation that their pain intensity will decrease or they may feel an obligation to trial staff to report decreases in pain intensity, both of which may lead to artificial improvement in their pain intensity ratings22. Providing education about the likelihood of receiving placebo and the clinical equipoise surrounding the investigational treatment, and informing participants that study staff want participants to report pain as accurately as possible even if it stays the same or worsens, may provide further approaches to increasing the content and the construct validity of pain ratings.22 It is also possible that the validity of pain rating scales can be further increased by providing considerably more detail for the levels of pain intensity assessed by pain rating scales. Well-defined response options with more explanation than typically provided by NRS and VAS measures of pain intensity may help respondents to more validly rate the intensity of their pain. One recent example of such an approach is provided by the Defense and Veteran’s Pain Rating Scale,23 a major expansion of a 0 to 10 NRS. Among other features, this measure includes an anchor for each of the 11 points on the 0 to 10 scale, groups together ranges of ratings with severity categories and color codes (ie, 1 to 4 = “mild,” presented as green; 5 to 6 = “moderate,” presented as yellow; and 7 to 10 = “severe,” presented as red), and depicts a faces pain rating scale simultaneously alongside the NRS. The Defense and Veteran’s Pain Rating Scale uses a novel approach to guide individuals who are rating their pain intensity, although one potential drawback is that many of the anchors combine pain intensity, pain interference, and affective aspects of pain (eg, the anchor for a rating of 8 is “awful, hard to do anything”). This suggests that the measure would have limited suitability for assessing sensory pain intensity independently from the affective component of pain and functional consequences of pain.23 Additional research is therefore needed to optimally define the appropriate response options for pain intensity ratings for specific populations with pain. Another approach for increasing the validity of pain ratings may be to develop composite measures that incorporate the patient’s or research participant’s subjective pain rating but that also include ratings by well-trained clinicians or interviewers who carefully observe key pain-related behaviors and affects (eg, wincing, favoring an area of the body). Such a measure would involve more aspects of the

2014 Wolters Kluwer Health, Inc. All rights reserved. Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved.

www.clinicalpain.com |

601

Clin J Pain

Dworkin et al

pain experience than just its sensory component. However, the widespread and enduring use of the Hamilton Depression Rating Scale24 in RCTs of treatments for major depression indicates that a clinician-based assessment of a subjective experience can be both valid and responsive to treatment effects (although false-negative outcomes have plagued RCTs of major depression as much as analgesic trials). Given the ubiquity of electronic devices, such a measure might even include electronic capture of facial displays of pain.25 Finally, there seem to be increasing threats to the integrity of clinical trial data resulting from various types of participant and investigator misconduct.26–28 We believe that blinding research participants and study site personnel to as much information as possible about pain rating inclusion and exclusion criteria is a straightforward method for increasing the validity of pain ratings because it has the potential to attenuate baseline score inflation, participant fabrication, and other sources of poor-quality data.5,29

CONCLUSIONS We asked “how might the validity of pain ratings be improved?” in the subtitle of this article. Although this question assumes that the validity of pain ratings should be improved, at present, there are no compelling data indicating it will be possible to do so. Nevertheless, the high failure rate of clinical trials of efficacious pain treatments demands that efforts be made to determine whether improved validity can also improve clinical trial assay sensitivity.5,30 Thousands of patients around the world enroll in clinical trials in the hope that their participation will contribute to the development of treatments with improved efficacy and safety, and we owe it to these research participants to design, analyze, and execute clinical trials using methods that ensure the data are as informative as possible. REFERENCES 1. Jensen MP, Karoly P. Self-report scales and procedures for assessing pain in adults. In: Turk DC, Melzack R, eds. Handbook of Pain Assessment. 3rd ed. New York, NY: Guildford Press; 2011:19–44. 2. Katz N. Methodological issues in clinical trials of opioids for chronic pain. Neurology. 2005;65(suppl 4):S32–S49. 3. Max MB. Afterword: five steps to increase the payoff of chronic pain trials. Neurology. 2005;65(suppl 4):S74–S77. 4. Dworkin RH, Peirce-Sandner S, Turk DC, et al. Outcome measures in placebo-controlled trials of osteoarthritis: responsiveness to treatment effects in the REPORT database. Osteoarthritis Cartilage. 2011;19:483–492, (Corrigendum: Osteoarthritis Cartilage. 2011;19:919). 5. Dworkin RH, Turk DC, Peirce-Sandner S, et al. Considerations for improving assay sensitivity in chronic pain clinical trials: IMMPACT recommendations. Pain. 2012;153: 1148–1158. 6. Jensen MP, Hu X, Potts SL, et al. Single vs composite measures of pain intensity: relative sensitivity for detecting treatment effects. Pain. 2013;154:534–538. 7. Jensen MP, Hu X, Potts SL, et al. Measuring outcomes in pain clinical trials: the importance of empirical support for measure selection. Clin J Pain. 2014;30:744–748.

602 | www.clinicalpain.com



Volume 31, Number 7, July 2015

8. Stone AA, Schneider S, Broderick J, et al. Single-day pain assessments as clinical outcomes: not so fast. Clin J Pain. 2014;30:739–743. 9. Dworkin RH, Siegfried RN. Are all those pain ratings necessary? Pain. 1994;58:279. 10. Jensen MP. Reply to Drs. Dworkin and Siegfried: Hopefully, all those ratings are not necessary. Pain. 1994;58:279–280. 11. Broderick JE, Schwartz JE, Schneider S, et al. Can end-of-day reports replace momentary assessment of pain and fatigue? J Pain. 2009;10:274–281. 12. Broderick JE, Schwartz JE, Vikingstad G, et al. The accuracy of pain and fatigue items across different reporting periods. Pain. 2008;139:146–157. 13. Wiggins JS. Personality and Prediction: Principles of Personality Assessment. Reading, MA: Addison-Wesley Publishing Co; 1973. 14. Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychol Bull. 1955;52:281–302. 15. Loevinger J. Objective tests as instruments of psychological theory. Psychol Rep. 1957;3:635–694. 16. Burke LB, Kennedy DL, Miskala PH, et al. The use of patientreported outcome measures in the evaluation of medical products for regulatory approval. Clin Pharmacol Ther. 2008;84:281–283. 17. Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull. 1959;56:81–105. 18. Melzack R, Casey KL. Sensory, motivational, and central control determinants of pain: a new conceptual model. In: Kenshalo D, ed. The Skin Senses. Springfield, IL: CC Thomas; 1968:423–439. 19. Dworkin SF, Von Korff MR, LeResche L. Epidemiologic studies of chronic pain: a dynamic-ecologic perspective. Ann Behav Med. 1992;14:3–11. 20. Price DD. Psychological Mechanisms of Pain and Analgesia. Seattle, WA: IASP Press; 1999. 21. Flor H, Turk DC. Chronic Pain: An Integrated Biobehavioral Approach. Seattle, WA: IASP Press; 2011. 22. Zimbroff DL. Patient and rater education of expectations in clinical trials (PREECT). J Clin Psychopharmacol. 2001;21: 251–252. 23. Buckenmaier CC, Galloway KT, Polomano RC, et al. Preliminary validation of the defense and veterans pain rating scale (DVPRS) in a military population. Pain Med. 2013; 14:110–123. 24. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56–62. 25. Lucey P, Cohn JF, Matthews I, et al. Automatically detecting pain in video through facial action units. IEEE Trans Syst Man Cybern B Cybern. 2011;41:664–674. 26. Devine EG, Waters ME, Putnam M, et al. Concealment and fabrication by experienced research subjects. Clin Trials. 2013;10:935–948. 27. Dworkin RH, McDermott MP. Back to the future: the need to resolve analgesic efficacy in human immunodeficiency virus neuropathy. Pain. 2014;155:1913–1915. 28. Institute of Medicine (IOM). Transforming Clinical Research in the United States: Challenges and Opportunities. Washington, DC: National Academies Press; 2010. 29. Rice ASC, Dworkin RH, McCarthy TD, et al. EMA401, an orally administered highly selective angiotensin II type 2 receptor antagonist, as a novel treatment for postherpetic neuralgia: a randomised, double-blind, placebo-controlled phase 2 clinical trial. Lancet. 2014;383:1637–1647. 30. Finnerup NB, Sindru SH, Jensen TS. The evidence for pharmacological treatment of neuropathic pain. Pain. 2010; 150:573–581.

Copyright

r

2014 Wolters Kluwer Health, Inc. All rights reserved.

Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved.

Reliability is Necessary but Far From Sufficient: How Might the Validity of Pain Ratings be Improved?

Reliability is Necessary but Far From Sufficient: How Might the Validity of Pain Ratings be Improved? - PDF Download Free
92KB Sizes 0 Downloads 8 Views