Expert Review of Pharmacoeconomics & Outcomes Research

ISSN: 1473-7167 (Print) 1744-8379 (Online) Journal homepage: http://www.tandfonline.com/loi/ierp20

Patient-reported outcomes for US oncology labeling: review and discussion of score interpretation and analysis methods Alan Shields, Cheryl Coon, Yanni Hao, Meaghan Krohe, Andrew Yaworsky, Iyar Mazar, Catherine Foley & Denise Globe To cite this article: Alan Shields, Cheryl Coon, Yanni Hao, Meaghan Krohe, Andrew Yaworsky, Iyar Mazar, Catherine Foley & Denise Globe (2015) Patient-reported outcomes for US oncology labeling: review and discussion of score interpretation and analysis methods, Expert Review of Pharmacoeconomics & Outcomes Research, 15:6, 951-959, DOI: 10.1586/14737167.2015.1115348 To link to this article: https://doi.org/10.1586/14737167.2015.1115348

Published online: 23 Nov 2015.

Submit your article to this journal

Article views: 184

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ierp20 Download by: [University of New England]

Date: 01 December 2017, At: 02:54

Downloaded by [University of New England] at 02:54 01 December 2017

Review

Patient-reported outcomes for US oncology labeling: review and discussion of score interpretation and analysis methods Expert Rev. Pharmacoecon. Outcomes Res. 15(6), 951–959 (2015)

Alan Shields*1, Cheryl Coon2, Yanni Hao3, Meaghan Krohe1, Andrew Yaworsky1, Iyar Mazar1, Catherine Foley1 and Denise Globe3 1

Endpoint Development & Outcomes Assessment, Adelphi Values, Boston, MA, USA 2 Outcometrix, Tuscon, AZ, USA 3 US Health Economics and Outcomes Research, Novartis Pharmaceuticals Oncology, East Hanover, NJ, USA *Author for correspondence: [email protected]

This paper describes ways to approach the conceptual and practical challenges associated with interpreting the clinical meaning of scores produced by patient reported outcome (PRO) questionnaires, particularly when used to inform efficacy decisions for regulatory approval for oncology products. Score interpretation estimates are not inherent to PRO questionnaires per se, instead, vary dependent upon sample and study design characteristics. Scores from PRO measures can be interpreted at the individual and group level, and each carries its own set of statistics for evaluating differences. Oncology researchers have a variety of methods and data analytic strategies available to support their score interpretation needs, which should be considered in the context of their a priori knowledge of the target patient population, the hypothesized effects of treatment, the study design and assessment schedule, and the inferences and decisions to be made from the PRO data. KEYWORDS: patient-reported outcomes ● oncology ● score interpretation ● clinical significance ● questionnaire development ● minimal important difference ● responder definition

As a direct measure of treatment outcome, patient-reported outcomes (PRO) have long been standard in medical practice and are now encouraged for use in evaluating treatment effects in regulated clinical trials.[1–3] Further, regulatory guidance and papers published by the Food and Drug Administration (FDA) and European Medicines Agency (EMA) lend credibility to the use of PROs to evaluate treatment benefit in oncology trials.[4–7] For example, in addition to suggesting survival, progression, and tumor response outcomes, the FDA’s 2015 Clinical Trial Endpoints for the approval of non-small cell lung cancer drugs and biologics suggests the assessment of tumor-related symptoms and functioning through well-defined and reliable patient reports as a means to evaluate treatment efficacy.[5] Consequently, the use of PROs to evaluate treatment benefit beyond conventional

www.tandfonline.com

10.1586/14737167.2015.1115348

oncology outcomes has increased.[8–11] Given that oncology products in the US are approved by the FDA based on an overall benefits-torisks assessment, it makes sense that the agency would encourage the use of measures of treatment benefit directly from the patient (e.g. symptom reduction) in addition to clinical measures of the disease itself (e.g. progression free survival or tumor response). Nevertheless, cancer researchers struggle to develop, adapt, implement, and/or document the use of PRO questionnaires in trials, and regulatory bodies, particularly the FDA, have been alerted by patient, industry and academic groups that they would like to see the initiative improved in order to see more patient reports of disease symptoms and impacts in new product labeling.[11,12] The challenges in developing patient-centric end points in oncology trials to support

© 2015 Novartis

ISSN 1473-7167

951

Downloaded by [University of New England] at 02:54 01 December 2017

Review

Shields et al.

regulatory approval are similar to those in other therapeutic areas and include clearly and simply defining a patient-centric measurement strategy, selecting concepts of measurement that are important to patients in the target patient population, and selecting and/or building questionnaires that a) measure those concepts in ways that patients understand and to which they can provide meaningful responses and b) produce trustworthy (i.e. that are reliable and construct valid) and interpretable scores. Questionnaire development in regulated oncology treatment development programs are particularly challenged with identifying a sufficient number of patients to support development activities; distinguishing among and between disease- versus treatment-related symptoms and impacts; lack of acceptable, defensible, and available PRO questionnaires; and study designs that lead to missing data, preclude blinding and have no comparison group. Though the FDA does not build PRO questionnaires, they have promoted their use in clinical trials by authoring the PRO Guidance [4] and developing the Drug Development Tool (DDT) for Clinical Outcome Assessment (COA) program,13] among other initiatives. As is true of any regulatory “guidance” (as opposed to “regulations,” which are mandatory), the information provided in the PRO Guidance and the feedback that a sponsor may receive during the DDT qualification process or through other interactions reflects the FDA’s opinion or current thinking on a topic, and therefore, deviations to suggested approaches are possible so long as they are reasonable, justified and well-documented. Indeed, very recently the FDA has acknowledged that, particularly in oncology, a degree of “regulatory flexibility” to the PRO Guidance is warranted in at least some cases.[12] Despite the value of PRO data and regulatory encouragement for its appropriate and defensible use, knowledge gaps and practical challenges remain with respect to developing and implementing a PRO measurement strategy in clinical trials. In their 2009 PRO Guidance, the FDA defines PRO as “A measurement based on a report that comes directly from the patient (i.e. study subject) about the status of a patient’s health condition without amendment or interpretation of the patient’s response by a clinician or anyone else”.[4] In reality, the patient “report,” typically a number or score the patient assigns to a measurement concept on a given scale is amended, and interpreted, by others in a variety of ways before conclusions on what that number means are made and inferences are drawn from it. Outcomes researchers have struggled with this issue of interpretation of scores produced by a PRO questionnaire, and there are ongoing challenges in communicating the clinical meaning of scores produced by questionnaires,[14–16] particularly when the scores are used to inform efficacy decisions for regulatory approval.[17] Though there are number of challenges that outcomes researchers must address when developing and implementing PRO measures (e.g. establishing content validity and adequate psychometric performance of scores produced upon their use), the focus of this article is on score interpretation. More specifically, this article is intended to foster a conceptual and practical understanding of the topic by presenting the 952

applicable history and measurement nomenclature relevant to score interpretation, summarizing current US regulatory perspectives on the topic, describing several statistical approaches for evaluating PRO data and discussing a successful case from the literature. History and common challenges in understanding score interpretation

Researchers have developed a variety of ways to facilitate the interpretation of scores produced by PRO questionnaires. Introduced [18] and defined [19] almost 30 years ago, the term minimal clinically important difference (MCID) helped researchers to understand that “statistically significant” results can be misleading and may not have any meaning to patients or clinicians with respect to understanding treatment benefit (e.g. will this treatment make me feel better?) or making treatment decisions (e.g. should I change this patient’s drug therapy?). Since the introduction of the MCID, additional terms have been defined, including the commonly referenced term minimal important difference (MID) [20,21] as well as clinically important difference (CID)[22] and minimally detectable difference (MDD).[22] In principle, it is sensible and useful to define a target point of change in scores on a PRO at which a conclusion of, “yes, this is clinically important,” could be drawn. Nevertheless, MID has been described as “a little phrase with big appeal” primarily because, in a field grappling with the challenge of assigning semantic, literal or symbolic meaning to the numbers generated by PRO questionnaires and making high-stakes decisions dependent on those assigned meanings, it provides a simple, albeit misleading, answer in the form of a single threshold to the complex problem of interpreting scores on PROs.[23] Described below are several challenges inherent to the conceptual understanding of score interpretation. Additionally, these challenges are approached in subsequent sections in an effort to provide some practical guidelines as how to manage them when encountered and how researchers have successfully addressed them. Score interpretation vs measurement properties

Interpretation issues are often considered in the same context as measurement/psychometric properties. Therefore, it is not uncommon to see score interpretation results presented with score reliability and construct validity results. Indeed, in its draft PRO Guidance,[24] the FDA placed score interpretation under the heading of measurement properties, whereas its final guidance presents interpretation in its own section.[4] Given the quantitative nature of both, this is reasonable so long as the difference between the topics is recognized. Broadly, while both are germane to scores, measurement properties reflect stability and reproducibility (i.e. reliability) of those scores and that those scores are logically related with scores produced by other measures (i.e. construct validity). In particular, score interpretation is often confused with responsiveness (sometimes referred to as sensitivity to change, a special type of construct-related validity). As a measurement or psychometric property, responsiveness Expert Rev. Pharmacoecon. Outcomes Res. 15(6), (2015)

Patient-reported outcomes for US oncology labeling

characterizes the extent to which scores produced by a PRO questionnaire change in concert with actual change in the concept of measurement. Thus, while responsiveness addresses the question of whether the questionnaire can detect change, score interpretation has to do with characterizing the meaning attributed to a given score or score change over time (e.g. a change of two points on scale X is to be considered ‘beneficial’ to the patient). Importantly, inferences drawn from clinical studies (e.g. this is/is not an efficacious treatment) depend on the meaning of those scores.

Downloaded by [University of New England] at 02:54 01 December 2017

Point of comparison

Another common problem when considering score interpretation is the point of comparison — at the individual-level or at the group-level. Scores can be interpreted based on an individual’s change from baseline to post-treatment (i.e. within-person), a group’s average change from baseline to post-treatment (i.e. within-group), or the difference in the magnitude of change scores from baseline to post-treatment for two groups (i.e. between-group), and each may have its own score interpretation guidelines. For within-person changes, the phrase “responder definition” is defined as the individual patient PRO score change over a predetermined time period that should be interpreted as a treatment benefit.[4] However, the term MID was first introduced by Jaeschke et al. (1989) as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management”,[14,19] and this difference could be interpreted at the individual level or the group level, both between and within. The thresholds called MID have been used for interpretation at all three levels of measurement, including for within-person changes despite the 2009 FDA Guidance clearly linking this type of interpretation to the responder definition, causing persistent confusion within the field.[23] Context of use considerations

As early as 2011, the FDA indicated that a PRO questionnaire’s ‘context of use’ is critical toward the regulatory conclusion that the questionnaire can (or cannot) be used to support drug treatment claims for labeling.[25] A questionnaire’s context of use reflects “what the COA is intended to be used for” [26] and includes things such as the target patient population (including patient sub-types), the sought after labeling or promotional claim (i.e. what the sponsor wishes to say about their drug), clinical trial design and end-point positioning.[27] For the present discussion, it is incumbent upon the sponsor to articulate their specific PRO questionnaire context of use in order to facilitate any significant discussion with regulators as to how to interpret meaning associated with scores produced by that tool. For example, with respect to articulating the target patient population, which can be particularly challenging for oncology researchers, researchers must understand the level of health status from which the respondent starts and where we would www.tandfonline.com

Review

hope they finish upon re-testing (e.g. is there an expectation of improvement, deterioration or stability in the target patient population?). This is to say that score interpretation guidelines can only be established once the clinical expectation in both the direction and magnitude of change in a target patient population is described. The dynamic nature of score interpretation

In consideration of the several challenges that researchers often encounter with the conceptual understanding of score interpretation, one critical point to be made is that any score interpretation estimate is neither static nor a fixed property of the questionnaire itself and it would be misleading to state that, for example, a score change of 2 on a PRO questionnaire is the point at which clinical meaningfulness can be inferred, without providing the context for that determination. Instead, a score interpretation estimate is dynamic property, dependent on the data in hand, and, along with other sources of information, can be used to help inform conclusions and guide decisions. In other words, just as PRO questionnaires per se are neither valid nor invalid, neither can they be interpretable nor uninterpretable. Instead, it is the scores acquired from a particular PRO questionnaire (i.e. the data at hand), for a particular purpose, with a particular population and under a particular set of circumstances, that must all be considered in addition to pre-specified responder definitions or MID’s when drawing conclusions about the meaning of those scores. Though not exclusively a subjective research activity, score interpretation does require the researcher to bring his or her own personal knowledge, and the unique perspectives of his or her research colleagues, of the research context to bear when evaluating subjects (as individuals or as groups) as meaningfully “changed” or not. FDA perspective on the interpretation of PRO data

The FDA took a position on the score interpretation topic by stating, “Regardless of whether the primary end point for the clinical trial is based on individual responses to treatment or the group response, it is usually useful to display individual responses, often using an a priori responder definition” [4](p. 24). In other words, when evaluating efficacy claims to support product approval or labeling, the FDA is interested in seeing treatment effects described within subjects. The FDA further stated, “The responder definition is determined empirically and may vary by target population or other clinical trial design characteristics. Therefore, we will evaluate an instrument’s responder definition in the context of each specific clinical trial”.[4] The removal of the term MID and the FDA’s focus on individual-level change has been observed [28]; however, the practical implications regarding how the agency may evaluate “how” and “when” score interpretation tactics should be implemented are not well understood and a source of frustration for researchers. For example, the PRO Guidance designates individual response to treatment as “often” (i.e. not always) evaluated based on “a priori” responder definitions that are determined 953

Downloaded by [University of New England] at 02:54 01 December 2017

Review

Shields et al.

“empirically.” First, researchers have questioned whether “often” is meant to qualify the “a priori” or “responder definition” (or both). In other words, researchers will ask under what conditions an a priori-stated responder definition may not be required. Second, researchers become confused as to what constitutes an empirically determined responder definition. Our experience suggests that this is not just semantics or researchers being nit-picky or critical of the FDA, as the answers to these questions can be material and alter the trajectory of a drug program. For example researchers often do not have PRO questionnaires available to them early enough in their program to establish a responder definition prior to a pivotal trial, or do not have pre-pivotal trials built into their development program at all. Thus, the oncology researcher might enter the pivotal trial knowing that data from the trial must be used to support the use of the instrument for assessing efficacy and to document evidence for efficacy itself. Our understanding of the PRO Guidance is that the recommendations are manageable and flexible. For example, the FDA indicates that score interpretation estimates, like any other quantitative estimate associated with PRO questionnaire scores [e.g. means, standard deviations (SDs) or reliability and validity estimates] can and will vary depending on the sample characteristics and study design and will be evaluated with those details in mind. Therefore, it would not be contrary to the PRO Guidance and, further, would be reasonable and practical to identify a logical and theory-driven responder definition prior to a pivotal trial and then empirically evaluate that responder definition in the context of that trial using the data in hand. As with any plan, however, there are risks, which include the FDA not agreeing with the responder definition (which may be mitigated through communication and discussion) or the pivotal trial data not supporting responder definition. These risks may be worth taking when the alternative is to conduct an unplanned study prior to the pivotal trial, which comes with its own risks including the delay to market of a potentially efficacious treatment.

Challenges with responder definitions in oncology

Defining treatment responders (individual level) is an especially prominent issue in oncology. Traditionally, PROs have not typically been positioned as primary or key secondary end points for oncology drugs; therefore, a perceived high standard of defining responders a priori and interpreting data to meet both statistical significance and clinical meaningfulness can lead developers to forgo a PRO strategy prematurely. Although a decision to not integrate a PRO means that the work of designing and implementing a PRO measurement strategy can be avoided, developers may run a higher risk of receiving a negative review decision of their product.[29] Therefore, it is important for the oncology drug developers to have a well-thought-out PRO measurement and interpretation approach that fulfills the FDA’s requirement and at the same time, is feasible to implement. 954

Statistical approaches for evaluating PRO data in oncology

As shown in Table 1, scores from PRO measures can be interpreted at the individual level and at the group level (for both within-group changes and between-group differences). However, each level of interpretation carries its own set of statistics to be used for evaluating differences, as well as its own language that can be used in reaching conclusions regarding the differences. Individual-level comparisons

For interpretation at the individual level, the responder definition is used to identify patients who have experienced sufficient change to be able to classify them as “treatment responders.” Once individuals are classified as responders (or non-responders), then a chi-square analysis can compare the treatment group to the control group to determine if there are significantly more subjects on the treatment who have responded to treatment as compared with the control group. This proportion of response can also be presented for all points along the PRO score metric using a cumulative distribution function (CDF), where separate curves are plotted for the experimental treatment group and the control group. Encouraged in the PRO Guidance and elsewhere,[14,28] a CDF illustrates the entire distribution of patient-reported responses, at a patient level, and can do so simultaneously for both the treatment and control groups. In other words, the use of the CDF allows for the interpretation to be in the eye of the beholder. For example, if a clinician reviewing a drug label believes that a more or less stringent magnitude of change than that established as the responder definition would be meaningful to patients, then the clinician could examine the separation between the treatment arms at that level of change to be able to make prescribing decisions. Another approach to examining individual-level response, particularly within oncology, is time to improvement. Using the same responder definition, the time along the course of treatment when each individual’s change score exceed the responder threshold is analyzed in the form of a survival analysis. Time to deterioration may also be considered when the experimental treatment is expected to report stable PRO scores over time while the control treatment is expected to worsen on symptoms or side effects. A limitation of dichotomizing continuous variables and treating results as categorical (e.g. treatment responders versus non-responders) is loss of information and statistical power. For example, dichotomizing variables at the median point of the distribution reduces statistical power in the same way that eliminating nearly 40% of the data would, and is even more costly when the cutting line deviates from the midpoint,[30] which is often the case in a responder analysis. Though dichotomization is common in medical research [31] and its weaknesses are well documented,[17,32,33] oncology researchers should pay particularly close attention to the issue of loss of statistical power associated with it. In an Expert Rev. Pharmacoecon. Outcomes Res. 15(6), (2015)

Patient-reported outcomes for US oncology labeling

Review

Downloaded by [University of New England] at 02:54 01 December 2017

Table 1. Example statistical methods and corresponding interpretation for each level of comparison. Level of comparison

Statistical method

Interpretation

Within-subject

Chi-square

There are significantly more subjects on the experimental treatment who have responded to treatment as compared with the control group (comparison of percent of responders between groups)a

CDF

The proportion of subjects on the experimental treatment who experienced a change score of X exceeds the proportion of subjects on the control arm with the same amount of change (comparison of percent of responders between groups)a

Survival analysis

The experimental treatment group responded to treatment significantly faster than the control group (comparison of median number of days to improvement between groups)

Within-group

T-test or ANOVA Mann–Whitney U or Kruskal–Wallis Linear or non-linear regression Mixed model

Overall, the experimental treatment group experienced a statistically significant and clinically meaningful change from baseline to follow up (comparison of mean or median change score to 0)b

Between-group

T-test or ANOVA Mann–Whitney U or Kruskal–Wallis Linear or non-linear regression Mixed model

Overall, the experimental treatment group experienced a statistically significantly greater treatment benefit than the control group and the difference between groups is clinically meaningful (comparison of mean or median change score between groups)b

a

Assuming that subjects are classified as responders or non-responders based on a scientifically sound responder definition. Assumes group difference is significant at the a priori alpha-level and the magnitude exceeds the CID threshold.

b

environment where small sample sizes and considerable missing data are common,[34] oncology researchers need to retain as much data as possible, and the purposeful disposal of data is potentially detrimental to the clear detection of an effect. Group-level comparisons

For interpretation at the group level, summary statistics must first be produced and then the threshold that defines the CID on the PRO is applied in interpreting the summary statistics. Summary statistics are produced for each treatment arm using the statistical method that is most appropriate for the study design and the data in hand. Traditionally, repeated-measures analysis of variance (ANOVA) evaluates change scores within each treatment arm after adjusting for baseline. However, many other parametric and non-parametric methods are available to statisticians, and, particularly in oncology, methods such as mixed models that are robust to missing data under certain assumptions should be considered.[35] Regardless of the method for producing summary statistics, the method itself provides a first look into the significance of change scores at the group level. The analysis may determine that the experimental treatment group achieved a significant change from baseline on the PRO measure, or that the experimental treatment group achieved a significantly greater change from baseline on the PRO measure than did the control group. But, while demonstrating statistically www.tandfonline.com

significant effects is one level of efficacy evidence, determining if these effects are clinically meaningful is a higher level of efficacy evidence. The CID threshold for defining importance at the group level may be used to evaluate the magnitude of both withingroup and between-group differences. If the change from baseline for the experimental treatment group exceeds the CID threshold, then the group can be considered to have experienced, on average or overall, a meaningful treatment benefit. If the change from baseline for the experimental treatment group exceeds that of the control group by an order of magnitude that exceeds the CID threshold, then the experimental treatment group can be considered to have experienced, on average or overall, a greater and more meaningful treatment benefit than the control group. Using this approach, group-level changes from baseline can be evaluated for statistical significance as well as meaningfulness from the patient perspective. General considerations

Of note, it is difficult for the within-subject responder definition to be interpreted in single-arm studies unless an a priori threshold is established for the proportion of responders needed to be able to infer the experimental treatment produced a sufficient response (e.g. an effective treatment would result in at least 50% of subjects reaching the 955

Downloaded by [University of New England] at 02:54 01 December 2017

Review

Shields et al.

responder threshold for change on the PRO). However, the group-level interpretation is appropriate in single-arm studies, where it is desired that the change from baseline within the treatment arm be significant and exceed the a priori grouplevel importance threshold. Thus, despite the FDA Guidance’s focus on approaches that define individual-level change, group-level change is also relevant and potentially critical. At the group level, industry statisticians have recommended that statistical significance be the first approach to evaluating change and that thresholds for interpreting the magnitude of that change be used as supportive evidence rather than picking one approach over another.[17] Thus, while the FDA Guidance emphasizes individual-level of change using responder definitions to interpret PRO scores, it would behoove researchers to present group-level summary statistics and interpretation thresholds as well, as that level of evaluation might offer further insight and greater power to detect true treatment benefits. Discussion: A case study

Oncology researchers have the same methods available to them to interpret the meaning of PRO data that other researchers have and, as with any data analytic approach, are encouraged to consider what they would like to communicate with their data in their particular context before making a principled decision regarding the appropriate approach to employ. Because the true goal of interpreting scores on a PRO questionnaire is to illuminate the effect of a treatment on how a patient feels, functions, and survives, researchers are particularly encouraged to think about their PRO data with respect to hypothesized treatment outcomes. Researchers should also determine whether they require estimates to facilitate interpretation of change within-person, between groups, or both, in order to evaluate their hypothesized claims. A good example of how researchers have successfully integrated a sound measurement and score interpretation strategy into an oncology clinical program is that of ruxolitinib, approved for treatment of patients with intermediate or highrisk myelofibrosis (MF) in the US. Ruxolitinib was the first cancer product to include patient-reported symptom data on its label since the draft PRO guidance was released in 2006, thanks in part to a clearly specified measurement context and a simple measurement “story” that could be easily understood with respect to clinical meaningfulness. As described below, researchers had a clearly specified target patient population, a welldefined treatment outcome, pre-specified points of comparison with appropriate score interpretation evidence to guide interpretation and a strong regulatory communication plan. Specifically, ruxolitinib developers hypothesized that, in addition to an overall reduction in spleen size, treated subjects would experience secondary, albeit important, reductions in MF-related symptoms as assessed by the MF Symptom Assessment Form (MF-SAF v2.0 diary) Total Symptom Score (TSS).[36] The primary and key secondary end points were comparisons of the proportion of treatment versus placebo 956

subjects who achieved ≥35% reduction in spleen volume and achieved a ≥ 50% reduction in MF symptoms as assessed by the MF-SAF v2.0 diary TSS, respectively, from study baseline to week 24. From a regulatory perspective, the critical point to understand is not that the PRO data were used to support an efficacy claim for labeling but rather, as described by one director in the Office of Hematology Oncology Products, “It was a secondary end point, but in our mind this is why we gave the application full approval. One could quibble about the importance of reduction in spleen size, but with reduction in all the symptoms full approval was warranted” [37] (p. 3). In other words, while interpretable in its own right, the PRO data facilitated swift product approval when considered in the context of the entire program and the primary end point. As indicated above, the FDA is not able to provide a blanket guidance or “one size fits all” approach toward the development and use of PRO questionnaires, or their subsequent interpretation. Instead, the successful integration of patient-reported data into drug approval and labeling decisions is a program and trial-specific issue and requires discussion between sponsors and regulators and the ruxolitinib example provides a roadmap in that regard.[37] With respect to score interpretation, ruxolitinib researchers first showed statistically significant (p < 0.001) betweengroup (treatment versus placebo) differences across all items of the MF-SAF v2.0 diary from baseline to follow-up [38]. Though a between-group score interpretation estimate was not provided, the mean percent change in TSS from baseline was graphically displayed for both groups so that differences could be observed. Next, to show that the responder definition was empirically defensible, an anchor-based analysis was conducted, which categorized subjects as “responders” (≥50% reduction in TSS) or “non-responders” (

Patient-reported outcomes for US oncology labeling: review and discussion of score interpretation and analysis methods.

This paper describes ways to approach the conceptual and practical challenges associated with interpreting the clinical meaning of scores produced by ...
528KB Sizes 1 Downloads 11 Views