J Clin Epidemiol Vol. 43, No. 6, pp. 589-595, Printed in Great Britain. All rights reserved

1990 Copyright

0

0895-4356/90 $3.00 + 0.00 1990 Pergamon Press plc

RELIABILITY OF ESTIMATES OF CHANGES IN MENTAL STATUS TEST PERFORMANCE IN SENILE DEMENTIA OF THE ALZHEIMER TYPE GERALD VAN BELLE,‘*~* RICHARD F. UHLMANN,’ JAMES P. HUGHES’*~

and ERIC B. LARSON~~~ ‘Department of Biostatistics, University of Washington, 2Division of Gerontology and Geriatric Medicine, Department of Medicine, Harborview Medical Center, University of Washington, ‘Department of Medicine and 4Alzheimer Disease Research Center, University of Washington, Seattle, WA 98195, U.S.A. (Received in revised form 5 July 1989)

Abstract-The over time.

concept of the reliability of a measure can also be applied to its change In this study we consider the growth curve approach to estimating the

reliability of change, in the context of cognitive status as measured by the Mini-Mental State Examination (MMSE) and the Blessed and Tomlinson Dementia Rating Scale (DRS) in patients with senile dementia of the Alzheimer type (SDAT). The reliability of the estimates of change is shown to depend primarily upon the length of time of observation, not the number of observations made. The estimated reliability coefficient for the change in MMSE (or DRS) at 6 months is 0.16 (or 0.08); at 2 years is 0.75 (or 0.57). The concept of signal-to-noise ratio is introduced to compare reliabilities in change scores. Reliability

Change score

Cognitive tests

INTRODUCTION

Alzheimer’s disease

component of clinical reasoning. For a progressive disease such as senile dementia of the Alzheimer type (SDAT), assessment of change and its significance is a central problem for clinicians and researchers. Changes in scores on tests are judged against a background or population of changes. The central challenge for the clinician is to determine whether an observed change in a cognitive test score represents a true change in a subject’s status or is “random variation”. The “change score” is defined as the change per unit time interval (e.g. a year) of the results of a test. The determination is influenced, in part, by the reliability of the changing score. The psychometric literature until recently has been critical of change scores [l-3]. In this paper we examine the reliability of the change scores for the Mini-Mental State Examination (MMSE) [4], and the Dementia Rating Scale (DRS) [5].

As clinicians follow patients they must interpret changes in clinical variables. Some synonyms for “change” are improvement, deterioration, decline, and loss of (or gain in) ability. All imply different levels of a measurable quantity at two or more time points. The differential diagnosis

of such changes includes aging, the natural history of the disease, response to treatment and measurement, non-standard conditions, different observers, and statistical artifacts. Interpreting change is particularly challenging in chronic diseases in which the patient’s baseline may be unstable. Such interpretations often affect prognostication and decisions regarding medical management and thus are a clinical *All correspondence

should be addressed to: Gerald van Belle, Ph.D., Department of Biostatistics SC-32, University of Washington, Seattle, WA 98195, U.S.A. 589

590

GERALDVAN BELLE et al. METHODS

The estimation

of reliability

The concepts of the reliability of an observed test score and the reliability of the observed change in a test score over time are welldeveloped and quantified [2,3]. In this section we review these concepts and discuss ways of estimating the reliability of estimates from data in a patient registry. In addition, the concept of signal-to-noise ratio of a change score is introduced. The observed score (Y) of a patient on an instrument is considered to be made up of a true score (p) plus noise (E) which is assumed to be random. For patient i write Yi=pi+Ei. The true score is assumed to vary from patient to patient (if it did not, the instrument would have no differential diagnostic value); the variance in the noise is assumed to be the same from patient to patient, and uncorrelated with p. The reliability of the instrument is then defined to be: Reliability = p = A, P ( where rr: is the variability in true scores (from patient to patient) and crf is the variability of the noise. Thus, variability in true scores P= variability in observed scores’ Reliability is therefore a characteristic associated with a group of patients-not a single patient. The reliability of an instrument is sometimes estimated from correlating paired observations. For an example see Ref. [6]. It can be shown that if there are no systematic changes from time 1 to time 2 and the correlation is positive, then the correlation coefficient is equivalent to the reliability as defined above. For comparison with [6] we calculated the correlation between the first and second observations of a score. Change or deterioration in the performance of a subject on an instrument over time can be considered in a similar fashion. The observed change is made up of a true change plus noise. In addition, the true change varies from patient to patient. In this paper we will assume (as a first-order approximation) that there is a linear change over time for each patient. Specifically,

for patient i, the observed response, Y, can be modeled by

where ai is the true score at time t = 0, pi (the slope) is the true change per unit time, ti,, t 123* * . , tki are the ki times at which patient i is observed; Q represents the random variability at time tv and is, again, termed noise. The quantities (a,, /Ii) vary from subject to subject with variances ui and cri, respectively. Hence ci is the variability of true changes in the population. The variance of the Q’S is denoted by a:.r, the usual residual from regression; it summarizes within-patient variability. A typical data set of n patients will have varying numbers and times of observations from patient to patient. These kinds of data can be analyzed in a variety of ways [2,3,7]. We will take an intuitive approach-using a growth curve model. Recent work [8-lo] provides theoretical and empirical evidence that these alternative approaches yield comparable results. In the growth curve approach separate linear regressions are fitted to the data for each patient. For each patient we calculated a regression line: Yv=ai+bit,+c,

where i = 1,. . . ,n andj = 1,. . . ,ki. Here a, is the score at time 0 and bi, the slope, is the change per unit time, for patient i. The value of a, is the estimated score for the patient when first seen. The residual from regression for each patient provides an estimate of o$.,. The variance of the observed slopes is an estimate of ai and related to the variance of the true change by the equation

where [P]

=

i

(tj-

r)*

j=l

is the sum of squares of deviations from the mean, T, of the repeated observations within a patient. The quantity a%.l/[t2] is the variance of the slope within a patient [2]. Following [2], the reliability of the change score can then be written

591

Reliability of Change Scores in Alzheimer’s Disease

In this formulation we assume that the spacings of the observations are approximately the same for all patients. This reliability is a function of three quantities: af, the variability of true changes among patients, ot., the inherent residual variability (noise) about the true change within a patient and [t’] which is a function of the spacing of the observations for a patient, and the number of observations. The relative effects of each on the reliability will be discussed. A distinction can be made between the precision of an estimate of change and its reliability. The precision of an estimate of change is o:,l/[t2]: a small value indicates that the change has been estimated precisely. However, if cri is also small, the reliability may not be high. As pointed out in [2], the reliability of a change score is a measure of how well the individuals in a group can be ranked on the basis of the observed change scores. The ratio R = o~/o~., is a measure of the variability of the true change among patients as a function of the inherent, irreducible, variability about the true change within a patient. This ratio, R, will be called the signal-to-noise ratio. It characterizes a test since it does not depend upon the length of time a patient is observed. Large values of R ratio indicate a potentially reliable measure of change. The inherent reliability of measures of change can be compared by means of these ratios, since in many cases CT;and oi., are inherent characteristics of the cognitive test. The relationship between p, and R is:

R

PC = R + l/[t+ from this formulation it can be seen that reliability of change is a function of two key factors: the signal-to-noise ratio (a test characteristic) and the spacing and frequency of measurements [t*] which is under the control of the investigator. Patients with at least three observations on one instrument were used in order to be able to estimate within-patient variability of change. Estimates are distinguished from parameters by the use of the A notation; for example, Sz, the observed variability of the changes, estimates ai. Estimates of reliability were calculated as a function of the length of time patients were observed. This was done for two reasons; first,

Table 1. Pattern of number of tests by length of time observed in half-yearly intervals-MMSE Length of time observed (yr)

Number of tests 1

One test only go.5

Reliability of estimates of changes in mental status test performance in senile dementia of the Alzheimer type.

The concept of the reliability of a measure can also be applied to its change over time. In this study we consider the growth curve approach to estima...
697KB Sizes 0 Downloads 0 Views