A Decision Theory Approach to Measuring Severity in Illness

by David H. Gustafson and Donald C. Holloway

Thze purpose of this study was to evaluate the applicability of a multiattribute utility model for measuring the severity of a patient's illness. A single medical problem (an analysis of the costs and benefits of different burn care systems) was used to test the model. Physicians estimated the relative importance of and severity functions for criteria influencing severity. The model's estimates of severity were compared with survival rates of more than 6000 actual patients and with physicians' rankings of hypothetical patients. Although continued validation is needed, the multiattribute utility model appears to have potential as an index for illness severity and, possibly, health status. Burn research units have been established in many locations around the country. New approaches in burn treatment are expensive, though, and their cost/benefit ratios do not clearly demonstrate their worth. A national information system on burn treatment has been established in order to facilitate comparison of different treatment methods and systems, but since the case mix of patients treated in the new centers differs markedly from that encountered in traditional settings, it is still quite difficult to compare the efficacy of new treatment techniques. The study reported here was conducted in an attempt to provide cost and benefit information about the patient care systems in one of these centers. So that treatment effectiveness could be compared, it was necessary to rate patients according to severity of condition, using some stratification mechanism that would be acceptable to a broad range of medical care providers. A number of statistical attempts at developing an index of illness severity had failed in the past, and it was clear that even if these attempts had worked the indexes would not have been accepted by the providers because they did not understand the techniques used. A decision theory approach was therefore selected because it required that providers of care be actively involved in developing the severity index and estimating its parameters. This meant that the index would likely be acceptable if it were shown to perform well. This article describes the development and evaluation of the index. Research supported by Public Health Service Grant No. HS-0031-05. Address communications and requests for reprints to David H. Gustafson, Ph.D., Department of Industrial Engineering, University of Wisconsin, Madison, WI 53706.

Spring 1975

97

Gustafson & Holloway THE SEVERITY INDEX MODEL

Assuming that the severity of a patient's illness is some aggregate of individual criteria, illness severity (S) may be represented as follows: n

S

=15

i=1

n+m

w,iB(xi) + X~ wjRj j=n+l

where

i = quantitative variable ( 1, 2, ... , n) j = qualitative variable (n + 1, n+2,. . . , n + m) w = relative weight x = extent to which a quantitative variable is present B (xi) = severity function associated with the ith variable R = presence or absence of the jth variable

All variables are assumed to be independent of one another. The term severity is used to represent the extent to which a patient is perceived by physicians to meet illness criteria. It differs from the term utility in that it does not possess any risk or trade-off considerations. Although the decision-theory model used here [1] has not been used in assessing severity of illness, this approach has proved useful in evaluating other complex problems. Gustafson used a multiattribute utility model to choose research projects [2]. A similar model has been used in a computerized job-selection system for rating job desirability [3]. The decision theory approach has also been validated in experimental contexts [4]. MacCrimmon's review article on subjective decision-making models [5] suggests other possible applications and reports on his attempt to develop such a model for selection of students. Huber has also reviewed field studies using multiattribute utility models [6]. DEVELOPMENT OF THE INDEX

In developing the index, (1) criteria were selected that were considered to be important in evaluating severity of illness, (2) measures of each criterion were developed, (3) the severity function associated with each criterion was determined, (4) the relative importance of each criterion was estimated, and (5) the evaluations of all physicians were aggregated. Each of these steps is described in detail in the following sections.

Selection of Criteria Criteria for evaluating the severity of a burned patient's condition were selected on the basis of a review of the literature and structured problem identification sessions [7]. Four physicians, one nurse, and one industrial engineer were involved in these sessions. The following criteria were selected as the most important: size of full-thickness burn, the patient's age, the patient's medical history, size of partial-thickness bum, and burn site.

98

Health Services Research

ESTIMATING ILLNESS SEVERITY

.e

Least

0I

'

10

'

'

20

'

30

I

40

50

I

60

70

90

80

100

Percent full-thickness burn

Fig. 1. Relative contribution of size of burn to severity of full-thickness burn, as independently conceptualized by four physicians.

.Z

10,

10

20

30

40

50

60

70

80

90

100

Percent partial-thickness burn

Fig. 2. Relative contribution of size of burn to severity of partialthickness burn, as independently conceptualized by four physicians. Most

a, a, 0

/

a 0

0 a.

E

I

I

0

10

30

40

50

Normalized

60

70

80

100

age

Fig. 3. Relative contribution of age to severity of burn, dently conceptualized by four physicians. Spring 1975

90

as

indepen-

99

Gustafson & Holloway

Development of Criteria Measures Criteria measures were developed using nominal and interval scales. Size of burn was measured on an interval scale as a percentage of the body surface involved. Patient age was measured in single years rather than age categories. A list of 29 medical problems was developed for evaluating a patient's medical history; since it was considered that any of the problems on this list would significantly increase the severity of a burned patient's condition, the 29 problems were assigned equal weights. Nine primary burn sites were identified: face, chest and abdomen, perineum, back of head, neck, back, arms and hands, legs and thighs, and buttocks.

Development of Severity Functions Of the available techniques for estimating utilities, the method of order has proved to be the most satisfactory for interval-scaled measures [4, 8, 9]. With this technique, subjects select the values of the criteria with which they would be most and least satisfied and then connect these two points with a curve that describes how satisfied they would be with values between these points. The four physicians on the panel used this approach for developing severity functions for full- and partial-thickness burn, age, and medical history. Results for size of full-thickness burn, results for size of partial-thickness burn, and results for age of patient are shown in Figs. 1-3 (p. 99). The similarity of the curves in these figures indicates that the judgments of the four physicians differed very little from each other. The odds approach was used to estimate severity as a function of burn site. This method was chosen on the basis of work on the odds technique by Huber [4] and by Gustafson [2]; the work of Huber and Gustafson is consistent with the results of Phillips et al. [10] on subjective probability estimation. The four physicians on the panel selected the most important burn site in terms of severity and then estimated how much more severe a burn would be at this site than a burn at each remaining site. These estimates were made on a logarithmically calibrated scale of odds ranging from 1: 1 to 1000: 1. (This scale was chosen because the literature indicates that it tends to reduce estimation bias.) The results for the four physicians were averaged, and the aggregate odds estimates were then converted into weights on a 0-100 scale. A patient whose burns covered more than one site would thus be assigned a burn site severity function on the basis of the sum of the functions of the individual sites. The severity function for a patient burned at every site would be 100.

Weighting of Criteria The five criteria (size of full- and partial-thickness burn, age, medical history, and burn site) were weighted by the four panel physicians using the odds technique described above, and the results were averaged. Table 1 shows the resulting scheme for weighting criteria, together with the severity functions for individual criteria.

100

Health Services Research

ESTINMATING ILLNESS SEVER1TY

Table 1. Criteria Used to Measure Burn Severity Criterion

Crwteion weight

Criterion

Percent of body covered by full-thickness burn ......... 0.371 ......... Age of patient ....................... 0.292 Number of past medical problems ...... 0.218

Percent of body covered by partial-thickness burn ....... Burn site

........

...........................

0.071 0.048

Criterion severity function See Fig. 1 See Fig. 3 Four or more = Three = Two = One =

100 90 75

None =

0

55

See Fig. 2 Face = 32.5 Chest and abdomen = 12.6 Perineum = 12.6 Back of head = 9.0 Neck = 9.0 Back = 8.1 Arms and hands = 5.4 Legs and thighs = 5.4 Buttocks = 5.4

APPLICATION OF THE INDEX

The severity of a burn patient's condition is calculated on the basis of the five criteria described above. The severity associated with each measure is multiplied by the relative weight of the criterion, and the products are added. For instance, a 38-year-old patient with a history of one important medical problem who had full-thickness burns covering 40 percent of his body and partial-thickness bums covering 20 percent of his body and whose burns covered all of his body except his face, head, back, and arms would be assigned a severity index of 37, as shown in Table 2 (p. 102). VALIDATION OF THE INDEX

Testing for Predictive Validity Fifteen hypothetical patients were created, with different values across all five criteria, and the patients' conditions were described on cards in conventional terms. The same four physicians who determined relative weights and severity functions for the index rank-ordered the patient descriptions according to severity. The Spearman rank correlation coefficient between the average physician ranking and the index ranking was 0.89, indicating that the index is predictive.

Spring 1975

101

Gustafson & Holloway

Table 2. Sample Calculation of Burn Severity Criterion

Percent of body covered by full-thickness burn ....... Age of patient ............. Number of past medical problems ................ Percent of body covered by partial-thickness burn ..... Bum site .................

Severity

Criterion

Crit nPatient information

function value

40 38

50 15

0.371 0.292

18.55 4.38

1

55

0.218

11.99

20

4

0.071

0.28

0.048 Severity index value

2.16 37.36

X

weight Index (from Table 1) - weight

Chest, abdomen, perineum, neck,

legs, buttocks 45

Testing for Content Validity Since the purpose of the burn severity index is to measure directly the severity of a burned patient's condition, and not merely to predict physician estimates of severity, further tests were required for validation. Content validity testing [11] is often used for validation of models of this type, but since no one test can measure content validity, developers of such models must accumulate circumstantial evidence that indicates that their models behave as expected. Comparison with Subjective Ratings by Physicians. The index ratings were first checked for content validity by comparing them with ratings of burn specialists unfamiliar with the index. Eight physicians at two other burn centers rated the 15 hypothetical patients according to perceived severity. Their ratings were compared to the index ratings and the ratings made by the four original physicians. The Spearman rank correlation coefficient between the index ratings and the ratings of six physicians at Burn Center 2 was 0.74. The correlation with the ratings of two physicians at Burn Center 3 was 0.87, which is comparable to the correlation of 0.89 registered by the physicians at Burn Center 1 who developed the index. Comparison with Another Model. The performance of the index was also tested by comparing its ratings with the ratings of a probit analysis model developed at Burn Center 2 that is used to estimate probability of death from burns on the basis of age and percent of body surface affected (both variables are used as independent indexes) [12]. When the probit analysis model and the severity index model were applied to the 15 hypothetical patients, the correlation coefficient was 0.64. As shown in Fig. 4, this relatively low correlation is due in great part to the different ratings assigned to four of the patients (K, 102

Health Services Research

ESTIMATING ILLNESS SEVERrrY

N, V, and EE) by the two models. The conditions of all four of these patients were more severe according to the severity index, in which the variables are not independent (for example, in applying the severity index model the relative effect of age cannot be estimated except as a function of size of burn). It is conceivable that the severity index model would not work well in instances that required the independent weighting of criteria, but recent comparisons of additive and nonadditive models suggests that this problem may be overstated [13]. On the other hand, the severity index model considers other factors besides age and burn size. In this case the key additional factor seems to be past medical problems. If past problems influence severity, one would expect significant differences in results for the two models. In Table 3 (p. 104), probit analysis E

ioo

z

75-c -C -

Q Q

/D

E

Cs

r=0.64

5Q0-

U0

CX

25

CC

EE K

o0 ' 0

' 10

'

20

V

N

'

30

40

50

60

Severity index

Fig. 4. Relationship between severity index scores and percent chance of death according to probit analysis model. Each letter indicates the position of an individual patient on the two scales.

results and data on age and medical problems are given for the four patients for whom the severity index ratings did not correlate with chance of death as computed by probit analysis. Along with the data for these four patients are the comparable data for the four patients with the closest severity ratings (patients S, FF, G, and Q). Differences in ratings between K and S and between N and FF are seen to result from the treatment of age and burn size; differences between V and G and between EE and Q, however, result from differences in medical histories. Patients V and EE had significant past medical problems (myocardial infarction, hypertension, and marked obesity for EE; thrombophlebitis, peptic diathesis, and marked obesity for V), and patients G

Spring 1975

103

Gustafson & Holloway

Table 3. Comparison of Severity Index Model to Probit Analysis Model Using Selected Patients Severity index

rating

Percent chance ofto death according probit analysis model

21.1 20.7

.....

Age of

Percent of

Number of past medical problems

2 47

1 3

20 55

0 0

26.9 26.3

3 47

48 8

20

55

1 1

..... .....

36.8 36.2

8 72

53 29

25 70

3 0

.....

45.9 46.8

14 70

70 70

15 40

3 0

Patient K S N

FF V G

EE

Q

.......

.......

.......

.....

patient body burned

and Q had no record of significant medical problems. Results for these patients indicate that the severity index may be a more adequate measure of severity than the probit analysis model using only age and burn size. It is interesting to note how much influence a model can have on physicians' thinking. Physicians at Burn Center 2 had been using the probit analysis model for several years to estimate percent chance of death for all burn patients. The severity estimates made by physicians at Burn Center 2 correlated very well with probit analysis estimates (r = 0.97) but less well with the severity index ratings (r = 0.74). At Burn Centers 1 and 3, however, correlations between physicians' estimates and probit analysis were not nearly so

high (0.56 and 0.65, respectively). Similarly, the correlation between physicians' estimates and severity index ratings was highest at Burn Center 1 (r = 0.89), where the index was developed. Comparison with Survival Rates of Patients. A third test of content validity was the comparison of severity index ratings with actual outcome of care. For this test, data on 6243 burn patients from the National Burn Information Exchange [14] were used. Severity indexes were calculated and divided into eight severity ranges, and survival rates were then calculated for each patient group. Ranges for the higher severity ratings had to be very wide (e.g., 40.099.9) in order to provide a sufficient number of observations per cell for each of five burn centers. Table 4 shows the results. Except where noted, each cell represents at least 30 patients. Survival rate is more or less monotonically related to the severity index in the column combining all five burn centers and approximates monotonicity in the five individual sites (see Table 4). Burn Center B shows a monotonic relationship. Burn Centers A and E have order reversals between categories four and five. Burn Center C shows an order reversal between the second and third categories. Burn Center D shows an order reversal between the third and fourth categories. 104

Health Services Research

ESTIMATING ILLNESS SEVERITY

Table 4. Relationship Between Survival Rates and Severity Index Ratings Severity index

A

Survival rate (%) by burn center C B D E

centers

99.4 99.5 97.5 100 0-3.9 ........ 100 97.9 85.6 100 94.8 4.0-7.9 ........ 100 97.2 79.6 97.0 8.0-11.9 ....... 97.8 96.8 90.8 87.1 86.9 83.8 12.o-15.9 ....... 91.9 16.0-19.9 ....... 92.9 89.3 86.6 77.6 92.3 74.7 20.0-27.9 ....... 70.7 66.3 62.0 83.3 54.5 45.9 39.9 35.6 28.0-39.9 ....... 60.9 * * .9.1 14.3 4.3 40.0-99.9 * There were not enough observations in these cells to develop reliable estimates.

98.4 94.3 94.4 87.9 87.6 70.7 49.1 9.9

This analysis was performed in 1968, and the comparison with probit analysis was made in 1972. Testing the probit analysis model against the 6243 records would have been desirable. However, these records of actual cases were no longer available. DISCUSSION

This study, completed in 1969, includes the first known use of a multiattribute utility model to measure health or health service needs. The results of the study suggest that such a model can predict the behavior of the persons who develop it and, very possibly, the behavior of other decision-makers in similar situations. Its success in predicting behavior in settings other than the one in which it was developed suggests that there is a common wisdom among physicians around which decision models can be built. The relatively low correlation (0.74) in one test setting occurred in a hospital in which another model using different severity criteria had been developed and was in operation. That model correlated better than the severity index model with decisions of physicians in that care setting. Does this suggest that the physicians involved use slightly different criteria in their decision making, or does it suggest that decision makers can be influenced by the very models built to aid them? If the latter is the case, it points to both the existing opportunities and the critical need for quality in models of this type. Results of the present study suggest that it may be feasible to use models of this nature to ascertain medically underserved areas [15] and, possibly, to measure health status-an approach that would not require the major investments in data collection and processing that characterize other approaches to these problems. If the multiattribute utility model works it can provide a quick and easy index to aid in a number of health service decisions. Acknowledgnents. The authors wish to express their appreciation to the physicians and staff of the burn research centers of the University of Michigan, University of Wisconsin, and St. Mary's Hospital, Milwaukee, for their cooperation in this effort. We also wish

Spring 1975

1 05

Gustafson & Holloway to acknowledge the valuable contributions to this work of Dr. I. Feller and Mr. K. Crane of the University of Michigan and Mr. R. Lemer of the University of Wisconsin. REFERENCES

1. Miller, J. R. A Systematic Procedure for Assessing the Worth of Complex Alternatives. Report MTR-260, Defense Documentation Center #ESD-TR-67-90, 1967. 2. Gustafson, D. H., G. K. Pai, and G. C. Kramer. A "weighted aggregate" approach to R & D project selection. IEEE Trans. 3( No. 1) Mar. 1971. 3. Daneshgar, R., G. P. Huber, and D. L. Ford. An empirical comparison of five utility models for predicting job preferences. Firm and Market Workshop Paper No. 6822, Social Systems Research Institute, University of Wisconsin, 1969. 4. Huber, G. P., V. K. Sahey, and D. L. Ford. A study of subjective evaluation models. Firm and Market Workshop Paper No. 6817, Social Systems Research Institute, University of Wisconsin, 1968. 5. MacCrimmon, Kenneth. An overview of multiple objective decision nqaking. In J. Cochrane and M. Zeleny (eds.), Multiple Criteria Decision Making. Columbia, SC: University of South Carolina Press, 1973. 6. Huber, G. P. Multi-attribute utility models: A review of field and field-like studies. Internal report, University of Wisconsin, 1972. 7. Delbecq, A. L., A. Van de Ven, and D. H. Gustafson. Group Techniques for Program Planning: A Guide to Nominal Groups and Delphi Process. Chicago: Scott, Fores-

man, 1975. 8. Fishburn, P. C. Methods of estimating additive utilities. Mgmt. Sci. 13:435 Mar. 1967. 9. Pai, G. K. A comparison of three methods of determining a utility function for money. Unpublished master's thesis, University of Wisconsin, 1970. 10. Phillips, L. D., W. L. Hays, and W. Edwards. Conservatism in complex probability inference. IEEE Trans. Hum. Fact. Electron. 7:7, 1966. 11. Nunnally, J. C. Psychometric Theory. New York: McGraw-Hill, 1967. 12. Waisbren, B. A., M. Stem, and G. E. Collentine. Comparison of Methods of Burn Treatment by Probit Analysis. St. Mary's Hospital Burn Center, Milwaukee, WI, 1974. 13. Keeny, R. Multidimensional utility functions: Theory, assessment and application. Operations Research Center Technical Report No. 43, Cambridge, MA, 1969. 14. Feller, I. The National Burn Information Exchange, An Information System for Burns. Internal report, Burn Center, University of Michigan Medical School, Ann Arbor, 1966. 15. Bureau of Community Health Services, DHEW. Report to Congress on the Criteria to Be Used for Designation of Medically Underserved Areas and Population Groups as Required by the Health Maintenance Organization Act of 1973. Unpublished report, Bureau of Community Health Services, 1974.

106

Health Services Research

A decision theory approach to measuring severity in illness.

A Decision Theory Approach to Measuring Severity in Illness by David H. Gustafson and Donald C. Holloway Thze purpose of this study was to evaluate...
896KB Sizes 0 Downloads 0 Views