Assessing Utilities by Means of Conjoint Measurement: An Application in Medical Decision Analysis ARNE MAAS, PhD, LUKAS

STALPERS, MD

presented for helping patients who have laryngeal cancer to decide between laryngectomy and radiotherapy in cases where these treatments are deemed medically equivalent. The method is based on the model of additive conjoint measurement. The treatment with the higher utility is determined from pair comparisons among outcomes that vary in quality and quantity of life. Pair comparisons enable a (partial) test of the axioms of additive conjoint measurement. This is in contradistinction to earlier work on decision making for patients with laryngeal cancer, and most of the work in medical decision making in general, in which underlying axioms have almost never been tested. Besides testing the axioms, another important advantage of pair comparisons is that they avoid difficulties with other, risk-based, assessment procedures by presenting only riskless alternatives. Encouraging results have been found in a study among patients. Key words: utilities; additive conjoint measurement; pair comparisons; laryngeal cancer. (Med Decis Making 1992;12:288-297)

A method is

survival and a gamble. An example of such a gamble is a situation in which there is a 50% chance to live ten years and a 50% chance to live two years. The respondent indicates the duration of certain survival, i.e., the certainty equivalent, that he or she considers equivalent to the gamble. Slight alterations of this method are possible. For example: the certainty equivalent may be given and subjects asked to fill in the probabilities such that they are indifferent. For a good and extensive survey of these and other possibilities, see von Winterfeldt and Edwards.’ Both methods suffer from shortcomings, which can be avoided or mitigated by using conjoint measurement,’ a fundamental measurement in which respondents directly compare two or more attributes with regard to their preferability. Conjoint measurement permits levels of attributes to be nominal and enables researchers to test axioms, which has made it popular in psychology and in marketing research. Below we mention the shortcomings of the timetradeoff method and the certainty-equivalent method, and explain how they are handled by conjoint mea-

1. Introduction In this paper

we

describe the additive conjoint

apply it to a medical decision probof two treatments must be chosen. We aim to show that by presenting pair comparisons in which quality and quantity of life are to be traded off, a preference for one treatment can be derived. It is emphasized that this preference is the preference of the patient and not the physicians, and that it is based on individual analysis. We report empirical findings from a study among patients that show that the additive conjoint measurement model is useful for choosing between medical treatments on the basis of a patient’s tradeoffs. Two methods for eliciting utilities currently used in medical decision analysis are 1) the time-tradeoff method and 2) the certainty-equivalent method.l In the time-tradeoff method, respondents are asked to indicate how many life years from a given number of years they are prepared to give up in order to gain an improved quality of life. For example, the first option may be &dquo;living 25 years in bad health,&dquo; and the second option &dquo;living 25 minus x years in good health.&dquo; The respondent selects a number x such that he or she is indifferent between the two options. In the certaintyequivalent method, the respondent is asked to indicate indifference between a number of years of certain measurement and

lem in which

one

surement : a judge) of indifference is difficult to determine. For example, to this end it is not clear how to interpret a reliability measure such as the correlation between two sets of indifference values, with respect to acceptability. That is, it is impossible to state that, say, 0.80 is acceptable and 0.79 is not acceptable, because there is no clear criterion. Other difficulties with a measure such as a correlation is that its precision depends on the number of observations on which it is based, and it is rather insensitive to systematic errors. In this study, another measure of consistency is used, since the answers of

1.

Consistency (within

measures

Received May 21, 1991, from the Nijmegen Institute for Cognition and Information, Department of Mathematical Psychology, Nijmegen, The Netherlands (AM); and St. Radboud Hospital, Department of Radiotherapy, University of Nijmegen (LS). Revision accepted for

publication January 28, 1992. Address correspondence

and reprint requests to Dr. Maas: Unilever Research Laboratorium Vlaardingen, P.O. Box 114, 3130 AC Vlaardingen, The Netherlands.

288

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

289

the patients are preference judgments. This measure, which tests against randomness of choices, is introduced in section 3.

problems of eveyday life, people often choose of several options (for whatever reasons). However, the statement of complete indifference is a rare occurrence in practice. Thus, asking for preferences is more in accordance with people’s experience, and this is what is done in our method. 2. In

one

3. The time-tradeoff method and the certaintyequivalent method are almost never tested against assumptions. For instance, both methods aggregate

utilities of levels of different attributes into an overall utility, but the implicit assumption of independent attributes, necessary for such an operation, is rarely tested. In this study assumptions of conjoint measurement were tested. 4. The attribute that is traded off, or gambled with, becomes salient. That is, people tend to give more weight to this attribute. This phenomenon has been demonstrated by experiments in which the respondent first had to indicate his or her indifference point (certainty equivalent). In preference judgments this certainty equivalent later was preferred to the gamble with which it was judged to be equivalent. 4,5 The most likely explanation for this phenomenon is that the attribute that is traded off or gambled with is weighted differently from (i.e., higher than) the same attribute in a situation in which a choice is made. It may be clear that a priori neither of the attributes should receive more weight just by posing a question. A manifestation of this phenomenon occurs when a patient who is not willing to trade off any years from a tradeoff stimulus prefers an alternative offering a better quality of life but some years less to live to this tradeoff stimulus in a pair comparison.

Additional disadvantages of the method are:

certainty-equivalent

5. Risk, as represented in gambles, often induces unknown subjective expectations of probability.6Hence, if we present a 50% -50% gamble, we cannot be sure that the probabilities of the events are perceived and

processed as equal. 6. Though subjective expected utility theory is assumed to be justified for normative purposes, it is unacceptable for description of actual choice behavior.6.7 Disadvantages 5 and 6 are avoided by conjoint measurement since it

does not include risky decisions.

FIGURE 1. cancer.

T,

or

(size

Decision tree of the medical problem in a case radiotherapy, SU surgery.

RT

=

T2), radiotherapy is chosen. If the tumor is large T), surgery is chosen. If the tumor is of size T3

of the vocal cords, but no extension the choice is less obvious: the two treatments are deemed medically equivalent.8 Surgery implies that the larynx is removed; this forces patients to learn artificial speech, in which they sometimes do not succeed and hence remain mute. Impairment of speech may cause severe problems on the social-psychological and vocational levels.9 Radiotherapy preserves normal (sometimes hoarsened) speech, but carries a higher probability of recurrence of the tumor, and hence yields a shorter life expectancy than surgery. In figure 1 the decision tree for this problem is given, with hypothetical but representative probabilities. The possibility of no speech has been omitted for the sake of clarity. So, the crucial question is how to assess a patient’s tradeoff of life expectancy against quality of speech. The analysis checks whether patients make tradeoffs according to the assumptions of the additive conjoint measurement model. If a patient’s choices satisfy the assumptions, it is possible to deduce which treatment he or she prefers. It is noted that such a procedure is based on assumptions of how to choose rationally. The deduction of a preferred treatment can be regarded only as advice, based on the patient’s tradeoffs and the assumptions mentioned. (total

to

immobility

adjacent structures),

2. Additive Conjoint Measurement In the

sequel we apply the additive form of conjoint

measurement to utilities. That

study concerns the choice of treatment (radiotherapy or surgery) for laryngeal cancer. The choice of treatment in many cases largely depends on the size of the tumor. In general, if the tumor is small (size Our

of laryngeal

=

is, we have two attributes Y and Q, with levels yi, Y2’ y;, Vm and ql, q&dquo;, respectively; these levels may be q?, ..., qi, nominal. Y can be thought of as number of life years, and Q can be thought of as quality of voice. The overall ...,

...,

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

... ,

290

FIGURE 2.

Double-cancellation axiom. A (left) represents the axiom; of the axiom. The white arrows represent the

(right) is a violation implications.

B

Say we present three pairs of objects {A, B}, {A, C} and {B, C} to patients (respondents). They have to indicate for each pair which object they prefer (this is the method of pair comparisons). They violate transivitiy [Al(b)] if their preferences are A > B, B > C, C > A, or B > A, C > B, A > C. Such violations are called circular triads or 3-cvcles. In section 4 we use a measurement of circularity based on these circular triads. Of course, a circular structure of choices may also 3 objects, and is then called an moccur with m >

cycle. utility U(yi, qi) and

is the

sum

of the

t

marginal utilities Uy(y;)

UQ(qj):1

Independence: (a, p) * (a, q) implies (b, p) * (b, q), and (a, p) * (b, p) implies (a, q) ~ (b, q).

A,Z:

For instance, let For convenience we let (a, p), (b, q), or (c, r), (a, b, c E Y and p, q, r E Q), denote generic elements of the Cartesian product Y X Q. Let 2 be a binary preference relation on pairs (a, p), to which we will refer as objects. Let - be a binary equivalence relation on the objects. > is the asymmetric part of ?. For instance, (a, p) 2 [>] (b, q) means that a patient considers living a years with quality of speech p at least as good as [better than] living b years with quality of speech q. Equation 1 holds if the following axioms of additive conjoint measurement

are

satisfied’:

AI: 2 is a weak order, that is: (a) ? is connected: either (a, 2

p) 2 (b, q)

or

(b, q)

(a, p).

If both hold, then (a, p, (b, q). (b) 2 is transitive: (a, p) ~ (b, q) and (b, imply (a, p) 2 (c, r).

q) 2 (c,

r)

a

be 2 years, b 5 years, p

perfect

health, and q moderate health. As an example of the first part of A2: if 2 years in perfect health is preferred to 2 years in moderate health, this implies that 5 years in

perfect

health is

preferred

to 5 years in moderate

health. This axiom would be violated if the premise holds and the conclusion does not hold. The second

part of A2

can

be clarified

A3: Double cancellation:

(c, q)

imply (a, p) >_

analogously. (a, q) ~ (b, r) and (b, p) ?

(c, r).

As an example, let a, b, p, and q be the same as in the example of A2, and let c be 10 years, and r bad health. Then, double cancellation requires that if 2 years in moderate health is preferred to 5 years in bad health, and 5 years in perfect health is preferred to 10 years in moderate health, then this implies that 2 years in perfect health is preferred to 10 years in bad health (see also figure 2a). If the premises hold, but the conclusion does not hold (see figure 2b), then this is a

violation of double cancellation. Double cancellation can also be exemplified by addition of scale values:

A4: Solvability: The equation (a, p)- (a~, q) has a solution for a, E Y, and the equation (a, p)- (b, p,) has a solution for PI E Q. This is a technical assumption (i.e., not testable), and requires that both attributes be sufficiently dense. Stated otherwise, this axiom requires that a solution exist for specific classes of equalities (the axiom can also be formulated for inequalities).

Usually, for mathematical convenience, a so-called Archimedean axiom is added. As shown by Luce et al.,l° this axiom has no empirical content, hence has no relevance for applications. Therefore, we do not give its definition. Al, A2, and A3 can be tested em-

pirically.

The above

description

of

conjoint

measure-

approach. The nuapproach, often used in marketing research, 11,12

ment is the so-called axiomatic

merical does not test the axioms of conjoint measurement but assumes that these axioms are satisfied. Even if they are not satisfied, conjoint measurement is used as a numerical representation method to gain insight into the preference structure. In our case, however, conjoint measurement is used as a normative decisionanalytic tool to help those patients whose preferences show no or minor violations of the model (see section 4). Patients whose preferences show severe violations do not justify use of the model.

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

291

3. Rpplication to Laryngeal Cancer previous studies, 13,14 two attributes were identiimportant to patients with laryngeal cancers, namely number of life years and quality of In

fied to be very

interviewed five laryngecactive in a union of larand had much experience with larynyngectomees, gectomees’ problems. It was concluded that number of life years and quality of life are the most important attributes. However, if the possible consequences of treatments are extensively explained, then quality of life may be reduced in this circumstance to quality of

speech.

To

verify this,

tomees.’ Some of them

we

were

speech.9

visable to establish this ranking by asking a patient before presenting the pair comparisons. This will significantly reduce the number of nontrivial pair com-

parisons. Though 45 pair comparisons seems large already, it is the authors’ experience that it is far easier for patients to make these comparisons than to select time tradeoffs or certainty equivalents. The task is a so-called forced-choice task, that is, the patients are not allowed to give an indifference answer. In all pair comparisons the respondents have to pick the object they prefer. This results in questions such as: Which do you A.

Based on previous experiments, we decided to use three levels of speech quality: normal speech, artificial speech, and no speech (mute). The first level results from radiotherapy; the other two result from surgery. It can be argued that one should include hoarse speech (resulting from radiotherapy) and distinguish two levels of artificial speech (esophageal and electro-laryngeal). However, in the previous experiments, the utilities of hoarse speech and normal speech appeared to be relatively close; this also held true for the utilities of the two types of artificial speech. We used six levels of numbers of life years. Often the utility function of number of life years is steeper in the first years than in the later years. To have a reliable estimate of this function, levels of the first, say, 8 years are chosen close to each other, while the intervals between levels representing high numbers of years (say, >8) are larger. The specific levels of numbers of life years depend on the life expectancy of the patient. With three levels of speech quality and six levels of numbers of life years, 18 different objects (i.e., combinations of speech quality and life duration) can be created. To establish a preference ranking of these objects, the method of pair comparisons is used, since this method enables us to test the axioms A1, A2, and A3. Of course, trivial pair comparisons were left out. That is, if a, b E A and p, q E P and a > b, p > q, then it is trivial to compare (a, p) and (b, q), since certainly (a, p) > (b, q). However, the comparison between (a, q) and (b, q) is not trivial. Notice that trivial questions can be traced only if the levels of the attributes are at least ordinally ordered, which is the case for number of life years and quality of speech. In our case, with 3 X 6 levels, we have 45 nontrivial pair comparisons altogether. The choice of the number of levels has, as explained, emerged from previous experiments. There is no requirement in the method with respect to the number of levels to be chosen, though one should take care not to use too many levels, since the number of nontrivial pair comparisons would then grow rapidly. The preference ranking of numbers of life years is always clear, i.e., the higher of the two numbers of life years is always preferred. The ranking of health qualities may not always be clear. In such cases it is ad-

Living

prefer?

six years with artificial

speech,

four years with normal

speech.

or

B.

Living

Options A and B both imply that after the given number of life years one will die. There is equivalence if one is prepared to give up precisely two life years to preserve normal speech instead of artificial speech. If one is prepared to give up more than two years, one will chose option B. If one is not prepared to give up two years, one will choose option A. If a respondent always chooses to maximize life years (i.e., is not willing to give up life years), or always chooses to maximize quality of speech, he or she is said to show a lexicographic choice pattern. In such cases one attribute definitely outweighs the other (without a shadow of a doubt). Such people would have no (cognitive) problem in choosing between radiotherapy and surgery. The test then serves only to detect the lexicographic choice pattern at hand, but the data are not analyzed any further, since the choice between treatments for such respondents is always obvious: maximizing life years indicates a preference for surgery, and maximizing quality of speech indicates a preference for radiotherapy. This lexicography should be interpreted in the context of the applied test: for instance, there might be pair comparisons in which a person who was lexicographic on life years would trade off, say, one month of 15 years in order to gain quality of speech. Such comparisons with small differences in life years are not incorporated in the test, because they are not relevant in practice. Where we henceforth use the term lexicography, this should be understood in the context of the test used. To test whether respondents give consistent answers, i.e., do not change preferences over time, the test is applied thrice. The three replications are assumed to be independent of each other, given the (unknown) true preference. This may be disputable, but experience has shown that during a replication respondents usually cannot recall what answer they gave to the same question earlier. Consistency of answers can be estimated by the proportion of comparisons in which a respondent gives

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

292

Table 1

.

Matrix with Columns

Representing

the Extreme Vectors of the Solution

Space*

*U utility; 6, 9, 12, and 15 are numbers of years of life; EL electro-laryngeal The utilities for three years of life, U(3), and &dquo;mute,&dquo; U(M), are set at 0.

speech, ES

the same answer to the same question in all replications. When there are three replications, equation 215 can be used for the estimation of consistency:

be derived. Utilities are then estimated by Ordmet.1’,18 The input format for Ordmet is a preference ranking, which is translated into a matrix of linear inequalities. That is, each inequality is a row in the matrix representing the preference relation between two alternatives. For instance, if alternative A is preferred to alternative B, then in the particular row of the matrix representing this preference a 1 is given in the column representing alternative A, and a - 1 is given in the column representing alternative B; all other entries in this row are 0. All preference relations between pairs of alternatives are thus represented. The solution space consists of a polyhedral convex cone, and a maximin criterion searches for a solution in which the correlation between the (minimal set of) extreme vectors of this space is as high as possible. In particular, two parts of the output of Ordmet are most important. First, a matrix is given in which the columns represent the extreme solutions of the system of linear inequalities. An example of such a matrix is given in table 1. In the last column of this table, which is not given by Ordmet, the centroid solution is given (the mean value per row over the eight columns). These are the utilities that are used. The lowest levels of both attributes are not included in the presentation since they are set at 0 by Ordmet. Since the highest utility is the one for U(15), namely 5.875 (see table 1), this utility is set at 1.0. Dividing all other centroid solution values by 5.875 yields marginal utilities that lie between 0 and 1.* A logarithmic function is computed such that it best fits the utilities of life years in a least-squares sense. This function is then used in the Markov model. The second important part is the so-called maximimvector, giving correlations of the maximim solution with the extreme solutions. The lowest value of this vector is called the maximin con-elation, which is 0.9300 for patient 2.

=

X is the

=

proportion of pair comparisons to which the is given in all three replications. Tr is a

same answer

maximum likelihood estimate of v, and can be interpreted as a measure of consistency. ’IT lies in the interval [0.5, 1]. Since ir 0.5 represents the randomness of a fair coin, it should be required that 7T exceeds 0.5 significantly, say, to the extent of three standard errors. Hence, since the number of nontrivial pair comparisons is 45 in our case, it is required that: =

Thus, the minimal consistency criterion requires that the observed proportion exceed three times the standard error of a random proportion, i.e., 0.50. Standard errors of proportions can be computed by means of the binomial distribution.16 This means that the probability that the answers are actually random and fr 2: 0.63, is 0.003. Solving equation 2 for X with fit 2: 0.63, the minimal consistency may also be expressed by X 2: 0.30, which is equivalent to a consistent answer (in three replications) to at least 14 pair comparisons. We refer to answers that are the same in all replications as stable answers. In practice for was usually much higher, i.e., for > 0.85, corresponding to at least 28 stable answers. Hence, instability of answers usually was no problem in our study. Notice that this estimation of consistency does not use (estimated) utilities, but is estimated directly from the observations. Tests of axioms are based on majority preferences, i.e., the answer that is given to a certain pair comparison in a majority of the replications is said to be the (majority) preference. Notice that the object that is majority-preferred can always be determined, since the number of replications is odd and indifference judgments are not allowed. If a respondent satisfies all axioms, a preference ranking of the 18 objects can

=

esophageal speech,

H

=

hoarse

speech, N

=

normal

speech.

principle, any program can be used in which data assumed to be represented by the (in our case)

In are

*

Other normalizations can be used. For instance, utilities can be normalized such that the combination with the highest utility is set at 1.0.

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

293

additive conjoint effect of several attributes.

Usually,

by transforming the observed ordinal values (e.g., rankings), and finding the best-fitting scale values. Programs designed for finding such solutions are widely available, e.g., MONANOVA.19,20 These programs optimize a goodness-of-fit criterion such as Kruskal’s stress.’9 If a patient’s preferences satisfy all (testable) axioms Al-A3 of conjoint measurement, this will usually result in perfect fit, that is, Kruskal’s stress will be 0. For practical purposes, utilities of quality and quantity of life are assumed to lie on an interval scale with a common unit for all (in our case both) attributes. Therefore, transformations of utilities U of the form aU + b, a > 0, are allowed. For convenience, the utilities are transformed such that the highest and lowest marginal utilities are 1 and 0, respectively. After estimation of the utilities (or several, in our case six, numbers of life years) by Ordmet, the utility function for life years is estimated, usually by a logarithmic function, which is necessary for application to the Markov simulation (see later). Though the choice of the logarithmic family of functions is, in principle, rather arbitrary, it is intuitively appealing. That is, the subjective difference between living 1 and 2 years is likely to be larger a priori than the subjective difference between living 10 and 11 years. This intuition is adequately represented by a logarithmic function (see, e.g., Pliskin this is done

et

a1.21). It is debatable whether the (riskless)

utility function

determined by additive conjoint measurements can be used in a risky context, i.e., in a situation where no outcome will occur for certain. For the moment, we assume that on the basis of the estimated utilities the overall utilities of all relevant treatments can be comas

puted :

(transition) probabilities to go from one state to another. Notice that such transitions in our study are irreversible: once a patient goes from a particular state to another state (e.g., from normal to artificial speech due to recurrence of the tumor), there is no possibility to return to the previous (in our study, better) state. The transition probabilities are age-dependent (i.e., when a patient gets older, the age-specific mortality increases), and depend on the previous state only (and not on the previous sequence of states). Since all patients will finally be dead (which is called the absorbing state), we have a finite probabilistic system. Once the transition probabilities are estimated, it is possible to compute the utilities of the treatments. This is done by folding back all branches belonging to a treatment. A worked-out example of folding back for one such branch is: a patient receives radiotherapy and lives 3 years with normal speech; then there is a recurrence of the tumor, and the patient lives one more year with artificial speech. The utility for living 4 years (altogether) is, say, 0.3. Further, suppose the utilities of normal and artificial speech are 0.8 and 0.4, respectively. The computation of utilities is rather complicated, and needs further research. In our case, the following heuristic is applied to compute the utility for quality of life: 75% of the life after radiotherapy is lived with a normal voice, and, after surgery, 25% is lived with an artificial voice. Hence, the utility for quality of life is (0.75)(0.8) + (0.25)10.4) - 0.7. Going by equation 1, the utility of the combination then is 0.3 + 0.7 = 1.0. This value is multiplied by the probability that this particular branch occurs. For instance, if the sequence living 3 years with normal speech and then living a year with artificial speech has the probability 0.002 of occurring, then the utility of this particular branch is (0.002)(1.0) 0.002. This branch now has been folded back. Folding back all the branches that start from a particular treatment, and adding the utilities of the branches, gives the overall utility for that treatment. So-called sensitivity analysis is applied to check whether small changes in utility values cause a change of treatment preference. If sensitivity analysis does not show such a change, the treatment with the highest utility will be advised. By simulating a Markov model, we can better approach reality. That is, one can reckon with events such as recurrences of tumors and age-specific mortalities, whereas the computation used by others&dquo; does not allow for changes of health states. =

where U(T) is the utility of a treatment, p( y,, q~ ) is the probability of living y, years with speech quality q,, and U(y,, qj ) is the utility of the combination of y; and q¡, as computed in equation 1. The treatment with the highest utility then can easily be determined. Utilities of treatments are computed by &dquo;folding back&dquo; a decision tree (see below for an example of folding back). An example of such a decision tree is given in figure 1. We note that with additive conjoint measurement we measure the certain outcomes, e.g., living 4 years with normal speech. The transition from one quality of speech to another is not explicitly measured. In the computation of treatment utilities we assume that such transitions influence neither the valuation of quality of speech nor the valuation of life duration. Folding back can be done by the program Decision Maker.22 In Decision Maker, a Markov process can be simulated. That is, there are several states a patient can be in (e.g., normal speech, dead), and there are

4. ResuRs pilot study among 45 students was carried out to possible shortcomings of the test. On the basis of the pilot study, we concluded that with more inA

uncover

formation about the results of the treatments and the purpose of the test, it would be possible to present the test to patients.

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

z

294

Table 2 o

Utilities for Years of Life and Speech Qualities for Five Patients, and the Treatments Advised*

*This advice would have been given (RT radiotherapy, SU surgery, no advice). These results were without implications for the patients. tThe maximin correlation can be interpreted as a measure of the tightness of the solution. *These patients had T, carcinomas and would always be treated with ra=

NO

=

=

diotherapy. We presented the test to nine former patients. Three of them had been laryngectomized more than a year previously, and six were undergoing radiotherapeutic treatment at the time of testing. We considered especially the latter group to be important, since they were going through a rather unstable and uncertain period, very similar to the situation before the choice of treatment. All the patients were explicitly informed about the experimental and voluntary nature of the test. For these patients the treatments had already been determined at the time of testing. Three patients were lexicographic: two were not prepared to give up life years, one was not willing to give up quality of speech. One patient did not complete the test, but stopped after one replication. Two patients’ choices were transitive and also satisfied the other testable axioms of additive conjoint measurements. Thus, the preferences of five of the eight patients who completed the test satisfied the axioms. The remaining three patients showed cycles in their answers. The degree of circularity can be measured by Kendall and Smith’s ~.23 An adjusted measure has been developed2’; like C, it is based on circular triads. The difference between ~ and Ca is that C., is more general in the sense that it reckons with prior knowledge in a design that does not allow all triads to be circular. Both measures are asymptotically normally distributed. Their values lie in the interval [0, 11. Both are to be interpreted as to mean the proportion of the triads that can be circular, but is observed to be not circular. See the appendix for more information about this measure. The circularity of preference structures is difficult to interpret statistically, since Bezembinder25 has shown that a test with a null hypothesis of circularity is very weak, i.e., the null hypothesis is nearly always rejected. It seems straightforward to reject circularity for those structures in which less than 5% of the triads are circular. This corresponds to the requirement that Ca > 0.95. On the basis of this criterion, the response pattern of patient 5 is judged to be too circular. The responses of the other two patients whose answers showed cycles were not too circular. Their circularities were resolved by a solving

procedure.26.27 An example of solving a cycle of patient 1 is sketched in the appendix. After the solving of intransitivities, the three patients’ rankings also satisfied the other axioms of conjoint measurement. The adjusted measure of circularity ~,,, the consistency Tf, computed by equation 2, and the treatment that was determined to have the higher utility, and hence is the advised treatment, are given in table 2 for patients 1 to 5. Patients 1 and 2 were offered another version of the test, with five levels of speech (mute, electro-laryngeal, esophageal, hoarse, and normal speech), and five levels of life years (3, 6, 9, 12, 15). Patient 5 had a longer life expectancy than patients 3 and 4. Since it is possible in our procedure to adjust the presented levels of life years in the pair comparisons to the life expectancy of the patient, other levels of life years were presented to patient 5. The responses of all the patients but one were more consistent than expected under randomness. Patient 5’s responses were judged to be too circular, and also his answers did not exceed randomness; it is possible that the observed circularity in the answers can be attributed to this randomness. The discrepancy arose mainly from the fact that his answers in the first replication completely differed from his answers in the second and third replications. That is, in the first replication he was prepared to give up many life years in order to gain quality of speech, whereas in the later replications he was not so willing to give up life years. It is probable that, after the first replication, he reconsidered the questions and concluded that he would manage to

laryngectomy. Usually, in our study, the solution space was rather tight, i.e, there was little room for the utilities to take overcome a

different values such that a monotone relation bethe utilities and the preference ranking was kept intact (see figure 3). The tightness of the solution can be observed by slightly changing utilities. In figure 3, a solution of Ordmet (i.e., the centroid of the solution, as given in table 1) is given for patient 2. On the left and right vertical axes the marginal utilities of life years and quality of speech, respectively, are given. The middle axis represents the sum of these marginal utilities. The ranking of the lines on this middle axis represents the preference ranking of the alternatives as elicited by our procedure. Thus, the line connecting 3 years of life and mute intersects the middle axis at the point 0, and is the first intersection on this axis. Uy + UQ Going down, the second intersection is at 0.106, which is the line connecting 3 years and electro-laryngeal speech. The order in which the alternatives appear on the axis Uy + UQ corresponds to the preference ranking of patient 2. The tightness of this solution can be exemplified by subtracting 0.021 from the utility for normal speech, which would disturb the ordering of the combinations (12 years, normal speech) and (15 years, esophageal speech). Hence, there is little room for the utilities to have different values. A tight solution on

tween

=

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

295

strongly suggests an interval scale. This does not mean that the decision analysis is more sensitive to changes in utilities than other analyses. It is merely a suggestion that the scale is on an interval level, a property that is commonly assumed in utility measurement. Since the main goal of our procedure, namely, giving advice about treatment choices based on patient preferences, is very delicate, our conclusion about this procedure must be drawn with care. The results of this application lead us to believe that the procedure may be useful and meaningful for many patients. This belief is also encouraged by the finding of LlewellynThomas et aF8 that laryngeal cancer patients are stable in their preferences for aspects of voice quality, despite the fact that they have experienced deterioration in voice quality during radiotherapy.

5. Discussion application of additive conjoint medical decision problem is fruitful. Compared with other methods popular in medical decision analysis (the time-tradeoff method and the certainty-equivalent method), additive conjoint measurement precludes certain disadvantages. Its three most important advantages, from a methodologic point of view, are 1) the possibility of testing axioms, 2) the lack of saliency of attributes, and 3) the opportunity for estimation of consistency. Advantage 1 offers the possibility to justify or reject the use of the additive model. Advantage 2 means an avoidance of saliency as induced by indifference questions, and hence an artificially higher subjective weight, of the attribute that is traded off or gambled with. Of course, saliency of attributes may also be induced by framing or presentation style; this possibility is not precluded. Advantage 3 enables us to test the answers of patients against randomness. Thus, the disadvantages of the time-tradeoff method and the certainty-equivalent method are mitigated or avoided by using additive conjoint measurement instead of probabilities and/or indifference questions. Going by the adjusted measure of circularity, the responses of one patient, who completed the test during radiotherapeutic treatment, were judged to be too circular. The same patient’s responses also did not exceed randomness. In the future, such a patient should be given a fourth replication; the first replication should be considered a trial session, necessary for the patient to get used to the questions. Such a trial session is supported by patients’ statements that during the first replication they began to realize what the essential tradeoff between the treatments entailed. The small number of patients whose responses violated transitivity and did not exceed randomness is rather encouraging, since especially the patients who were under radiotherapeutic treatment had many mental difficulties to cope with. That is, their cancers had been We have shown that

measurement to

a

FIGURE 3. Utilities of patient 2, illustrating the lution space computed by Ordmet. U(15 years,

esophageal speech)

U(12 vears, normal

speech)

=

=

U115, ESI

U(12, N)

=

=

tightness

of the

1.0 + 0.128

0.850 + 0.298

so-

=

1.128

=

1.148

Hence, U(12, N) is preferred to U(15, ESI. If 0.021 is subtracted from U(normal speech), resulting in 0.277, then this preference is reversed, i.e., U(15, ES) is preferred to Utl2, N).

diagnosed only plied.

a

few weeks before the test

was

ap-

There is a difference between conjoint measureand Thurstone’s Case V model (TC)’9: TC assumes that objects are elements of one set. This set in our study comprised pairs of life duration and quality of speech. However, to use the utilities in a Markov model, utilities must be unraveled into (marginal) utilities of life duration, and (marginal) utilities of quality of life. This cannot be done by TC, but it can be done ment

using conjoint

measurement.

The model we used in our analysis is an additive one. This is not distinguishable from a multiplicative model if all utilities are positive. However, when a health state q that is perceived to be worse than death is possible, a shorter life duration with q is preferred to a longer duration with it. When this occurs, an additive model will not adequately represent the preference structure. In such a case, the more general multiplicative model is called for.3° When there are three or more attributes, still other models are possible (see, e.g., Krantz and Tversky31), but their values for representing preference structures within a medical context are

not

intuitively apparent.

It should be mentioned that the ditive conjoint measurement to the

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

application of adproblem of choos-

296

ing a treatment for laryngeal

cancer may rather easily similar decision be applied problems that may occur when patients have other cancers such as bladder and prostate cancers. The same decision structure, quality versus quantity of life, shows up in relation to many other diseases for which several medical treatments are available.

to

The authors thank Prof. Th. Bezembinder, Prof. W. A. J. van Daal, Dr. Peter Wakker, and two anonymous reviewers for their valuable comments on an earlier draft of this paper.

CH, Bezembinder TGG, Goode FM. Testing expectation theories of decision making without measuring utility or subjective probability. J Math Psychol. 1967;4:72-103. 16. Hays WL. Statistics. Third ed. New York: CBS College Printing, 15. Coombs

1981. 17. Roskam EE. ORDMET3: an

imin solution to

improved algorithm to find the max-

system of linear (in)equalities. Methodika.

a

In

press. 18. McClelland GH, Coombs CH. Ordmet: a general algorithm for constructing all numerical solutions to ordered metric structures. Psychometrika. 1975;40:269-90. 19. Kruskal JB. Analysis of factorial experiments by estimating monotone transformations of the data. J R Statistical Soc, Series B. 1965;27:251-63. 20. Coxon APM. The user’s



References

21.

1. Weinstein MC.

Fineberg

HV. Clinical decision

analysis.

Phila22.

delphia : W. B. Saunders, 1980. 2. Von Winterfeldt D, Edwards W. Decision

analysis and behavioral Cambridge, England: Cambridge University Press, 1986.

research. RD, Tukey JW. Simultaneous conjoint measurement: a new type of fundamental measurement. J Math Psychol. 1965;1:1-

23.

3. Luce

24.

27.

Additivity and expected utility in risky multiattribute preferences. J Math Psychol. 1980;21:66-82. Tversky A, Sattath S, Slovic P. Contingent weighting in judgment and choice. Psychol Rev. 1988;95:371-84. Machina MJ. Choice under uncertainty: problems solved and unsolved. Econ Perspect. 1987;1:121-54. Kahneman D, Tversky A. Prospect theory: an analysis of decision under risk. Econometrica. 1979;47:263-91. Stalpers LJA, Verbeek ALM, van Daal WAJ. Results of radiotherapy and surgery for glottic carcinoma. Cancer Treat Rev. 1987;14:131-

4. Von Winterfeldt D. 5. 6.

7. 8.

25. 26. 27.

on

28.

41.

for quality of life after laryngectomy. Soc Sci Med. 1991;33:1373-7. Luce RD, Krantz DH, Suppes P, Tversky A. Foundations of measurement. Vol. 3. New York: Academic Press, 1990. Green PE, Rao VR. Conjoint measurement for quantifying judgmental data. J Marketing Res. 1971;8:355-63. Johnson RM. Trade-off analysis of consumer values. J Marketing Res. 1974;11:121-7. McNeil BJ, Weichselbaum R, Pauker SG. Speech and survival: tradeoffs between quality and quantity of life in laryngeal cancer. N Engl J Med. 1981;305:982-7. Stalpers LJA, Verbeek ALM, van Daal WAJ. Radiotherapy or surM glottic carcinoma? A decision-analytic apo N 2 gery for T proach. Radiother Oncol. 1989;14:209-17.

10. 11. 12.

13.

14.

request.

Llewellyn-Thomas

HA, Sutherland HJ,

sessment of values in

9. Maas A. A model

guide to multidimensional scaling. Lon-

don: Heinemann Educational Books, 1982. Pliskin JS, Shepard DS, Weinstein MC. Utility functions for life years and health status. Oper Res. 1980;28:206-24. Sonnenberg FA, Pauker SG. Decision Maker 6.0, Operating manual. Boston: New England Medical Center, 1987. Kendall MG, Smith BB. On the method of paired comparisons. Biometrika. 1940;31:324-45. Maas A. An adjusted measure of circularity for designs with prior knowledge. University of Nijmegen, NICI, Department of Mathematical Psychology, Nijmegen, The Netherlands, 1991. Preprint available on request. Bezembinder TGG. Circularity and consistency in paired comparisons. BrJ Math Statistical Psychol. 1981;34:16-37. Maas A. A method for solving intransitivities. In: Loke WH, ed. Judgment and decision making. Singapore: Times Inc., in press. Maas A, Bezembinder TGG, Wakker PP. On solving intransitivities. University of Nijmegen, NICI, Department of Mathematical Psychology, Nijmegen, The Netherlands, 1991. Preprint available

laryngeal

ment methods. J Chronic Dis. 29. Thurstone LL. A

law of

Ciampi A, et al. The asreliability of measure-

cancer:

1984;37;283-91.

comparative judgment. Psychol

Rev.

1927;34:273-86. 30. Miyamoto JM, Eraker SA. A multiplicative model of the utility of survival duration and health quality. J Exp Psychol Gen. 1988;117:3 20.

DH, Tversky, A. Conjoint-measurement analysis of comrules in psychology. Psychol Bull. 1978;78:151-69. 32.. Moon JW. Topics on tournaments. New York: Holt, Rinehart &. Winston, 1968. 33. Roberts FS. Measurement theory: with applications to decisionmaking, utility, and the social sciences. Reading, MA: Addison31. Krantz

position

Wesley,

1979.

APPENDIX

1.

Computation off

sitivity implies that there exists a permutation of the objects, s,,), that satisfies s, > s, for i > j.32 Since there say (S1’ sz, is always an object in a transitive ordering that is not preferred to any other object, and since there is always an object in a transitive ordering that is preferred to every other object, ...

Table A is a majority preference matrix, which is to be read as follows: a 1 [0] in cell (i, j ) means a preference of i over j [j over i].The scores of the objects (1, a) to (3, c) give the numbers of objects an object is preferred to. These scores are shown to the right of the matrix under the heading s,. Under the sf, the squared scores are given; it is easily verified that the sum of these squared scores is 186. This sum is maximal if the preference structure is transitive, and tran-

s, is

=

,

0, and s,,

maximally ol

=

n -

+ l’ +

1. Hence the ...

+ 82

=

sum

of

204. The

squared sum

scores

of squared

is minimal if the scores are as nearly equal as possible-,2; if n 9 the score sequence then is (4, 4, 4, 4, 4, 4, 4, 4, 4), and the minimal sum of squared scores is 144. Maas24 shows that scores =

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

297

Table A o Majority Preferences

*1, 2, 3 denote life years,

a,

b,

c

between Nine

denote

Objects*

qualities of life, 3 > 2 > 1, c > b > a.

laMe B * Cyclic

Preference Structure of Patient 1, with Numbers of Times Preferences are Observed*

which is (186 - 144)/(204 - 144) 0.70 for the matrix in table A. The matrix in table A contains prior knowledge, for instance, it is a priori known that the combination (3, c) is preferred to all other combinations. Hence, the scores are not exactly equal, since the score of (3, c) is 8, meaning that the minimal sum of squared scores exceeds 144. The adjusted measure, called Ca, 21 can account for such deviations due to prior knowledge in a design. This means that for the adjusted measure mine¿ s; ) must first be computed (an algorithm for this is given by Maas24). For this particular matrix its value is 186. Hence Ca is 0, indicating that this preference =

structure is

2.

maximally

Solving

Part of the

*This information is needed to apply

IaM6 C *

our

solving procedure.

Transitive Order Determined by the Solving Procedure Applied to the Preference Structure in Table 4*

circular.

intransitivities

preference

structure

of

patient

1, which is

circular, is given in table B. This table gives not the majority preference (as in table A), but the number of times a preference is observed. Thus, a 2 in cell (i, jl means that i is observed to be preferred twice (out of three replications) over j (and henceis preferred over i once). Now, the solving procedure starts from the following assumptions: 1. True

preferences

are

transitive.

more intensive an observed preference is, the it is to be the true preference.

2. The

likely

3. Stable ences, and

preferences have priority over instable should receive special attention.

more

prefer-

Intensity of a preference, as mentioned in assumption 2, is measured by the proportion of times i is chosen over /. The higher this proportion is, the more intense the observed preference is. These assumptions led us to a solving procedure, which is sketched below by means of solving the observed 7-cycle of patient 1. First, according to assumption 3, special attention is given to stable preferences, i.e., those preferences that are the same in all replications. These preferences are easily spotted in

*This structure has been made transitive by transitive closure. The transitive

orderisg>f>b>e>d>a>c. table B, since they are represented by 3. Our procedure does not allow reversal of these preferences. The first step is to incorporate these preferences into a new preference structure that is to be created, and that should be transitive. Then, transitive closure is applied to this new structure; roughly speaking, in all triads of objects a, b, c where a is preferred to b, and b is preferred to c, the relation a is preferred to c must also hold (see, e.g, Roberts~i3l. Relations that result from transitive closure are also incorporated into the new preference structure. If at this first stage circular triads exist in the new preference structure, our solving procedure is not applicable. Then, by an algorithm that is not described here, a pair of objects is selected, and its corresponding majority preference is incorporated into the new structure. Then, again, transitive closure is applied, and a new pair is selected. It can be shown that this procedure gives a transitive preference structure. The cycle in table B is solved by incorporating the stable preferences in a new structure and applying transitive closure; the transitive structure is shown in table C.

Downloaded from mdm.sagepub.com at Bobst Library, New York University on April 18, 2015

Assessing utilities by means of conjoint measurement: an application in medical decision analysis.

A method is presented for helping patients who have laryngeal cancer to decide between laryngectomy and radiotherapy in cases where these treatments a...
1MB Sizes 0 Downloads 0 Views