Computer Probability Estimates of Angiographic Coronary Artery Disease: Transportability and Comparison with Cardiologists’ Estimates:‘. ROBERT DEI RANO. ‘+ MARCO NOBBIO.’ HAKOIB OLSON.’ ADRIAN SHANDL.ING.’ MYRVIN H. ELI.NTNT.’ EDL’ARDO ~~~.EGKIA.’ DIEGO MARTINEZ-CARO.’ ALBERTO RIGHET.I-I,~ ANUR.4S JANOSI.’ WALTER STEINBRUNN.” MATTHIAS PFISTER~R.~ GIANPIETRO SANN/\,’ GIOVANNI FERARRI,’ KERN H. GUPPY.“’ JOHANN-JAKOB SC.HMIII.” JAAP DECKLRS,” HA MINH Lk..’ OL.EH BREZDEN,‘ AND ANTONIO COI OMHO”

Received

September

23. I99 I

A computer algorithm for estimating probabilities of any significant coronary obstruction and triple vessel/left main obstructions was derived. validated. and compared with the assessments of cardiac clinician angiographers. The algorithm performed at least as well a5 the clinicians when the latter knew the identity of the patients who\e angiograms they had decided to perform. The clinicians were more accurate when they did not know the identity of the subjects but worked from tabulated objective data. Referral and value induced bias may affect physician judgment in assessing disease probability. Application ofcomputer aid< or consultation with cardiologists not directly involved with patient management may assirt in more rational assessments and decision making. c IYYZ Icdemic l’ra\\. Inc

Computer assisted diagnosis of coronary artery disease has received a recent influx of interest (Z-6). This interest reflects the realization that clinicians frequently must accept uncertainty in their patients’ diagnoses, especially when * This research was supported by grants from the Veterans Administration Health Services Research and Development Agency, the Swiss Federal Government. the National Institutes of Health, the Schiller Corporation, the Columbus Heart Center, and Memorial Heart Institute. t To whom correspondence and reprint requests should be addressed at: Saint John’s Cardiovascular Research Center, 1124 W. Carson Street. RB-2. Torrance. CA 90502.

468 OOIO-4809/92 $5.00 Cupyrqht i’ 1992 by Acadcmw Press. Inc. 411 right\ of reproducflon in any fotm reserved

DETRANOETAL.

469

coronary angiography is not warranted or cannot be performed. Most computer algorithms involve the estimation of a disease probability whose accuracy is verified by comparisons of the computer’s results with those of coronary angiography in a test group of patients. We know of only one attempt to compare the accuracy of such probability estimates with the judgment of clinicians (7). That comparison involved a literature-derived algorithm since shown to be limited when compared to data-based algorithms (I). Furthermore, no attempt was made in that investigation to compare the estimates for the very important angiographic subset of triple vessel/left main coronary disease with those of clinicians. The present investigation compares a logistic regression-derived algorithm with two separate estimates of probability made by the same group of cardiac clinicians. A similar algorithm for predicting triple vessel/left main disease is also derived and its results are compared with those of the clinicians. METHODS Training

Set

The group of patients from which the algorithms were derived, called the training set, consisted of 3549 individuals consecutively referred for coronary angiography at eight institutions. Referrals were made to confirm the presence or the severity of coronary disease. No patient had known serious valve disease or prior cardiac catheterization. In contrast to our previous work (l-4), patients with myocardial infarction were included in the training set. Clinical data and noninvasive test results were recorded prior to (and therefore without knowledge of) the angiographic results. Data were collected from patient records for 22 clinical and test variables. The clinical variables were age, sex, historical or electrocardiographic evidence of a prior myocardial infarction, chest pain type, resting systolic blood pressure, history of hypertension, smoking history, history of diabetes, family history of coronary artery disease, resting electrocardiogram, serum cholesterol, and fasting blood sugar in milligrams per deciliter. Exercise variables included achieved work load in METS, maximum achieved heart rate, resting heart rate, exercise-induced changes in systolic blood pressure, presence or absence of exercise-induced angina and/or hypertension, exercise-induced ST slope, exercise-induced ST depression relative to rest and exercise-induced thallium perfusion defect. Of these 22 variables, only the 15 listed in Table I were used as candidate variables in deriving the computer algorithm. The rationale for this choice is explained below. Conventional visual assessment by a panel of at least three cardiologists was used to analyze the coronary angiograms in all centers. Disease was defined as a greater-than-50%-diameter narrowing in at least one major coronary arterial branch. This criteria was chosen since it has been utilized in more investigations (8) than any other criteria. Triple vessel/left main disease was defined as a greater-than-50% luminal narrowing of all three major epicardial branches and/ or the left main coronary arteries. Histories, physical examinations, and all

PKOBABILITY

370

CLINICAL

AND TUT

OF C‘OKONAK>

VARIAHLES

tis~o

i)lbt ;i>l

ro D~.RI\F

AI WKII-HM~

I. Age (year>) 1. sex 3. Chest pain characteristics T~@uI/ trnginrr pector-is: Pain that occurs in the anterior thorax. neck. shoulders. .jaM. or arms and is precipitated by exertion and relieved within 20 min hy rest Arypic~ul ungina: Pain in one of the above locations but which either is not precipitated b) exertion or is not relieved by rest within 20 min Nonanginul puin: Pain that is not located in any of the above locations, iz not related to exertion. is fleeting (less than IO set). or lasts longer than 30 min No puirl 4. Documented history of prior myocardial infarction 5. Resting systolic blood pressure (mm Hg) 6. Serum cholesterol (mg%l 7. Resting electrocardiogram (Abnormal = left ventricular hypertrophy, resting ST depresblon. or left bundle branch block) Very abnormal = Q wave (Minnesota Code) 8. Fasting blood sugar (>I20 mg%‘c) 9. Exercise workload achieved (METS) IO. Maximum exercise heart rate I 1. Presence or absence of angina with exercise 12. Hypotensive changes with exercise (exercise-induced drop in systolic blood pressure relative to baseline) 13. ST slope at peak exercise (Abnormal = horizontal or downsloping) 14. Exercise-induced thallium perfusion defect 15. Exercise-induced ST depression

noninvasive tests were performed within 6 weeks before the date of the coronary angiogram. Patient data were collected in eight participating centers. The numbers of participating subjects in each center are shown in Table 2. Since the individual data collection proceeded in a forward longitudinal direction. this was a prospective investigation (9). Table 3 shows the distribution of the 15 candidate variables in the training set (10). Table 4 shows the distribution of the IS variables in the test set. Derivation of the Algorithm

The derivation of the algorithm utilized four guidelines determined by the consensus of a panel of eight clinical cardiologist investigators. 1. (a) The variables used by the algorithm should be significantly associated with the endpoint (angiographic disease) in the group of derivation. (x2 analysis and t tests were used to determine the significance of the association with angiographic disease.)

DETRANO TABLE PARTICIPATING

471

ET AL, 2 CENTERS

Center

Number of subjects

Period of data collection

1664 829

1978-1989 1983-1989

303 295 229 86 85 58

1981-1984 1984- 1987 1986-1989 1987-1989 1985 1985

Training set University Hospital, Geneva. Switzerland Hungarian Institute of Cardiology, Budapest, Hungary Cleveland Clinic (8). Cleveland, U.S.A. Erasmus University, Rotterdam, Holland Northwestern University, Chicago, U.S.A. Santa Anna Hospital, Como, Italy University Hospital, Basel, Switzerland University Hospital, Zurich. Switzerland Test Set Long Beach Veterans Administration Medical Center, Long Beach, U.S.A. Memorial Heart Institute Long Beach, U.S.A. University Hospital, Pamplona, Spain

194 136 67

(b) The variables were judged clinically relevant by the panel. The chosen variables are listed in Table 1. Smoking and family history were excluded from analysis by the first criterion. History of hypertension and diabetes as well as resting heart rate were excluded by the second since the panel felt that the first two were less important than the fasting blood sugar and systolic pressure and that the third did not contribute any information toward the diagnosis. 2. The algorithm should allow the clinician user to utilize available information concerning his patients. It should not require him to perform tests whose sole purpose would be to make results available for computer entry. For example, a clinician evaluating a subject with chest pain may or may not have an exercise test result available. His patient should not have to undergo one only because the computer algorithm requires exercise results. To adhere to this guideline and to preserve the interdependence of the clinical and test information, it would be necessary to calculate all clinically relevant combinations of regression coefficients or to make the training database available to calculate coefficients “on the fly” as needed. The former course was chosen for this research. 3. Guidelines 1 and 2 would require that the algorithm consist of 2”-1 logistic regression expressions, one for each possible combination of the 15 variables. Since such an enormous calculation would be difficult and make the use of the algorithm unwieldy on desktop personal computers, three rules were conceived

Variable I. 7. 3. 4. 5. 6. 7. 8. 9. IO. I I, 12. 13. 14. 15.

Age (year\) Sex (5%men) Typical angina ($5) Prior myocardial infarction (%) Systolic BP (mm Hg) Cholesterol (mgidl) Abnormal resting ECG (%I FBS > 120 mgidl (5%) Workloads (METS) Peak heart rate (beatsimin) Angina with exercise (%) Hypotension with exercise (%) Abnormal ST slope (%) Exercise-induced thallium defect (%I Exercise-induced ST depression

Mean or “r

Standard deviation

54 79 47 24

IO

136 252 42

IY 60

N-730 X-603

16 6.0 130

2.5 24

1.5-220 60-202

71

-

0.7

1996 II9 I I9 510

3 67 83

Y I906 5X

--

-

-

1.0

O-6.2

381 I754 I776 169

-

and applied in order to reduce the number of expressions to those consisting only of “clinically relevant” combinations of the variables in Table 1. The rules were: i. Age, sex, and chest pain type must be present. Since information about the first two is always available, information about chest pain is the determining factor. The patient must be able to give a coherent history about chest pain type. The chest pain classification used is described in Table 1. ii. If exercise variables are present, then the workload in METS, the maximum exercise heart rate, and the presence or absence of chest pain during exercise must also be available. All three of these are easy to measure and calculate during almost every exercise test. iii. If an exercise test is done and the slope of the exercise-induced ST depression can be determined, then the actual measurement of the ST depression must also be determined. 4. Each variable should contribute significant and independent information to disease probability. Therefore, the derivation used a forward stepwise approach (II). This approach avoids the occurrence of insignificantly associated variables which add only noise to the probability estimations.

DETRANO TABLE THE

15 VARIABLES

ENTERING Mean

Variable 1. Age (years) 2. Sex (% men) 3. Typical angina (%) 4. Prior myocardial infarction (%) 5. Systolic BP (mm Hg) 6. Cholesterol (mgidl) 7. Abnormal resting ECG (%) 8. FBS > 120 mg/dl (%) 9. Workloads (METS) 10. Peak heart rate (beatsimin) 11. Angina with exercise (%) 12. Hypotension with exercise (%) 13. Abnormal ST slope (%) 14. Exercise-induced thallium defect (%) 15. Exercise-induced ST depression

or %

473

ET AL. 4

THE

MODEL

Standard deviation

IN THE

TEST

Range

56 86 44 233

11 -

23-84

134 228 57

22 52 -

90-200 105-545

37 6.5 136

2.8 25

9

0 44 17

1.5-15.0 68-191

38 169 169

-

173

-

179

64 79 1.4

Number with missing data

-

-

71

SET

198 332 1.4

o-5.5

189

Calculations On the basis of the above four guidelines, 416 combinations of the 15 candidate variables listed in Table 1 were chosen to enter the stepwise logistic regression calculations. The derivation differed from that reported in a previous publication in which no attempt was made to choose the best fitting model for each of the combinations of candidate variables. The details of these calculations are described in the Appendix. The final (416 x 16) coefficient matrices (available by mail from the principal author, R.D.) were stored in an indexed random access file for later use in the validation of the algorithm. Validating

the Algorithm

Validation of the algorithm followed two guidelines: 1. The algorithm should be tested in institutions other than those from which it was derived. This is a variation on a standard principle used in validating prediction models (12). It requires not only that the individual subjects used to test the algorithm not be used to derive it but, in addition. that they be from

-174

PKOHABILI’I‘Y

OF COKO~FAK~

i)Isb..,l:~i

different laboratories. The transportability ofthe algorithm could thus be cvaluated (13). 2. The algorithm should be compared against the judgement of expert cardiac clinicians who are ignorant of the angiographic results. This latter requirement was judged essential since the utility of computer-assisted diagnosis depends on its accuracy relative to that of a practicing clinician who might USC:such algorithms. The second guideline requires some discussion. In clinical situattons, physicians are knowledgeable not only of the objective clinical and test data available to our computer algorithms, but also of other information as well as of subtle cues and other behavioral hints. The cardiologists responsible for conducting the angiograms were board certified staff physicians with an average of 9 years experience as practicing cardiologists and angiographers. These physicians, who had intimate knowledge of the clinical cases, were asked to make a preangiographic estimate of the probability of any greater-than-50Sdiameter obstruction as well as of triple vessel or left main obstructions. In order to reduce value-induced bias (14. 15) a second assessment of disease probability was obtained from the same angiographers. This was done by mailing anonymous reports of only the data available for the items in Table 1 to each of the angiographers at 2 to 3 months following the cardiac catheterization. These reports were randomly scrambled within each center and distributed equally among the angiographers at that center making sure that no angiographer received a report for a patient for whom he had performed an angiogram. The angiographers were asked to assess the probabilities of any angiographic disease and triple vessel/left main disease and return their assessments to the investigators. Discrimination The discriminatory power of the computer algorithm and of the physician’s assessments were determined using the receiver operating characteristic (ROC) curves (16). The areas under these curves represent the probability that a randomly chosen diseased patient has a higher probability estimate than a randomly chosen nondiseased patient. These curves were plotted for the computer and physician assessments of the probability of any greater-than-50% obstruction as well as for the assessments of the probability of triple vessel or left main disease. Areas under the curves were calculated and compared using the method of Hanley and McNeil (17). This method takes into account the correlation between the areas induced by the paired nature of the data. Reliability ROC curves measure only discrimination. It is desirable also to obtain a measure of the calibration or reliability. Reliability indicates how well the estimates which were derived from the rule reflect the disease prevalence in the

DETRANOETAL.

475

test group (18, 19). To achieve this, we performed the “goodness of fit” analysis (20). The probabilities for each of the three assessments in the case of any greater-than-50% obstruction and in the case of triple vessel and left main obstructions were sorted in ascending order. They were then divided into quintiles of equal size and the expected or average probability in each quintile was calculated. The prevalence of disease or triple vessel/left main disease in these quintiles was then subtracted from these calculated averages. The difference is a measure of the overestimation per quintile and shows how faithful the estimates are to actual disease prevalence at low, intermediate, and high probabilities. Clinical

Utility

(“Making

a Diagnosis”)

It has been stated that physicians are not interested in the probability of disease; they are interested in making a diagnosis. Although the statement itself is problematic, one can translate the continuous set of probabilities derived from the computer algorithm or from physician’s judgement into dichotomous “diagnoses.” Such a translation is made simply by choosing cutpoints of probability and stating that subjects with probabilities higher than the cutpoint have the disease, while those with probabilities below the cutpoint do not. For any method of probability assessment, a certain number of correct and erroneous “diagnoses” will be made. The percentage of erroneous diagnoses was calculated for the computer algorithm and the two physician assessments for any greater than 50% obstruction as well as for triple left main obstructions. This error rate was plotted against the probability cutpoints for the three assessments. McNemar’s test was used to compare correlated proportions. RESULTS

The training set consisted of 3549 patients from eight institutions. The test set consisted of 397 patients from three institutions. Table 5 shows the regression coefficients for the 15 variables for the case when all were allowed to be candidates for the stepwise procedure. Tables 3 and 4 show the mean standard deviations and ranges for the variables allowed to enter into the derivation of the computer model and into the decisions of the clinicians. No attempt was made to insure that the data used to derive the computer model (Training Set) was similar to that on which it was tested (Test Set). In fact, differences in populations on which such models are meant to be used are expected. We see from Tables 3 and 4 that significant differences in sex distribution, percentage with prior myocardial infarction, percentage with an abnormal resting electrocardiogram, percentage with an abnormal fasting blood sugar, and percentage with exercise-induced hypotension did occur. Though one might expect disease prevalence to be dissimilar in the training and test sets, Table 6 reveals that the prevalence of any angiographic obstruction and that of triple vessel or left main obstruction were not very different in these two groups.

476

i’KC~HABll,l’f\r’

OF

C‘OKOh:\Ki

Intercept I. Age (years) 7. Sex (male = 0, female = I) 3. Chest pain type (1 = typical angina. 2 =: atypical angina. 3 = nonanginal. 4 = asymptomatic) 4. History of prior infarction 5. Systolic blood pressure 6. Cholesterol 7. Rest ECG (0 = normal, 1 = ST abnormality, 2 =

.\\I

I>

0

l).Ol42 0 I.043

0 (’ II

Q waves)

8. Fasting blood sugar 9. METS 10. Peak heart rale 1I. Exercise angina (0 = absent, 1 = present) 11. Exercise hypotension 13. Exercise ST slope 14. Exercise thallium defect (0 = absent, 1 = present) IS. Exercise ST depression (mm) Note.

l-)141-

0 -0.0232 0.342

0 -0.0217 0

0 0 0.814

II 0 0.502

0.513

0.552

Zero entries signify that stepwise procedure rejected that variable

Discrimination

Figure I displays the ROC curves for the computer algorithm estimates of probability and those for the clinicians both when they knew the identity of the patients (not blinded) and when they were asked to make estimates without knowing the patients’ identities (blinded). The areas under these curves are indices of the power of estimates to distinguish diseased from nondiseased

TABLE ANGLOGRAPHIC

No disease Any >SO% obstruction Triple vessel/left main disease

6 SUBSETS

Training set (N = 3549) (5%)

Test set (N = 397) (%‘c)

36 64

33 67

18

34

DETRANO

s

0.8

-

e n Computer (0.80+I- .oa 7 O-8i

+

Not

-A-

Blinded

blinded

477

ET AL.

t

0

/

/

0.2

0.4

,

0.6

0.8

1

False Positive Rate FIG. 1. Receiver operating characteristic (ROC) curves for the prediction of angiographic disease in the test group. The area under the curve for the blinded physicians is significantly greater than the areas under the other two curves.

subjects. The areas for the computer algorithm and the unblinded physicians are similar (0.80 ? 0.03 and 0.82 +- 0.02). However, the area under the curve for the blinded clinicians (0.88 + 0.02) is significantly higher (P < 0.05). This indicates that the blinded physicians could more accurately discriminate between subjects with angiographic disease and those without angiographic disease than could the same physicians when they knew the patient’s identity and also that the blinded physicians could discriminate better than the computer algorithm. For the more difficult problem of diagnosing triple vessel or left main disease there were no significant differences between the ROC areas for the three methods of estimating probability (Fig. 2). The computer algorithm had a discriminatory power equal to that of the clincians either with or without personal knowledge of their patients. Reliability Figure 3 shows the extent to which the three methods of estimating probability overshot the mark. The heights of the bars are the differences between the mean probability estimates and the actual disease prevalence in the five ascending quintiles of probability. For example, the first three bars represent the overestimation of probability in the lowest quintile (low probability of disease). When the physicians knew who the patients with such low probabilities were, they overestimated the disease probability by 10%. When they knew only the objective data, they overestimated by only 2.5%. The computer, on the other hand, did not over- or underestimate the probability at all in this quintile. The unblinded physicians were significantly less accurate in the second (lowintermedi-

478

I’KOBABILITY

OF COKOIAKl

l,l>t..\\i

0.2

0.6

S e -+

Computer

n

jo.72 +/- .03)

p

Not

t

blinded

(0.69 +

+I- .03)

i V

Blinded

(0.70 *I- .03)

i t Y

0

0.4

0.8

1

False Positive Rate FIG. 2. Receiver operating characteristic (ROC) curves for the prediction of triple vessel/left main disease in the test group. There are no significant differences in the areas under the three curves.

ate probability) and fifth (high-probability) quintiles. Here they overestimated by as much as 25 and 17% 0’ < 0.05 compared to the computer or the blinded physicians). None of the other differences in the heights of the bars shown in the table are significant. Similarly, the unblinded physicians overestimated disease probability for triple vessel/left main disease significantly more than did the same physicians when they were blinded to the identity of the patient and more than the computer

Overestimation

of Probability

-.-

0.3,

0.1 -: i , : L,

0

-0.2 1 -Computer

&

@@#Not

blinded

i~Bllnded

-~.. First

Second

Third

Fourth

Fifth

Quintile of Probability FIG. 3. Overestimation of probability of angiographic disease per quintile. The probability overestimations for the nonblinded physicians were significantly greater in the second and fifth quintiles.

DETRANO Overestimation

479

ET AL.

of Probability

0.5

-0.1

-

-0.2

-

-0.3

! First

Second

Quintile

Third

Fourth

Fifth

of Probability

FIG. 4. Overestimation of probability oftriple vessel/left main disease per quintile. The probability overestimations for the nonblinded physicians were significantly greater in the fourth and fifth quintiles.

algorithm for triple vessel/left main disease. This was true in the fourth and fifth quintiles (intermediate-high and high probabilities) as is shown in Fig. 4 (P < 0.05 for comparisons with the other bars). Once again, the reliability of the computer and that of the blinded physicians was not significantly different.

Clinical

Utility

(Making

a Diagnosis)

Figures 5 and 6 show the percentage of erroneous diagnoses made by the computer, by the physicians when they knew the identity of their patients (not blinded), and by the physicians when they knew only the objective data (blinded). Figure 5 is for the prediction of any greater-than-SO% angiographic obstruction and Fig. 6 is for the prediction of triple vessel/left main obstructions. The blinded physicians who did not know the identity of their patients made significantly fewer errors for both of these diagnostic problems than did the unblinded physicians. This decrease in errors was most marked for low to intermediate probabilities (between 0.15 and 0.65) where it was significant at the 0.05 level using McNemar’s test. The blinded physicians made significantly fewer erroneous diagnoses than the computer algorithm for the problem of identifying subjects with any greater-than-50% obstruction. However, the computer algorithm performed as well as the blinded physicians and made fewer errors than the unblinded physicians for the problem of identifying subjects with triple vessel/left main disease (P < 0.05 by McNemar’s test between probabilities 0.25 and 0.65).

480

I’KOB.4HILITY

OF C‘OKON?\KL Erroneous r------~

50%

IIiSt.

\,\I

diagnoses

40%-

-1~

0%

0

0.1

-~~

0.2

~~

0.3

0.4

0.5

Probability

0.6

0.7

0.8

0.9

1

Cutpoint

FIG. 5. Percentage of erroneous diagnoses at each probability cutpoint for predicting angiographic disease. These percentages were significantly lower for the blinded physicians than for the computer algorithm between probabilities of 0.15 and 0.65 (P < 0.05).

DISCUSSION

Angiographic diagnoses or “subsetting” has been applied to decision making regarding surgical therapy for coronary artery disease (21-2.5). Despite obvious problems with the visual interpretations of coronary angiograms (20-22). sophisticated digital subtraction (26), ultrasound (27). and invasive radionuclide (28)

Erroneous

diagnoses

60% '?

-

Computer

-4-

Not

+A-

Blinded

40%-

blinded

30%-

20%&

,0%u-----

0

----I

0.1

0.2

0.3

0.4

Probability

~.

0.5

0.6

0.7

j

0.8

0.9

1

Cutpoint

FIG. 6. Percentage of erroneous diagnoses for predicting triple percentages were significantly higher for the nonblinded physicians between probabilities of 0.25 and 0.65 CP < 0.05).

vessel/left main disease. These than for the computer algorithm

DETRANO

ET AL.

481

techniques are rarely used in clinical practice and decisions are usually based on visual angiographic interpretations along with clinical and exercise findings. Since coronary angiography involves significant costs and risks, the preangiographic determination of angiographic subsets is of considerable interest. Since perfect, discrete diagnosis and/or subsetting is impossible before angiography, clinicians frequently use probabilistic terminology to classify their patients, and thereby make management decisions. A patient classified as “very low probability of significant disease” is unlikely to be chosen for angiography while one classified as “high probability of triple vessel/left main disease” is quickly listed on the cath board. Because these verbal assessments lack precision, investigators have suggested replacing them with numerical probabilities expressed as percentages or proportions. A 10% probability of disease means that I out of every 10 such subjects will have disease while the interpretation of “very low probability” differs from one clinician to another. Most of the attention on coronary disease probabilities has focused on the estimates of computer algorithms (1-7,28-30). Though there has been considerable literature on physicians’ subjective probability estimates of coronary prognosis (32-34), there has been far less on physician-estimated probabilities of diagnosis or subsets (7). We performed the present investigation in order to evaluate the accuracy of physician estimates compared with those of a computer algorithm derived from a multicenter data base on catheterized patients. We have tested and published results of the accuracy of similar algorithms (I, 3. 4). Our algorithms have the distinct advantages of having been validated at institutions other than those at which they were derived (35) and of accounting for the interdependence of the independent variables (36). Still, the results presented here show that: 1. Our algorithms are not more accurate than carefully made decisions of cardiac clinicians who do not know the identity of the subjects whose data they are evaluating. 2. The algorithms are more reliable than the estimates of clinicians knowledgeable of their patients’ identity and data. 3. This latter improvement in accuracy is greater for predicting triple vessel or left main disease than it is for predicting any significant angiographic obstruction. We will discuss these three results separately. Equivalence

of Physician

and Computer

Estimates

Many reports comparing physician and computer-assisted estimates of coronary prognosis have shown little difference in accuracy (35). The computer program derived from patient data collected in other institutions and the physician with experience in his own institution have distinct advantages. The computer has the advantage of cold objectivity. It “knows” the contribution of each item of data, it makes certain assumptions concerning normality of data and

similarity of means and variance, and it computes a probability accordmgiy The physician has the advantage of experience with similar patients whose prognosis and diagnosis relate to clinical symptoms. signs. and test results in a similar way. We expect. though it has not been shown here, that a computer algorithm derived in the institution where it is validated might achieve higher reliability than that achieved by clinicians. We have already discussed the severe limitations on the applicability of such an eventual result. Poor Performance

of Clinicians

Knowledgeable

oj’ Patient

Idetltitl’

The cardiologist angiographers participating in this investigation were asked to estimate their patients’ disease probability twice. Their first estimate was made shortly before the angiographic procedure. They knew both the identity and the clinical and exercise results of their patients at that time. The second estimate was made by the same physicians but approximately 3 months after the angiograms. At that time, an anonymous tabulated report of the available data for the items in Table 1 was made for each patient; the reports were randomly scrambled at each institution and sent to the physicians in proportion to their original estimates. The physicians reported their second estimates for each patient and sent them back to the investigators. Before the angiograms, the cardiologists had access to clinical and test information (from history, physical examination, and echocardiographic or radionuelide tests) which was available neither to the computer algorithm nor to themselves at their second estimate. Despite this presumed advantage, they overestimated disease probability by significantly greater amounts before the angiograms when they knew the patients’ identity than they did 3 months later when they had only information about the items in Table 1. This apparent paradox can be explained by noting the referral bias (35) and value-induced bias (36) reported by investigators. Referral bias makes a false positive decision more likely than a false negative one. Overestimations occur simply because patients with high probabilities are more likely to undergo an angiogram. Aside from referral bias, value-induced bias may also play a part. Angiographers feel they have a personal stake in their decision to perform an angiogram. They may overestimate disease probability in order to justify their decisions to submit their patients to an invasive and expensive procedure. Relatively Increased Improvement by Applying the Cotnputer Algorrklltn .fk~ Estimating the Probability of Triple Vessel or Left Main Disease The improvement in accuracy achieved by using the computer algorithms was greater for predicting triple vessel or left main disease than for predicting any greater-than-5095 obstruction. This occurred because the physicians’ overpredictions were greater for the former problem. The clinicians apparently went to great pains not to miss cases of triple vessel or left main disease. This is an example of the value-induced bias noted above.

DETRANO

ET AL.

483

Increased cost-effectiveness in angiographic referrals would result in fewer “normal angiograms” while retaining the results of missed diagnoses for severely diseased subjects at an acceptably low level. The cardiac angiographers whose responsibility is to make the final decision to perform an angiogram are subject to pressure from patients and referring physicians in their assessment of disease probability. The results presented here suggest that they might benefit either from the application of a validated computer algorithm or from consultation with other cardiologists who do not know the patients and are not personally involved in their decision to perform an angiogram. APPENDIX

To facilitate the assignment of combinations to the execution of the regressions, a pattern consisting of a string of 15 1s and OS was assigned to each combination. A 1 signified that a variable was used and a 0 that it was not used in the combination. For example, the pattern 111000000000000 was assigned to the combination where only the first three items (age, sex, and chest pain type) were used. A computer program was written in SAS programming language which generated the BMDP programming code necessary to calculate stepwise logistic regression coefficients given any pattern of the above type. BMDP was chosen since at the time programming was initiated (1988), SAS did not offer a stepwise logistic regression procedure. An asymptomatic covariance estimate of the regression coefficients was made at each step with enter and remove limits set at 0.1 and tolerance limit for inverting the cross product matrix at 0.0001. The convergence criterion for no improvement was set at 0.001. After each stepwise series was completed, a FORTRAN program extracted essential information from the voluminous BMDP output. The extracted results included the coefficients from the last step, the number of cases with and without disease, the log likelihood ratio at the last step, and any eventual BMDP error messages. After the FORTRAN program had recorded this information and the operating system had deleted the BMDP source code and output, the SAS program proceeded to generate the code for the next stepwise calculation. Some of the resulting sets of coefficients were redundant because the stepwise procedure frequently entered the same set of variables which had already been entered by a previous stepwise calculation. When this occurred, the sets derived from fewer cases were replaced with that derived from the highest number of cases (patients). ACKNOWLEDGMENT The editorial assistance of Clover Behrend and Barbara Vaitovas is deeply appreciated.

REFERENCES W., PFISTERER, M., SCHMID, J., SANDHU, S., GUPPY, International Application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Car&o/. 64, 304-310 (1989).

1, DETRANO, R., JANOSI, A., STEINBRUNN, K. H., LEE, S., AND FROELICHER, V.

2

GUPPY. K. H.. DETRANO. I+.. ARHASSI. N.. JACOBI. A.. &>I)HL,. h ,\\I) I-KII~ , ic tikit ‘\ I hc reliability of probability analysis in the prediction of coronary artcrq disea\r in two ha\pital\ Med. Drcis. Mtr!ii,q 9, IX I- 189 ( 1989). .?, DETRANO. R.. GUPPY. K. H.. ABBASSI. N.. J,\NOSI. .4.. S.\hotiI 5 \YI) l.~ot [.I( 13~~. \ Reliability of bayesian probability analysis for predicting coronary arter)’ d~\ea\r in a veteran\ hospital. J. Cliff. &pider~rcr~l. 41, 599-60.5 (19881. 4. DETRANO. K., JANOSI. A., STEINBRUNN. W.. PFISTERER. M.. SC HMID. J.. MFI FK. M.. (JUPP~ K. H.. AND ABI-MANSOIJR. P. Algorithm to predict triple vessel/left main coronary d&ease in patients without myocal-dial infarction: an international cro,\-validation. c’irc trlnrior7 83111. 89-96

(1991).

5. RoBERr. A. R.. MEI IN. J. A.. AND DECRY. J. R. Logistic diagnostic accuracy of exercise testing for coronary artery (1991). 6. DECKERS. J., RENSIN(;. B. J.. TIJSSEN. J. G. I’., VINKE. M. L. A comparison of methods of analyzing exercise tcbts 62, 438-444 7. HLATKY. 8.

9. 10.

II. I?. IZ.

1-I. 1.5. 16. /7.

IX. IY.

R. V H.. ALAR. ‘4. J.. SIMMONS. for diagnosis of CAD. Bi-. Hcjrrrt .I.

(1989).

M.. BOTVINICL(, E., AND BRUNDAGE. B. Diagnostic accuracy of cardiologists compared with probability calculations using Bayes’ Rule. Am. J. Ctrrdiol. 49, l917-1932 (1982). DETKANO, R.. GIANROSSI. R.. AND FROEI.ICHER. V. The diagnostic accuracy of the exercise electrocardiogram: A meta-analysis of22 years of research. Pro,. Crrrtliouti.\c,. IX\. 32, 173-206 (19891. FEINSTEIN. A. R. “Clinical Epidemiology Diseases ofthe Heart and Circulation.“pp. 226-Z?. Saunders, Philadelphia. 1985. DETRANO. R.. YIANNII(AS. J.. SALCEDO. E. E.. RINCOIU. G.. Go. R. T.. WILL I,~MS, G.. ,\YD LEATHERMAN. J. Bayesian probability analysis: A prospective demonstration of its clinical utility in diagnosing coronary disease. Cimhrion 69, 541-547 ( 1984). BROWN. M. B., ENGLEMAN. L.. FRANE. J. W.. HII.I . M. A.. JENNRICH, R.. I.. ANDTOBOUI.E~

Computer probability estimates of angiographic coronary artery disease: transportability and comparison with cardiologists' estimates.

A computer algorithm for estimating probabilities of any significant coronary obstruction and triple vessel/left main obstructions was derived, valida...
1MB Sizes 0 Downloads 0 Views