The performance of normal-tissue complication probability models in the presence of confounding factors Eva Onjukka, Colin Baker, and Alan Nahum Citation: Medical Physics 42, 2326 (2015); doi: 10.1118/1.4917219 View online: http://dx.doi.org/10.1118/1.4917219 View Table of Contents: http://scitation.aip.org/content/aapm/journal/medphys/42/5?ver=pdfcov Published by the American Association of Physicists in Medicine Articles you may be interested in Technical Note: Calculation of normal tissue complication probability using Gaussian error function model Med. Phys. 37, 4924 (2010); 10.1118/1.3483097 Self-consistent tumor control probability and normal tissue complication probability models based on generalized EUDa) Med. Phys. 34, 2807 (2007); 10.1118/1.2740010 The theoretical benefit of beam fringe compensation and field size reduction for iso-normal tissue complication probability dose escalation in radiotherapy of lung cancer Med. Phys. 30, 1086 (2003); 10.1118/1.1573208 A variable critical-volume model for normal tissue complication probability Med. Phys. 28, 1338 (2001); 10.1118/1.1380432 Volume and Kinetics in Tumor Control and Normal Tissue Complications Med. Phys. 26, 1407 (1999); 10.1118/1.598815

The performance of normal-tissue complication probability models in the presence of confounding factors Eva Onjukkaa) Section of Radiotherapy Physics and Engineering, Department of Medical Physics, Karolinska University Hospital, Stockholm 171 76, Sweden

Colin Baker and Alan Nahum Physics Department, Clatterbridge Cancer Centre, Clatterbridge Road, Bebington, Wirral CH63 4JY, United Kingdom

(Received 15 July 2014; revised 6 December 2014; accepted for publication 8 December 2014; published 15 April 2015) Purpose: This work explores different methods for accounting for patient-specific factors in normaltissue complication probability (NTCP) modeling, and compares the performance of models using pseudoclinical datasets for “lung” and “rectum” complications. Methods: Datasets consisting of dose distributions and resulting normal-tissue complications were simulated, letting varying levels of confounding factors (i.e., nondosimetric factors) influence the outcome. The simulated confounding factors were patient radiosensitivity and health status. Seven empirical NTCP models were fitted to each dataset; this is analogous to fitting alternative models to datasets from different populations, treated with the same technique. The performance of these models was compared using the area under the ROC curve (AUC) and the impact of confounding factors on the model performance was studied. The patient-specific factors were then accounted for by (1) stratification and (2) two ways of modifying the traditional NTCP models to include these factors. Results: Confounding factors had a greater impact on model performance than the choice of model. All models performed similarly well on the rectum datasets (except the maximum dose model), while critical-volume type models were slightly better than the mean dose-, the Lyman–Kutcher–Burman-, and the relative seriality models for lung. This difference was more apparent without confounding factors in the dataset. The two alternative functions including patient-specific factors used in this work (one logistic and one cumulative normal function) were found to be equivalent, and more efficient than stratifying datasets according to patient-specific factors and fitting models to subgroups individually. For datasets including confounding factors, the performance improved greatly when using models accounting for these; AUC increased from around 0.7 to close to unity. Conclusions: This work shows that identifying confounding factors, and developing methods to quantify them, is more important than the choice of NTCP model. Most dose–volume histogram (DVH)-based NTCP models can be generalized to include confounding factors. C 2015 American Association of Physicists in Medicine. [http://dx.doi.org/10.1118/1.4917219] Key words: NTCP, model fitting, confounding factors, outcome simulation

1. INTRODUCTION Normal-tissue complication probability (NTCP) models have long been used in the analysis of clinical data, and are recommended by ICRU for reporting planned dose to organs at risk (OAR).1 More radically, the AAPM Task Group 166 proposes that “the dose–volume criteria, which are merely surrogate measures of biological responses, should be replaced by biological indices in order for the treatment process to more closely reflect clinical goals of radiotherapy.”2 As more clinical data, of improving quality, are becoming available, models based on dose–volume histograms (DVH) can be used with increasing confidence. Parameter values for some endpoints and models have been collated in the QUANTEC report. Assuming that the DVH represents the dominating features of the treatment plan (in terms of the risk 2326

Med. Phys. 42 (5), May 2015

of normal-tissue damage), some NTCP models might perform better than others for any given endpoint. There is a multitude of studies where model parameter values have been fitted to local datasets; in most of these a single DVH-based NTCP model is used, but there are a few where a number of models have been fitted to the dataset. In these studies, it is generally concluded that the performance of all models is similar.3–10 Whether this is due to limitations in the datasets or because all models are equal is not clear. NTCP models generally estimate the risk of complications as a function of the dose distribution only. However, many studies have demonstrated the importance of considering nondosimetric factors such as health status, surgery, chemotherapy, comorbidities, base-line organ function, and smoking status.11–20 When fitting NTCP models with the dose distribution as the only risk factor, any other parameters

0094-2405/2015/42(5)/2326/16/$30.00

© 2015 Am. Assoc. Phys. Med.

2326

2327

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

influencing the outcome of the treatment act as confounding factors. This means that the best fit parameters intrinsically incorporate averages over confounding factors, which results in shallower dose–response curves.21 When confounding factors influence the outcome (e.g., chronic obstructive pulmonary disease for radiation pneumonitis19), the predictive power of the models is reduced since only a subset of the relevant factors is taken into account. Inevitably, all models (using dose–volume parameters only) perform less well when applied to datasets with high levels of confounding factors, but it is also expected that any differences in performance between models might be obscured. The performance of NTCP models should ideally be studied using datasets without any confounding factors. The aim of the study is to explore the relative importance of the choice of NTCP model and the level of confounding factors in the dataset, and to compare different methods of modeling the influence of confounding factors on NTCP. This was achieved by fitting DVH-based models to pseudoclinical datasets generated by a previously developed mechanistic 3D model,22 and comparing their performance. Datasets were simulated with varying levels of confounding factors (varying radiosensitivity and health status) and represent two different organs at risk: lung (radiation pneumonitis) and rectum (rectal bleeding). These two endpoints were chosen in order to study the behavior of the models for organs with different volume effects; the lung has a large functional reserve23 while rectal bleeding is related to high local doses to a rather small area of the rectal wall.24 By using simulated data for model comparison, rather than real patient data, it was possible to control both dosimetric and nondosimetric variables in the datasets. This is a novel method of studying the nature of NTCP models, which offers the possibility to optimize dataset characteristics for the question to be studied. However, the simulated nature of the datasets means that no inference can be made from this analysis regarding the dose–response of organs. 2. METHODS 2.A. Simulated datasets

The datasets used for comparing the performance of empirical NTCP models were generated by simulations with the mechanistic model developed by Rutkowska et al.22 In this model, a 3D distribution of tissue damage, represented by inactivated functional subunits (FSUs), is evaluated against a threshold within a “critical functioning volume” (CFV). The CFV governs the volume dependence of an organ since, in the case of a large CFV, some local tissue damage can be compensated for by a high survival fraction of FSUs elsewhere in the CFV, while an organ with a small CFV is sensitive to small volumes of concentrated tissue damage. The inactivation of functional subunits represents the distribution of local tissue damage resulting from the dose distribution and is quantified by the LQ model and Poisson statistics. Some simulations included a variation in patient-specific factors. Two types were considered: one which influences the Medical Physics, Vol. 42, No. 5, May 2015

2327

local tissue damage to a given local dose, and one which makes different “patients” respond differently to the same level of tissue damage. The former will be referred to in this work as a variation in radiosensitivity and was modeled by a standard deviation in the LQ-model radiosensitivity parameter α. The variation in global organ response to a given distribution of tissue damage, on the other hand, is referred to in this work as a variation in “health status,” which is an unspecific parameter which might depend on baseline organ function, comorbidities, concurrent treatments, or smoking status. As motivated in Ref. 25, the variation in health status was modeled by a standard deviation in the number of FSUs before irradiation (N0FSU). This mechanistic model was used to generate pseudoclinical datasets with different characteristics, analogous to different patient populations given the same radiotherapy treatment. Parameter values corresponding to both pseudolung and pseudorectum22,25 were used; see summary in Table I. By using these two parameter sets in the simulations, the mechanistic model produced data with different dose–volume response, inspired by the lung and the rectum. In this paper, these two types of pseudoclinical datasets will be referred to as “lung” and “rectum”. Each patient in a dataset was characterized by a unique dose distribution and individually sampled values of α and N0FSU. Thus, the outcome for a patient depended on both dosimetric and nondosimetric factors. All datasets had different combinations of standard deviations when sampling α and N0FSU (σα and σFSU), from a log-normal distribution, but used the same set of 1000 dose distributions (separate sets for “lung” and “rectum”). When empirical models were fitted to these datasets (DVHs), the variation in radiosensitivity and health status acted as confounding factors. The standard deviation for α was estimated by considering the range among patients in local tissue response required to explain differences in response to whole-lung irradiation seen in the clinic. Also the range of radiosensitivity measured in animals, and the relative contribution of a stochastic component and a deterministic component to the development of telangiectasia were considered.25,26 The standard deviation for N0FSU, on the other hand, was estimated by adjusting its value until after repeated simulations a distribution of p-values for a correlation between a dose–volume parameter and outcome were similar to p-values published in clinical studies.25,26 The lung dose distributions were generated as described in our previous work.22 In the case of the rectum, beams were T I. Parameters used in simulations: the LQ-model α/β-ratio, population mean α(α), ¯ population standard deviation (σ α ), the number of stem cells per FSU, the CFV, the FSU inactivation threshold within the CFV (FSUmin), the mean number of FSUs per voxel ( N¯ 0FSU), and the population standard deviation in the number of FSUs per voxel (σ FSU).

Lung Rectum

α/β (Gy)

α¯ (Gy−1)

σα (%)

Nsc

CFV (%)

FSUmin

N¯ 0FSU

σ FSU (%)

3 6

0.24 0.05

15 15

67 50

75 45

0.47 0.60

3600 370

25 10

2328

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2328

F. 1. Fifty sample DVHs for (a) lung and (b) rectum.

directed at the target (prostate) in a four-field box technique, such that the anterior and posterior beams always covered the rectum while the portion covered by the lateral fields varied from patient to patient. The DVH of the rectum was generated based on the voxels of the rectal wall (excluding rectal filling). The variation in beam widths and doses was adjusted to make the DVHs resemble clinical rectal DVHs from prostate treatments, using the methods described in Refs. 22 and 26. For each organ, a thousand dose distributions were generated; the DVHs for fifty of these are displayed in Fig. 1. Both for “lung” and “rectum,” five datasets were generated; see Table II: one without confounding factors (σα = σFSU = 0), one with the values of σα and σFSU derived in Refs. 25 and 26, respectively, and three with either σα or σFSU set to zero. Out of these, one was generated with σα or σFSU set to the value derived for the other in order to compare the relative effect of these confounding factors on the model performance. 2.B. DVH-based NTCP models

Empirical NTCP models look at different features of the DVH (such as the volume receiving a certain dose, mean dose, or maximum dose) and make assumptions about their relative influence on the magnitude of the predicted NTCP. The confounding factors act as random noise in the datasets, so reducing the predictive power of models and hiding any differences between models’ accuracy in predicting complications for individual DVHs. By fitting the models to

T II. Standard deviations used when sampling patient-specific variables for each dataset; a large σ α or σ FSU indicates a great interpatient variation in radiosensitivity or health status, respectively. Dataset 1 (%)

Dataset 2 (%)

Dataset 3 (%)

Dataset 4 (%)

Dataset 5 (%)

Lung σα σ FSU

0 0

15 0

0 15

0 25

15 25

Rectum σα σ FSU

0 0

0 10

10 0

15 0

15 10

Medical Physics, Vol. 42, No. 5, May 2015

simulated data with known levels of confounding factors, the effect of these factors can be studied. Seven DVH-based NTCP models, described in the appendix, were fitted to each dataset. For ease of comparison of parameter values and model structure, a common notation is used for all models. For most models, NTCP is calculated as a sigmoid function of a DVH summary measure, here called φ. The slope of this curve, and the value of φ corresponding to NTCP = 50% are given by m and φ50, respectively. Additional parameter values are used to reduce the DVH to a summary measure. The Lyman–Kutcher–Burman (LKB) model27,28 uses three parameters, best known as TD50, m, and n, where n is a volume-effect parameter (values close to unity correspond to a large volume effect). The mean dose model is a special case of the LKB model with n fixed at unity (for which the DVH summary measure reduces to the radiobiologically adjusted mean dose). Similarly, the “maximum dose model” uses the maximum dose as the DVH summary measure. Three versions of the critical volume model9,23,29 were used, the first with a total of four parameters, the second with only three parameters, as proposed by Jin et al.,30 and the third, the VDth model (named thus by Seppenwoolde et al.9), also with three parameters. The VDth model simply reduces the DVH to the volume that receives at least Dth Gy. After the LKB model, the relative seriality model31 is the most widely used NTCP model. The parameters used by this model are D50, γ, and s. The first two determine the response of the organ when uniformly irradiated, and s is the volume-effect parameter (large values correspond to a small volume effect). Unlike the other models, the relative seriality model cannot be parametrized with φ50 and m using the cumulative normal function. 2.C. Fitting NTCP models to datasets

In most cases, the models were fitted using the maximum likelihood method but when the solutions were unstable, simulated annealing and/or visual inspection of the parameter likelihood space were used to find the best fit. The performance of the model fits was evaluated in terms of the area under the ROC curve (AUC) when applying the model to the dataset it was fitted to. This assesses the ability of the model to link the outcome to the DVH.

2329

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

A likelihood ratio statistic and the Akaike information criterion (AIC) were also calculated for each model fit. A likelihood ratio statistic which can be used with binary outcome data is the difference in deviance, DD, between the fitted model and a logistic model with only a constant as explanatory parameter32 (the intercept model), DD = −2(L i − L c ).

(1)

Here, L i is the maximum log-likelihood of the intercept model, and L c is the maximum log-likelihood of the current fitted model. A high value indicates that the current model is much better than the intercept model at describing the data. Unlike DD, the AIC takes into account the number of parameters Npar in the fitted model,33 and the lower its value, the better the model fit, AIC = 2Npar − 2L c .

(2)

For each dataset of 1000 patients, the seven NTCP models were fitted to the full dataset, and the model fits were evaluated on the same data. A validation of the fitted models was not necessary, as they will not be used for patient data. In order to test for overfitting, the models were also fitted independently to half the datasets (500 patients), and each model fit was then evaluated on the second half of the dataset. To two decimal points, the AUC was identical as when evaluated on the same (full) dataset as the models were fitted to. Therefore, the presented results in this work are for models fitted and evaluated on the same datasets. The models were first fitted to the datasets using only the DVHs and the outcome (complication or no complication); here the variation in radiosensitivity and health status in datasets 2–5 acted as confounding factors. However, since these factors were known, the datasets were then stratified into subgroups with similar radiosensitivity and health status, respectively, before fitting the models to each subgroup. Finally, the models were adapted to include the patientspecific values of radiosensitivity and health status and again fitted to the complete datasets. Two methods of adapting models where NTCP is a sigmoid function of a DVH summary measure are developed below, for a logistic regression and a cumulative normal function, respectively.

2329

in the model. The NTCP is then given by 1 . (4) 1 + e−(β0+β1 X1+β2 X2+···+β N X N ) The parameters of this model ( β0, β1, . . . , β N , and any parameters of the summary measure) can be fitted to a dataset with the maximum likelihood method. Also the cumulative normal function can be used with a set of covariates.34 Using the following expression:  t 1 2 NTCPcn = √ e−x /2dx, (5) 2π −∞ t can include both a DVH summary measure and nondosimetric variables, NTCPlr =

t = β0 + β1 X1 + β2 X2 + ··· + β N X N .

(6)

Alternatively, this can be expressed in terms of φ50 and m, as defined in Sec. 2.B, t=

φ − φ50 + β2 X2 + ··· + β N X N , m · φ50

(7)

where β2, . . . , β N are the model parameters for the nondosimetric factors X2,...,X N . 3. RESULTS 3.A. The influence of confounding factors

Figures 2 and 3 show the performance of the seven empirical NTCP models when fitted to the “lung” and “rectum” datasets, respectively, with increasing levels of confounding factors (see Table II). As expected, the performance of the DVH-based models decreased with increasing variation in radiosensitivity and health status, as these were not accounted for by the models. This pattern was confirmed by both the likelihood ratio statistic DD, and the AIC, as listed in Tables III and IV.

2.D. Models including patient-specific factors

Since the argument of the logistic regression model is the sum of any number of covariates (numerical or categorical), nondosimetric variables can easily be included; the covariates can be DVH-summary measures (φ) as well as biological or clinical parameters. A logistic regression model32,34 calculates the logarithm of the odds of a complication occurring as a sum of the contribution from the N covariates X i , ) ( NTCPlr = β0 + β1 X1 + β2 X2 + ··· + β N X N . (3) ln 1 − NTCPlr Here, β0 is called the intercept and the magnitude of the parameter βi depends on the strength of the influence of X i on the odds. X1 would typically be the DVH summary measure, and X2 ... X n any patient-specific variables included Medical Physics, Vol. 42, No. 5, May 2015

F. 2. The performance (AUC) of seven DVH-based models fitted to pseudoclinical lung datasets which were simulated with different levels of confounding factors, given by the standard deviations of radiosensitivity, σ α , and health status, σ FSU.

2330

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2330

performance between the models was more apparent at low levels of confounding factors. Thus, although the different models have so far performed equally well for real clinical datasets, differences might be discovered if confounding factors are taken into account. In contrast to these results, both types of models performed equally well for the simulated rectum datasets; the intermediate CFV of the rectum could make the spatial distribution of damage especially important, which is lost in all DVH reduction schemes (see Sec. 4.D).

3.C. Fitted parameters

F. 3. The performance (AUC) of seven DVH-based models fitted to pseudoclinical rectum datasets which were simulated with different levels of confounding factors, given by the standard deviations of radiosensitivity, σ α , and health status, σ FSU.

All DVH-based models fitted to the datasets, except the maximum dose model, performed similarly well; the figures show that the influence of the confounding factors on the model performance was more important than the choice of model. The superiority of the mean dose model compared to the maximum dose model, for both “lung” and “rectum” datasets, is consistent with the high values of the LKB volume parameter n, and the low values of the relative seriality parameter s, in Tables III and IV (see below).

3.B. The relative model performance

Although the performance of most models was similar, for the lung datasets the three critical-volume type models (the VDth and the 3- and 4-parameter critical-volume models) were consistently superior (p < 0.05, except VDth compared to LKB for the second dataset with p = 0.08). The reason for this could be that the outcome for the lung simulations is predominantly linked to the damaged volume, due to the large CFV, and thus better predicted by the critical-volume models which reduce the DVH to a volume parameter. The difference in AUC between the critical-volume type models and the other three well-performing models (the LKB, mean dose, and relative seriality models) was studied in more detail, as a function of σFSU. This difference was expected to decrease as the confounding factors grew stronger (with decreasing AUC). For values of σFSU between 0% and 50% (σα = 0), models were fitted to five simulated 200-patient lung datasets, in order to provide information about the variation in model performance for different datasets, and the difference between the model groups. Figure 4(a) shows the mean AUC at each σFSU, for all models in the comparison, and Fig. 4(b) shows the difference in mean AUC between the criticalvolume type models and the other three well-performing models, together with the standard error of the difference, based on the five datasets. As expected, the difference in Medical Physics, Vol. 42, No. 5, May 2015

Tables III and IV show the fitted parameter values for all seven models and datasets, using only DVHs and outcome data (with radiosensitivity and health status as confounding factors). The values of φ50 and D50 were generally consistent among the datasets, except the unrealistically high φ50 for the maximum dose model in Table III, which confirms that this is an unsuitable model for organs like the lung. For the lung datasets (Table III), φ50 of the LKB model was lower than φ50 for the mean dose model, and n was consistently greater than one. This means that the LKB model found lower doses to be important (or the volume receiving lower doses). The performance of the relative seriality model was very similar to the mean dose model, and Table III shows that φ50 and D50 for these models were very similar. As expected, the value of the slope parameter reflected the amount of interpatient variation in the datasets; in both tables m increased and γ decreased with increasing variation. This is also illustrated in Figs. 5 and 6; the patients in lung datasets 1 and 5, respectively, were binned into groups, and the complication rate (CR) in the bins was plotted together with the NTCP as a function of φ, as calculated by the four 3-parameter models. The steeper curves in Fig. 5 correspond to a mean AUC of 0.96, and the shallower curves in Fig. 6 to a mean AUC of 0.67. For the rectum datasets (Table IV), n had intermediate values. Thus, the LKB model fits suggest that the volume effect is smaller for rectum (as represented by the current simulations) than for lung. In contrast, the relative seriality model fits resulted in surprisingly low values of s, indicating a large volume effect, and D50 was lower than φ50 of the LKB model, but higher than φ50 of the mean dose model. Interestingly, φ50 and m of the critical-volume type models were similar to the corresponding parameters fitted to the lung datasets, though D50 was higher. In a few cases, the probability distribution of the criticalvolume model parameters showed two peaks when fitted to rectum datasets, with a low D50 and high φ50 or a high D50 and a low φ50, respectively. However, in most cases the probability of D50 ∼ 35 Gy was higher than D50 ∼ 63 Gy, as shown by the maximum likelihood estimates in Table IV. This illustrates that although the parameters of the critical-volume type models have a biological interpretation, as empirical models they cannot be assumed to represent the true dose–volume response relationship.

2331

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2331

T III. Maximum likelihood estimates of the parameter values for fits of NTCP models to pseudoclinical lung data presented in Fig. 2. The last two columns list the likelihood ratio statistic and AIC for each fit. γ

s

DD

AIC

21.1 20.1 20.7 19.9 20.2

2.71 1.60 1.29 0.800 0.787

7.78 × 10−4

344 197 146 64 61

511 953 1023 1260 1258

T D50 (φ 50) (Gy)

m

n

12.5 15.8 13.2 12.4 13.2

0.136 0.261 0.320 0.555 0.562

3.01 1.61 2.50 2.69 2.34

380 191 154 69 64

474 959 1015 1255 1255

T D50 (φ 50) (Gy)

m

20.6 20.3 20.6 19.9 20.1

0.112 0.230 0.271 0.471 0.481

353 187 145 65 60

500 961 1021 1257 1256

T D50 (φ 50) (Gy)

m

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

638 115 413 280 281

0.869 0.663 1.36 2.31 2.22

Critical volume (4p)

φ 50

m

D50 (Gy)

0.465 0.464 0.459 0.505 0.427

0.0305 0.189 0.254 0.304 0.446

13.6 13.3 14.4 11.5 15.0

φ 50

m

D50 (Gy)

0.507 0.531 0.573 0.473 0.434

0.0313 0.182 0.435 0.354 0.449

15.1 13.9 13.8 15.6 14.9

φ 50

m

D50 (Gy)

0.478 0.476 0.459 0.422 0.427

0.0429 0.243 0.256 0.470 0.387

14.1 14.0 14.7 15.3 16.6

Relative seriality Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 LKB Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Mean dose Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Maximum dose

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Critical volume (3p) Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 VDth Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

D50 (Gy)

3.D. Model performance after stratifying for patient-specific factors

Clinical outcome data inevitably include significant levels of confounding factors. However, if the radiosensitivity and Medical Physics, Vol. 42, No. 5, May 2015

4.10 × 10−4 9.42 × 10−5 5.15 × 10−5 8.04 × 10−5

0 1.50 0 0 0

853 1147 1167 1322 1317

k 10.3 6.79 31.9 58.9 46.4

699 253 236 124 117

157 899 934 1202 1203

682 253 236 121 111

173 897 933 1203 1208

666 242 234 121 96

189 908 934 1203 1222

health status (and any other important nondosimetric factors) were known for all patients, and could be quantified, the dataset could be stratified for such patient-specific factors, and the models fitted separately to each subset. This would lead to a lower level of confounding factors since in each

2332

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2332

T IV. Maximum likelihood estimates of the parameter values for fits of NTCP models to pseudoclinical rectum data presented in Fig. 3. The last two columns list the likelihood ratio statistic and AIC for each fit. γ

s

DD

AIC

40.2 42.4 43.7 43.2 45.9

8.20 4.53 3.36 2.75 2.51

2.96 × 10−5 8.51 × 10−3 0.0370 0.0686 0.165

330 271 202 178 148

194 437 589 755 861

T D50 (φ 50) (Gy)

m

n

46.4 51.8 50.8 48.7 50.5

0.0433 0.0607 0.0974 0.134 0.152

0.466 0.293 0.335 0.391 0.341

313 278 200 170 146

211 429 591 764 863

T D50 (φ 50) (Gy)

m

38.7 38.6 39.3 39.0 39.0

0.0651 0.109 0.152 0.391 0.220

280 240 174 151 130

242 466 614 781 878

T D50 (φ 50) (Gy)

m

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

78.7 76.4 77.7 78.6 78.0

0.0683 0.0606 0.0833 0.113 0.118

57 94 58 38 37

465 612 730 893 970

Critical volume (4p)

φ 50

m

D50 (Gy)

k

0.458 0.456 0.462 0.475 0.443

0.0770 0.161 0.221 0.266 0.336

34.7 34.8 35.5 33.3 36.4

8.84 8.91 8.16 9.53 10.1

343 263 192 177 147

183 446 600 758 865

φ 50

m

D50 (Gy)

0.535 0.524 0.548 0.545 0.522

0.0564 0.122 0.172 0.210 0.264

35.6 36.2 35.4 35.0 37.3

358 279 199 177 148

165 429 591 756 862

φ 50

m

D50 (Gy)

0.481 0.538 0.548 0.528 0.340

0.115 0.224 0.283 0.323 0.393

34.6 31.0 31.1 31.1 62.5

295 209 152 153 138

229 498 638 781 872

Relative seriality Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 LKB Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Mean dose Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Maximum dose

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Critical volume (3p) Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 VDth Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

D50 (Gy)

subset all patients would have similar radiosensitivity and health status. DVH-based models would then be expected to perform better for each subset compared to the total dataset. However, the parameter values would be different for each Medical Physics, Vol. 42, No. 5, May 2015

subset, and the appropriate parameter set would have to be chosen when applying the model to a new patient. This hypothesis was tested by stratifying dataset 5, which included interpatient variations in both radiosensitivity and

2333

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2333

F. 4. (a) Mean AUC for six well-performing models for five equivalently generated datasets of 200 lung patients each, at different levels of σ FSU. (b) The difference in performance (mean AUC) between two groups of DVH-based NTCP models: in the first group, three critical-volume type models (the VDth and the 3- and 4-parameter critical-volume models) and in the second, the other three models (the LKB, mean dose, and relative seriality models).

health status, into four groups of similar α, and these groups into four further subgroups of similar N0FSU. The models were then fitted to each of the 16 subgroups and their performance evaluated. The results for two of the models (the LKB and the 3-parameter critical-volume models), fitted to the simulated lung dataset 5, are presented in Table V. Similarly, Table VI shows the AUCs for these models fitted to the simulated rectum dataset 5. As expected, a great improvement was seen in some of the subgroups, compared to when the models were fitted to the total dataset. For example, when the LKB model was fitted to the total dataset of Table V (dataset 5 in Fig. 2), the AUC was 0.65 and for the corresponding dataset without confounding factors (dataset 1 in Fig. 2) it was 0.93. Most of the values in Table V are within this range, and some close to the upper limit. A certain variability in the patient-specific

factors still exists within the subsets, since each variable was divided into only four bins. This is reflected by the values of the AUC generally being lower than that for the dataset without confounding factors. A disadvantage of splitting the dataset into subgroups before fitting the models is the resulting loss of statistical power. Many of the subgroups did not have enough responders to give reliable model parameters, and where no or all patients suffered the complication the models could not be fitted at all. To instead include the patient-specific factors in the models is much more efficient since, instead of fitting three parameters to, e.g., sixteen different subsets (48 parameter values in total), five parameters (three dosimetric and two nondosimetric) can be fitted to the total dataset, and all information from the data is considered when fitting each parameter. This method was explored below.

F. 5. Fitted NTCP models compared to the CR in bins of 50 patients with similar EUD or relative damaged volume, for the first lung dataset (σ α = 0% and σ FSU = 0%). The fitted models are (a) the LKB model, (b) the relative seriality model, (c) the 3-parameter critical volume model, and (d) the VDth model. Medical Physics, Vol. 42, No. 5, May 2015

2334

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2334

F. 6. Fitted NTCP models compared to the CR in bins of 50 patients with similar EUD or relative damaged volume, for the fifth lung dataset (σ α = 15% and σ FSU = 25%). The fitted models are (a) the LKB model, (b) the relative seriality model, (c) the 3-parameter critical volume model, and (d) the VDth model.

3.E. Including nondosimetric variables

So far only the DVHs were considered as input data to the models fitted to the simulated datasets. For most models, this was done by reducing the DVH to a summary measure and linking it to NTCP using the cumulative normal function. In order to also account for the patients’ radiosensitivity and health status, α and N0FSU were now introduced as covariates alongside the model-specific φ (see Sec. 2.D). This could not be done for the relative seriality model, which has a different structure. As in Sec. 3.D, dataset 5 for both the simulated “lung” and “rectum” data were used, which included clinically relevant levels of the confounding factors. The different models were adapted to include patient-specific factors using both the logit function and the cumulative normal function, as described in Sec. 2.D. Tables VII and VIII show the resulting maximum

likelihood fitted parameter values. β0 is the intercept, and β1, β2, and β3 the coefficients associated with φ, the radiosensitivity, and the health status, respectively. The other parameters are the model specific parameters required to calculate φ. In order to facilitate optimization, the health status was represented by N0FSU/10 000, giving values more similar to the radiosensitivity parameter α. The positive values of β1 and β2 indicate that higher values of φ and α increase the risk of complications, while the negative value of β3 indicates an increase in the risk of complications for decreasing N0FSU. Figure 7 shows the performance of these models and, for comparison, the performance of the corresponding standard models for the datasets without confounding factors (identical to the first datapoints in Figs. 2 and 3). As expected, the performance improved greatly compared to when only the DVH was considered for the same datasets (dataset 5 in Figs. 2 and 3), reaching levels similar to

T V. AUC of the LKB and 3-parameter critical-volume models fitted to subsets of lung dataset 5, stratified according to radiosensitivity (α) and health status, represented by the initial FSU density (N0FSU). The number of complications and total number of patients in each subset are given in brackets. The values of α and N0FSU are the midpoint values of the bins. α (Gy−1) N0FSU

0.19

0.23

0.25

0.33

LKB model

2200 3300 3900 5800

0.61 (25/62) 1.0 (1/62) – (0/62) – (0/62)

0.82 (40/62) 0.74 (19/62) 0.61 (3/62) – (0/62)

0.83 (59/62) 0.80 (36/62) 0.95 (13/62) 1.0 (1/62)

– (62/62) 0.94 (54/62) 0.79 (32/62) 0.90 (13/62)

Critical-volume model

2200 3300 3900 5800

0.77 (25/62) 1.0 (1/62) – (0/62) – (0/62)

0.84 (40/62) 0.92 (19/62) 0.90 (3/62) – (0/62)

0.95 (59/62) 0.97 (36/62) 0.95 (13/62) 1.0 (1/62)

– (62/62) 0.96 (54/62) 0.79 (32/62) 0.91 (13/62)

Medical Physics, Vol. 42, No. 5, May 2015

2335

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2335

T VI. AUC of the LKB and 3-parameter critical-volume models fitted to subsets of rectum dataset 5, stratified according to radiosensitivity (α) and health status, represented by the initial FSU density (N0FSU). The number of complications and total number of patients in each subset are given in brackets. The values of α and N0FSU are the midpoint values of the bins. α (Gy−1) N0FSU

0.038

0.048

0.052

0.070

LKB model

310 360 380 450

− (0/62) − (0/62) − (0/62) − (0/62)

0.93 (9/62) 0.92 (3/62) − (0/62) 0.98 (2/62)

0.86 (33/62) 0.88 (13/62) 0.91 (7/62) 0.98 (2/62)

0.91 (43/62) 0.97 (36/62) 0.93 (33/62) 0.82 (12/62)

Critical-volume model

310 360 380 450

− (0/62) − (0/62) − (0/62) − (0/62)

0.95 (9/62) 1.0 (3/62) − (0/62) 1.0 (2/62)

0.93 (33/62) 0.94 (13/62) 0.92 (7/62) 1.0 (2/62)

0.90 (43/62) 0.97 (36/62) 0.93 (33/62) 0.81 (12/62)

the performance for datasets without confounding factors. Interestingly, the poorest models performed better on dataset 5 when patient-specific factors were considered, than for dataset 1 which did not include confounding factors. This does not indicate that the maximum dose model was a good summary measure for these datasets, but rather that including more predictors for a dataset with patient-specific factors made it possible for the model to classify more patients correctly, since now not only the dose distribution influenced the outcome. The performance of the models based on the logistic regression function and the cumulative normal function was essentially identical.

4. DISCUSSION 4.A. Testing model performance on simulated datasets

The choice to use simulated data to study NTCP-model performance was motivated primarily by the need to control the level of confounding factors in the datasets. For this

approach to be meaningful, it is essential that the model used for generating the datasets is sufficiently different from the DVH-based models fitted to these datasets. The simulation model was a mechanistic model, which was built to mimic the mechanisms of radiation damage in the “lung” and “rectum.” Thus, this model is structurally different from the empirical models fitted to the simulated data, and parameter values were chosen based on a range of different types of information in the literature, rather than by a fit to clinical datasets. Admittedly, there are parallels between the structure of the 3D simulation model and the original critical-volume model,23,35 in that both use the LQ model to derive local tissue damage, and a fractional damaged volume to determine whether a complication has occurred. However, in the 3D model, the outcome depends on the spatial distribution of damage caused by the 3D dose distribution. More importantly, the original mechanistically motivated critical volume model was not fitted to the simulated data; instead, simpler empirical models, which use different parameters from the original model, were included in the study. Although these models use a “critical volume” parameter, the very different value

T VII. Maximum likelihood estimates of the parameter values for fits of NTCP models including nondosimetric factors [Eqs. (4)–(6)] to pseudoclinical lung dataset 5, and the corresponding goodness-of-fit values. The parameters are explained in the text. β0 Logit model LKB Mean dose Maximum dose Critical volume (4p) Critical volume (3p) VDth Cumulative normal model LKB Mean dose Maximum dose Critical volume (4p) Critical volume (3p) VDth

Medical Physics, Vol. 42, No. 5, May 2015

−12.1 −13.6 −2.71 −30.2 −24.3 −22.5 −6.40 −7.28 −1.38 0.518 0.534 0.501

β1

0.830 0.581 −0.0140 54.8 44.4 41.9 0.457 0.325 −0.008 69 0.069 7 0.068 2 0.090 3

β2

β3

n

93.0 90.0 66.2 159 134 148

−57.8 −55.6 −38.6 −99.1 −83.3 −91.2

2.75

49.7 48.2 35.9 76.1 77.3 72.9

−31.7 −30.5 −21.0 −49.3 −49.9 −46.9

2.69

D 50

12.3 15.4 14.1

12.5 14.9 14.0

k

14.0

12.0

DD

AIC

919 905 747 1089 1077 1073

403 418 573 232 244 248

916 902 740 1075 1070 1063

407 418 580 250 252 259

2336

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

2336

T VIII. Maximum likelihood estimates of the parameter values for fits of NTCP models including nondosimetric factors [Eqs. (4)–(6)] to pseudoclinicalquery rectum dataset 5, and the corresponding goodness-of-fit values. The parameters are explained in the text. β0 Logit model LKB Mean dose Maximum dose Critical volume (4p) Critical volume (3p) VDth

−54.6 −30.5 −33.0 −28.0 −33.5 −24.9

Cumulative normal model LKB Mean dose Maximum dose Critical volume (4p) Critical volume (3p) VDth

63.0 −16.9 −18.1 0.699 0.751 0.749

β1

β2

β3

n

0.855 0.586 0.373 41.1 45.6 34.5

690 535 330 680 665 643

−642 −516 −338 −649 −612 −633

0.315

354 286 177 341 351 318

−327 −267 −177 −310 −318 −295

0.324

0.0346 0.322 0.203 0.0669 0.0535 0.0760

compared to the CFV of the generated lung datasets (0.75) listed in Table III, as well as the values of D50 listed in Table IV being about half the value resulting in 50% local tissue damage in the 3D model, confirm that there is no direct correspondence between the simulation model and the fitted critical volume models. Therefore, the latter are similar to the former only in that they seem to consider local tissue damage and its result on overall organ function as separate levels in the model. However, the other DVH-based models can be understood to implicitly do this as well. It is also important to recognize that when empirical models are fitted to datasets, their parameters cannot be assumed to directly relate to any “biological” parameters underlying the response of the population. Although efforts were made to select parameters for the 3D simulation model which would give a plausible representation of the damage to lung and rectum from irradiation,25,26 the datasets generated in this study cannot be treated as clinical datasets. In particular, information about the dose–response of organs cannot be inferred from these results. It is interesting to note that when fitting the empirical models to the simulated datasets, the best-fit parameter values were somewhat different compared to the range of values derived in clinical studies,

D50

36.2 38.4 35

36.3 36.9 35.0

k

12.7

12.4

DD

AIC

755 687 488 757 752 743

257 325 524 255 260 269

749 684 484 748 751 730

265 328 527 267 263 284

for each of the organs simulated in this study. It is not clear why the parameter values in Tables III and IV are not more representative of the literature for radiation pneumonitis and rectal bleeding, respectively, but model parameter values also differ significantly between clinical studies, as seen in reviews such as the QUANTEC report. The intermediate to large volume effect for rectal bleeding suggested by some of the values in Table IV, which is perhaps the most striking difference, is related to the value of CFV = 0.45 used in the simulations, which is supported by studies on rectal dosemaps.36,37 More data on the mechanisms of rectal bleeding are required in order to determine the relative importance of high and intermediate doses, respectively. However, we would like to stress that a close agreement between the simulated data and corresponding clinical patient data, in terms of fitted model parameters, was not expected, given the limitations of any mechanistic model developed, while there are still significant gaps in the understanding of the mechanisms of normal-tissue damage from radiation. Nevertheless, the current approach allows models to be compared using datasets tailored to highlight their differences, as well as the influence of confounding factors. For our purposes, it was sufficient to generate datasets for endpoints

F. 7. Performance of the logit and cumulative normal (probit) models including patient-specific factors for (left) lung dataset 5 and (right) rectum dataset 5, compared to the performance of the models for the dataset without confounding factors. Medical Physics, Vol. 42, No. 5, May 2015

2337

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

with different dose–volume dependence, and importantly, with different levels of confounding factors. Thus, a representative range of confounding factors in the datasets is of higher importance than typical parameter values for fitted NTCP models. The former is supported by the values of AUC for the datasets which included the full range of confounding factors (∼0.65–0.75), which are typical for clinical studies. Currently, the value of the 3D simulation model is to enable new analyses, which cannot be carried out on clinical data, rather than to represent all possible clinical endpoints. In the 3D simulation model, the confounding factors were modeled mechanistically by changing the effect of the dose distribution, on a local tissue-damage level based on the patient-specific value of α, and a global organ-function level based on the patient-specific value of N0FSU. The empirical NTCP models, on the other hand, make no assumptions about the effect of included parameters, but when the patient-specific parameters were included, a correlation was found and the predictive power increased. 4.B. The importance of including patient-specific factors in NTCP models

NTCP modeling is an integral part of the individualization of radiotherapy treatments, which has become feasible with recent technical and scientific advancements.38,39 Current DVH-based models were developed to account for the influence of the dose distribution on the complication probability; this is achieved with increasing confidence as more clinical data become available. However, it is becoming evident that the performance of NTCP models, and thus their usefulness in clinical practice, is limited by confounding factors such as interpatient variations in health status and radiosensitivity. Therefore, these factors need to be identified and studied in order to further improve the models. It is expected that where treatments are planned based on iso-NTCP,38 the dose distribution does not predict the outcome,40 but rather, the complications arise in patients who are particularly susceptible to the complication due to patient-specific factors (e.g., high radiosensitivity). One way of accounting for patient-specific factors is to stratify the dataset to be analyzed into subgroups and fit the model to each independently. However, as demonstrated in Sec. 3.D, this means a loss of statistical power and difficulty to fit the model to subgroups with a very unequal division between responders and nonresponders. A better method is to account for the nondosimetric factors in the model itself. This has been done either by including both dosimetric and nondosimetric variables in a logistic regression model,16,18 or by the use of a “dose modifying factor.”17,41 With the dose modifying approach, nondosimetric factors are included in, e.g., the LKB model by adjusting the effective T D50 for different subgroups. Thus, the EUD response curve is shifted along the EUD axis, while using fixed values for the other parameters. However, in this work, an alternative for models using a cumulative normal function more similar to the logistic regression approach has been presented, where nondosimetric variables are naturally incorporated in the models. Medical Physics, Vol. 42, No. 5, May 2015

2337

When nondosimetric factors are not included in the models, the population heterogeneity is reflected in existing parameter values, in particular in the slope. This is analogous to the use of the parameter σα in the Marsden tumor control probability model,42 to account for the population “heterogeneity” in radiosensitivity. This parameter could be excluded if the patient’s radiosensitivity was known. The great improvement in model performance when including patient-specific factors highlights the importance of identifying any nondosimetric factors influencing the risk of toxicity; a comparison of Figs. 2 and 3 with 7 shows an increase in AUC from around 0.7 to values approaching unity. Some of this improvement may be due to simply including more parameters in the model (two extra when including radiosensitivity and health status), but as indicated by the relatively similar performance for models with two to four parameters in Figs. 2 and 3, the number of parameters does not make a large difference on this scale. This is also confirmed by the AIC values. The extent of the improvement of model performance when accounting for nondosimetric factors, in the current analysis, depends on the values of σα and σFSU used in the simulations. As these values are rather uncertain, the exact potential for improvement of AUC values when using real patient data cannot be predicted from the results of this study. The chosen values were briefly motivated in Sec. 2.A, and described fully in Refs. 25 and 26. When selecting these values, more information was found for the case of the lung, but moderate values were selected for the rectum simulations in order to not overestimate the impact. The interpatient variability is more likely to have been underestimated in this study, rather than overestimated, which gives confidence to the conclusion about the importance of taking nondosimetric factors into account. In order to be able to include nondosimetric factors, methods to classify individuals into risk groups need to be developed. For some factors, such as smoking status or comorbidities, this should be feasible, while for others, such as intrinsic radiosensitivity, there is currently no practical method to find out where on the scale any given patient belongs.

4.C. The AUC as a measure of model performance

In this work, the performance of the empirical NTCP models has been compared in terms of the AUC. The AUC is the probability of a responder having a higher NTCP than a nonresponder when one of each kind is randomly picked from the dataset, and thus measures how well the model rankorders the DVHs. The AUC was found to be influenced by the NTCP values (and thus the complication rate) in the dataset which the model fits were applied to; it was more difficult for the models to correctly classify DVHs with NTCP∼50% than DVHs with low or high NTCP, and thus performed worse on a dataset with many DVHs with NTCP∼50%. This effect might have influenced the results in Figs. 2 and 3, since the datasets have increasing complication rate with increasing levels of confounding factors. However, when similar datasets were generated with parameter values adjusted to produce similar

2338

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

complication rates, the results still showed a decrease in AUC with increasing levels of confounding factors. 4.D. The relative performance of NTCP models

This study has shown that the performance of NTCP models is influenced more by confounding factors than by the choice of NTCP model. This is in agreement with a recent study fitting three models to a large dataset of rectal complications following prostate radiotherapy;43 all models performed similarly but including clinical factors significantly improved the fits. However, two of the models were based on the EUD: the LKB model and a logistic model with EUD as DVH summary measure. Thus, they were essentially the same model, and the third model, the relative seriality model, was shown to perform very similarly to the LKB model. Greater differences in model performance might have been found if also a critical-volume type model had been included in the study, although our results suggest rectal bleeding is not necessarily represented better by using a critical-volume model (see Sec. 3.B). Indeed, given recent reports that the spatial distribution of damage in the rectal wall is important,36,37 the predictive power of all DVH-based models is expected to be limited. The limited heterogeneity among the DVHs used in the current simulations (Fig. 1) may have contributed to the limited difference in model performance observed in this study. The DVHs were generated with a particular treatment technique in mind, but including different techniques (e.g., IMRT and 3D conformal radiotherapy) might have pushed the models to more heterogeneous behavior. A good empirical NTCP model should convert the DVH into a summary measure which is relevant for organ function. Ideally, the model would predict either 0% or 100% NTCP with a very small confidence interval for all plans, because then a safe plan could be distinguished with certainty from one which would cause a complication. However, this is generally not possible since (1) an empirical NTCP model cannot perfectly summarize the relevant features of a dose distribution, (2) not only the dose distribution but also patientspecific factors (confounding factors) influence the outcome. Our results show that accounting for patient-specific factors can reveal differences in model performance. The superior performance of the critical-volume models for our lung datasets might, or might not, translate to clinical data. However, the performance of an empirical model also depends on how easily it can be fitted to a dataset, and it has been reported that one of the four parameters of the critical-volume model cannot be determined with accuracy from clinical datasets.8,9,44 Also, as shown by van Luijk et al.,44 strong correlations between model parameters make it difficult to determine confidence intervals for the parameter values. On a practical note, the mean dose model and the 4parameter critical-volume models had the advantage of easily converging to the same optimal solution for different starting points in the parameter space, using the maximum likelihood method. The VDth model, on the other hand, was quite unstable Medical Physics, Vol. 42, No. 5, May 2015

2338

and the starting point often had to be close to the optimal solution for the maximum likelihood method to find it. In this case, the more time-consuming simulated annealing approach was used, as well as a visual inspection of the full parameter probability distribution. The fact that the empirical models considered in this work are all based on the DVH means that they ignore any spatial effects of the dose distribution. Thus, such effects would act as another confounding factor in terms of linking the DVH to the outcome. Accounting for relevant spatial effects would further improve the performance of the models. 5. CONCLUSIONS Our results indicate that the influence of confounding factors on the performance of NTCP models is greater than the choice of model. The great improvement in the performance of the models when relevant nondosimetric factors were included motivates efforts to find methods to identify and quantify such factors. Confounding factors can be accounted for by extending NTCP models to include nondosimetric variables. The relative importance of different parameters is expected to vary between datasets. In order to increase the usefulness of predictive NTCP models for clinical practice, relevant (endpoint-specific) parameters summarizing the effect of a dose distribution need to be identified, as well as the patient-specific factors influencing the outcome. The identification of relevant dosimetric parameters on the one hand, and nondosimetric parameters on the other, needs to be done in parallel, as any unknown factors act as confounding factors which might prevent the detection of differences in model performance.

ACKNOWLEDGMENT Project funded by School of Health Sciences, University of Liverpool, Liverpool, L69 3BX, UK.

APPENDIX: DVH-BASED NTCP MODELS Most empirical NTCP models reduce the DVH to a single summary measure, φ, under some assumptions of an organspecific volume effect, reasoning that the architecture of an organ can make it especially vulnerable to specific features of a dose distribution. Generally, one or two parameters are used in the DVH reduction expression. The resulting summary measure is then related to the NTCP using a sigmoid function which uses two additional parameters, e.g., the value of φ resulting in 50% complication probability, and the slope of the curve at this point.8,9 The most common sigmoid functions for modeling binary data are the logit and the cumulative normal functions32 (Chap. 3), while in NTCP modeling also an exponential binomial or Poisson function is often used,31,45,46 since this provides a direct link to radiation cell kill through the LQ model.47 The most commonly used NTCP model, the LKB model27,28 uses the cumulative normal function.

2339

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

For a chosen summary measure, φ, the cumulative normal function gives the NTCP,  t 1 2 e−x /2dx, (A1) NTCP = √ 2π −∞

2. The critical-volume model

This section will focus on an empirical model used by the Dutch NKI group,9,29 which was developed as a simplification of the original mechanistic critical-volume model. For the critical-volume model, the local dose-effect function is

where φ − φ50 t= . m · φ50

(A2)

Here, the value of φ associated with 50% NTCP, φ50, and the relative standard deviation, m, are two parameters which are fitted to clinical data. Some of the DVH summary measures most commonly used in Eq. (A2) are summarized below. A commonly used model with a “reversed” structure is the relative seriality model (Appendix 3). With this model, doses are first translated to probabilities, and then a probability–volume histogram is reduced to an NTCP, based on assumptions about the volume effect. In the presentations which follow, it is assumed that all doses have been normalized with respect to fraction size and expressed as 2 Gy/fraction-equivalent dose (EQD2). 1. The LKB model

The original LKB model27,28 uses three parameters: T D50 and m, which are equivalent to φ50 and m above, and n which is the volume-effect parameter. A large value of n indicates a large volume effect. This is used to calculate an “effective volume”  ( Di ) 1/n Vi , (A3) Veff = DR Vtot i where D R is a reference dose, normally the maximum or prescription dose. The upper bound in the integration of the normal distribution (A1) is

E(Di ) =

1 , 1 + (D50/Di )k

(A7)

and the summary measure (which can be used with the cumulative normal function) is the relative damaged volume,  Vi φ = * E(Di ) + . (A8) V tot , i The model is named after the critical damaged volume parameter, φ50. The other three parameters of the criticalvolume model are m, D50, and k. When this model is fitted to clinical or experimental data, it is often difficult to find a unique best estimate for both local dose-effect parameters, especially the parameter representing the slope of the local dose–response function, k.8,9,44 Therefore, it is more efficient to assume that the slope of the curve is known, which reduces the model to a threeparameter model. Jin et al.30 have proposed the following one-parameter sigmoid function for the local dose effect: Di /D50 − 1   + 1/2   E(Di ) =  1 + (Di /D50 − 1)2  1 

when Di < 2D50 when Di ≥ 2D50,

,

(A9)

which assumes a moderate slope and is symmetric around D50. This function replaces Eq. (A7) in Eq. (A8). Where a steep dose–response function is more appropriate, the VDth model is to be preferred.9 This model assumes that volumes receiving doses above D50 are completely damaged, while volumes with lower doses are spared. The local doseeffect function is  0 E(Di ) =  1 

D R −T D50PR t= , m ·T D50PR

2339

Di ≤ D50 . Di ≥ D50

(A10)

where −n T D50PR = T D50 · Veff .

(A4)

However, the most common parametrization of LKB model reduces the DVH to the generalized equivalent uniform dose (EUD), introduced by Niemierko48 n  1/n Vi + * . EUD = Di Vtot , i

(A5)

Here, φ = EUD and φ50 is generally called T D50, and NTCP is calculated using Eqs. (A1) and (A2). This parametrization is equivalent to the original one. A two-parameter version of the LKB model is the mean dose model. Here, n = 1 and EUD is equal to the mean dose  Vi +. EUD = * Di V tot i , Medical Physics, Vol. 42, No. 5, May 2015

(A6)

3. The relative seriality model

Unlike the models described above, the relative seriality model31 is not based on the cumulative normal function. The DVH is expressed as a probability–volume histogram (also called a risk histogram23), P(Di ) = 2−exp(eγ(1−D i /D50)),

(A11)

where D50 is the dose causing 50% NTCP if the organ is uniformly irradiated, and γ is the slope at the steepest point of the curve. The NTCP is given by 1/s  (1 − P(Di )s )Vi /Vtot+ , NTCP = *1 − (A12) i , where s is the volume-effect parameter. For high values of s (∼1), this function assumes that functional subunits are organized predominantly in a serial manner, while for low

2340

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

values of s, a large number of functional subunits are assumed to function independently of each other (i.e., a parallel-type response). a)Electronic

mail: [email protected] “ICRU report 83: Prescribing, recording, and reporting intensity-modulated photon-beam therapy (IMRT),” J. ICRU 10, 27–40 (2010). 2X. A. Li, M. Alber, J. O. Deasy, A. Jackson, K.-W. K. Jee, L. B. Marks, M. K. Martel, C. Mayo, V. Moiseenko, A. E. Nahum, A. Niemierko, V. A. Semenenko, and E. D. Yorke, “The use and QA of biologically related models for treatment planning: Short report of the tg-166 of the therapy physics committee of the aapm,” Med. Phys. 39, 1386–1409 (2012). 3A. C. Houweling, M. E. P. Philippens, T. Dijkema, J. M. Roesink, C. H. J. Terhaard, C. Schilstra, R. K. T. Haken, A. Eisbruch, and C. P. J. Raaijmakers, “A comparison of dose–response models for the parotid gland in a large group of head-and-neck cancer patients,” Int. J. Radiat. Oncol., Biol., Phys. 76, 1259–1265 (2010). 4N. Pettersson, J. Nyman, and K.-A. Johansson, “Radiation-induced rib fractures after hypofractionated stereotactic body radiation therapy of nonsmall cell lung cancer: A dose- and volume-response analysis,” Radiother. Oncol. 91, 360–368 (2009). 5S. L. Tucker, H. H. Liu, S. Wang, X. Wei, Z. Liao, R. Komaki, J. D. Cox, and R. Mohan, “Dose-volume modeling of the risk of postoperative pulmonary complications among esophageal cancer patients treated with concurrent chemoradiotherapy followed by surgery,” Int. J. Radiat. Oncol., Biol., Phys. 66, 754–761 (2006). 6A. I. Blanco, K. S. C. Chao, I. E. Naqa, G. E. Franklin, K. Zakarian, M. Vicic, and J. O. Deasy, “Dose-volume modeling of salivary function in patients with head-and-neck cancer receiving radiotherapy,” Int. J. Radiat. Oncol., Biol., Phys. 62, 1055–1069 (2005). 7T. Rancati, C. Fiorino, G. Gagliardi, G. M. Cattaneo, G. Sanguineti, V. C. Borca, C. Cozzarini, G. Fellin, F. Foppiano, G. Girelli, L. Menegotti, A. Piazzolla, V. Vavassori, and R. Valdagni, “Fitting late rectal bleeding data using different NTCP models: Results from an Italian multi-centric study (AIROPROS0101),” Radiother. Oncol. 73, 21–32 (2004). 8S. L. Tucker, R. Cheung, L. Dong, H. H. Liu, H. D. Thames, E. H. Huang, D. Kuban, and R. Mohan, “Dose-volume response analyses of late rectal bleeding after radiotherapy for prostate cancer,” Int. J. Radiat. Oncol., Biol., Phys. 59, 353–365 (2004). 9Y. Seppenwoolde, J. V. Lebesque, K. de Jaeger, J. S. A. Belderbos, L. J. Boersma, C. Schilstra, G. T. Henning, J. A. Hayman, M. K. Martel, and R. K. T. Haken, “Comparing different NTCP models that predict the incidence of radiation pneumonitis. Normal tissue complication probability,” Int. J. Radiat. Oncol., Biol., Phys. 55, 724–735 (2003). 10C. Schilstra and H. Meertens, “Calculation of the uncertainty in complication probability for various dose–response models, applied to the parotid gland,” Int. J. Radiat. Oncol., Biol., Phys. 50, 147–158 (2001). 11G. Defraene, L. V. den Bergh, A. Al-Mamgani, K. Haustermans, W. Heemsbergen, F. V. den Heuvel, and J. V. Lebesque, “The benefits of including clinical factors in rectal normal tissue complication probability modeling after radiotherapy for prostate cancer,” Int. J. Radiat. Oncol., Biol., Phys. 82, 1233–1242 (2012). 12E. X. Huang, J. D. Bradley, I. E. Naqa, A. J. Hope, P. E. Lindsay, W. R. Bosch, J. W. Matthews, W. T. Sause, M. V. Graham, and J. O. Deasy, “Modeling the risk of radiation-induced acute esophagitis for combined washington university and RTOG trial 93-11 lung cancer patients,” Int. J. Radiat. Oncol., Biol., Phys. 82, 1674–1679 (2012). 13P. Jenkins and J. Watts, “An improved model for predicting radiation pneumonitis incorporating clinical and dosimetric variables,” Int. J. Radiat. Oncol., Biol., Phys. 80, 1023–1029 (2011). 14T. Rancati, C. Fiorino, G. Fellin, V. Vavassori, E. Cagna, V. C. Borca, G. Girelli, L. Menegotti, A. F. Monti, F. Tortoreto, S. D. Canne, and R. Valdagni, “Inclusion of clinical risk factors into NTCP modelling of late rectal toxicity after high dose radiotherapy for prostate cancer,” Radiother. Oncol. 100, 124–130 (2011). 15C. Dehing-Oberije, D. D. Ruysscher, A. van Baardwijk, S. Yu, B. Rao, and P. Lambin, “The importance of patient characteristics for the prediction of radiation-induced lung toxicity,” Radiother. Oncol. 91, 421–426 (2009). 1ICRU,

Medical Physics, Vol. 42, No. 5, May 2015

16O.

2340

Gayou, S. K. Das, S.-M. Zhou, L. B. Marks, D. S. Parda, and M. Miften, “A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes,” Med. Phys. 35, 5426–5433 (2008). 17S. L. Tucker, H. H. Liu, Z. Liao, X. Wei, S. Wang, H. Jin, R. Komaki, M. K. Martel, and R. Mohan, “Analysis of radiation pneumonitis risk using a generalized Lyman model,” Int. J. Radiat. Oncol., Biol., Phys. 72, 568–574 (2008). 18I. El Naqa, J. Bradley, A. I. Blanco, P. E. Lindsay, M. Vicic, A. Hope, and J. O. Deasy, “Multivariable modeling of radiotherapy outcomes, including dose-volume and clinical factors,” Int. J. Radiat. Oncol., Biol., Phys. 64, 1275–1286 (2006). 19T. Rancati, G. L. Ceresoli, G. Gagliardi, S. Schipani, and G. M. Cattaneo, “Factors predicting radiation pneumonitis in lung cancer patients: A retrospective study,” Radiother. Oncol. 67, 275–283 (2003). 20L. B. Marks, M. T. Munley, G. C. Bentel, S. M. Zhou, D. Hollis, C. Scarfone, G. S. Sibley, F. M. Kong, R. Jirtle, R. Jaszczak, R. E. Coleman, V. Tapson, and M. Anscher, “Physical and biological predictors of changes in wholelung function following thoracic irradiation,” Int. J. Radiat. Oncol., Biol., Phys. 39, 563–570 (1997). 21Here ‘dose’ is any dose-volume parameter of relevance for the endpoint of interest, e.g., mean dose. 22E. Rutkowska, C. Baker, and A. Nahum, “Mechanistic simulation of normaltissue damage in radiotherapy–implications for dose-volume analyses,” Phys. Med. Biol. 55, 2121–2136 (2010). 23A. Jackson, G. J. Kutcher, and E. D. Yorke, “Probability of radiation-induced complications for normal tissues with parallel architecture subject to nonuniform irradiation,” Med. Phys. 20, 613–625 (1993). 24J. M. Michalski, H. Gay, A. Jackson, S. L. Tucker, and J. O. Deasy, “Radiation dose-volume effects in radiation-induced rectal injury,” Int. J. Radiat. Oncol., Biol., Phys. 76, S123–S129 (2010). 25E. S. Rutkowska, I. Syndikus, C. R. Baker, and A. E. Nahum, “Mechanistic modelling of radiotherapy-induced lung toxicity,” Br. J. Radiol. 85, e1242–e1248 (2012). 26E. Rutkowska, “Mechanistic simulation of normal-tissue damage in radiotherapy,” Ph.D. thesis, University of Liverpool, 2010. 27G. J. Kutcher and C. Burman, “Calculation of complication probability factors for non-uniform normal tissue irradiation: The effective volume method,” Int. J. Radiat. Oncol., Biol., Phys. 16, 1623–1630 (1989). 28J. T. Lyman, “Complication probability as assessed from dose-volume histograms,” Radiat. Res. 104(Suppl. 8), S13–S19 (1985). 29L. J. Boersma, E. M. Damen, R. W. de Boer, S. H. Muller, C. M. Roos, R. A. V. Olmos, N. van Zandwijk, and J. V. Lebesque, “Dose–effect relations for local functional and structural changes of the lung after irradiation for malignant lymphoma,” Radiother. Oncol. 32, 201–209 (1994). 30J.-Y. Jin, F.-M. Kong, I. J. Chetty, M. Ajlouni, S. Ryu, R. T. Haken, and B. Movsas, “Impact of fraction size on lung radiation toxicity: Hypofractionation may be beneficial in dose escalation of radiotherapy for lung cancers,” Int. J. Radiat. Oncol., Biol., Phys. 76, 782–788 (2010). 31P. Källman, A. Ågren, and A. Brahme, “Tumour and normal tissue responses to fractionated non-uniform dose delivery,” Int. J. Radiat. Biol. 62, 249–262 (1992). 32D. Collett, Modelling Binary Data, 2nd ed. (Chapman & Hall/CRC, Boca Raton, FL, 2003). 33K. P. Burnham and D. R. Anderson, Model Selection and Multimodel Inference: A Practical Information-theoretic Approach, 2nd ed. (SpringerVerlag, New York, NY, 2002). 34P. McCullagh and J. A. Nelder, Genearalized Linear Models (Chapman and Hall, New York, London, 1983). 35A. Niemierko, “Modeling of normal tissue response to radiation: The critical volume model,” Int. J. Radiat. Oncol., Biol., Phys. 25, 135–145 (1993). 36S. L. Tucker, M. Zhang, L. Dong, R. Mohan, D. Kuban, and H. D. Thames, “Cluster model analysis of late rectal bleeding after IMRT of prostate cancer: A case-control study,” Int. J. Radiat. Oncol., Biol., Phys. 64, 1255–1264 (2006). 37F. Buettner, S. L. Gulliford, S. Webb, M. R. Sydes, D. P. Dearnaley, and M. Partridge, “Assessing correlations between the spatial distribution of the dose to the rectal wall and late rectal toxicity after prostate radiotherapy: An analysis of data from the MRC RT01 trial (ISRCTN 47772397),” Phys. Med. Biol. 54, 6535–6548 (2009).

2341

Onjukka, Baker, and Nahum: Modelling NTCP including confounding factors

38A. E. Nahum and J. Uzan, “(Radio)biological optimization of external-beam

radiotherapy,” Comput. Math. Methods Med. 2012, 329214. 39J. Uzan and A. E. Nahum, “Radiobiologically guided optimisation of the prescription dose and fractionation scheme in radiotherapy using BioSuite,” Br. J. Radiol. 85, 1279–1286 (2012). 40M. Alber, “Normal tissue dose–effect models in biological dose optimisation,” Z. Med. Phys. 18, 102–110 (2008). 41S. T. H. Peeters, M. S. Hoogeman, W. D. Heemsbergen, A. A. M. Hart, P. C. M. Koper, and J. V. Lebesque, “Rectal bleeding, fecal incontinence, and high stool frequency after conformal radiotherapy for prostate cancer: Normal tissue complication probability modeling,” Int. J. Radiat. Oncol., Biol., Phys. 66, 11–19 (2006). 42A. E. Nahum and B. Sanchez-Nieto, “Tumour control probability modelling: Basic principles and applications in treatment planning,” Phys. Med. 17(Suppl. 2), 13–23 (2001). 43G. Defraene, L. V. den Bergh, A. Al-Mamgani, K. Haustermans, W. Heemsbergen, F. V. den Heuvel, and J. Lebesque, “Importance of including clinical

Medical Physics, Vol. 42, No. 5, May 2015

2341

factors in rectal NTCP modeling after radiotherapy for prostate cancer,” Int. J. Radiat. Oncol., Biol., Phys. 82, 1233–1242 (2012). 44P. van Luijk, T. C. Delvigne, C. Schilstra, and J. M. Schippers, “Estimation of parameters of dose-volume models and their confidence limits,” Phys. Med. Biol. 48, 1863–1884 (2003). 45J. O. Deasy, A. Niemierko, D. Herbert, D. Yan, A. Jackson, R. K. T. Haken, M. Langer, and S. Sapareto, “Methodological issues in radiation dosevolume outcome analyses: Summary of a joint AAPM/NIH workshop,” Med. Phys. 29, 2109–2127 (2002). 46R. P. Hill, H. P. Rodemann, J. H. Hendry, S. A. Roberts, and M. S. Anscher, “Normal tissue radiobiology: From the laboratory to the clinic,” Int. J. Radiat. Oncol., Biol., Phys. 49, 353–365 (2001). 47G. Steel, “Dose fractionation in radiotherapy,” in Handbook of Radiotherapy Physics, edited by P. Mayles, A. E. Nahum, and J. Rosenwald (Taylor & Francis, Boca Raton, London, 2007), Chap. 9, pp. 163–178. 48A. Niemierko, “A generalized concept of equivalent uniform dose (EUD),” Med. Phys. 26, 1100 (1999).

The performance of normal-tissue complication probability models in the presence of confounding factors.

This work explores different methods for accounting for patient-specific factors in normal-tissue complication probability (NTCP) modeling, and compar...
4MB Sizes 2 Downloads 7 Views