International Journal of

Radiation Oncology biology

physics

www.redjournal.org

Clinical Investigation

Quantifying Unnecessary Normal Tissue Complication Risks due to Suboptimal Planning: A Secondary Study of RTOG 0126 Kevin L. Moore, PhD,* Rachel Schmidt, BS,y Vitali Moiseenko, PhD,* Lindsey A. Olsen, MS,z Jun Tan, PhD,z Ying Xiao, PhD,x James Galvin, PhD,x Stephanie Pugh, PhD,jj Michael J. Seider, PhD, MD,{ Adam P. Dicker, MD,x Walter Bosch, DSc,z Jeff Michalski, MD,z and Sasa Mutic, PhDz *Department of Radiation Medicine and Applied Sciences, University of California San Diego, La Jolla, California; yDepartment of Physics, Fort Hays State University, Hays, Kansas; zDepartment of Radiation Oncology, Washington University in St. Louis, St. Louis, Missouri; xThomas Jefferson University Hospital, Philadelphia, Pennsylvania; jjNRG Oncology Statistics and Data Management Center, Philadelphia, Pennsylvania; and {Akron City Hospital, Akron, Ohio Received Nov 7, 2014, and in revised form Jan 27, 2015. Accepted for publication Jan 28, 2015.

Summary Intensity modulated radiation therapy prostate plans from Radiation Therapy Oncology Group study 0126 were analyzed with knowledge-based dose-volume histogram (DVH) predictions to quantify clinical effects of plan quality deficiencies. Focusing on grade 2 late rectal toxicities with an outcomes-validated Lyman-Kutcher-Burman

Purpose: The purpose of this study was to quantify the frequency and clinical severity of quality deficiencies in intensity modulated radiation therapy (IMRT) planning in the Radiation Therapy Oncology Group 0126 protocol. Methods and Materials: A total of 219 IMRT patients from the high-dose arm (79.2 Gy) of RTOG 0126 were analyzed. To quantify plan quality, we used established knowledge-based methods for patient-specific dose-volume histogram (DVH) prediction of organs at risk and a Lyman-Kutcher-Burman (LKB) model for grade 2 rectal complications to convert DVHs into normal tissue complication probabilities (NTCPs). The LKB model was validated by fitting dose-response parameters relative to observed toxicities. The 90th percentile (22 of 219) of plans with the lowest excess risk (difference between clinical and model-predicted NTCP) were used to create a model for the presumed best practices in the protocol (pDVH0126,top10%). Applying the resultant model to the entire sample enabled comparisons between DVHs that patients could have received to DVHs they actually received. Excess risk quantified

Reprint requests to: Kevin L. Moore, PhD, Department of Radiation Medicine and Applied Sciences, University of California-San Diego, 3960 Health Sciences Dr, MC: 0865, La Jolla, CA 92093. Tel: (858) 8226056; E-mail: [email protected] This project was supported by Radiation Therapy Oncology Group grant U10 CA21661, Advanced Technology Consortium grant U24 CA81647 from the National Cancer Institute, and an American Association of Physicists in Medicine undergraduate research fellowship (R.S.). Int J Radiation Oncol Biol Phys, Vol. 92, No. 2, pp. 228e235, 2015 0360-3016/$ - see front matter Ó 2015 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ijrobp.2015.01.046

The contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute. Conflict of interest: Drs Moore and Olsen have received research grants from, are consultants for, and hold licensing agreements with Varian Medical Systems. Dr Mutic has received research grants from, is a consultant for, and holds a licensing agreement with Varian Medical Systems; is an owner of Radialogica LLC; and holds a licensing agreement with Modus Medical.

Volume 92  Number 2  2015

model, comparisons between rectal and model-predicted DVHs yielded absolute excess risks from suboptimal planning of 94 of 219 patients (42.9%) with >5% excess risk, 20 of 219 patients (9.1%) with >10% excess risk, and 2 of 219 patients (0.9%) with >15% excess risk.

Quantifying risks of suboptimal planning

229

the clinical impact of suboptimal planning. Accuracy of pDVH predictions was validated by replanning 30 of 219 patients (13.7%), including equal numbers of presumed “high-quality,” “low-quality,” and randomly sampled plans. NTCP-predicted toxicities were compared to adverse events on protocol. Results: Existing models showed that bladder-sparing variations were less prevalent than rectum quality variations and that increased rectal sparing was not correlated with target metrics (dose received by 98% and 2% of the PTV, respectively). Observed toxicities were consistent with current LKB parameters. Converting DVH and pDVH0126,top10% to rectal NTCPs, we observed 94 of 219 patients (42.9%) with 5% excess risk, 20 of 219 patients (9.1%) with 10% excess risk, and 2 of 219 patients (0.9%) with 15% excess risk. Replanning demonstrated the predicted NTCP reductions while maintaining the volume of the PTV receiving prescription dose. An equivalent sample of high-quality plans showed fewer toxicities than low-quality plans, 6 of 73 versus 10 of 73 respectively, although these differences were not significant (PZ.21) due to insufficient statistical power in this retrospective study. Conclusions: Plan quality deficiencies in RTOG 0126 exposed patients to substantial excess risk for rectal complications. Ó 2015 Elsevier Inc. All rights reserved.

Introduction Although intensity modulated radiation therapy (IMRT) has become the standard treatment for several cancers, singleinstitution (1, 2) and single-dataset (3) plan quality investigations have shown that patients who should have had low-risk organ-at-risk (OAR) dose-volume histograms (DVHs) were at considerably higher risk due to suboptimal planning. Furthermore, several studies of large-scale clinical trials found that nonadherence to protocol guidelines was correlated with worsened outcome (4-8). Recent work has focused on developing models trained from previous patients to predict achievable OAR DVHs for individual patients (9, 10). These methods assist clinicians by identifying suboptimal treatment plans as those where DVHs deviate greatly from model predictions. As these estimations are available for new patients based on their individual anatomy, this quality control (QC) mechanism can also form the basis for automated treatment planning (9, 11, 12). The full clinical impact of suboptimal planning is, as yet, unknown. The purpose of this work was to combine model-based treatment plan QC with a multi-institutional clinical trial to assess the frequency and clinical severity of suboptimal treatment planning on a large scale. NRG Oncology’s Radiation Therapy Oncology Group (RTOG) 0126 protocol, A Phase III Randomized Study of High Dose 3DCRT/IMRT versus Standard Dose 3DCRT/IMRT in Patients Treated for Localized Prostate Cancer (13), was selected to study treatment plan quality variations. With accrual of 1532 patients from 88 participating institutions, RTOG 0126 ran from October 2004 to March 2010, with the final trial results recently presented (14). This work focused on high-dose IMRT patients, with 219 fitting inclusion criteria. To our knowledge, this represents both the largest and the most institutionally diverse IMRT plan quality survey to date. Although this study can only draw conclusions on the analyzed plans, by extension, the

observed quality variations in RTOG 0126 give insight into the need for enacting QC measures in other multiinstitutional RT trials, as well as the importance of eliminating suboptimal planning in the wider practice of RT.

Methods and Materials Patient data A total of 226 IMRT patients were available for analysis in the RTOG 0126 high-dose arm (79.2 Gy in 44 fractions). Clinical target volume (CTV) was defined as prostate-plusproximal seminal vesicles; PTV margins were 5 to 10 mm (13). Because of known negative correlations between target coverage and OAR sparing, 6 outliers with respect to PTV coverage (D98% Z dose received by 98% of the volume) were censored with a threshold of D98% > 95%, where D98% Z [84.0%; 88.4%; 93.0%; 93.1%; 93.4%; and 94.9%]. Plans in 1 case (D98% Z 88.4%) underdosed the PTV protecting the penile bulb, but the remaining cases protected the rectum. One additional patient was censored because a hip prosthesis necessitated nonstandard beam angles. Thus, 219 patients were used for this study.

Predictive DVH program Following Appenzoller et al (9), we used a model-based QC paradigm that predicted achievable DVHs based on statistical analysis of previously treated patients. Briefly, this method quantifies the correlation between dose in an OAR voxel and its geometric relationship to the PTV. The primary geometric quantity of interest is the boundary distance, defined as the distance between an OAR voxel and the closest voxel on the PTV. OAR subvolumes, defined as voxels that share a range of boundary distances, represent finite volumes with their own differential DVHs. Subvolume DVHs of prior plans comprise a training cohort,

230

International Journal of Radiation Oncology  Biology  Physics

Moore et al.

subsequently fitted by a 3-parameter skew-normal distribution. The parameter evolution feeds the model’s DVH predictions for individual subvolumes; summation over all subvolumes yields full DVH prediction (pDVH) based on the quantitative experience of the training cohort.

“Best practices” determination with pDVH models An existing prostate pDVH model was used to evaluate rectal and bladder sparing in the RTOG 0126 cohort. This pDVH model was trained from 20 high-quality prostate plans culled from a random sample of 100 patients treated at the University of California, San Diego (UCSD), for which selection criteria were maximal OAR sparing; PTV coverage was consistently set at V100%  97%, where V100% is equal to the volume receiving the prescription dose. Training plans were treated in the 2008 to 2012 time frame by using 7-field technique, 15-MV photons, dynamic multi-leaf collimator delivery, and optimization in Eclipse (Varian Medical Systems, Palo Alto, CA). All parameters would satisfy the RTOG 0126 protocol constraints, although we note for completeness that standard PTV margins (3 mm posterior, 7 mm elsewhere) were different than those used in the RTOG 0126 sample (minimum 5mm around the CTV); ultimately, these margin differences make no difference to the results of this study because this cohort was used only for initial quality stratification. The existing pDVH model was used to initially assess quality variations in the OARs and to locate presumed highquality plans for best practices model training. DVH cutpoints at V40, V65, and V75 assessed intermediate, high, and near-prescription doses, respectively for bladder and rectum (15, 16). As described in Results, the highest quality plans were found to be those that maximally spared the rectum. To ensure that PTV metrics were not unduly sacrificed, Pearson correlation coefficients (r) were computed among PTV D98% and D2% and rectum, DV75ZDVHð75 GyÞ  pDVHð75 GyÞ, to determine whether achieving predicted rectal sparing was correlated with compromise of target coverage and/or heterogeneity.

Excess risk quantification Normal tissue complication probability (NTCP) models provide quantification of the hazard posed by particular DVHs. The well-established Lyman-Kutcher-Burman (LKB) model is commonly cast with the effective dose to the whole organ, calculated as: X n 1=n Deff Z ðDi Þ vi ð1Þ i

where Di P is the dose to the differential volume vi for the ith dose bin i vi Z1. The parameter n describes the volume effect. Complication probability is thus: Zt 1 x2 NTCP Z pffiffiffiffiffiffi e 2 dx ð2Þ 2p N

where Deff  TD50 ð3Þ m  TD50 TD50 is the dose at which 50% are predicted to develop complications, and m describes the steepness of the dose-response curve. For grade 2 rectal toxicities, quantitative analysis of normal tissue effects in the clinic (QUANTEC) recommendations for LKB parameters were fTD50Z76:9 Gy; mZ0:13; nZ0:09g (17). The excess risk posed by clinical DVHs over pDVH predictions is NTCPðDVHÞ  NTCPðpDVHÞ. The presumed highest quality plans were identified as those plans with the smallest excess risk over the NTCP predicted by the UCSD pDVH models. The 10% (22 of 219 patients) of plans with the smallest NTCPðDVHÞ  NTCPðpDVHUCSD Þ were used to train a new model, pDVH0126,top10%. The resultant pDVH0126,top10% model represents attainable DVHs according protocol best practices. By applying the pDVH0126,top10% model to the entire cohort, we benchmarked all plans relative to the presumed 90th percentile of plan quality in RTOG 0126. This approach yielded the primary objective of this analysis: quantifying the excess complication risk to which patients were exposed due to suboptimal IMRT planning. Wilcoxon rank-sum test was used to compare excess risk quantification between groups of NTCPðpDVH0126;top10% Þ to assess any correlations of plan quality with the degree of predicted risk. tZ

Validation 1: Comparing the NTCP model with observed late rectal toxicities QUANTEC cautions against naı¨vely extending NTCP models based on 3-dimensional conformal RT (3D-CRT) data to IMRT (17). The first stage of validation involved confirming that the QUANTEC LKB model was consistent with toxicities in the protocol sample. The QUANTEC LKB parameters TD50 and m were refitted using the 219patient sample. Best fitting values were obtained using maximum likelihood analysis (18), 95% confidence intervals were calculated using profile likelihood method, and c2 or degree-of-freedom goodness-of-fit were used to compare dose-response curves.

Validation 2: Replanning the study to confirm model predictions for rectal sparing and NTCPs Definitive validation of pDVH predictions can only be accomplished by direct replanning. For samples of this size it was not practical to manually replan every patient, so a subset was selected for replan validation. Using the pDVH0126,top10% model, we selected the 10 presumed “high-quality” plans (lowest excess risk), the 10 presumed “low-quality” plans (highest excess risk), and 10 plans selected at random, collectively representing 13.7% (30 of 219) of the cohort.

Volume 92  Number 2  2015

Quantifying risks of suboptimal planning

predicted NTCP, high excess risk; HL Z high predicted NTCP, low excess risk; and HH Z high predicted NTCP, high excess risk. The number of patients (N) in each group was NLL, NLH, NHL, and NHH; and the number of toxicity events (E) in each group was ELL, ELH, EHL, and EHH. Regardless of low/high thresholds for predicted NTCP and excess risk, it was found that very few patients could be reasonably classified in the HH group. Thus, the respective low/high thresholds were set by attempting to equalize the populations NLL Z NLH Z (NHL þ NHH). Given the inequality between NHL and NHH, it was not possible to compare outcomes in the high absolute risk group, but as NLL Z NLH, the observed toxicities in the suboptimal group (ELH/NLH) could be compared against a roughly equivalent group with higher quality plans (ELL/NLL). Onesided Fisher exact test was used to assess whether LH and LL stratification yielded statistically significant differences in outcomes. The LKB model also allowed a comparison of the observed toxicities in each category to the predicted rate PNxy from the expectation value hExx =Nxx i Z N1xy iZ1 NTCPi .

All replanning used 7 static fields and 15-MV photons and was optimized using a commercial planning system (Eclipse version 10; Varian Medical Systems). Replanning guidelines were: (1) to maintain PTV volume receiving prescription dose (V100%); (2) to ensure the PTV high dose did not exceed minor deviation levels (D2% < 110%); (3) to maintain or improve bladder DVH; and (4) to use pDVH0126,top10% as a guide to improve rectal sparing as much as possible. In addition to comparisons among clinical DVHs, replanned DVHs, and pDVHs, the 30 replanned rectal NTCPs were compared to model-predicted NTCPs to validate excess risk predictions.

Validation 3: Effect of suboptimal planning on observed toxicities Excess risk gives the means to compare outcomes between patients who received suboptimal plans and those who received superior IMRT. However, a complicating factor emerged in that patients with larger predicted NTCP generally had lower excess risk, so simply grouping the sample into high and low excess risk categories did not yield statistically equivalent samples in terms of exposed NTCP. Thus, each patient was put into 1 of 4 categories: LL Z low predicted NTCP, low excess risk; LH Z low

b

RECTUM V40

Results Applying existing pDVH models to the RTOG 0126 cohort showed that a clear majority of plans gave excess dose to

c

RECTUM V65

Clinical V65

Clinical V40

80% 60% 40%

40%

20%

20% 0%

0% 20%

40%

60%

80%

0%

100%

Model predicted V40

40%

Average clinical rectum DVH

60% 40%

Average model rectum pDVH

20%

0

20

40

60

80

60

80

Dose (Gy)

f

100%

BLADDER

60%

Clinical V65

80%

Clinical V40

80%

0%

60%

BLADDER V65

e

100%

60% 40%

40%

20%

20% 0%

0% 0%

RECTUM

Model predicted V65

BLADDER V40

d

20%

Normalized Volume

0%

100%

60%

100%

Normalized Volume

a

231

20%

40%

60%

80%

Model predicted V40

100%

80% Average clinical bladder DVH

60% 40%

Average model bladder pDVH

20% 0%

0%

20%

40%

Model predicted V65

60%

0

20

40

Dose (Gy)

Fig. 1. Rectum and bladder DVHs compared to University of California, San Diego model-predicted pDVHs. Scatter plots of specific rectum DVH points (a) V40 and (b) V65 show that the large majority of treatment plans are delivering more rectal dose than needed. (c) Averaging across the 219-patient sample shows the spread between clinical rectum DVHs and pDVHs. Scatter plots of (d) V40 and (e) V65 in the bladder, as well as the (f) average bladder DVH and pDVH demonstrate greater agreement between clinical DVHs and bladder pDVHs. DVH Z dose-volume histogram; pDVH Z predicted DVH.

232

International Journal of Radiation Oncology  Biology  Physics

Moore et al.

the rectum, as seen in scatter plots of V40 and V65 for clinical DVHs and institutional pDVHs (Fig. 1a and b). Averaging all rectum DVHs and pDVHs (Fig. 1c) showed the degree to which rectal sparing was unrealized in the clinical sample. Comparison of V40 and V65 cutpoints for bladder DVHs and pDVHs (Fig. 1d and e) exhibited less dramatic overdosing than that for the rectum, and comparison of the average bladder DVH and pDVH (Fig. 1f) similarly showed much less disagreement. The better concordance between DVHs and pDVHs in the bladder could have been due to better planning for this organ or, perhaps as likely, the fact that bladder DVHs are simply less sensitive than the rectum to plan quality deficiencies. Given this, as well as the challenges in converting bladder DVHs to urinary complications (19), rectal sparing was chosen as the primary quality marker. To ascertain whether increased rectal sparing was detrimental to PTV quality measures, the protocolspecified metrics D98% and D2% were compared to DV75Z DVHð75 GyÞ  pDVHð75 GyÞ. There was no correlation for either variable, with rZ0.11 for D98% and DV75 and rZ0.12 for D2% and DV75, implying that achieving the pDVH-predicted rectal sparing would not have affected the target metrics. LKB model parameter values obtained for the protocol toxicity data were TD50 Z 75.5 Gy (95% confidence interval: 72.1, 100.8) and m Z 0.10 (95% confidence interval: 0.06, 0.33), with c2/d Z 0.974 and logML Z 94.72. 1.0 0.9

RTOG 0126 Probit, fit to RTOG 0126 Probit, QUANTEC

Probability of complications

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

55

60

65

70

75

Deff, Gy

Fig. 2. Validation of QUANTEC LKB NTCP model. Observed grade 2 late GI toxicities agree with predictions based on QUANTEC (17) recommended LKB parameters (dotted line). Dose-response curve obtained with parameters fitted to RTOG 0126 data is shown for comparison (solid line). Vertical error bars are 68% binomial confidence intervals; error bars on effective dose values are standard deviations. GI Z gastrointestinal; LKB Z Lyman-Kutcher-Burman; NTCP Z normal tissue complication probabilities; QUANTEC Z quantitative analysis of normal tissue effects in the clinic; RTOG Z Radiation Therapy Oncology Group.

Figure 2 shows late rectal toxicities in the sample, the doseresponse curve obtained as fit to the trial data, and the QUANTEC-recommended dose-response curve. Parameter values obtained for RTOG 0126 data are in excellent agreement with QUANTEC values of TD50 Z 76.9 Gy and m Z 0.13. Calculating the QUANTEC dose-response by using the sample yields c2/d Z 0.87 and logML Z 95.29, virtually equivalent to directly fitting the data, so the QUANTEC-recommended model parameters were deemed suitable for this study. After generating the pDVH0126,top10% model from the identified 90th percentile plans, using the pDVH0126,top10% model to pick out the top 10% in the overall cohort resulted in 20 of 22 plans in common. We thus conclude that the impact of the UCSD model used to initially filter the RTOG 0126 cohort had little to no impact on the final results. Using the pDVH0126,top10% model and the QUANTEC LKB model for grade 2 rectal complications, Figure 3a shows each patient’s clinical complication probability NTCPðDVHrect Þ relative to the model-predicted NTCPðpDVH0126;top10% Þ. The bands of excess risk show the large population of patients that reside in the positive excess risk region. A total of 94 of 219 patients (42.9%) received 5% excess risk, 20 of 219 (9.1%) received 10% excess risk, and 2 of 219 (0.9%) received 15% excess risk. Data below 0% excess risk represent patients whose plans improved the NTCPðpDVH0126;top10% Þ prediction by reducing high doses to rectum. Figure 3b shows the same data as a histogram, with the entire cohort exhibiting a mean excess risk of 4:7%  3:9%. Grouping the patients by quartiles of predicted risk, Figure 3c shows that patients most at risk for rectal complications NTCPðpDVH0126;top10% Þ < 18:0% showed a statistically significant difference (P5.5% threshold) yielded equal

Volume 92  Number 2  2015

233

Clinical Grade 2+ Rectal NTCP

–5

%

0%

% +5

+1

+1

5%

0%

40%

+2

a

0%

Quantifying risks of suboptimal planning

30%

20%

10%

0% 0%

10%

20%

30%

40%

Predicted Grade 2+ Rectal NTCP

b

c

35

p

Quantifying Unnecessary Normal Tissue Complication Risks due to Suboptimal Planning: A Secondary Study of RTOG 0126.

The purpose of this study was to quantify the frequency and clinical severity of quality deficiencies in intensity modulated radiation therapy (IMRT) ...
602KB Sizes 0 Downloads 6 Views