IJC International Journal of Cancer

Accuracy of ultrasound for predicting pathologic response during neoadjuvant therapy for breast cancer Michael Luke Marinovich1, Nehmat Houssami1, Petra Macaskill1, Gunter von Minckwitz2, Jens-Uwe Blohmer3 and Les Irwig1 1

Screening and Test Evaluation Program (STEP), Sydney School of Public Health, The University of Sydney, Sydney, NSW, Australia German Breast Group and Universit€ ats-Frauenklinik Frankfurt, Neu-Isenburg, Germany 3 Gynecology and Breast Center, Charite, Berlin, Germany 2

Cancer Therapy

Early assessment of response to neoadjuvant chemotherapy (NAC) for breast cancer allows therapy to be tailored; however, optimal response assessment methods have not been established. We estimated the accuracy of ultrasound (US) to predict pathologic complete response (pCR) using common response criteria and pCR definitions, and estimated incremental accuracy over known prognostic variables. Participants undergoing US after two cycles in the GeparTrio trial randomised to no change in NAC were eligible. US response by World Health Organisation (WHO) criteria (1D or 2D) and Response Evaluation Criteria In Solid Tumours (RECIST) was assessed. Four pCR definitions were applied. Sensitivity (correct prediction of pCR), specificity (correct prediction of no-pCR) and diagnostic odds ratios (DORs) were calculated. Areas under the curve (AUCs) were derived from logistic regression including patient variables with and without US. In 832 patients, DORs decreased as pCR definitions became less stringent (p 5 0.01). For WHO-2D, DORs were as follows: 4.07 (ypT0,ypN0), 3.75 (ypT0/is,ypN0), 3.14 (ypT0/ is,ypN1/2) and 2.65 (ypT0/is/1a,ypN1/2). DORs did not differ between US criteria (p 5 0.60). High sensitivity and lower specificity were found for WHO-2D and RECIST; WHO-1D was highly specific with low sensitivity. Sensitivity was highest for WHO-2D predicting ypT0,ypN0 (sensitivity 5 81.7%, specificity 5 47.6% vs. 42.3% and 80.4% for WHO-1D). Adding US to models including patient variables (age, T-stage, histology and subtype) improved AUCs for predicting pCR by 2–3%. In conclusion, US accuracy is highest for predicting ypT0,ypN0, shown to be most prognostic of long-term survival. WHO-2D and RECIST maximise sensitivity; WHO-1D maximises specificity. US modestly improves the prediction of pCR by patient characteristics.

Key words: breast neoplasms, ultrasonography, sensitivity and specificity, neoadjuvant therapy, drug monitoring Abbreviations: AUC: area under the receiver operating characteristics curve; DCIS: ductal carcinoma in situ; DFS: disease-free survival; DOR: diagnostic odds ratio; ER: estrogen receptor; HER2: human epidermal growth factor receptor 2; MRI: magnetic resonance imaging; NAC: neoadjuvant chemotherapy; OS: overall survival; pCR: pathologic complete response; PR: progesterone receptor; RECIST: Response Evaluation Criteria in Solid Tumours; US: ultrasound; WHO: World Health Organisation Additional Supporting Information may be found in the online version of this article. Grant sponsor: National Health and Medical Research Council (NHMRC Australia); Grant number: 633003; Grant sponsors: NHMRC Postgraduate Scholarship, National Breast Cancer Foundation (NBCF Australia) Practitioner Fellowship DOI: 10.1002/ijc.29323 History: Received 1 Aug 2014; Accepted 23 Oct 2014; Online 12 Nov 2014 Correspondence to: Mr. Michael Luke Marinovich, Screening and Test Evaluation Program (STEP), Sydney School of Public Health, A27, Edward Ford Building, The University of Sydney, Sydney, NSW 2006, Australia, Tel.: 161-2-9114-1160, Fax: 161-2-93517420, E-mail: [email protected]

C 2014 UICC Int. J. Cancer: 136, 2730–2737 (2015) V

Neoadjuvant chemotherapy (NAC) has a well-established role in the management of breast cancer, with randomised controlled trials showing similar long-term disease-free survival (DFS) and overall survival (OS), and increased rates of breast conserving surgery compared with upfront surgery and adjuvant therapy.1 A key advantage of NAC is the opportunity to assess response early during treatment as a predictor of pathologic complete response (pCR) at the end of therapy2 (a surrogate marker for prolonged DFS and OS), and for treatment modification to increase DFS and OS in hormone receptor positive breast cancers.3 Randomised controlled trials that assessed early response by clinical examination and ultrasound (US) have indicated benefits of therapy modification in responders and non-responders. Increased rates of pCR, breast conservation and improved OS were found in the Aberdeen trial for responders randomised to taxanes after four cycles of anthracycline-based NAC relative to those continuing treatment.4,5 The GeparTrio trial randomised early responders to standard or extended anthracycline/taxanebased NAC and found longer DFS and a trend towards longer OS when therapy was extended.6 Improved DFS and treatment tolerability was found in early non-responders after a switch to vinorelbine and capecitabine, with similar rates of pCR and breast conservation as continued anthracycline/taxane-based NAC.7,8

2731

Marinovich et al.

What’s new? Ultrasound is commonly used to assess “early” responses to neoadjuvant chemotherapy, which is given before primary treatment of breast cancer to shrink the tumor. However, its remains unclear whether ultrasound can accurately predict the absence of all residual invasive cancer in the breast tissue (pathological complete response). The authors performed an extensive study using Response Evaluation Criteria in Solid Tumors and World Health Organization criteria for response. Accuracy was higher than previously reported, indicating that the role of ultrasound is currently underestimated in response assessment guidelines.

Methods This is a retrospective study of patients prospectively enrolled between August 2002 and April 2005 in the GeparTrio trial, which investigated treatment modification based on clinical and sonographic response after two cycles of docetaxel, doxorubicin and cyclophosphamide. Ethical clearance for this 2014 analysis was obtained from the University of Sydney Health Research Ethics Committee. Eligibility criteria and informed consent have been reported previously.6,7 Patients with operable (cT2-3,cN0-2a,M0) or locally advanced (cT4 a-d,cN0-2b,M0) disease were enrolled if they had histologic confirmation of primary breast cancer, no previous treatment and at least one of the following risk factors: age < 36 years; clinical tumour size > 5 cm; estrogen receptor (ER) and progesterone receptor (PR) negative; clinically involved axillary lymph nodes and/or undifferentiated tumour grade. Patients randomised to GeparTrio control arms were eligible for this analysis. The protocol for those patients was a continuation of planned therapy (an additional four cycles of docetaxel, doxorubicin and cyclophosphamide), regardless of response after two cycles (i.e., therapy was not tailored based on early response). In addition, patients were included if the primary tumour was measurable by US in two dimensions at C 2014 UICC Int. J. Cancer: 136, 2730–2737 (2015) V

baseline and after two cycles, and complete pathologic assessment of the primary tumour and axillary lymph nodes was undertaken after therapy. Ultrasound imaging protocol

Before the first NAC cycle, patients underwent systematic sonographic examination of both breasts and axillae with at least 7.5 MHz breast US systems with electronic linear US probe, performed by gynaecologists or radiologists. If possible, the tumour was measured in one to three diameters, and measurements were registered in study case report forms. Because of the definition of inflammatory breast cancer and the size of the US device (3.8–5.0 cm), measurements were not possible in every patient. US was repeated after the first two cycles and before surgery. Ultrasound response criteria and definitions of pathologic response

US response after two cycles was assessed by World Health Organisation criteria14 applied to the primary tumour in two dimensions (WHO-2D: 50% reduction of the product of the longest and perpendicular diameters after two cycles compared with baseline). Additional analyses also applied RECIST 1.1 criteria9 (30% reduction of the longest diameter of the primary tumour), and WHO criteria applied to one dimension only (WHO-1D: 50% reduction of the longest diameter). All criteria were applied to measurements of the primary tumour only; nodal metastases were not considered. pCR after NAC was defined as no evidence of residual tumour [invasive or ductal carcinoma in situ (DCIS)] in the breast and nodes (ypT0,ypN0), consistent with the definition found to be most prognostic for long-term OS and DFS.3 Common alternative pCR definitions were also applied, including: no invasive cancer in the breast and nodes but DCIS may be present (ypT0/is,ypN0); no invasive cancer in the breast but DCIS may be present, irrespective of nodal status (ypT0/is,ypN1/2) and “near-pCR,” defined as the presence of limited invasive cancer (0.5 cm), irrespective of nodal status (ypT0/is/1a,ypN1/2). Baseline patient variables

Baseline patient characteristics included: age; clinical T-stage; clinical N-stage; histology; grade; ER and PR status; human epidermal growth factor receptor 2 (HER2) status and

Cancer Therapy

The Response Evaluation Criteria In Solid Tumours (RECIST) guideline states that US is unsuitable for monitoring tumour size owing to high operator dependence and low reproducibility, instead recommending magnetic resonance imaging (MRI) for assessing response to NAC.9 Studies investigating the accuracy of testing mid-treatment have therefore focussed on MRI.10 However, it is not clear whether the potential benefits of US early response assessment found in randomised trials are applicable to MRI,11 and despite the RECIST recommendations, US is commonly used in clinical practice.12 Few studies have investigated the accuracy of US for predicting pCR, and the optimal method for response assessment has not been established.13 US has potential advantages over MRI in being less costly, less complex and more accessible. Therefore, our study seeks to estimate the accuracy of US early during NAC to predict pCR, using standard criteria for US response,9,14 and commonly applied pCR definitions of varying prognostic significance.3 The additional contribution of US over baseline patient characteristics in predicting pCR will also be investigated.

2732

menopausal status. For this analysis, grade, ER, PR and HER2 status were used to define tumour subtypes according to von Minckwitz et al.3

Cancer Therapy

Statistical analyses

Differences in characteristics between patients included in and excluded from the analysis were tested using the t-statistic for age as a continuous variable, and v2 statistics for categorical variables. Sensitivity (correct prediction of pCR) and specificity (correct prediction of no-pCR) were calculated for each US response criterion and pCR definition; 95% confidence intervals were calculated using the binomial distribution. Diagnostic odds ratios (DORs) were calculated as an overall measure of US accuracy: DOR [sensitivity/(1 2 sensitivity)]/[(1 2 specificity)/specificity]. The DOR is the ratio of the odds of US identifying mid-NAC response when pCR is truly present relative to the odds of US identifying response when pCR is not achieved. A DOR of 1 means that the test does not discriminate between patients with and without pCR; higher values indicate better test performance.15 Logistic regression using generalised estimating equations (PROC GENMOD in SAS) to allow for correlations in the data was used to test for a trend in DORs across definitions of pCR (based on evidence of differing accuracy as pCR is less stringently defined16), and for differences in DORs between US response criteria. Multiple logistic regression (PROC LOGISTIC in SAS) was used to model the probability of pCR based on baseline patient variables only, and the US result in addition to patient variables. In the primary analysis, casewise deletion was applied to observations with missing data. Variables were selected using stepwise backwards elimination, based on the likelihood ratio test (v2) statistic computed as the difference between the 22log likelihood for successive models, with degrees of freedom equal to the number of additional parameters fitted. The criterion for variable inclusion was p < 0.05 for main effects and p < 0.01 for interactions. Univariate logistic regression of the association between pCR and the US result was also undertaken. Model fit was summarised by the area under the receiver operating characteristics curve (AUC); the incremental contribution of US over baseline characteristics in predicting pCR was assessed by the difference between the AUC for models with and without US. As a sensitivity analysis, multiple imputation (PROC MI in SAS) using the Markov chain Monte Carlo method was used to impute missing values for model covariates. All patient characteristics, US response and pCR variables were included in the imputation model, and the number of imputations was set to 40. AUCs from multiply imputed data sets were averaged,17 and differences in combined AUCs were derived as per the primary analysis.

Results Baseline patient characteristics

From 1,025 patients in the GeparTrio control groups, 193 patients were excluded owing to missing US tumour meas-

Ultrasound for predicting pathologic response

urements at baseline (N 5 43), after two cycles (N 5 104), or both (N 5 22); or insufficient pathologic data to determine pCR by all definitions (N 5 24) (see Supporting Information Appendix 1). Table 1 describes characteristics of the 832 included patients compared with exclusions. There were no statistically significant differences in baseline characteristics between included and excluded patients, with the exception of a higher proportion of T3 tumours in the excluded group (p < 0.001). The mean age of included patients was 49.9 years (range: 23–80). The majority of patients had T2 tumours (82.5%), and invasive ductal carcinoma was the predominant histologic type (77.8%). Clinical and pathologic response

There were no statistically significant differences between included and excluded patients in the rate of pCR, regardless of definition [there was only weak evidence of a higher rate of ypTis,ypN1/2 among inclusions (p 5 0.07)] (Table 1). As might be expected, pCR rates were higher for less stringent definitions, ranging from 16.5% for ypT0, ypN0 to 34.1% for “near-pCR.” Rates of response assessed by US were similar for WHO-2D (57.2%) and RECIST (53.9%), but were lower for WHO-1D (23.3%). Univariate analyses of US for predicting pCR

Sensitivities, specificities and DORs for all combinations of US response criteria and pCR definitions are presented in Table 2 (see also Supporting Information Appendix 2 for 2 3 2 tables). Across all US criteria, DORs decreased significantly as the pCR definition became less stringent (p 5 0.01, trend). The highest DORs were observed for the prediction of ypT0,ypN0. Within pCR definitions, sensitivity and specificity were comparable for WHO-2D and RECIST, with moderate-to-high US sensitivity paired with lower specificity. By contrast, WHO-1D displayed low sensitivity with high specificity. There was no evidence of a difference in DORs for US response criteria within pCR definitions (p 5 0.60). Multivariate analyses of US for predicting pCR

One or more baseline patient characteristics were unavailable for 229 patients, leaving 603 patients with complete covariate data for inclusion in multivariate modelling. Table 3 presents AUCs from univariate models of US for those patients, as well as multivariate models of baseline patient characteristics only, and US in addition to patient characteristics; p-values for included variables are also presented. Nodal status, menopausal status and tumour grade were not significant in any of the models, and were excluded from further consideration. Similarly, interactions between the US result and patient characteristics were not statistically significant. The US result and tumour subtype were highly significant in all models which included those variables (p < 0.001). Beta coefficients and intercepts for C 2014 UICC Int. J. Cancer: 136, 2730–2737 (2015) V

2733

Marinovich et al.

Table 1. Characteristics of included and excluded patients

Levels

N (%)

N (%)

p

Age

(Continuous)

Mean (95% CI) 5 49.93 (49.22–50.64)

Mean (95% CI) 5 49.61 (48.12–51.08)

0.70

T stage

T21

606 (82.5)

88 (60.3)

Accuracy of ultrasound for predicting pathologic response during neoadjuvant therapy for breast cancer.

Early assessment of response to neoadjuvant chemotherapy (NAC) for breast cancer allows therapy to be tailored; however, optimal response assessment m...
168KB Sizes 0 Downloads 6 Views