Annals of Oncology Advance Access published May 14, 2014

1

Comprehensive gene expression meta-analysis of head and neck squamous cell carcinoma microarray data defines a robust survival predictor

Key Message: "Meta‐analysis of head and neck squamous cell carcinoma microarray data  enabled to identify and validate a prognostic 172‐gene model, unrelated to a HPV signature,  able to improve assessment of patient’s risk of relapse compared to existing clinical and  molecular parameters.  In order to transpose our model into a useful clinical grade assay,  additional work is needed." 

© The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: [email protected].

Downloaded from http://annonc.oxfordjournals.org/ at University of Windsor on May 30, 2014

L. De Cecco1, P. Bossi2, L. Locati2, S. Canevari1,3,* and L. Licitra2 1 Functional Genomics and Bioinformatics, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan Italy 2 Head and Neck Medical Oncology Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy 3 Molecular Therapies, Dept. of Experimental Oncology and Molecular Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan Italy * Correspondence to: Dr, PhD, Silvana Canevari, Department of Experimental Oncology and Molecular Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, via Amadeo 42, Milan 20133, Italy, phone: 39-02-23902567; e-mail: [email protected]

2

Abstract Background: Head and neck squamous cell carcinoma refers to a heterogeneous disease frequently aggressive in its biologic behavior. Despite the improvements in the therapeutic modalities, the long-term survival rate remained unchanged over the past decade and patients with this type of cancer are at a high risk of developing recurrence. For this reason there is a great need to find better ways to foresee outcome, to improve treatment choices, and to enable a more personalized approach. Patients and methods: Nine microarray gene expression datasets, reporting survival data of a same version of microarray chips, were selected and merged following a meta-analysis approach to build a training set. The remaining 6 studies were used as independent validation sets. Results: The training set led us to identify a 172-gene signature able to stratify patients in low or high risk of relapse (log-rank, P=2.44e-05; HR=2.44, 95% CI: 1.58-3.76). The model based on the 172 genes was validated on the 6 independent datasets. The performance of the model was challenged against other proposed prognostic signatures (RSI, 13-gene OSCC signature, Hypoxia metagene, 42-gene high risk signature) and was compared to a HPV signature: our model resulted independent and even better in prediction. Conclusions: We have identified and validated a prognostic model based on the expression of 172 genes, independent from HPV status and able to improve assessment of patient’s risk of relapse compared to other molecular signatures. In order to transpose our model into a useful clinical grade assay, additional work is needed following the framework established by the Institute of Medicine and REMARK guidelines.

Key words: HNSCC, gene expression, microarray, meta-analysis, survival prediction

Downloaded from http://annonc.oxfordjournals.org/ at University of Windsor on May 30, 2014

total of 841 samples, were retrieved from publicly repositories. Three datasets, profiled on the

3

Introduction The long-term survival rate for head and neck squamous cell carcinoma (HNSCC) patients, with the only exception of HPV positive oropharyngeal cancer, has remained unchanged over the past decade [R1]. In early stage disease both surgery and radiation-therapy (RT) are able to cure as single modality. In locally advanced disease (stage III and IV) the upfront treatment options are surgery followed by RT+/- chemotherapy (CT), or CT/RT followed by salvage surgery, if feasible [R2]. In this setting, despite the improvements in surgery, RT and CT, 40-50% of patients still recur [R1]. Thus, there is an urgent need to find better ways to foresee outcome, improve outcome. Since it is conceivable that specific biology might be implicated in dictating patient’s outcome, the “omic” technologies have gained an ever-increasing interest in the last decade. For major cancer types as breast and lung tumors, analyses of gene expression and epigenetic mechanisms contributed to improve diagnosis, prognosis, and prediction of treatment response, and the extraordinary advances in DNA sequencing technologies has also increased the knowledge of cancer associated mutation/genetic aberrations [R3]. However, at present in the HNSCC area only a limited number of studies, each analyzing a limited number of cases, often non-homogenously treated, has been published [R4]. Gene-expression profiling has provided huge amount of data potentially useful but concerns regarding model over-fitting still remain. In fact, high throughput technologies allow to test a very large number of genes but to contain the costs often the studies include a limited number of samples. These limitations could be overcome through the computational integration (metaanalysis) of different microarray datasets addressing similar clinical/biological questions [R5]. Meta-analysis, when compared to the independent analysis of each dataset, may lead to the identification of more robust survival predictors, as already reported for tumors other than HNSCC [R6, R7, R8]. Here we applied a meta-analysis approach on three publicly available HNSCC datasets to identify a signature able to stratify patients according to relapse free-survival (RFS). We validated our prognostic model in six independent datasets and, in addition, we assessed its performance in comparison to other proposed prognostic signatures and to the available clinical parameters.

Downloaded from http://annonc.oxfordjournals.org/ at University of Windsor on May 30, 2014

treatment choices, and enable a more personalized approach that may globally improve treatment

4

Materials and Methods Data processing We systematically searched for gene expression datasets on HNSCC publicly available and reporting full clinical annotations. At June 2013, nine studies listening 841 samples were retrieved from different sources. Three datasets [9, 10, 11] profiled on the same array platform (Affymetrix U133 plus 2.0), containing 20,827 unique gene symbols, were selected in order to build a uniform training set. By applying analytical methods for data normalization and batch effect correction [12] abovementioned datasets, we obtained a single merged training dataset (thereafter named MetaH&N dataset). For the independent validation, 6 studies were retrieved from literature: five datasets [13, 14, 15, 16, 17] are gene expression microarray-based studies, the sixth was a next-generation sequencing (NGS) study made available along with the clinical annotations at June 2013 from TCGA website [18]. Information about the datasets entering our study, including platform, number of samples and clinical endpoint is summarized in Table S1 (Supplementary methods).

Statistical Methods A gene expression signature able to stratify patients in two RFS risk classes was identified in the Meta-H&N dataset through a semi-supervised survival method involving principal component method [19] by using R package superpc (http://www-stat.stanford.edu/~tibs/superpc) and a prognostic index was developed as detailed in Supplementary methods SM2. The algorithm to compute the prognostic index and the list of genes entering into the model are reported in Table S2 (Supplementary material). Statistical analysis was performed using R [R20], version 2.15, BioConductor [R21], release 2.10, and BrB-ArrayTool developed by Dr. Richard Simon and the BRB-ArrayTools Development Team (v4.2.0, http://linus.nci.nih.gov/BRB-ArrayTools.html). The evaluation of prediction error and model performance was performed using peperr [22], SurvJamda [23], and survcomp [24] R packages, as detailed in Supplementary methods SM3.

Downloaded from http://annonc.oxfordjournals.org/ at University of Windsor on May 30, 2014

(Supplementary methods, SM1) on a collection of 195 gene expression arrays belonging to the 3

5 External independent validation was performed on six datasets as detailed in Supplementary methods SM4. Four signatures previously reported in literature as associated to patient outcome were tested [13, 14, 25, 26]. In addition, Slebo’s HPV signature [27] was applied to Meta-H&N dataset to predict HPV status. A description of the external signatures used in the present study can be found in Supplementary methods SM5.

Results

The primary end-point of this study was the identification of a signature prognostic of RFS. Applying a semi-supervised method of risk prediction we identified a gene signature, consisting of 172 genes (Fig 1A), correlated to RFS and we defined a threshold to split the Meta-H&N dataset in 93 and 102 cases having a high- or a low-risk of relapse, respectively. The predicted high-risk group had significantly worse RFS than the predicted low-risk group (log-rank, P=2.44e-05); Fig 1B shows the Kaplan-Meier analysis for the cross-validated risk groups. The genes entering into our model along with their annotation on Affymetrix chips, the GO terms, and the corresponding weights to calculate the prognostic index are reported in Table S2 ( Supplementary material). According to the assessment of prediction error (Fig S1A) and prediction performance (Fig S1B) as described in Supplementary results 1, the 172-gene model showed a reasonable fit and a HR=4.97 (CI=4.80-5.15; P-value0.6, in all but one dataset (GSE39369), where a borderline trend was observed (concordance index=0.574; p=0.0552). Repeating the analysis using D index as performance criteria leaded to similar conclusions (Supplementary results 5 and Fig S3). The performance of RSI and 13-gene OSCC signature was also assessed since these two competitive models resulted associated to RFS in univariate analysis on the Meta-H&N dataset (see Table S5). The concordance indices were calculated for both models in each dataset with the exclusion of the 13-gene signature in the GSE686 dataset since too few genes are present. Fig 3B RSI provided a significant estimate in Meta-H&N, GSE31056 and GSE41613 datasets along with a trend in GSE2837, while significant performances were found for 13-gene signature in all datasets, with the exception of GSE39369. Table S6 (Supplementary results 6) reports the concordance indexes with their confidence limits and corresponding P-values for all the single datasets and an overall meta-estimate taking into account all datasets without the training set for the 172- and 13-gene OSCC signatures. In the attempt to identify the best risk model, the concordance indices were compared. We took into account a comparison based on 4 datasets (GSE2837, GSE31056, GSE39368, and TCGA). The results (Table S7 in Supplementary results 6) suggest that the 172-gene model and 13-gene OSCC signature outperform the RSI.

Association of the 172-gene model to clinical parameters Uni- and multi-variate Cox regression analyses were used to test the effect of the 172-gene model (high versus low risk) on RFS taking into account the following demographic and clinical parameters: i) age at diagnosis (≥70 versus

Comprehensive gene expression meta-analysis of head and neck squamous cell carcinoma microarray data defines a robust survival predictor.

Head and neck squamous cell carcinoma refers to a heterogeneous disease frequently aggressive in its biologic behavior. Despite the improvements in th...
327KB Sizes 2 Downloads 3 Views