Page 1 of 34
Accepted Preprint first posted on 13 May 2014 as Manuscript ERC-14-0111
1
Title page
2
Biology of breast cancer during pregnancy using genomic profiling
3
Hatem A. Azim Jr. 1, 2*, Sylvain Brohée 2, Fedro A. Peccatori 3, Christine Desmedt 2, Sherene
4
Loi 4, 5, Diether Lambrechts 6, 7, Patrizia Dell’Orto 8, Samira Majjaj 2, Vinu Jose 2, Nicole
5
Rotmensz 9, Michail Ignatiadis 2, 10, Giancarlo Pruneri 8, Martine Piccart 10, Giuseppe Viale 8,
6
Christos Sotiriou 2, 10
7 8
1. Department of Medicine, BrEAST Data Centre, Institut Jules Bordet, Université Libre de
9
Bruxelles (ULB), Brussels, Belgium
10
2. Breast Cancer Translational Research Laboratory (BCTL) J. C. Heuson, Institut Jules
11
Bordet, Université Libre de Bruxelles (ULB), Brussels, Belgium
12
3. Department of Gynecologic Oncology, Fertility and Procreation Unit, European Institute of
13
Oncology, Milan, Italy
14
4. Translational Breast Cancer Genomic Lab, Cancer Therapeutics Program, Peter
15
MacCallum Cancer Centre, East Melbourne, VIC Australia
16
5. Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC
17
Australia
18
6. Vesalius Research Centre, VIB, Leuven, Belgium
19
7. Laboratory of Translational Genetics, Department of Oncology, University of Leuven,
20
Belgium
21
8. Department of Pathology, European Institute of Oncology, Milan, Italy
22
9. Division of Epidemiology and Biostatistics, European Institute of Oncology, Milan, Italy
23
10. Department of Medicine, Medical Oncology Clinic, Institut Jules Bordet, Université
24
Libre de Bruxelles (ULB), Brussels, Belgium
25
1 Copyright © 2014 by the Society for Endocrinology.
Page 2 of 34
26
* Correspondence
27
Hatem A. Azim Jr., MD, PhD
28
Department of Medicine, BrEAST Data Centre
29
Boulevard de Waterloo, 121
30
1000 Brussels
31
Belgium
32
Tel: +32 2 541 3854
33
Fax: +32 2 541 3477
34
Email:
[email protected] 35 36
This work was presented as an oral presentation in the 9th European Breast Cancer
37
Conference (EBCC), March 2014, Glasgow, UK
2
Page 3 of 34
38
Abstract
39
Introduction: Breast cancer during pregnancy is rare and is associated with relatively poor
40
prognosis. No information is available on its biological features at the genomic level.
41
Methods: Using a dataset of 54 pregnant and 113 non-pregnant breast cancer patients, we
42
evaluated the pattern of hot spot somatic mutations and did transcriptomic profiling using
43
Sequenom® and Affymetrix®, respectively. We performed gene set enrichment analysis to
44
evaluate pathways associated with diagnosis during pregnancy. We also evaluated the
45
expression of selected cancer-related genes in pregnant and non-pregnant patients and
46
correlated the results with changes occurring on the normal breast using a pregnant murine
47
model. We finally investigated aberrations associated with disease-free survival (DFS).
48
Results: No significant differences in mutations were observed. 18.6% of pregnant and 23%
49
of non-pregnant patients had a PIK3CA mutation. Around 30% of tumors were basal, with no
50
differences in the distribution of breast cancer molecular subtypes between pregnant and non-
51
pregnant patients. Two pathways were enriched in tumors diagnosed during pregnancy; G-
52
protein coupled receptor pathway and serotonin receptor pathway (FDR90% tumor cellularity) were sent to the Breast Cancer
105
Translational Research Laboratory at Institut Jules Bordet for mutational and transcriptomic
106
profiling. DNA and RNA extraction were performed using the ZYMO Research Pinpoint TM
107
Slide DNA Isolation System and stored at -80 C until use. Quality and quantity were
108
evaluated using the NanoDrop.
109
All patients provided consent to donate biological samples for research purposes per
110
IEO institutional policies. The study was approved by the ethics committee of Institut Jules
111
Bordet in 2010 (Number: 1782).
112
113 114 115
Mutational profiling
The samples were genotyped using the Sequenom MassARRAY ® Assay Design 3.1 with the default parameters. Multiplex PCR was done in 5µL volume containing 5-10ng of
6
Page 7 of 34
116
DNA. Samples were considered to be of sufficient quality when genotyping could be
117
performed for more than 75% of the mutations.
118
We queried the COSMIC database to identify a broad range of genes for hotspot
119
somatic mutation profiling (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/). In
120
total, we examined 84 hotspot somatic mutations on 19 cancer related genes. Details
121
regarding the genes and mutations that were evaluated are provided in Supplement 1. We
122
ultimately covered 94% (n=29) of the reported PIK3CA mutations.
123
124 125
Transcriptomic profiling
All samples were hybridized using Affymetrix® Human Genome U219 array plates
126
following standard Affymetrix protocols. This platform was used based on pilot studies
127
performed in-house showing high reproducibility between transcriptomic information
128
obtained from paired frozen and FFPE tumor samples. Detailed information on the used
129
protocol is provided in Supplement 2.
130
We used two methods to determine breast cancer molecular subtypes; PAM50 (Parker,
131
et al. 2009) and 3-Gene (Haibe-Kains, et al. 2012), as previously defined (Supplement 2). To
132
compare the transcriptomic profiles of patients diagnosed with breast cancer during
133
pregnancy and those not diagnosed during pregnancy, we performed gene set enrichment
134
analysis (GSEA) using the Gene Ontology classes as reference gene set using the
135
Bioconductor package Piano (Varemo, et al. 2013). The genes were sorted based on their
136
differential expression between pregnant and non-pregnant patients using a conditional
137
logistic regression model
138
(http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/Epi/html/clogistic.html). Results were
139
cross-checked with those obtained using the hypergeometric probabilities delivered by the
140
DAVID web tool (Huang da, et al. 2009).
7
Page 8 of 34
141
We then evaluated the expression of selected cancer related genes like BRCA1,
142
BRCA2, PD1 and PDL1 in addition to previously published gene sets representing several
143
important oncological biological processes in pregnant and non-pregnant patients. These gene
144
sets included the following: RAS, SRC, MYC, Wnt/β-catenin (Bild, et al. 2006), PI3K (Loi,
145
et al. 2010), Insulin Growth Factor 1 (IGF1) (Creighton, et al. 2008), AKTmTOR (Majumder,
146
et al. 2004), MAPK (Creighton, et al. 2006), PTEN (Saal, et al. 2008); proliferation gene sets;
147
GGI (Sotiriou, et al. 2006) and GENE70 (van 't Veer, et al. 2002), mammary stem cell and
148
luminal progenitor gene sets (Lim, et al. 2009), stroma gene sets; DCN (Farmer, et al. 2009)
149
and PLAU (Desmedt, et al. 2008), and immune gene sets; STAT1(Desmedt et al. 2008) and
150
IRM (Teschendorff, et al. 2007).
151
Gene sets (or signature score) were computed by taking the sum of the products of the
152
gene coefficient in the module (si) by the corresponding gene expression value (ei) according
153
to the following formula:
154 155 156 157
Supplement 3 summarizes the size of the different gene sets and number of genes that
158
were available in the Affymetrix platform that we used in the current analysis. The raw gene
159
expression data, together with the patients’ characteristics are publically available on GEO
160
http://www.ncbi.nlm.nih.gov/geo/, with accession number GSE53031
161
162
In-Silico analysis on a publically available gene expression data derived from normal breast
163
of a pregnant mouse model
164 165
In order to evaluate the potential role of pregnancy on the normal breast tissue, we investigated the expression of genes or gene sets that were found to be differentially
8
Page 9 of 34
166
expressed between the pregnant and non-pregnant groups on a publicly available murine data
167
set derived from the normal breast of pregnant mice [GEO ID: GSE8191] (Anderson, et al.
168
2007). In the murine model, the breast tissue samples were obtained on days 1, 3, 7, 12, and
169
17 of pregnancy and also on day 19 (post-partum). Four replicates were available for each
170
time point. Similar to the procedure that was applied in the human data set, the data were re-
171
normalized using the robust multiarray analysis (RMA) normalization and the expression
172
values of each set of probe sets corresponding to one Ensembl identifier were averaged
173
(Bolstad, et al. 2003).
174
175
Survival analysis
We performed updated DFS comparison between pregnant and non-pregnant breast
176 177
cancer patients using a multivariate Cox regression model adjusted for age, tumor size, nodal
178
status, histological grade, ER, HER2 and type of systemic therapy.
179
We performed an exploratory analysis to evaluate, genes/gene sets or somatic
180
mutations that were associated with DFS. We used the median expression value in the whole
181
population (i.e. pregnant + non-pregnant) to define the cut-off of high and low gene or gene
182
set expression. Analyses were performed with the SAS software version 9.2 (SAS Institute,
183
Cary, NC) and R software Version 2.12.2 (available at www.r-project.org). All tests were two-
184
sided.
185 186
Results
187
Study population
188
The original data set included 195 breast cancer patients; 65 pregnant and 130 non-
189
pregnant. For the current genomic analysis, we opted to exclude patients who received
190
neoadjuvant therapy to avoid potential impact on the obtained results (n=21). Seven more
9
Page 10 of 34
191
patients were excluded due to poor RNA quality resulting in a total of 167 patients (85% of
192
original data set); 54 pregnant and 113 non-pregnant who were included in the current
193
analysis (Supplement 4). Table 1 summarizes patient characteristics in comparison to the
194
original data set. No major differences were observed.
195
196 197
Mutational analysis
No significant differences in the pattern of hot spot somatic mutations evaluated in
198
this study were observed between pregnant and non-pregnant patients (Table 2). Considering
199
all eligible patients, 36 (21.6%) and 9 (5.4%) had a mutation in PIK3CA and TP53,
200
respectively.
201
202 203
Molecular subtyping
We found high correlation between PAM50 and the 3-Gene classifiers in defining
204
breast cancer molecular subtypes (r=0.857, p0.7, p