Cancer Causes Control (2014) 25:759–769 DOI 10.1007/s10552-014-0379-1

ORIGINAL PAPER

Colorectal cancer risk and patients’ survival: influence of polymorphisms in genes somatically mutated in colorectal tumors Stefanie Huhn • Melanie Bevier • Barbara Pardini • Alessio Naccarati • Ludmila Vodickova • Jan Novotny • Pavel Vodicka • Kari Hemminki • Asta Fo¨rsti

Received: 15 January 2014 / Accepted: 27 March 2014 / Published online: 5 April 2014 Ó Springer International Publishing Switzerland 2014

Abstract Purpose The first two studies aiming for the highthroughput identification of the somatic mutation spectrum of colorectal cancer (CRC) tumors were published in 2006 and 2007. Using exome sequencing, they described 69 and 140 candidate cancer genes (CAN genes), respectively. We hypothesized that germline variants in these genes may influence CRC risk, similar to APC, which is causing CRC through germline and somatic mutations. Methods After excluding the well-established CRC genes APC, KRAS, TP53, and ABCA1, we analyzed 35 potentially Electronic supplementary material The online version of this article (doi:10.1007/s10552-014-0379-1) contains supplementary material, which is available to authorized users. S. Huhn (&)  M. Bevier  K. Hemminki  A. Fo¨rsti Department of Molecular Genetic Epidemiology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany e-mail: [email protected] B. Pardini  A. Naccarati Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, 14200 Prague, Czech Republic B. Pardini  A. Naccarati  L. Vodickova  P. Vodicka Human Genetics Foundation (HuGeF), 10126 Turin, Italy L. Vodickova  P. Vodicka 1st Faculty of Medicine, Institute of Biology and Medical Genetics, Charles University, 12000 Prague, Czech Republic J. Novotny Department of Oncology, General Teaching Hospital, 12808 Prague, Czech Republic K. Hemminki  A. Fo¨rsti Center of Primary Health Care Research, Clinical Research Center, Lund University, 20502 Malmo¨, Sweden

functional single-nucleotide polymorphisms (SNPs) in 10 CAN genes (OBSCN, MLL3, PKHD1, SYNE1, ERCC6, FBXW7, EPHB6/TRPV6, ELAC1/SMAD4, EPHA3, and ADAMTSL3) using KBiosciences Competitive Allele-Specific PCRTM genotyping assays. In addition to CRC risk (1,399 CRC cases, 838 controls), we also considered the influence of the SNPs on patients’ survival (406 cases). Results In spite of the fact that our in silico analyses suggested functional relevance for the studied genes and SNPs, our data did not support a strong influence of the studied germline variants on CRC risk and survival. The strongest association with CRC risk and survival was found for MLL3 (rs6464211, OR 1.50, p = 0.002, dominant model; HR 2.12, p = 0.020, recessive model). Two SNPs in EPHB6/TRPV6 (dominant model) showed marginal associations with survival (rs4987622 HR 0.58 p = 0.028 and rs6947538 HR 0.64, p = 0.036, respectively). Conclusion Although somatic mutations in the CAN genes have been related to the development and progression of various types of cancers in several next-generation sequencing or expression analyses, our study suggests that the studied potentially functional germline variants are not likely to affect CRC risk or survival. Keywords Colorectal cancer  Risk  Survival  SNP  CAN genes

Introduction Cancer research has made great progress through the technological achievements in molecular genetics. Next-generation sequencing (NGS) and genome-wide association studies (GWAS) have offered new insights into the origin, development, and progression of cancer. Identification of

123

760

new genetically defined tumor subtypes, driver mutations, and molecular genetic markers may point out new options for risk assessment and therapeutic targets and help clinicians to take informed treatment decisions, making cancer therapy more effective and less harmful. Although colorectal cancer (CRC) is among the most common human cancers and its ‘‘adenoma to carcinoma sequence’’ is well defined [14, 50], little is known about predictive and prognostic markers that could guide treatment decisions with reasonable accuracy. Up to date, only very few genetic markers (KRAS, BRAF, microsatellite instability) are included in the European treatment guidelines for CRC [12]. In addition to the prognostic markers, large efforts have been taken to identify genes that predispose to CRC and may allow risk estimations for this cancer [48]. However, many studies—both candidate gene and GWAS approaches—have shown contradictory results and together still have failed to explain much of the genetic susceptibility to CRC [11, 20, 48], not least because of the possible influence of environmental risk factors [21, 22]. If it were possible to predict individual’s CRC risk with adequate accuracy, individuals with additional personal risk factors, such as obesity, diabetes, or inflammatory bowel disease, could benefit from early detection screenings or lifestyle interventions [4, 21]. It is known that germline mutations in a number of tumor suppressor genes, including APC, TP53, and VHL, cause various cancers, but these genes are also somatically mutated in sporadic tumors [18]. As NGS data have been generated from many tumor types over the past years, we posit that in analogy to the above tumor suppressors, there may also be cancer-related germline variants in the genes commonly mutated in sporadic tumors. The first high-throughput exome sequencing study on CRC tumors was published in 2006 and described 69 candidate cancer genes (CAN genes) out of a list of 519 somatic mutations that were found in tumor samples [39]. In 2007, a second, similar study described a longer list of 140 CAN genes, confirming most of the previous findings [55]. Despite the high mutation load of the tumors, the individual driver mutations were rare somatic mutations in some relevant genes. We hypothesized that germline mutations or polymorphisms in these genes may have functional or regulatory effects on CRC susceptibility or prognosis. Thus, we investigated whether polymorphisms in the CAN genes may be modifiers of CRC risk or prognosis.

Materials and methods Study population The study was carried out on a CRC case–control population from the Czech Republic (Supplementary Material,

123

Cancer Causes Control (2014) 25:759–769

Table S1). All cases and controls were of Czech ancestry. Between 09/2004 and 10/2010, 1,399 CRC cases were recruited by nine oncological departments in the Czech Republic (58.5 % males) [30]. All patients were diagnosed with colon or rectal malignancy by colonoscopy (67.1 % colon). The medical evidence was confirmed histologically as CRC. Patients who met the Amsterdam criteria I or II for hereditary nonpolyposis colorectal cancer (HNPCC) [49] were excluded from the study in order to collect a sample set of nonsyndromic CRC. The control population consisted of 838 blood donors that were recruited during the same time period by a blood donor center in one hospital in Prague, Czech Republic (51 % males). The health status of the blood donors was verified during a standard health examination in the course of the blood donation, and the donors were cancer free at the time of sampling [30]. A subgroup of 406 CRC patients diagnosed between 2003 and 2010 was available for a survival analysis with comprehensive clinical data at the time of diagnosis. Beside general information about age, sex, TNM stage classification (size or direct extent of the primary tumor [T], degree of spread to regional lymph nodes [N], presence of metastasis [M]) [41], and grade, information about distant metastasis, relapse, and date of death was available with a follow-up until 31 August 2011. Gene and SNP selection for genotyping For this study, we selected and genotyped 35 singlenucleotide polymorphisms (SNPs) to test their associations with CRC risk or prognosis (Fig. 1 and Supplementary Material Table S2). The SNPs were located in genes that showed somatic mutations in at least four of the 35 CRC tumor samples ([10 % of tumors) analyzed by Sjo¨blom et al. [39] and Wood et al. [55]. The well-known and extensively studied CRC genes, APC, KRAS, TP53, and ABCA1, were not considered as candidates for our study. Ten somatically mutated genes (OBSCN, MLL3, PKHD1, SYNE1, ERCC6, FBXW7, EPHB6/TRPV6, ELAC1/SMAD4, EPHA3, ADAMTSL3) were analyzed for potentially functional nonsynonymous SNPs (nsSNPs), using the search terms ‘‘nonsense,’’ ‘‘missense,’’ ‘‘frameshift,’’ and ‘‘stop gained,’’ listed in NCBI dbSNP 136 [10] and validated by HapMap (Supplementary Material, Table S2) [45, 46]. All these functional SNPs were considered promising candidates for the association study. We analyzed the relevant gene regions for the linkage disequilibrium (LD) in order to prevent parallel analysis of highly linked polymorphisms (r2 C 0.85) (Haploview: V2. R24. Analyse Panel CEU) [2]. This approach further allowed us to select only those SNPs for genotyping that are located in DNA sequences acceptable for the design of the genotyping assays. In case of nonacceptable flanking sequence of

Cancer Causes Control (2014) 25:759–769

761

Fig. 1 Workflow of the stepwise approach of candidate gene and SNP selection and of the downstream analysis of the data. *Exception of the MAF cut off: MLL3 MAF [ 1 %; **the list of functional SNPs and candidate SNPs is shown in Supplementary Table S2

the SNP of interest, we selected a highly linked SNP (r2 C 0.85) located in an acceptable DNA sequence as a tagging SNP. We analyzed the polymorphisms for allele frequency differences (AFDs) among the HapMap populations (YRI

vs. CEU, CHB, and JPT). The AFD was used as an ‘‘easy to access’’ value that may indicate selective processes in genes and SNPs and point to hidden functionality beyond coding/noncoding categories [31]. Additional methods were used to confirm these signatures of selection in all

123

762

genes and SNPs that were found to be associated with CRC risk or patients’ survival (cf. paragraph ‘‘Functional Prediction and Signatures of Selection’’ below). With the exception of MLL3 that mostly harbored SNPs with very low allele frequency (\0.05), only SNPs with a minor allele frequency (MAF) [10 % were considered for genotyping. Genotyping The DNA used in this study was obtained from peripheral blood lymphocytes taken from the study participants at the collaborating hospitals. All DNA extracts underwent whole genome amplification before genotyping (illustra GenomiPhi V2 DNA Amplification Kit GE HealthcareTM). For genotyping, ‘‘KBiosciences Competitive Allele-Specific PCR’’ (KASPTM) system was used. PCR reactions were carried out in a 384 well format using 3 ng of whole genome amplified DNA per reaction in a 4 ll reaction volume. The PCR conditions for the individual assays were set according to the recommendations by KBiosciences. Endpoint genotype detection was carried out on an ABI PRISM 7900-HT Sequence Detection System with SDS 2.2 software (Applied Biosystems). As an internal quality control, 7 % of the samples were randomly selected as duplicates. The concordance rate between the original and the duplicate samples was C99 %. The average call rate was 94.7 % (89.5–97.1 %). Thirty samples were excluded from the study due to bad overall performance [\50 % of all 35 genotypes called (B17 genotypes)]. Statistical analysis The observed genotype frequencies in the controls were tested for Hardy–Weinberg equilibrium (HWE) using Pearson’s goodness-of-fit v2 tests. Deviation from HWE was assumed at p \ 0.01. Odds ratios (ORs) and 95 % confidence intervals (CIs) for associations between genotypes and CRC risk were estimated by logistic regression (PROC LOGISTIC, SAS Version 9.2; SAS Institute, Cary, NC). The estimated effects for all SNPs refer to the minor allele. p values \0.05 were considered statistically significant. To account for the differences in the age and sex distributions between the cases and the controls (mean age 62 vs. 45.6 years p \ 0.0001; 58.5 vs. 51.0 % males p = 0.005), the ORs were adjusted for age and gender. For all SNPs with significant p values per genotype, the best model (dominant or recessive) was calculated. For polymorphisms with MAF [ 10 %, we had [90 % power to detect an OR of 1.5 at a significance level of 0.05, using the dominant model (Quanto V1.2.4) [34]. Differences in the survival between patients carrying different genotypes were estimated by hazard ratios (HRs) and 95 % CIs

123

Cancer Causes Control (2014) 25:759–769

(PROC PHREG, SAS Version 9.2, SAS Institute, Cary, NC) using Cox regression. For all SNPs with significant p values per genotype (p value \0.05), the best model was calculated (dominant or recessive model) and they were analyzed further adjusting the data separately for age, gender, T, N and M status, and grade. Additionally, Kaplan–Meier plots were generated for these SNPs, and the differences between the survival functions among the groups were estimated by the log-rank test (PROC LIFETEST, SAS Version 9.2; SAS Institute, Cary, NC). Functional prediction and signatures of selection Functional polymorphisms may exert an increased likelihood of having a pronounced effect on a disease [47]. Signatures of selection are suggested to indicate functionality beyond coding/noncoding categories [31] and to provide a value for the functional relevance of a SNP. The SIFT [26, 38] and PolyPhen-2 [1, 33] browsers were used to predict functional consequences of the nsSNPs. SNPnexus [40] and HaploReg v2 [19, 52] were used to predict regulatory consequences of the analyzed SNPs and any linked SNP (r2 [ 0.85). The fixation index (FST) and AFD values were used to assess the degree of population differentiation among the HapMap populations, indicating selective processes [8, 56]. FST values [0.25 indicate strong genetic differentiation, while values in the range between 0.05 and 0.1 indicate moderate genetic differentiation [8, 56]. Fay Wu’s H was used to detect signatures of selection via the estimation of local changes in the frequency spectrum of the SNPs within the candidate genes [13]. The integrated haplotype score (iHS) is defined by the local LD in a gene or a gene region, and it was used to detect signatures of selection in the haplotype structure of the candidate genes [35, 51]. Strong negative Fay-Wu’s H values were considered as signatures for a selective sweep, and iHSs\-2 or[2 give evidence for a powerful selection signal [13, 42, 51]. These four estimates together provide sufficient evidence if a locus has been under natural selection and shed light on its functional relevance [7, 31].

Results Case–control study For one SNP out of the 35 tested (rs765525 in PKHD1), a significant deviation from HWE was detected. This SNP was excluded from the analysis. For the majority of tested SNPs, the observed genotype frequencies did not differ significantly between the case and the control groups (Supplementary Material, Table S3). However, an association with CRC risk was detected for three SNPs in a

Cancer Causes Control (2014) 25:759–769

763

Table 1 CRC risk in relation to MLL3 rs10252263, MLL3 rs6464211, and ERCC6 rs4253038 Risk of CRC—adjusted for age and sex Control

Case

OR

95 % CI

p value

C/C C/T

728 16

1,185 41

T/T





2.407

[1.08–5.37]

0.03*







C/T ? T/T

16

41

2.407

[1.08–5.37]

0.03*

MLL3 rs10252263

were in LD in the Czech control population (r2 = 0.68) and supported the results of each other. Kaplan–Meier plots were generated for these three polymorphisms (Fig. 2, from the EPHB6/TRPV6 SNPs, only rs4987622 is shown). All three analyzed polymorphisms showed significant differences in the survival function (log-rank p value B0.05). The associations did not remain significant after adjustment for multiple testing using Bonferroni correction (p 0.05/ 35 = 0.0014). Functional prediction and selective pressure

MLL3 rs6464211 C/C

511

781

C/T

198

408

1.545

[1.18–2.02]

0.001**

T/T

26

34

1.137

[0.59–2.19]

0.70

C/T ? T/T

224

442

1.496

[1.16–1.93]

0.002**

0.736

[0.56–0.97]

0.03*

ERCC6 rs4253038 T/T

219

434

C/T

340

570

C/C

154

212

0.663

[0.47–0.94]

0.02*

C/T ? C/C

494

782

0.713

[0.55–0.93]

0.01*

Associations between genotypes and CRC risk were considered significant at p \ 0.05; level of significance * \0.05; ** \0.005 OR odds ratio, CI confidence interval

dominant model, adjusted for age and sex (Table 1): rs6464211 in MLL3 (OR 1.50, 95 % CI 1.16–1.93, p = 0.002), rs10252263 in MLL3 (OR 2.41, 95 % CI 1.08–5.37, p = 0.03), and rs4253038 in ERCC6 (OR 0.71, 95 % CI 0.55–0.93, p 0.01). The associations did not remain significant after adjustment for multiple testing using a highly conservative Bonferroni correction (p 0.05/ 35 = 0.0014). Survival analysis For three polymorphisms, significant differences in the survival between the carriers of the minor allele and the carriers of the major allele were detected either in the dominant or recessive model (p value \0.05). According to the recessive model, homozygous carriers of the minor allele in rs6464211 (MLL3: HR 2.12, 95 % CI 1.12–4.01, p = 0.02, Table 2) showed decreased survival rates compared to the carriers of the major allele. In contrast, carriers of the minor allele in rs4987622 and rs6947538 in EPHB6/TRPV6 showed increased survival rates compared to the homozygous carriers of the major allele [(HR 0.58, 95 % CI 0.36–0.94, p = 0.03, Table 2) and (HR 0.64, 95 % CI 0.42–0.97, p = 0.04, Supplementary Material, Table S4), respectively]. These two SNPs

The three SNPs associated with CRC risk or patients’ survival are coding synonymous (rs6464211 in MLL3) and intronic (rs4987622 and rs6947538 in EPHB6/TRPV6) polymorphisms, respectively, for which no regulatory consequences were predicted by SNPnexus. However, HaploReg v2 predicted effects on regulatory motives for all three SNPs and several linked variants (r2 [ 0.85). The SNP rs6464211 in MLL3 and four linked SNPs (r2 [ 0.85) may affect 42 regulatory motives. For the two SNPs in EPHB6/TRPV6, the prediction resulted in two not overlapping lists of 40 SNPs highly linked to rs4987622 (r2 [ 0.9) and of 33 SNPs highly linked to rs6947538 (r2 [ 0.8) that may affect 126 and 78 regulatory motives, respectively. Signatures of selective pressure were analyzed as an indicator of functionality such as regulatory function of a locus or genetic hitchhiking due to a highly selected unknown functional variant [3, 7, 13, 31]. For all three SNPs, several signatures of selection were detected (Table 3). We found signatures of strong population differentiation between CEU and YRI indicated by AFD (rs6464211 0.7, rs4987622 0.57, and rs6947538 0.64, respectively), FST values (0.49, 0.40, and 0.47, respectively), and significant changes in the frequency spectrum of genetic variants around the tested SNPs (Fay Wu’s H values: -129.6, -47.24, and -43.78, respectively; Table 3). We did not find significantly elevated |iHS| values.

Discussion The 10 genes tested in the present study were mutated in 10–19 % of the tumor samples analyzed by Sjo¨blom et al. [39] and Wood et al. [55]. Only the already well-studied genes APC, KRAS, and TP53 were mutated more frequently. According to our data, the studied germline variants do not have a strong influence on CRC risk and survival. Among the 35 studied SNPs, we found marginal effects on CRC risk or patients’ survival for three different genes.

123

764

Cancer Causes Control (2014) 25:759–769

Table 2 Overall survival in relation to MLL3 rs6464211 and EPHB6/TRPV6 rs4987622 At risk

Died (%)

HR

95 % CI

p value

C/C

242

134 (55.4)

C/T

135

69 (51.1)

0.90

[0.67–1.20]

T/T

13

10 (76.9)

2.04

[1.07–3.89]

0.030*

2.12

[1.12–4.01]

0.020*

0.483

Log-rank p value

MLL3 rs6464211

Recessive model (CC ? CT vs. TT) Adjusted for age C/C

0.044a* 0.455

242

134 (55.4)

C/T

135

69 (51.1)

0.90

[0.67–1.21]

T/T

13

10 (76.9)

2.11

[1.10–4.02]

0.024*

2.19

[1.15–4.14]

0.016*

0.479

Recessive model (CC ? CT vs. TT)

0.017*

Adjusted for sex C/C

242

134 (55.4)

C/T

135

69 (51.1)

0.90

[0.67–1.20]

T/T

13

10 (76.9)

1.98

[1.04–3.77]

0.038*

2.05

[1.09–3.88]

0.027*

Recessive model (CC ? CT vs. TT) Adjusted for tumor size (T) C/C

228

123 (53.9)

C/T

119

57 (47.9)

0.90

[0.66–1.23]

0.502

T/T

11

8 (72.7)

2.07

[1.01–4.25]

0.048*

2.15

[1.05–4.37]

0.036*

Recessive model (CC ? CT vs. TT) Adjusted for degree lymph node involvement (N) C/C 212

110 (51.9)

C/T

110

55 (50.0)

0.95

[0.68–1.31]

0.736

T/T

9

6 (66.7)

2.23

[0.97–5.13]

0.059

2.27

[1.00–5.19]

0.051

0.357

Recessive model (CC ? CT vs. TT) Adjusted for presence of distant metastasis (M) C/C

232

128 (55.2)

C/T

129

67 (51.9)

0.87

[0.65–1.17]

T/T

13

10 (76.9)

1.79

[0.94–3.43]

0.077

1.89

[1.00–3.58]

0.052

Recessive model (CC ? CT vs. TT) Adjusted for the grade of the cancer cells C/C

206

112 (54.4)

C/T

113

56 (49.6)

0.87

[0.63–1.20]

0.389

T/T

9

6 (66.7)

1.57

[0.69–3.58]

0.283

1.65

[0.73–3.74]

0.229

Recessive model (CC ? CT vs. TT) EPHB6/TRPV6 rs4987622 T/T

321

185 (57.6)

C/T

45

18 (40.0)

C/C

1

0 (0.0)

Dominant model (TT vs. CT ? CC)

0.069a 0.60

[0.37–0.97]

0.038*







0.58

[0.36–0.94]

0.028*

Adjusted for age T/T

321

185 (57.6)

C/T

45

18 (40.0)

C/C

1

0 (0.0)

Dominant model (TT vs. CT ? CC)

123

0.59

[0.37–0.96]

0.034*







0.57

[0.35–0.93]

0.025*

0.025*

Cancer Causes Control (2014) 25:759–769

765

Table 2 continued At risk

Died (%)

HR

T/T

321

185 (57.6)

C/T

45

18 (40.0)

C/C

1

0 (0.0)

95 % CI

p value

Log-rank p value

Adjusted for sex

Dominant model (TT vs. CT ? CC)

0.58

[0.36–0.95]

0.029*







0.56

[0.35–0.91]

0.020*

Adjusted for tumor size (T) T/T

291

162 (55.7)

C/T

43

16 (37.2)

C/C

1

0 (0.0)

Dominant model (TT vs. CT ? CC)

0.60

[0.36–1.00]

0.048*







0.59

[0.35–0.98]

0.042*

Adjusted for degree lymph node involvement (N) T/T

271

148 (54.6)

C/T

38

14 (36.8)

0.66

[0.38–1.14]

0.135

0

0 (0.0)

– 0.66

– [0.38–1.14]

– 0.135

C/C Dominant model (TT vs. CT ? CC) Adjusted for presence of distant metastasis (M) T/T

307

178 (58.0)

C/T

43

17 (39.5)

C/C

1

0 (0.0)

Dominant model (TT vs. CT ? CC)

0.66

[0.40–1.08]

0.097







0.60

[0.36–0.98]

0.042*

Adjusted for the grade of the cancer cells T/T

267

149 (55.8)

C/T

40

17 (42.5)

C/C

1

0 (0.0)

Dominant model (TT vs. CT ? CC)

0.67

[0.41–1.11]

0.117







0.65

[0.39–1.07]

0.091

Genotypes with individual adjustments for age, sex, T, N, M, and grade HR hazard ratio, CI confidence interval Associations between genotypes and survival were assumed significant at p B 0.05; level of significance * \0.05, ** \0.005 a

Overall log-rank p value for the survival distribution function per genotype

The strongest association with CRC risk and survival was found for the MLL3 gene. Furthermore, we found that polymorphisms in EPHB6/TRPV6 may be associated with survival. Common SNPs in these three genes have not previously been described to be associated with risk or survival of CRC [9, 36]. The location of the SNPs genotyped in each of the three genes in relation to the mutations found in CRC tumors is shown in Fig. 3 [39, 55]. Strong signatures of natural selection detected in MLL3 and EPHB6/TRPV6 and the predicted effects on regulatory motifs support the hypothesis that the variants in these genes may play a functional role for the genes and related pathways [3, 7, 13, 31]. As a component of a histone modification complex histone 3-lysine 4 (H3K4), MLL3 plays an important role in epigenetic regulation of gene transcription. Furthermore, it has been described to be involved in p53-dependent DNA damage response as being part of the H3K4

methyltransferase coactivator complex of p53 [17, 27]. In this function, MLL3 was suggested as a tumor suppressor. In NGS and whole genome studies, mostly published in the year 2012, MLL3 has been found to be mutated in CRC and several other types of cancers [15, 39, 55]. Strikingly, MLL3 was frequently found in microsatellite instable CRC, with nonfunctional DNA repair mechanisms [5, 25, 28, 37, 53]. In our study, we found one SNP in MLL3 to be marginally associated with CRC risk and survival in microsatellite stable CRC. The genomic location of MLL3 on chromosome 7q36.1 is within a region with very high recombination rates; however, MLL3 only harbors 13 SNPs with a MAF [ 0.05 (Haploview V2, R24 CEU). In contrast, there are 54 SNPs with a MAF between 0.01 and 0.05, which tightly link MLL3 to the neighboring gene GALNT11 (Supplementary Material, Figure S1). GALNT11 is involved in the mucin-type O-glycosylation, a pathway that has been related to CRC especially via MUC1 and

123

766

Cancer Causes Control (2014) 25:759–769

Fig. 2 Kaplan–Meier plots for the survival distribution function of the SNPs in a EPHB6/TRPV6 and b MLL3 (best model)

Table 3 Signatures of selection in the SNPs significantly associated with CRC risk and/or overall survival Gene

SNP

Functiona

Association Risk of CRC

EPHB6/TRPV6

rs4987622

Intronic variant

No association

Signatures of selection Survival

AFD (%)

FST

Fay Wu’s H

iHS

HR 0.58, p = 0.028;

56.6

0.4

-47.24

-1.22

log-rank p = 0.025 (dominant model) EPHB6/TRPV6

rs6947538

Intronic variant

No association

HR 0.64, p = 0.036; log-rank p = 0.034 (dominant model)

64.2

0.47

-43.78

-1.58

MLL3

rs6464211

Synonymous variant

OR 1.50, p = 0.002 (dominant model)

HR 2.12, p = 0.020;

70.1

0.49

-129.61

-0.18

log-rank p = 0.017 (recessive model)

AFD allele frequency difference YRI versus CEU, estimation based on the minor allele in CEU, updated with dbSNP137, Jan 2013, validation status HapMap CEU, FST fixation index, iHS integrated haplotype score a

dbSNP137 (Jan 2013)

123

Cancer Causes Control (2014) 25:759–769

767

Fig. 3 Location of the genotyped SNPs within EPHB6/TRPV6 and MLL3. Plot of the genomic position of the genotyped SNPs in relation to the somatic mutations and in relation to common nsSNPs within the four genes described in Sjo¨blom et al. [11]. *SNPs found to be

significantly associated with CRC risk and/or survival; plots adapted from http://bio.ieo.eu/fancygene/. Red lines indicate nonsynonymous SNPs, and green lines indicate synonymous SNPs. (Color figure online)

MUC2 [6, 24]. Rs6464211 is characterized by a strong signal of natural selection with highly significant values for the fixation index (FST) and Fay Wu’s H estimates [13, 16, 31, 43]. EPHB6 belongs to the Eph-family receptor tyrosine kinases and is expressed in almost every human tissue [29]. It is located at 7q33-q35 in close proximity to another gene: TRPV6. TRPV6 belongs to the superfamily of cation channel proteins and is frequently expressed in placenta, pancreas, prostate, and CRC cell lines, with lower expression in kidney and small intestine [32]. In our study, we analyzed two SNPs within a high LD block linking the two genes EPHB6 and TRPV6 with an r2 C 0.76 (Haploview V2. R24. CEU) and an r2 = 0.68 in the Czech control population (Supplementary Material, Figure S2). Our data suggest that either rs4987622 in TRPV6 or rs6947538 in EPHB6 may be associated with survival. Because of the LD between the two genes, it is not clear whether the detected effect can be attributed to one of the two genotyped variants in EPHB6 or TRPV6 or any other linked variant in this particular region. However, Sjo¨blom et al. and Wood et al. addressed EPHB6 as the actual CAN gene. The estimates analyzed for signatures of natural selection are very similar for both SNPs and point to a selective process that has shaped the gene region. HaploReg v2 predicted effects on regulatory motives for both SNPs and several linked variants (r2 [ 0.85). Additionally, both genes, EPHB6 and TRPV6, have been found mutated and

differentially expressed in various tumor types such as breast, lung, and gastric cancer (COSMIC Release v63) [15]. Our study has both strengths and limitations. Strengths include its candidate gene study design, using a welldefined and genetically homogeneous study population with a sufficient size. From the altogether 1,399 cases, only the 406 consecutively collected incident cases diagnosed in 2003 or later were available for the survival analysis. This ensured that only newly diagnosed CRC cases (within one year of diagnosis before enrollment for this study) were included to the study, excluding a survival bias. For this subgroup, nearly complete clinical data were available, allowing evaluation of the SNPs as independent prognostic markers. However, the limitation to newly diagnosed CRC cases made some subgroup analyses small and decreased the power to detect associations with genotypes. Another limitation of the study may be that only a few SNPs were studied in each gene. Based on our hypothesis that germline variants in the genes commonly mutated in sporadic CRC may influence CRC risk or survival, the study intentionally focused on potentially functional coding SNPs. Therefore, other SNPs in or nearby the selected genes that may contribute to the risk or survival of CRC might have been missed. In conclusion, our study did not provide evidence of a strong association of the 35 studied potentially functional SNPs with CRC risk or survival. However, an effect of

123

768

Cancer Causes Control (2014) 25:759–769

other germline variants within frequently mutated CAN genes cannot be excluded. A retrospective analysis of the latest publications (effective January 2014) about the mutational landscape of CRC conclusively lists three of our candidate genes among the most commonly mutated and important genes in CRC: SYNE1, FBXW7, and SMAD4 [23, 44]. Also, MLL3 and EPHB6/TRPV6 have been repeatedly found mutated in CRC tumor samples, although with varying mutation frequencies [44]. Further and larger studies on frequently mutated CAN genes will be necessary in the future to understand their functional role on CRC risk, progression, and prognosis. Acknowledgments We would like to thank all patients and blood donors for their participation in this research study on CRC susceptibility and prognosis. This work has been supported by the Grant Agency of the Czech Republic (GACR) and the Ministry of Education, Youth and Sport of the Czech Republic. [Grant Numbers: CZ:GA CR:GA304/12/1585, CZ:GA CR:GA304/10/1286, and Prvouk-P27/LF1/1]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Conflict of interest

The authors declare no conflict of interest.

Ethical standard According to the Helsinki declaration, all patients and blood donors provided a written informed consent and approved the use of their biological samples for genetic studies (Medical ethics manual; WMA) [54]. The study was approved by the Ethics Committees of the Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, Prague (Czech Republic); Institute for Clinical and Experimental Medicine and Faculty Thomayer Hospital, Prague (Czech Republic).

References 1. Adzhubei IA, Schmidt S, Peshkin L et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249 2. Barrett JC, Fry B, Maller J et al (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265 3. Barton NH (2000) Genetic hitchhiking. Philos Trans R Soc Lond B Biol Sci 355:1553–1562 4. Bernstein CN, Blanchard JF, Kliewer E et al (2001) Cancer risk in patients with inflammatory bowel disease: a population-based study. Cancer 91:854–862 5. Biswas S, Trobridge P, Romero-Gallo J et al (2008) Mutational inactivation of TGFBR2 in microsatellite unstable colon cancer arises from the cooperation of genomic instability and the clonal outgrowth of transforming growth factor beta resistant cells. Genes Chromosom Cancer 47:95–106 6. Brokx RD, Revers L, Zhang Q et al (2003) Nuclear magnetic resonance-based dissection of a glycosyltransferase specificity for the mucin MUC1 tandem repeat. Biochemistry 42:13817–13825 7. Carlson CS, Thomas DJ, Eberle MA et al (2005) Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 15:1553–1565 8. Coop G, Pickrell JK, Novembre J et al (2009) The role of geography in human adaptation. PLoS Genet 5:e1000500

123

9. dbCPCO. http://www.med.mun.ca/cpco/default.aspx 10. NCBI dbSNP. http://www.ncbi.nlm.nih.gov/ 11. De La Chapelle A (2004) Genetic predisposition to colorectal cancer. Nat Rev Cancer 4:769–780 12. Duffy MJ, Lamerz R, Haglund C et al (2014) Tumor markers in colorectal cancer, gastric cancer and gastrointestinal stromal cancers: European group on tumor markers 2014 guidelines update. Int J Cancer 134:2513–2522 13. Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413 14. Fearnhead NS, Wilding JL, Bodmer WF (2002) Genetics of colorectal cancer: hereditary aspects and overview of colorectal tumorigenesis. Br Med Bull 64:27–43 15. Forbes SA, Bindal N, Bamford S et al (2011) COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39:D945–D950 16. Fu YX, Li WH (1993) Statistical tests of neutrality of mutations. Genetics 133:693–709 17. Guo C, Chang CC, Wortham M et al (2012) Global identification of MLL2-targeted loci reveals MLL2’s role in diverse signaling pathways. Proc Natl Acad Sci USA 109:17603–17608 18. Haber DA, Fearon ER (1998) The promise of cancer genetics. Lancet 351(Suppl 2):SII1–SII8 19. HaploReg v2. http://www.broadinstitute.org/mammals/haploreg/ haploreg.php 20. Hemminki K, Forsti A, Lorenzo Bermejo J (2009) Surveying the genomic landscape of colorectal cancer. Am J Gastroenterol 104:789–790 21. Huxley RR, Ansary-Moghaddam A, Clifton P et al (2009) The impact of dietary and lifestyle risk factors on risk of colorectal cancer: a quantitative overview of the epidemiological evidence. Int J Cancer 125:171–180 22. Jasperson KW, Tuohy TM, Neklason DW et al (2010) Hereditary and familial colon cancer. Gastroenterology 138:2044–2058 23. Kandoth C, Mclellan MD, Vandin F et al (2013) Mutational landscape and significance across 12 major cancer types. Nature 502:333–339 24. Kawashima H (2012) Roles of the gel-forming MUC2 mucin and its O-glycosylation in the protection against colitis and colorectal cancer. Biol Pharm Bull 35:1637–1641 25. Kloosterman WP, Hoogstraat M, Paling O et al (2011) Chromothripsis is a common mechanism driving genomic rearrangements in primary and metastatic colorectal cancer. Genome Biol 12:R103 26. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073–1081 27. Lee J, Kim DH, Lee S et al (2009) A tumor suppressive coactivator complex of p53 containing ASC-2 and histone H3-lysine-4 methyltransferase MLL3 or its paralogue MLL4. Proc Natl Acad Sci USA 106:8513–8518 28. Markowitz S, Wang J, Myeroff L et al (1995) Inactivation of the type II TGF-beta receptor in colon cancer cells with microsatellite instability. Science 268:1336–1338 29. Matsuoka H, Iwata N, Ito M et al (1997) Expression of a kinasedefective Eph-like receptor in the normal human brain. Biochem Biophys Res Commun 235:487–492 30. Naccarati A, Pardini B, Stefano L et al (2012) Polymorphisms in miRNA-binding sites of nucleotide excision repair genes and colorectal cancer risk. Carcinogenesis 33:1346–1351 31. Oleksyk TK, Smith MW, O’brien SJ (2010) Genome-wide scans for footprints of natural selection. Philos Trans R Soc Lond B Biol Sci 365:185–205 32. Peng JB, Brown EM, Hediger MA (2001) Structural conservation of the genes encoding CaT1, CaT2, and related cation channels. Genomics 76:99–109

Cancer Causes Control (2014) 25:759–769 33. PolyPhen-2. http://genetics.bwh.harvard.edu/pph2/ 34. Quanto. http://hydra.usc.edu/gxe/ 35. Sabeti PC, Reich DE, Higgins JM et al (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837 36. Savas S, Younghusband HB (2010) dbCPCO: a database of genetic markers tested for their predictive and prognostic value in colorectal cancer. Hum Mutat 31:901–907 37. Shin N, You KT, Lee H et al (2011) Identification of frequently mutated genes with relevance to nonsense mediated mRNA decay in the high microsatellite instability cancers. Int J Cancer 128:2872–2880 38. SIFT. http://sift.jcvi.org/ 39. Sjo¨blom T, Jones S, Wood LD et al (2006) The consensus coding sequences of human breast and colorectal cancers. Science 314:268–274 40. SNPnexus. http://snp-nexus.org/citation.html 41. Sobin LH, Gospodarowicz MK, Wittekind C et al (2010) TNM classification of malignant tumours. Wiley-Blackwell, Hoboken 42. Southam L, Soranzo N, Montgomery SB et al (2009) Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants? Diabetologia 52:1846–1851 43. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595 44. The Cancer Genome Atlas Network (2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:330–337 45. The International Hapmap3 Consortium (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58

769 46. The International Hapmap Consortium (2003) The international HapMap project. Nature 426:789–796 47. Tomlinson I (2012) Colorectal cancer genetics: from candidate genes to GWAS and back again. Mutagenesis 27:141–142 48. Tomlinson IP, Dunlop M, Campbell H et al (2010) COGENT (COlorectal cancer GENeTics): an international consortium to study the role of polymorphic variation on the risk of colorectal cancer. Br J Cancer 102:447–454 49. Vasen HF, Watson P, Mecklin JP et al (1999) New clinical criteria for hereditary nonpolyposis colorectal cancer (HNPCC, Lynch syndrome) proposed by the International Collaborative group on HNPCC. Gastroenterology 116:1453–1456 50. Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10:789–799 51. Voight BF, Kudaravalli S, Wen X et al (2006) A map of recent positive selection in the human genome. PLoS Biol 4:e72 52. Ward LD, Kellis M (2012) HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 40:D930–D934 53. Watanabe Y, Castoro RJ, Kim HS et al (2011) Frequent alteration of MLL3 frameshift mutations in microsatellite deficient colorectal cancer. PLoS ONE 6:e23320 54. Williams JR, World Medical Association (2009) Medical ethics manual. WMA, Ferney-Voltaire Cedex 55. Wood LD, Parsons DW, Jones S et al (2007) The genomic landscapes of human breast and colorectal cancers. Science 318:1108–1113 56. Wright S (1978) Evolution and the genetics of populations: variability within and among natural populations. University of Chicago Press, Chicago

123

Colorectal cancer risk and patients' survival: influence of polymorphisms in genes somatically mutated in colorectal tumors.

The first two studies aiming for the high-throughput identification of the somatic mutation spectrum of colorectal cancer (CRC) tumors were published ...
840KB Sizes 0 Downloads 3 Views