Multiple Sclerosis and Related Disorders 1 (2012) 39–42

Contents lists available at SciVerse ScienceDirect

Multiple Sclerosis and Related Disorders journal homepage: www.elsevier.com/locate/msard

Concealed effects of gene–environment interactions in genome-wide association Adam E. Handel a,b,n,1, Alexander J. Williamson c,1, Sreeram V. Ramagopalan a,b a

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom Department of Clinical Neurology, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom c Department of Physical and Theoretical Chemistry, University of Oxford, Oxford, United Kingdom b

a r t i c l e i n f o

abstract

Article history: Received 29 May 2011 Received in revised form 5 August 2011 Accepted 22 August 2011

Genome-wide association studies (GWAS) have been conducted on many complex diseases with the aim of identifying common genetic variants important in disease susceptibility. It is rare for these to consider environmental factors when calculating the odds ratios for variants associated with disease. We modelled the effect of an environmentally-dependent genetic risk locus and showed that, despite a relatively low overall odds ratio, environmental exposure was able to conceal a large true effect size on disease susceptibility in exposed individuals. This model was applied to vitamin D deficiency in multiple sclerosis and raised the possibility of large magnitudes of concealed effect in loci responsive to vitamin D in this disease. In the post-GWAS era, efforts should be made to elucidate potential gene– environment interactions involved in complex traits in order for true effects on disease risk to be realised. & 2011 Elsevier B.V. All rights reserved.

Keywords: Genome-wide association study Gene–environment interaction Genetic epidemiology Modelling Multiple sclerosis Vitamin D

1. Introduction Genome-wide association studies (GWAS) are a commonly used hypothesis-free method used to identify genetic variation associated with disease susceptibility (Hirschhorn and Daly, 2005). These studies utilise the theory that common genetic variation, generally either single nucleotide polymorphisms (SNPs) or copy number variants (CNVs) contributes to the risk of disease. By comparing the frequency of genetic variants in cases and controls it is possible to identify which are increased or decreased in cases relative to controls. These are presumably involved in the aetiology of disease or proximal to loci that are. However, recent attention has focused on the potential limitations of this approach to exploring disease susceptibility (Wang et al., 2005). Typically the odds ratios (ORs) found in these studies are modest in magnitude (Manolio et al., 2008). For example, recent GWAS efforts identified several loci associated with increased risk of MS (Hafler et al., 2007; IMSGC, 2010). However, outside of the previously known main locus, the major histocompatibility complex (MHC), the ORs for these SNPs were around 1.2–1.3. It is

n Corresponding author at: The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, United Kingdom. Tel.: þ44 1865 287659; fax: þ 44 1865 287501. E-mail address: [email protected] (A.E. Handel). 1 These authors contributed equally to this manuscript.

2211-0348/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.msard.2011.08.005

possible that gene–environment interactions may conceal the true effect size of these genes (Thomas, 2010). The interaction between genetic and environment risk factors is likely to be of importance in many complex conditions. The nature of this interaction might take the form of epigenetic regulation of gene expression or by interaction with molecular pathways (Handel et al., 2009). Again using MS as an example, there is evidence that genetic and environmental risk factors interact on a populationwide level to determine susceptibility to disease (Ramagopalan et al., 2010). Expression of the key MHC risk allele is controlled by vitamin D (Ramagopalan et al., 2009), and it is likely that at least some of the newer genes shown to be associated to MS by GWAS are also regulated by vitamin D (Wang et al., 2005). Genetic epidemiological work using twin pairs has shown an interaction between latitude (perhaps a proxy for vitamin D levels) and disease concordance, arguing that MS susceptibility is strongly determined by the interplay of vitamin D and genetic risk factors (Islam et al., 2006). Case-control studies are beginning to shed light on the nature of interactions between different risk factors in determining MS susceptibility. The evidence so far suggests interactions exist between the main MHC risk loci, EBV antibody titres, infectious mononucleosis and smoking (Nielsen et al., 2009; Simon et al., 2010). One polymorphism in the vitamin D receptor (VDR) appears to interact with vitamin D levels in establishing the relative risk of MS conferred by that locus (Simon et al., 2010). Given the likely importance of gene–environment interactions in complex disease susceptibility, we aimed to model the magnitude of

40

A.E. Handel et al. / Multiple Sclerosis and Related Disorders 1 (2012) 39–42

true ORs of environmentally-dependent susceptibility genes and then apply this to the particular case of vitamin D deficiency in MS.

2. Materials and methods 2.1. Matrix characteristics of GWAS Fig. 1 defines the mathematical symbols used throughout this manuscript and their derivation from GWAS and epidemiological studies. Importantly, these include: allele 1 and allele 2, which are different alleles of the same SNP; a, cases exposed to the environmental factor (e), who have allele 1; a, cases not exposed to e, who have allele 1; b, cases exposed to e, who have allele 2; b, cases not exposed to e, who have allele 2; c, controls exposed to e, who have allele 1; g, controls not exposed to e, who have allele 1; d, controls exposed to the e, who have allele 2; d, controls not exposed to e, who have allele 2; ORGWAS is the overall OR from the GWAS; ORe þ is the OR in the presence of e; ORe the OR in the absence of e; E is the proportion of controls exposed to e; k is a point estimate of the ratio between cases and controls exposed to e obtained from epidemiological studies. For this paper the minor allele frequency (MAF) is assumed to be 0.3 throughout. 2.2. Mathematical model Our derivation of a mathematical model based upon the above variables is detailed in the Supplementary Information. The model was composed using Mathematica 7.0.1.0 and figures were manipulated using GnuPlot 4.3.0.

3. Results 3.1. Concealed effect size with equal exposure of cases and controls to an environmental factor We first modelled possible effect sizes for SNPs with ORGWAS of 1.2 and 1.3 for different values of E, assuming that in the absence of

Fig. 2. Maximum concealed effect of susceptibility loci with equal exposure of cases and controls to environmental factors.

e, ORe ¼1 (Fig. 2). This clearly illustrated that gene–environment interactions could conceal large magnitude ORs, especially when the value of E was low. For example, when 10% of cases and controls is exposed (E¼0.1), ORe þ could take values as high as 5.3 (ORGWAS ¼1.2) or 16.8 (ORGWAS ¼1.3). 3.2. Concealed effect size with differing exposure of cases and controls to an environmental factor However, given that e is likely to have an effect on disease susceptibility in its own right, it is highly improbable that cases and controls would have an equal exposure. We plotted the ORe þ against k and E to model the effect of varying the proportion of cases relative to controls exposed to e, again assuming ORe  ¼1 (Fig. 3A). This shows that although the effect is attenuated by increased values of k, a high degree of susceptibility will be concealed by gene–environment interactions. For example, when 10% of cases and controls are exposed (E¼0.1) and k¼ 1, 2, 3 and 5, ORe þ could take values of as high as 5.3, 2.3, 1.8 or 1.4, respectively (ORGWAS ¼1.2). We next went on to model how this would differ when ORe  a1 and ORGWAS ¼1.2 (Fig. 3B). Again, the effect size was attenuated but we showed that a conventional GWAS would still conceal high ORe þ for relatively rare E. For example, when k¼2 and ORe  ¼0.5, 1 and 1.1, ORe þ could still take values as high as 300, 2.3 and 1.7 when E¼ 0.1. 3.3. Application to multiple sclerosis candidate SNPs

Fig. 1. Definitions and their place in GWAS and epidemiological studies. In this case, e is the environmental factor.

We next attempted to apply our model to data from MS GWAS and epidemiological studies, again using vitamin D as an example environmental factor. In the study by Munger and colleagues on serum vitamin D and MS susceptibility, the risk of MS falls off precipitously in individuals outside of the lowest quintile of vitamin D levels (Munger et al., 2006). If one hypothesises that this effect may be due to gene–environment interactions to some part, one can postulate that EE0.2 (where e is vitamin D deficiency). From this study, one can estimate that k is 1.46. From these values, it can be calculated that, for a SNP with an ORGWAS ¼1.3 and assuming that ORe ¼1, ORe þ ¼2.43 (see Supplementary Information for modelling assumptions). Thus from real data, making some assumptions, one can show that it is likely that gene– environment interactions conceal large effects of candidate genes in GWAS.

A.E. Handel et al. / Multiple Sclerosis and Related Disorders 1 (2012) 39–42

41

Fig. 3. Map projection of 3-dimension plots of maximum concealed effect of susceptibility loci with differing exposure to environmental factors. (A) The odds ratio in exposed individuals (ORe þ ) for differing frequencies of population exposure (E), ratios of exposure between cases and controls (k) and overall GWAS odds ratios (ORGWAS). The odds ratio in unexposed individuals (ORe  ) was defined as 1. (B) The odds ratio in exposed individuals (ORe þ ) for differing odds ratios in unexposed individuals (ORe  ), ratios of exposure between cases and controls (k) and frequencies of population exposure (E). The overall GWAS odds ratio (ORGWAS) was 1.2. In all plots the magnitude of ORe þ is represented by the colour scale.

4. Discussion We have shown that gene–environment interactions have the potential to mask large effect sizes in GWAS. Many of the published ORs for susceptibility loci may therefore be underestimates of the true effects as very few take into account environmental factors in stratifying cases and controls for statistical testing, and thus gene– environment interactions may explain at least some of the so called ‘missing heritability’ in the post-GWAS era. New methods are rapidly evolving to consider environmental factors in GWAS (Murcray et al., 2009; Kraft et al., 2007). Even with a relatively common environmental exposure such as vitamin D deficiency, there is still potential for concealing large

effects if these interactions are not taken into account. The problem becomes even more cogent if one considers synergistic, permissive environmental factors, i.e. situations in which two or more factors have to be present to realise the true effect of genetic variation on disease susceptibility. Here the effective population exposure to both would be very low since the probability of exposure if both were independent would be multiplicative. Some of this effect would be attenuated by the almost certain concentration of such environmental exposures in cases relative to controls; however, as we have shown graphically, there is still scope for huge concealed biological effect sizes. In MS this is important, as two of the key risk factors, vitamin D and Epstein–Barr virus (EBV) have been shown to interact with the MHC susceptibility locus. The interaction between EBV and

42

A.E. Handel et al. / Multiple Sclerosis and Related Disorders 1 (2012) 39–42

the MHC highlights the potential for concealed effect sizes particularly eloquently. The overall relative risk of MS in individuals positive for HLA-DRB1n1501 is 1.94 (95% confidence interval 1.37–2.74) but this increases to 9.67 (3.20–29.22) in the group with the highest titres of antibodies against EBV (De Jager et al., 2008). However, this study also highlights an important difficulty inherent in studying gene–environment interactions; statistical power is inevitably lost when subdividing cohorts based on exposure to environmental factors thus rendering the detection of important susceptibility effects problematic. Our model has some limitations. We explore only the situation of a single locus interacting with a single putative environmental exposure. In reality, there are likely to be multiple gene–gene, gene–environment and environment–environment interactions. Our model also assumed a binary effect of environmental exposure, whereas the true effect is likely to be some function of the level of exposure. The intricacy of gene–environment interactions will likely only be resolved by methods incorporating testing for the impact of candidate environmental factors on genetic risk and by functional work exploring the expression of susceptibility genes in relation to these factors. Identifying important gene–environment interactions is one of the major challenges facing researchers in the field of multiple sclerosis aetiology. Due to the loss of power resulting from all gene–environment interactions as illustrated above, only large cohort studies will be sufficient to dissect out these important effects. These will have to be considered very carefully since it is vital that the environmental factors being measured are potentially relevant to disease causation and so data on this should ideally be collected prospectively. For this to occur, researchers will need to recruit vast prospective cohorts of high risk individuals without any bias and study these individuals very comprehensively. Accruing such cohorts will require international collaboration but the gain in both understanding MS susceptibility and potentially intervening to prevent development of disease will be worth any amount of effort. By investigating gene–environment interactions we will gain further insight into the role of genetic variation in disease susceptibility (Khoury and Wacholder, 2009), as well as to uncover potential routes for disease prevention.

Conflicts of interest The authors declare no conflicts of interest.

Author contributions AEH, AJW and SVR conceived the idea. AEH and AJW analysed the data. AEH, AJW and SVR wrote the manuscript.

Appendix A. Supplementary materials Supplementary data associated with this article can be found in the online version at doi:10.1016/j.msard.2011.08.005.

References De Jager PL, Simon KC, Munger KL, et al. Integrating risk factors: HLA-DRB1n1501 and Epstein–Barr virus in multiple sclerosis. Neurology 2008;70:1113–8. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 2005;6:95–108. Hafler DA, Compston A, Sawcer S, et al. Risk alleles for multiple sclerosis identified by a genomewide study. The New England Journal of Medicine 2007;357: 851–62. Handel AE, Ebers GC, Ramagopalan SV. Epigenetics: molecular mechanisms and implications for disease. Trends in Molecular Medicine 2009. IMSGC. Comprehensive follow-up of the first genome-wide association study of multiple sclerosis identifies KIF21B and TMEM39A as susceptibility loci. Human Molecular Genetics 2010;19:953–62. Islam T, Gauderman WJ, Cozen W, et al. Differential twin concordance for multiple sclerosis by latitude of birthplace. Annals of Neurology 2006;60:56–64. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene– environment interaction to detect genetic associations. Human Heredity 2007;63:111–9. Khoury MJ, Wacholder S. Invited commentary: from genome-wide association studies to gene–environment-wide interaction studies—challenges and opportunities. American Journal of Epidemiology 2009;169:227–30. (discussion 234-5). Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. Journal of Clinical Investigations 2008;118:1590–605. Munger KL, Levin LI, Hollis BW, Howard NS, Ascherio A. Serum 25-hydroxyvitamin D levels and risk of multiple sclerosis. JAMA 2006;296:2832–8. Murcray CE, Lewinger JP, Gauderman WJ. Gene–environment interaction in genome-wide association studies. American Journal of Epidemiology 2009;169: 219–26. Nielsen TR, Rostgaard K, Askling J, et al. Effects of infectious mononucleosis and HLA-DRB1n15 in multiple sclerosis. Multiple Sclerosis 2009;15:431–6. Ramagopalan SV, Dobson R, Meier UC, Giovannoni G. Multiple sclerosis: risk factors, prodromes, and potential causal pathways. Lancet Neurology 2010;9:727–39. Ramagopalan SV, Maugeri NJ, Handunnetthi L, et al. Expression of the multiple sclerosis-associated MHC class II Allele HLA-DRB1n1501 is regulated by vitamin D. PLoS Genetics 2009;5:e1000369. Simon KC, van der Mei IA, Munger KL, et al. Combined effects of smoking, antiEBNA antibodies, and HLA-DRB1n1501 on multiple sclerosis risk. Neurology 2010;74:1365–71. Simon KC, Munger KL, Xing Y, Ascherio A. Polymorphisms in vitamin D metabolism related genes and risk of multiple sclerosis. Multiple Sclerosis 2010;16: 133–8. Thomas D. Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. Annual Review of Public Health 2010;31:21–36. Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nature Reviews Genetics 2005;6:109–18. Wang TT, Tavera-Mendoza LE, Laperriere D, et al. Large-scale in silico and microarray-based identification of direct 1,25-dihydroxyvitamin D3 target genes. Molecular Endocrinology 2005;19:2685–95.

Concealed effects of gene-environment interactions in genome-wide association.

Genome-wide association studies (GWAS) have been conducted on many complex diseases with the aim of identifying common genetic variants important in d...
505KB Sizes 1 Downloads 7 Views