Enhanced evolution by stochastically variable modification of epigenetic marks in the early embryo Sergio Branciamore1,2, Andrei S. Rodin1,2, Arthur D. Riggs2, and Sergei N. Rodin3 Department of Diabetes and Metabolic Diseases Research, Beckman Research Institute of the City of Hope, Duarte, CA 91010 Contributed by Arthur D. Riggs, March 4, 2014 (sent for review December 6, 2013)

|

|

DNA methylation mathematical modeling computational biology molecular evolution developmental biology

|

O

|

ne of the crucial driving forces behind gaining new complexity and functions in the evolutionary process is evolution by gene duplication, an idea that was first proposed and substantiated in the seminal work of Susumu Ohno (1). Due to technological progress in whole genome sequencing and omics in general, we recently have gained a much better qualitative and quantitative understanding of the extent and patterns of gene duplication in extant genomes. However, the actual evolutionary dynamics of gene duplication events are still a matter of considerable debate. Though the molecular underpinnings of generating duplicate gene copies are well understood, it is during the subsequent fixation of gene duplicates that a serious difficulty arises—namely, how to avoid pseudogenization, which is statistically much more likely than new or diversified functions. Substantial biological evidence, as well as theoretical considerations, suggests that epigenetic events influence not only development of individual organisms but also evolutionary processes (2–7). For example, epigenetic silencing by DNA methylation, which is generally a repressive mark, has the potential for tissue-specific gene silencing and thus, as we and others have reported (8–10), may greatly accelerate the rate of evolution by gene duplication. In this paper we consider stochastic epigenetic modifications (SEM) during a window of opportunity in the postfertilization zygote and very early embryo, with emphasis on stochastically variable removal of repressive epigenetic marks established during gametogenesis. It is well recognized that haploid organisms sometimes use persistent but stochastically variable epigenetic states for survival advantage. One genotype can code for two metastable phenotypes, allowing the microbe to survive in two quite different www.pnas.org/cgi/doi/10.1073/pnas.1402585111

environments (11). An example is the metastable on/off switch for pili in bacteria that cause nephritis. There are also examples for mammals, although the mechanisms as well as the evolutionary consequences tend to be more diverse, complex, and thus more difficult to model and interpret (see, e.g., refs. 12 and 13 for a theoretical treatment from the population genetics viewpoint). One well-studied system is the viable yellow agouti gene [A(vy)] (14) in which the phenotype of a mouse depends on the DNA methylation state of the LTR of a retrotransposon inserted near the agouti gene. If the LTR is methylated, the phenotype is wild-type agouti, but if the LTR is unmethylated, then the mouse has yellow hair and tendency toward obesity, metabolic syndrome, and diabetes. This allele and others, such as the Axin1 (Fu) allele, have been termed metastable epialleles (14), showing variation between individuals but little variation across tissues within an individual. These agouti and axin alleles are sensitive to nutritional environment, but other examples, also in mice, show phenotypic variation even in a constant environment. One such example results from duplication of a DNA segment near the imprint control region (ICR) of the Igf2 and H19 genes (15). Upon paternal transmission, the duplication produces two phenotypes for the same genotype. Some of the pups in a litter express the paternal Igf2 gene and are large, whereas other pups with the same genotype in the same litter do not express paternal Igf2 and are smaller. For the large phenotype the ICR is methylated, whereas for the small phenotype the ICR is not methylated. Within an individual pup the methylation level at the ICR was similar in all tissues tested, consistent with the epigenetic differences being established in the zygote. These results were interpreted as due to the duplication changing the probability of methylation during a window of opportunity and then maintenance of the methylation state during further development. Here we suggest that the stochastically variable event caused by the duplication may not be (only) methylation Significance In this paper we investigate by quantitative modeling the effect on evolution of epigenetic variation during a window of opportunity in the early embryo. It is generally accepted that generation of new functions is primarily driven by gene duplication. However, pseudogenization (degradation of a new gene copy) is statistically much more likely than gaining a new function, and thus this remains a serious conceptual problem. We find that epigenetic variation, even in a constant environment, can essentially eliminate the pseudogenization problem and dramatically improve the efficacy of evolution by gene duplication. Author contributions: S.B., A.D.R., and S.N.R. conceived the study; S.B., A.S.R., A.D.R., and S.N.R. designed research; S.B., A.S.R., A.D.R., and S.N.R. performed research; S.B., A.S.R., A.D.R., and S.N.R. analyzed data; and S.B., A.S.R., A.D.R., and S.N.R. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1

S.B. and A.S.R. contributed equally to this work.

2

To whom correspondence may be addressed. E-mail: [email protected], arodin@ coh.org, or [email protected].

3

Deceased November 11, 2011.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1402585111/-/DCSupplemental.

PNAS | April 29, 2014 | vol. 111 | no. 17 | 6353–6358

EVOLUTION

Evolution by gene duplication is generally accepted as one of the crucial driving forces for the gain of new complexity and functions, but the formation of pseudogenes remains a problem for this mechanism. Here we expand on earlier ideas that epigenetic modifications can drive neo- and subfunctionalization in evolution by gene duplication. We explore the effects of stochastic epigenetic modifications on the evolution (and thus development) of complex organisms in a constant environment. Modeling is done both using a modified genetic drift analytical treatment and computer simulations, which were found to agree. A transposon silencing model is also explored. Some key assumptions made include (i) stochastic, incomplete removal (or addition) of repressive epigenetic marks takes place during a window(s) of opportunity in the zygote and early embryo; (ii) there is no statistical variation of the marks after the window closes; and (iii) the genes affected are sensitive to dosage. Our genetic drift treatment takes into account that after gene duplication the prevailing case upon which selection operates is a duplicate/singlet heterozygote; to the best of our knowledge, this has not been considered in previous treatments. We conclude from our modeling that stochastic epigenetic modifications, with rates consistent with experimental observation, can both increase the rate of gene fixation and decrease pseudogenization, thus dramatically improving the efficacy of evolution by gene duplication. We also find that a transposon silencing model is advantageous for fixation of recessive genes in diploid organisms, especially with large effective population sizes.

during gametogenesis but also the removal of methylation in the zygote and preimplantation embryo. In the context of random monoallelic expression, such as seen for most mammalian X-linked genes and some autosomal genes, Chess (16) has reviewed the literature and pointed out that these stochastic events may affect development and evolution. We also propose here that stochastic epigenetic events during developmental windows may provide evolutionary advantages. More specifically, we make three primary assumptions for all of the models presented in this paper. First, we assume that during gametogenesis, many genes are modified with repressive epigenetic marks, probably DNA methylation, that will render them transcriptionally silent in somatic cells unless the marks are removed. The second assumption is that there is a window of opportunity in the early embryo during which variable incomplete removal of epigenetic marks takes place. The third assumption is that after the window of opportunity is closed, the epigenetic marks are then propagated faithfully. Though our general model does not require specifying actual mechanistic underpinnings, we also tacitly assume that (i) the epigenetic mark is DNA methylation, i.e., 5-methylcytosine (5mC), and (ii) the window of opportunity for stochastic removal of 5mC is in the zygote and very early embryo, starting in mammals at fertilization and continuing only until implantation. Although we focus on stochastic removal of the epigenetic mark, the model is quite robust with respect to the actual molecular minutia of the epigenetic mechanisms, similar to the modeling framework in (12). Our results are essentially the same if the stochastic events in the early embryo include the addition of epigenetic marks, and, of course, the epigenetic mark is not required to be 5mC. Similarly, stochastic epigenetic variation introduced during meiosis is compatible with our modeling. Our focus on removal of epigenetic marks is in keeping with the fact that the genomes of both oocytes and sperm are highly methylated, but following fertilization most (not all) 5mC is lost before and during the first few cell divisions (17). Recently it was discovered that in the mouse zygote, much of the 5mC in the male pronucleus is rapidly converted to 5-hydroxymethylcytosine (5hmC), which is then diluted away during DNA replication or is removed by base excision repair (17). For the female genome, the removal of 5mC is a somewhat slower process, and may be passive due to lack of maintenance methylation. Sperm and oocyte DNA are highly methylated, so the demethylation process erases most (but, again, not all) methylation marks established during gametogenesis (17). We propose that erasure after fertilization is stochastically incomplete for many genes. 5mC is strongly associated with the transcriptionally silent state, so if 5mC is not removed from a critical promoter, enhancer, or nearby transposon, the methylated gene is likely to be silent, or at least less active, compared with the “normal” demethylated gene. Because enhancers act at a distance and heterochromatin sometimes spreads for a variable distance from certain DNA sequences, such as transposons (18), the affected gene may sometimes be some distance from the DNA sequence change that alters the frequency of stochastic variation. Importantly, it has been shown (18) that retroposon-related epigenetic marks established in the very early embryo can indeed be maintained to mature somatic cells, with the methylation pattern seen in adult liver cells reflecting the pattern established in preimplantation embryo. For some genes, such as imprinted genes and those on the X chromosome, the methylation patterns established at or before the time of implantation can be different between alleles; and these differences are maintained throughout subsequent development (19). Thus, it is well established that there can be a window of opportunity for epigenetic change in the early vertebrate embryo followed by faithful maintenance during later development. In this study, we first consider models centered around gene duplication, and then a model based on transposable elements (TEs) affecting reactivation. Though the former approach is well established in the population genetics field, the latter is relatively novel, and we plan to develop it further in subsequent research. In any case, both newly duplicated genes and newly inserted TEs are likely to be recognized during meiosis, probably by lack of a pairing 6354 | www.pnas.org/cgi/doi/10.1073/pnas.1402585111

partner during meiosis. In Neurospora it is clear that homologous pairing during meiosis is used to recognize duplicated genes or DNA newly inserted into the genome. The presence of unpaired DNA regions during meiosis triggers epigenetic marking by DNA methylation, which subsequently leads to elimination of the insert (20). In mammals there is evidence that genes unpaired during pachytene are subject to postmeiotic transcriptional repression (21), and it is well established that both plants and animals, including mammals, have evolved elaborate mechanisms to silence TEs (22–27). We thus assume that vertebrates recognize and epigenetically mark insertions. We propose that instead of the marked DNA region being eliminated, the epigenetic mark is retained until fertilization, at which time there is stochastically variable removal of the mark. There also could be variable spreading of marked chromatin, as seen with position-effect variegation in Drosophila and in a model system in mouse where a transposon-derived sequence drives heterochromatinization over at least a 1-kb region (18), and this could affect the probability of reactivation in the early embryo. Accepting these assumptions for the SEM model, we pursued two complementary analytical approaches: (i) direct computer simulations and (ii) extending the standard genetic drift population genetics framework to take into account the fact that a new gene “doublet” resulting from a duplication event will be paired with a single gene, forming a doublet/singlet heterozygote. The results of both the simulation experiments and the population genetics analysis concurred in suggesting that if the organism fitness is sensitive to gene dosage, the duplicated genes tend to be fixed more rapidly under SEM, also largely avoiding pseudogenization; thus, SEM has the potential to greatly accelerate and stabilize evolution by gene duplication. Conversely, if a duplicated copy happens to be located near a TE, under SEM even a recessive allele would then be exposed to selection pressure. Current theories of evolution by gene duplication, though in principle comprehensive and mature (see refs. 28–35; also see ref. 36 for the practical implications of the concept), still tend to tacitly gloss over, at least to some extent, two important issues. The first issue is the pseudogenization trap—during the time it “normally” takes a duplicate to be fixed in a population by neutral drift (leading to neo- or at least subfunctionalization), the duplicate will likely have already degenerated to a pseudogene, because mutations detrimental to a gene function are much more frequent than the functionalization mutations in relative terms, and sufficiently frequent in absolute terms (37). To avoid this pseudogenization trap, selection pressure must be (re)established shortly after birth of the duplicate. Second, most current evolution-by-gene-duplication scenarios take for granted that the duplication per se has already spread (mostly by means of stochastic drift) across the entire population. In reality, on the microevolution/population level, all such basic evolutionary processes (mutagenesis, drift, fixation, selection) are concurrent. Therefore, a more realistic assumption is that the newly emerged doublet/singlet heterozygotes will continue to exist in the population for a relatively long time. It is probably at this stage that the majority of duplicated genes are reduced to pseudogenes. We have extended the standard stochastic drift population genetics framework to the case of doublet/singlet heterozygotes, and demonstrated that SEM aids the retention (increases the fixation probability) of functional gene duplicates. It should be noted that for the purposes of this modeling exercise we restricted ourselves to the “precise dosage” situation, in which fitness of the organism is dependent on the precise dosage of gene expression product (directly proportional to the number of gene copies). For example, if a duplication event takes place in an environment where two doses of gene expression product are optimal, then without immediate epigenetic silencing of one copy the duplication will find itself under negative selection pressure. Results Fig. 1 shows three alternative models that we have analyzed quantitatively; for each, we postulate that (i) there is a window of Branciamore et al.

opportunity during early embryonic development for repressive epigenetic marks to be removed so that the gene can become active or potentially active, (ii) the probability of (re)activation is independent for each gene, (iii) the epigenetic status (active or inactive) is maintained over the life of the organism, thus directly affecting its fitness, but is not transmitted to the next generation, and (iv) there is no recombination between the gene and its duplicate copy. An important consequence of incorporating epigenetic modifications in the evolutionary model is that we no longer have a oneto-one deterministic genotype/phenotype mapping. Thus, the fitness of the organism depends not only on the genotype, but also on the probability of epigenetic modification in a window of opportunity during development. In this context, we define the epiphenotype as the specific phenotype ultimately realized after SEM takes place and the window of variability for this particular gene closes.

different genotypes and epiphenotypes for the model in Fig. 1B. The expected frequencies of epiphenotypes with k (out of n) genes silenced follow a binomial distribution (Table 1):   n k e ð1 − eÞn−k : f ðn; k; ρÞ = k

Gene Duplication Models and Analytical Treatment Using Diffusion Equations. Fig. 1 A and B show two gene duplication models in-

Assuming that the probability of epigenetic event (ρ) and selection coefficient (s) remain constant during fixation time, and the mean fitness of each genotype is constant as well, we can norm the relative fitness coefficients of genotypes A/A, A/AA, and AA/AA. Under the basic diffusion model, one way to estimate the probability of fixation of both duplicated gene copies is (38, 39) Z p z eS½ð2h−1Þx −2hx dx ∏ðS; p; hÞ = Z0 1 ; z eS½ð2h−1Þx −2hx dx

corporating SEM, with a randomly mating population consisting of N diploid individuals (effective population size Ne). Let A be a normal wild-type allele at a particular locus. The model in Fig. 1A assumes that the epigenetic silencing of the gene of interest occurs only during spermatogenesis; then, during a development window after fertilization, the repressive mark is removed. For the model in Fig. 1B we assume that epigenetic silencing occurs during both spermatogenesis and oogenesis; then, after fertilization, the marks are removed with equal probability from each copy. The Fig. 1A model is the most straightforward to treat quantitatively, and was derived first, but due to space limitations is described in SI Text (Tables S1 and S2). The model in Fig. 1B, which is arguably the most biologically realistic model, will now be detailed (the analytical treatment broadly applying to other models in this study). A consideration of two extremes may be useful. If there is no reactivation (ρ = 0), then the probability of fixation is zero. If there is complete reactivation (ρ = 1), then triploid expression will greatly reduce, and effectively eliminate, fixation. The question is, then, what is the effect of partial reactivation, averaged over many generations, on the probability of fixation? We can evaluate the mean fitness for a given genotype as the weighted (by expected frequencies) fitness (W) averaged over all possible epiphenotypes, the frequencies depending on the probability (ρ) of (re)activation of a single gene copy. Table 1 shows the relative frequencies and associated fitness values for the

where S is the selection coefficient, p is the population frequency of the new allele, and h is the dominance parameter (see SI Text for more detail). Fig. 2 shows the fixation probability of locus AA under SEM (pS) and standard neutral model (pN = 1/2Ne), demonstrating the effect of ρ on the probability of fixation of duplicated locus under SEM assumption for different values of effective population size. Fig. 2A shows the comparative advantage of SEM model in Fig. 1B, with a broad peak around reactivation probability of 0.3, and the advantage increases with Ne. The results obtained under the model in Fig. 1A (Fig. S1) clearly indicate that the probability of fixation (reflecting relative advantage of SEM) of both gene copies is significantly increased by SEM if the removal probability (ρ) of the epigenetic mark ranges around 0.3, a reasonable number in keeping with experimental data (17). Moreover, even for small effective population size, the relative advantage is always larger than 1 (if ρ is less than ∼0.4). For Ne = 1,000, a modest yet realistic effective population size, the probability of fixation of a duplicated locus is ∼20× higher than what would be expected under neutral drift. The model in Fig. 1A gives similar results, and these are shown in Fig. S1A. We have also modeled a third version, a “relaxed” model in Fig. 1B (see Tables S3 and S4), where we maintained the basic assumptions of the previous model (specifically, that the processes of epigenetic silencing and stochastic reactivation of the

B

C

EVOLUTION

A

0

Fig. 1. Stochastic epigenetic reactivation models. (A) Variable reactivation of a duplicated gene with only the paternally transmitted duplication being marked during gametogenesis with 5mC, which is then variably removed in the zygote. Fertilization results in a duplicate/singlet heterozygote, which then undergoes stochastic demethylation during a short window of opportunity before implantation, leading to alternate epiphenotypes of different fitness (W). (B) Both maternally and paternally duplicated genes are heavily marked. Fertilization results in a duplicate/singlet heterozygote, which then undergoes stochastic demethylation during a short window of opportunity before implantation, leading to alternate epiphenotypes of different fitness (W). (C) Variable activity due to a nearby transposable element. The transposon (Tr), the wild-type allele (A), and the allele with a recessive mutation (B) are heavily marked in both sperm and oocyte. The reactivation of Tr after fertilization is variable and this affects the activity of nearby gene A. Asterisk and black color indicate a gene or control region that is methylated and will be transcriptionally silent in somatic cells unless subsequently demethylated. Red color indicates a demethylated and active or potentially active gene and/or control region that is active or potentially active in somatic cells.

Branciamore et al.

PNAS | April 29, 2014 | vol. 111 | no. 17 | 6355

Table 1. Genotype/epiphenotype/fitness relationships in the SEM model in Fig. 1B with epigenetic modifications occurring independently in sperm and egg (see Results) Genotype A/A

AA/A

AA/AA

AA/AX

AX/AX

Epiphenotype

Frequency, f

Fitness, W

Mean fitness, W

A/A A*/A A*/A* AA/A A*A/A A*A*/A A*A*/A* AA/AA

ρ 2ρ(1 − ρ) (1 − ρ)2 ρ3 3ρ2(1 − ρ) 3ρ(1 − ρ)2 (1 − ρ)3 ρ4

1 1−s 0 1−s 1 1−s 0 1 − 2s

ρ[2 − ρ − 2(1 − ρ)s]

A*A/AA A*A*/AA A*A*/A*A A*A*/A*A* AA/AX A*A/AX AA/AX* A*A*/AX A*A/AX* A*A*/A*X A*A*/AX* A*A*/A*X* AX/AX A*X/AX AX*/AX A*X*/AX A*X/A*X AX*/AX* A*X*/A*X A*X*/AX* A*X*/A*X*

4ρ3(1 − ρ) 6ρ2(1 − ρ)2 4ρ(1 − ρ)3 (1 − ρ)4 ρ4 3ρ3(1 − ρ) ρ3(1 − ρ) 3ρ2(1 − ρ)2 3ρ2(1 − ρ)2 ρ(1 − ρ)3 3ρ(1 − ρ)3 (1 − ρ)4 ρ4 3 2ρ (1 − ρ) 2ρ3(1 − ρ) 4ρ2(1 − ρ)2 ρ2(1 − ρ)2 ρ2(1 − ρ)2 2ρ(1 − ρ)3 2ρ(1 − ρ)3 (1 − ρ)4

1− 1 1− 0 1− 1 1− 1− 1 0 1− 0 1 1− 1 1− 0 1 0 1− 0

2

ρ[(ρ2 − 3ρ + 3) + (−4ρ2 + 6ρ − 3)s]

ρ[−(ρ3 − 4ρ2 + 6ρ − 4) + (6ρ3 − 16ρ2 + 12ρ − 4)s]

s s s

ρ[(ρ2 − 3ρ + 3) − (4ρ2 − 6ρ + 3)s]

s s

s ρ[−(ρ − 2) + 2(ρ − 1)s] s s

s

X is the inactive pseudogene, ρ is the probability of removal of epigenetic marks, and s is the selection coefficient. Note that for any set of equivalent epiphenotypes (e.g., A*/A and A/A*), only one is listed in the table. *Preceding gene copy was epigenetically silenced.

gene occur in both male and female germlines), but have relaxed gene function to be nonessential so that the epigenetically silenced epiphenotype has a positive (albeit low) fitness (ranging between s and 2s). The results (Fig. S2) are qualitatively similar to the previous model, but the quantitative effect of SEM is significantly reduced, although still substantial. It might be argued that model in Fig. 1B is the most realistic biologically, so it bodes well for our modeling approach that the results are largely robust with respect to the levels of model complexity. Pseudogenization. As noted previously, the difficulty of keeping both duplicated gene copies functional, i.e., avoiding pseudogenization, is a crucial aspect (if not a stumbling block) of evolution by gene duplication. The next natural question is whether the same condition (SEM) that favors fixation of duplicated loci preserves them from pseudogenization. Following the above logic (Table 1), we estimated the probability of fixation of a pseudogene (X) in a population where the duplicated locus is already fixed. Results (Fig. 2B) suggest that if, again, the probability of gene (re)activation is sufficiently low, the probability of a pseudogene getting fixed becomes vanishingly small. For Ne = 1,000, the probability of fixation of a pseudogene would be 105 – 107 times lower compared with the standard neutral case. Thus, the pseudogenization problem is essentially solved. Computer simulations. The SEM process was also explicitly modeled using simulations to corroborate the above analytical results. All simulations assumed constant effective population size. First, an epiphenotype was assigned to each organism following the highest probability depending on a specific model in question. Then, reproduction and selection were modeled simultaneously by drawing with replacement from the same gamete pool following the 6356 | www.pnas.org/cgi/doi/10.1073/pnas.1402585111

fitness of the respective epiphenotype. In each “generation,” an epiphenotype was assigned to each “organism” with a given genotype with probability as a function of ρ. Here we should emphasize that under the assumption of SEM relative genotype fitnesses strongly depend on ρ. For example, if ρ is sufficiently small the heterozygotes (with only one duplicated locus) could have fitness higher than both homozygotes. Fig. 2C and Figs. S1C and S2C show the probability of fixation of duplicated genes (black) and pseudogenes (red) estimated via simulation experiments (designated by open circle) and numerical solution of diffusion equations (solid line) for Ne = 400 (higher Ne values lead to much more computationally difficult modeling). It is satisfying to observe excellent agreement between simulation and diffusion equation approaches; moreover, just by visual inspection of the graphs (Fig. 2C and Figs. S1C and S2C), it is easy to identify cutoff values of ρ that would favor fixation and subsequent preservation of duplicated locus. Our primary conclusion is that the very same (epigenetic) conditions that favor fixation of duplicated loci also preserve them from pseudogenization, and this is supported by both modeling and simulation experiments. In fact, under certain biologically realistic conditions, the main effect is virtual elimination of pseudogenization. Transposon silencing model. Last, but by no means least, we have considered the effect of epigenetic modification on the fixation probability of a new gene B carrying an advantageous recessive mutation. The specific mechanism of most interest to us was the transposon silencing model (TSM), in which the TE would be silenced together with its immediate neighborhood [including the gene(s) of interest] in an early developmental stage (Fig. 1C). Branciamore et al.

B

C

D

Fig. 2. Relative advantage in fixation of a duplicated locus AA (A) and pseudogene locus AX (B) under the SEM model in Fig. 1B for selection coefficient s = 0.01 calculated using diffusion equation; ρ is the probability of reactivation of the gene; Ne is the effective population size (shown in log scale in the Inset.). Shown in red, green, blue, and yellow are the results for Ne of 1,000, 10,000, 50,000, and 100,000, respectively. (C) Relative advantage in fixation of a duplicated locus AA (black line) and pseudogene locus AX (red line) under the SEM model in Fig. 1A for Ne = 400 and the selection coefficient s = 0.01 calculated using the diffusion equation. Black and red circles show the relative advantage for the duplicated and pseudogene loci, respectively, as generated in the simulation experiments. The shaded area shows the values of ρ for which both fixation of duplicated genes is favored, and fixation of the pseudogene is prevented. (D) Relative advantage in fixation of recessive gene B under the TSM model for selection coefficient s = 0.01 calculated using the diffusion equation. Shown in red, green, blue, yellow, and black are the results for Ne of 100, 1,000, 10,000, 100,000 and 1,000,000, respectively.

Following the same approach as before, we calculated fitness for each genotype as averaged over the possible epiphenotypes (Table 2). We then used the diffusion model to derive the fixation probability for the newly arisen gene B. We were primarily interested in seeing if epigenetic modification, brought about by a nearby transposable element, could result in significant increases in fixation probability for an advantageous recessive mutation. Fig. 2D illustrates the effect of TSM, as a ratio (pT/pR) of the probability of fixation of a recessive gene when the TSM assumption holds (pT) to the fixation probability of a recessive gene under standard assumptions (pR). An important aspect is that under TSM we also have an efficient mechanism for increasing evolutionary rates which, again, is beneficial for increasing the probability of fixation of the recessive advantageous mutation. Numerical integration of diffusion equations was carried out via quadratic adaptive integration (QAG package) (40). Simulations were implemented in Perl, with source code, scripts, and other data and resources freely available from the authors. Discussion One primary purpose of this paper is to begin quantitatively considering the ramifications for evolution of stochastic, epigenetic changes, differing between homologous local gene regions that are made during a window of opportunity but then are fixed and no longer variable during subsequent development. If a stochastic epigenetic reactivation event takes place in, for example, Branciamore et al.

one allele of one cell of a two-cell embryo, and the differential allele marking is subsequently inherited by all progeny cells, then the phenotype of the organism can be radically altered and selected for or against. Importantly, true base-sequence mutations that affect the probability of removal (or addition) of an epigenetic mark, such as elimination of a CpG site(s), a duplication, or a retrotransposon insertion near the gene will also come under selection. Thus, we are considering stochastic generation of epialleles in the early embyro, followed by somatic inheritance of these epialleles combined with standard selection. We assume Mendelian inheritance of the local sequence-change lesion that affects the probability of reactivation and thus stochastic differences in cis-located allele expression. No epigenetic transgenerational affects or other unusual genetics are involved. It is also important to note that only local, locus-specific effects of duplications or insertions were considered. Others have found that mutations that globally affect epigenetic marking and expression variability are potentially advantageous in a variable environment (12). To keep our models more laconic and tractable, we have assumed that the stochastic events take place during the waves of DNA demethylation and methylation that occur in the very early embryo. However, there are likely to be other windows of opportunity during mammalian development, because some epigenetic changes correlate with lineage commitments. Current DNA methylation data are consistent with the possibility that there is considerable stochastic variability for many, perhaps all, CpG sites. At the same time there is good evidence for somatic inheritance of DNA methylation patterns (17, 41). Thus, as a cell population progresses along a particular lineage pathway there may be numerous windows of opportunity for DNA methylation change, and these could be windows for stochastic epigenetic variability. Such stochastic variability may be of evolutionary advantage, particularly for the immune system and perhaps even brain and CNS development and function. Interestingly, the dynamics of the models detailed above are largely independent of the epigenetic “directionality,” i.e., whether the stochastic change is the removal or the addition of the epigenetic marks, or whether the marks are repressive or activating. The key assumptions are only that (i) the stochastic, somatically heritable change that is subject to later selection takes place in the early embryo (or progenitors cells of a particular lineage); (ii) the changed mark is then fixed during subsequent development and life of the organism, and (iii) the mark will ultimately influence the expression of a nearby specific gene. We have focused on removal of epigenetic marks because this is consistent with recent data for genome-wide demethylation occurring shortly after fertilization, and our understanding of the key epigenetic mark being DNA cytosine methylation. However, the model essentially remains the same if additional stochastic variability takes place during the wave of methylation that takes in mammalian embryos at the time Table 2. Genotype/epiphenotype/fitness relationships in TSM model with epigenetic modifications occurring independently in sperm and egg (see Fig. 1 and text) for fixation of a new recessive gene B with selective advantage 1 + s (see Table 1 for designations) Genotype Epiphenotype Frequency, f Fitness, W A/A

A/B

B/B

A/A A*/A A*/A* A/B A*/B A/B* A*/B* B/B B*/B B*/B*

ρ2 2ρ(1 − ρ) (1 − ρ)2 ρ2 ρ(1 − ρ) ρ(1 − ρ) (1 − ρ)2 ρ2 2ρ(1 − ρ) (1 − ρ)2

1 1 0 1 1+s 1 0 1+s 1+s 0

Mean fitness, W ρ(2 − ρ)

ρ[(2 − ρ) + (1 − ρ)s]

ρ[(2 − ρ)(s + 1)]

PNAS | April 29, 2014 | vol. 111 | no. 17 | 6357

EVOLUTION

A

of implantation, or, for that matter, during windows of stochastic opportunity in specific cell lineages. Importantly, several of the assumptions we have made and the predictions generated by our modeling can be tested by use of very recently emerging technology. In particular, it is now possible to measure DNA methylation in single cells at specific CpG sites, and epigenetic variability in sperm and preimplantation embryos has been seen (42, 43). Thus, existing duplications, transposon insertions, and newly engineered genome alterations now can be, and need to be, analyzed. In the future, we plan to expand our modeling parameters. We are especially interested in developing the TSM model. We feel strongly that the interface of evolution by gene duplication, TEs, and epigenetics is a promising venue of research. We propose that the evolution of machinery (molecular mechanisms) able to silence TEs and perhaps other “selfish” agents within vertebrate genome conversely triggered further “constructive” evolution (gene duplication and, eventually, sub- and/or neofunctionalization) by epigenetically silencing duplicated gene copies. Stochastic epigenetic silencing 1. Ohno S (1970) Evolution by Gene Duplication (Springer, New York). 2. Feinberg AP, Irizarry RA (2010) Evolution in health and medicine Sackler colloquium: Stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc Natl Acad Sci USA 107(Suppl 1):1757–1764. 3. Henikoff S (2012) Chromatin processes, epigenetic inheritance, centromere structure and function and evolution. Curr Biol 22(4):R106–R107. 4. Hernando-Herraez I, et al. (2013) Dynamics of DNA methylation in recent human and great ape evolution. PLoS Genet 9(9):e1003763. 5. Jablonka E, Lamb MJ (2006) The evolution of information in the major transitions. J Theor Biol 239(2):236–246. 6. Klironomos FD, Berg J, Collins S (2013) How epigenetic mutations can affect genetic evolution: Model and mechanism. Bioessays 35(6):571–578. 7. Paulsen M (2011) Unique patterns of evolutionary conservation of imprinted genes. Clin Epigenetics 2(2):405–410. 8. Rodin SN, Parkhomchuk DV, Riggs AD (2005) Epigenetic changes and repositioning determine the evolutionary fate of duplicated genes. Biochemistry (Mosc) 70(5): 559–567. 9. Rodin SN, Parkhomchuk DV, Rodin AS, Holmquist GP, Riggs AD (2005) Repositioningdependent fate of duplicate genes. DNA Cell Biol 24(9):529–542. 10. Rodin SN, Riggs AD (2003) Epigenetic silencing may aid evolution by gene duplication. J Mol Evol 56(6):718–729. 11. Youngson NA, Chong S, Whitelaw E (2011) Gene silencing is an ancient means of producing multiple phenotypes from the same genotype: Common mechanisms and functions in epigenetic processes can be seen throughout all life forms. Bioessays 33(2):95–99. 12. Carja O, Liberman U, Feldman MW (2013) Evolution with stochastic fitnesses: A role for recombination. Theor Popul Biol 86:29–42. 13. Youngson NA, et al. (2013) No evidence for cumulative effects in a Dnmt3b hypomorph across multiple generations. Mamm Genome 24(5-6):206–217. 14. Dolinoy DC, Weinhouse C, Jones TR, Rozek LS, Jirtle RL (2010) Variable histone modifications at the A(vy) metastable epiallele. Epigenetics 5(7):637–644. 15. Reed MR, Riggs AD, Mann JR (2001) Deletion of a direct repeat element has no effect on Igf2 and H19 imprinting. Mamm Genome 12(11):873–876. 16. Chess A (2012) Mechanisms and consequences of widespread random monoallelic expression. Nat Rev Genet 13(6):421–428. 17. Smith ZD, Meissner A (2013) DNA methylation: Roles in mammalian development. Nat Rev Genet 14(3):204–220. 18. Quenneville S, et al. (2012) The KRAB-ZFP/KAP1 system contributes to the early embryonic establishment of site-specific DNA methylation patterns maintained during development. Cell Rep 2(4):766–773. 19. Ohlsson R, Paldi A, Graves JA (2001) Did genomic imprinting and X chromosome inactivation arise from stochastic expression? Trends Genet 17(3):136–141. 20. Rountree MR, Selker EU (2010) DNA methylation and the formation of heterochromatin in Neurospora crassa. Heredity (Edinb) 105(1):38–44. 21. Turner JM, Mahadevaiah SK, Ellis PJ, Mitchell MJ, Burgoyne PS (2006) Pachytene asynapsis drives meiotic sex chromosome inactivation and leads to substantial postmeiotic repression in spermatids. Dev Cell 10(4):521–529. 22. Akkouche A, et al. (2013) Maternally deposited germline piRNAs silence the tirant retrotransposon in somatic cells. EMBO Rep 14(5):458–464. 23. Blumenstiel JP (2011) Evolutionary dynamics of transposable elements in a small RNA world. Trends Genet 27(1):23–31.

6358 | www.pnas.org/cgi/doi/10.1073/pnas.1402585111

could help adaptive evolution, including the fixation and preservation of functional gene copies, thus opening the road to development of new complex functions and advantageous traits. In summary, contrary to the standard models of neofunctionalization (applicable predominantly to large populations) and subfunctionalization (small populations), our results suggest that the stochastic epigenetic silencing mechanism favors fixation and preservation of duplicated genes even for small effective population sizes, becoming progressively more dominant with effective population size increase. Note Added in Proof. We would like to draw attention to two recently published papers, each of which is highly relevant and supportive of the work reported here. These are refs. 44 and 45. ACKNOWLEDGMENTS. A.S.R. holds the Susumu Ohno Chair in Theoretical Biology and S.B. is a Susumu Ohno Distinguished Fellow at Beckman Research Institute of the City of Hope.

24. Castañeda J, Genzor P, Bortvin A (2011) piRNAs, transposon silencing, and germline genome integrity. Mutat Res 714(1-2):95–104. 25. Feng S, Jacobsen SE, Reik W (2010) Epigenetic reprogramming in plant and animal development. Science 330(6004):622–627. 26. McCue AD, Nuthikattu S, Slotkin RK (2013) Genome-wide identification of genes regulated in trans by transposable element small interfering RNAs. RNA Biol 10(8): 1379–1395. 27. Nuthikattu S, et al. (2013) The initiation of epigenetic silencing of active transposable elements is triggered by RDR6 and 21-22 nucleotide small interfering RNAs. Plant Physiol 162(1):116–131. 28. Aiello D, Caffrey DR (2012) Evolution of specific protein–protein interaction sites following gene duplication. J Mol Biol 423(2):257–272. 29. Collins RE, Merz H, Higgs PG (2011) Origin and evolution of gene families in bacteria and archaea. BMC Bioinformatics 12(Suppl 9):S14. 30. Fernández A, Tzeng YH, Hsu SB (2011) Subfunctionalization reduces the fitness cost of gene duplication in humans by buffering dosage imbalances. BMC Genomics 12:604. 31. Katju V (2012) In with the old, in with the new: The promiscuity of the duplication process engenders diverse pathways for novel gene creation. Int J Evol Biol 2012:341932. 32. Khurana E, et al. (2010) Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res 38(20):6997–7007. 33. Kondrashov FA (2012) Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc Biol Sci 279(1749):5048–5057. 34. Sen K, Podder S, Ghosh TC (2010) Insights into the genomic features and evolutionary impact of the genes configuring duplicated pseudogenes in human. FEBS Lett 584(18):4015–4018. 35. Vinogradov AE (2012) Large scale of human duplicate genes divergence. J Mol Evol 75(1-2):25–33. 36. Yanagawa H (2013) Exploration of the origin and evolution of globular proteins by mRNA display. Biochemistry 52(22):3841–3851. 37. Branciamore S, Chen ZX, Riggs AD, Rodin SN (2010) CpG island clusters and pro-epigenetic selection for CpGs in protein-coding exons of HOX and other transcription factors. Proc Natl Acad Sci USA 107(35):15485–15490. 38. Chen CT, Chi QS, Sawyer SA (2008) Effects of dominance on the probability of fixation of a mutant allele. J Math Biol 56(3):413–434. 39. Hartl DL, Clark AG (2007) Principles of Population Genetics (Sinauer, Sunderland, MA). 40. Gough B, ed (2009) GNU Scientific Library Reference Manual (Network Theory, Ltd., Godalming, United Kingdom), 3rd Ed. 41. Cedar H, Bergman Y (2012) Programming of DNA methylation patterns. Annu Rev Biochem 81:97–117. 42. Guo H, et al. (2013) Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res 23(12):2126–2135. 43. Lorthongpanich C, et al. (2013) Single-cell DNA-methylation analysis reveals epigenetic chimerism in preimplantation embryos. Science 341(6150):1110–1112. 44. Gendrel AV, et al. (2014) Developmental dynamics and disease potential of random monoallelic gene expression. Dev Cell 28(4):366–380. 45. Eckersley-Maslin MA, et al. (2014) Random Monoallelic Gene Expression Increases upon Embryonic Stem Cell Differentiation. Dev Cell 28(4):351–365.

Branciamore et al.

Enhanced evolution by stochastically variable modification of epigenetic marks in the early embryo.

Evolution by gene duplication is generally accepted as one of the crucial driving forces for the gain of new complexity and functions, but the formati...
786KB Sizes 0 Downloads 3 Views