Journal of Theoretical Biology 365 (2015) 23–31

Contents lists available at ScienceDirect

Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi

Speed of evolution in large asexual populations with diminishing returns Maria R. Fumagalli a,b,c,d, Matteo Osella a,c,d, Philippe Thomen e, Francois Heslot e, Marco Cosentino Lagomarsino a,c,f,n a Université Pierre et Marie Curie Genomic Physics Group, UMR 7238 "Computational and Quantitative Biology", 15 Rue de l'École de Médecine, 75006 Paris, France b Dipartimento di Fisica, Università degli Studi di Milano, Via G. Celoria 16, Milano, Italy c Dipartimento di Fisica, Università degli Studi di Torino, Via P. Giuria 1, Torino, Italy d INFN Sezione di Torino, Via P. Giuria 1, Torino, Italy e Université Pierre et Marie Curie - CNRS (UMR 8551) Laboratoire Pierre Aigrain, Ecole Normale Supérieure, Université D. Diderot, 24 rue Lhomond, 75005 Paris, France f CNRS UMR 7238 "Computational and Quantitative Biology", 15 Rue de l'École de Médecine, 75006 Paris, France

H I G H L I G H T S

   

Experiments with large asexual populations show a sublinear increase of fitness. We describe the phenomenon using a minimal model with diminishing returns. We propose a procedure to compare the model with data. We apply the procedure to data from two different experiments with bacteria.

art ic l e i nf o

a b s t r a c t

Article history: Received 25 July 2013 Received in revised form 22 September 2014 Accepted 30 September 2014 Available online 13 October 2014

The adaptive evolution of large asexual populations is generally characterized by competition between clones carrying different beneficial mutations. Interference slows down the adaptation speed and makes the theoretical description of the dynamics more complex with respect to the successional occurrence and fixation of beneficial mutations typical of small populations. A simplified modeling framework considering multiple beneficial mutations with equal and constant fitness advantage is known to capture some of the essential features of laboratory evolution experiments. However, in these experiments the relative advantage of a beneficial mutation is generally dependent on the genetic background. In particular, the general pattern is that, as mutations in different loci accumulate, the relative advantage of new mutations decreases, a trend often referred to as “diminishing return” epistasis. Here, we propose a phenomenological model that generalizes the fixed-advantage framework to include this negative epistasis in a simple way. We evaluate analytically as well as with direct simulations the quantitative consequences of diminishing returns on the evolutionary dynamics. The speed of adaptation decreases in time and reaches a limit value corresponding to neutral evolution in the long time limit. This corresponds to an increase of the diversity in terms of “classes of mutation” in the population. Finally, we show how the model can be compared with dynamic data on fitness and number of beneficial mutations from laboratory evolution experiments. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Evolutionary biology Clonal interference Epistasis

1. Introduction Thanks to contemporary technologies such as phenotypic characterization and high-throughput sequencing, previously unachievable quantitative measurements of the results of n

Corresponding author. E-mail address: [email protected] (M. Cosentino Lagomarsino). http://dx.doi.org/10.1016/j.jtbi.2014.09.042 0022-5193/& 2014 Elsevier Ltd. All rights reserved.

controlled laboratory evolution experiments are now possible. This is guiding theoretical investigations and could make the validation and falsification of phenomenological theories feasible (Hindré et al., 2012; Barrick et al., 2009; Tenaillon et al., 2012), with notable consequences in a wide range of bio-technological and ecological investigations. In the specific case of large asexual (or rarely recombining) populations of microorganisms, a high number of beneficial mutations emerge in different clones, and cannot be mixed because of slow or

24

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

absent recombination. These beneficial mutations appearing in parallel coexist and compete to drive adaptation. This phenomenon of concurrent beneficial mutations (sometimes generically termed “clonal interference”), is related to the Fisher-Muller hypothesis (or Hill-Robertson effect) for the advantage of recombination (Felsenstein, 1974). In general, different mutations can bring different fitness advantages. The distribution of these advantages is not known precisely. It has been shown to be species-specific, dependent on the genomic region where the mutation occurs (e.g., coding or non-coding), and differently skewed for beneficial and deleterious mutations. However, it is often approximated by an exponential distribution for modeling simplicity (Gerrish and Lenski, 1998; Orr, 2003; Keightley and Eyre-Walker, 2007; Eyre-Walker and Keightley, 2007). Recent models have generally dealt with the competition between mutations of different strengths and the competition between mutations that arise on different fitness backgrounds separately. The first effect, the role of a distribution of fitness changes, called “clonal interference”,1 is analyzed by several models (Gerrish and Lenski, 1998; Wilke, 2004; Park et al., 2010). In these models any individual is either the wild type or a mutant derived directly from the wild type. Thus, multiple mutations arising in the extant mutants are neglected. Conversely, models that explicitly deal with multiple mutations typically assume that all mutations have the same (positive or negative) effect (Park et al., 2010; Tsimring et al., 1996; Desai and Fisher, 2007; Brunet et al., 2008; Desai et al., 2012) (recent work incorporating both effects (Good et al., 2012) has shown that for peaked distribution of advantage this is the correct effective theory). The latter kind of model has the advantage of being simpler to treat and accessible analytically. It is characterized by a Gaussian-like traveling wave for the histogram of log-fitness throughout the population. In absence of epistasis, this wave moves toward higher log-fitness with a constant speed and shape (Park et al., 2010; Tsimring et al., 1996; Rouzine et al., 2003, 2008; Hallatschek and Korolev, 2009). New mutations are fixed in the population if they occur in the high-fitness edge (or “nose”) of the distribution (Desai and Fisher, 2007; Brunet et al., 2008). Consequently, a quantitative law relates the width of the log-fitness histogram and the adaptation speed. There appears to be one important discrepancy between the models described so far and the behavior of bacteria evolved in the laboratory for longer times (roughly, 4 1000 generations). This discrepancy is well represented by the experimental sub-linear increase on long time-scales of the average population log-fitness (or fitness advantage) (Kryazhimskiy et al., 2009). This is in contrast with the linear increase predicted theoretically by many models even if they have been compared successfully with the diversity and adaptation speed of short-time laboratory evolution experiments (Desai et al., 2007). Therefore, the core issue is to understand the evolutionary mechanisms at the basis of the experimental slowing down of the adaptation process. Furthermore, two recent experimental studies (Khan et al., 2011; Chou et al., 2011) have shown a common trend in the advantage of combined beneficial mutations occurring in different genes. In most of the cases analyzed, the combined advantage is lower than the sum of that of individual mutations. In other words, when mutations of loci in different genes accumulate, the effective advantage of each of them is lower. This was shown by combinatorial genetics techniques, by constructing all the possible configurations of a small set of mutations, and evaluating their 1 Note that the term has a strict sense in this case. In the following we will reserve the term clonal interference to this stricter meaning of competition between mutations of different strength, and talk of interference between beneficial mutations in the generic case.

advantage through competition experiments. This decrease of the advantage carried by a mutation as the background fitness increases provides a possible mechanism at the basis of the observed sub-linear increase of the average fitness in long-term evolutionary experiments (Khan et al., 2011). This trend, referred to as “diminishing returns” epistasis, had been previously suggested theoretically on the basis of the general pattern of adaptation observed in long-term microbial experiments (Kryazhimskiy et al., 2009), using a modeling framework that neglected concurrent or multiple mutations. Another study predicts the same principle on the basis of a simple fitness landscape model combined with the distribution of single mutation effects measured experimentally (Martin et al., 2007). The actual pattern in the fitness advantage associated to the same mutation in different backgrounds observed by the two studies is complex, as, on top of the diminishing return effect, the advantage appears to depend on the mutation identity. Even more recent systematic experiments (Tenaillon et al., 2012) are unveiling a complex scenario where different mechanisms coexist for the interactions of mutations between and within functional “blocks”, which can span multiple genes along the genome. However, the full experimental complexity is difficult to incorporate in a treatable model, and experimental data on linked mutations and interference between them are difficult to obtain. Thus, simplified descriptions, as the multiple-mutations model, are useful to model evolving populations using a minimal quantity of information on mutations and fitness advantage. Here, we take a simplified approach to study the diversity and speed of adaptation in presence of diminishing returns. We define a framework that can account for multiple mutations, and incorporates the effect of diminishing return epistasis. Namely, the fitness of a mutation depends only on its order of appearance in a clone, and decreases with it. This generalizes the non-epistatic multiplemutations model (Desai and Fisher, 2007; Brunet et al., 2008) (recovered in case the advantage decrease with the number of acquired mutations is zero). We preserve the model assumption that evolution is driven by beneficial mutations which appear with constant rate (see the Discussion for an evaluation of these assumptions in light of the results). We study the infinite-N behavior of this model using standard techniques, and the diversity of the finitepopulation behavior across realizations. At finite population sizes, we show that the analytical quantitative estimates of the speed of adaptation of the constant advantage multiple-mutations model can be applied with appropriate modifications.

2. Basic features of the model 2.1. Model definition We build a minimal population-genetics model (Park et al., 2010; Wright et al., 1931; Fisher, 1930) including diminishing return epistasis in presence of competition between beneficial mutations. The model describes a population of N haploid individuals, or sequences, in which each individual of type i produces a random number of offsprings with average equal to its fitness wi. Inheritance is introduced by assigning the fitness of the parent to the offspring. Mutations change the mean fitness of the offspring w0i relative to the parental one wi according to the relation w0i ¼ wi ð1 þ sÞ C wi es . While the fitness advantage associated to new mutations is a complex issue (Eyre-Walker and Keightley, 2007), in presence of abundant beneficial mutations, deleterious mutations (negative effect on fitness, s o0) do not typically contribute to the adaptation of large populations and can be neglected (Park et al., 2010; Brunet et al., 2008; Desai and Fisher, 2007).

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

Following this approach, in the model each offspring has a constant probability per generation Ub (the “beneficial mutation rate”) of acquiring a beneficial mutation. The positive parameter s is the “selection coefficient” whose typical values in laboratory evolution experiments with bacteria are in the range s C 0:001– 0:005 (Hegreness et al., 2006; Perfeito et al., 2007). For the beneficial mutation rate and population size, we consider the regime NU b b 1, exploring the range 10  10 –10  3 (Hegreness et al., 2006; Perfeito et al., 2007) for Ub, and used population sizes between 106 and 1010 (Hindré et al., 2012). The condition NU b b 1 allows to capture the clonal interference regime (see also Supplementary Notes, Sec. S1). Since the dynamics is stochastic, it is possible to define for each mutation a “surviving” probability πðsÞ, proportional to its advantage (Kimura, 1983). When a certain mutation is carried by a fraction of the population larger than 1=πðsÞ, the mutation is established, and will fix in the whole population. The model assumes that successive mutations do not lead to the same fitness gain, but the fitness gain is dependent on the mutations already occurred in an individual (Kryazhimskiy et al., 2009). This feature extends the non-epistatic model for multiple mutations (Desai and Fisher, 2007; Brunet et al., 2008) and considers selection coefficients dependent on the number of mutations, i.e. s ¼ s0 g 0 ðkÞ, where g 0 ðkÞ is a decreasing function of the number of acquired mutations k. The total advantage is given by the sum of the effects 0 of contributing mutations, i.e, wðkÞ ¼ es0 gðkÞ and gðkÞ ¼ ∑kk0 ¼ 1 g 0 ðk Þ. In order to fully specify the model, one has to choose a specific form for the function g 0 ðkÞ, describing the strength of the negative epistasis between mutations. A simple example is given by the choice of a fitness gain that depends on the number of the extant α1 mutations k as a power law. In this case, g 0 ðkÞ p k with α r 1, where the epistasis grows in strength with decreasing α from the non-epistatic case α ¼ 1. In the case 0 o α o 1, the fitness of an individual with k beneficial mutations has the form α

wðkÞ ¼ es0 k :

ð1Þ

A more detailed description of the model, the simulation algorithm, and of two additional diminishing returns function g 0 ðkÞ considered here (leading to logarithmic or geometric increase of the log-fitness), can be found in the Supplementary Notes, Section S1. 2.2. Phenomenology of the model: adaptation slows down until it becomes effectively neutral As in the non-epistatic case (Desai and Fisher, 2007; Brunet et al., 2008), the population is composed of classes of individuals with the same number of mutations that are in one-to-one correspondence to fitness advantage classes (Fig. 1A, B). Direct simulation of the model shows that both the mean advantage 〈s0 gðkÞ〉 and average number of mutations 〈k〉 grow sub-linearly for intermediate to long times (Fig. 1C, D). This trend is independent from α (or from the specific model of epistasis g 0 ðkÞ) and is due to the fact that decreasing advantage and the consequent rise of the establishment threshold for clones together slow down adaptation. The time derivatives of 〈s0 gðkÞ〉 and 〈k〉 estimate the adaptation speed vs and the mutation-accumulation speed vk of a typical realization. Fig. 1E shows an average of vk over 100 realizations, plotted as a function of time. The simulations indicate that vk relaxes to a plateau which is close to the beneficial mutation rate Ub. Equivalently, for long times, the mean number of fixed mutations shows a linear behavior in time with a rate close to Ub (red line in Fig. 1C). In the same long-time limit, the advantage of a mutation s ¼ s0 g 0 ðkÞ drops asymptotically to zero. At long times there is a transition to an effectively neutral regime: the selection coefficient s0 g 0 ðkÞ becomes too small to be

25

Fig. 1. Basic features of the model. Because of competition between beneficial mutations, the population is divided into sub-populations with different frequencies, defined by the number of mutations k (panel A). Lk is the difference, in number of mutations, between the maximum number of mutations found in a clone kmax and the mean. This induces a distribution for the log-fitness s0 gðkÞ (panel B). Both distributions travel in time, driven by established beneficial mutations, with instantaneous velocities vk and vs. Middle panels show the increase in time of 〈k〉 (C) and of 〈s0 gðkÞ〉 (D) obtained by direct simulation of the diminishing return model. The plot in panel C is in log–log scale, and the data are compared with a reference straight line (dashed blue line, with slope 5  10  3) to highlight the sublinear growth of 〈k〉. The continuous red line shows the asymptotic long-time linear behavior with slope corresponding to Ub. (E) Long-time behavior of the mean speed of fixed mutations vk (green symbols), averaged over different realizations. For long times, this quantity decreases towards the limit value vk ¼ Ub (continuous red line), where the assumptions of the model break down and deleterious mutations need to be accounted for Rouzine et al. (2008). This limit also corresponds to the limit value of vk obtained by an infinite-N estimate (see the text). Simulations are carried out using the parameters N ¼ 5  107 , s0 ¼0.5, α ¼ 0:02, U b ¼ 1  10  3 . Averages are computed over 100 realizations (these averages are implied in the notations for the y-axis labels). (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

relevant, and the fixation dynamics is driven solely by genetic drift. The probability of fixation of an essentially neutral mutation is  1=N while the rate of appearance of new mutations is UbN. Therefore, the pace at which new mutations are accumulated is approximately vk  U b (see for example Kimura, 1983). This neutral long-time regime is outside of the limit of applicability of the model, and has to be regarded as unphysical, since when vk ¼ Ub, deleterious mutations cannot be neglected (Rouzine et al., 2008). Thus, for any finite N the asymptotic trend of 〈k〉 has to be interpreted as an effective signature of a change of regime for both vk and vs, where beneficial mutations should be in equilibrium with deleterious ones, which could possibly be captured by a variant of the model including deleterious mutations

26

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

(Rouzine et al., 2008). Note also that realistically the beneficial mutation rate itself could change in the later stages of evolution (Park et al., 2010; Park and Krug, 2008), eventually giving a contribution to the adaptation dynamics. In fact, a decrease of the beneficial mutation rate could in principle contribute to the slowing down of the fitness increase in time (see Discussion). 3. Results 3.1. An infinite-N analysis gives a relation between the variance of the advantage distribution and the adaptation speed In the limit of infinite population, N-1, the dynamics of the model can be described using the following equation (Park et al., 2010), wðkÞ 〈w〉ðt  1Þ wðk  1Þ ; þ U b f ðk  1; t  1Þ 〈w〉ðt  1Þ

f ðk; tÞ ¼ ð1  U b Þf ðk; t  1Þ

ð2Þ

where f ðk; tÞ is the frequency of individuals with k beneficial mutations at generation t, wðkÞ ¼ es0 gðkÞ , and 〈w〉ðtÞ ¼ ∑k wðkÞf ðk; tÞ the mean fitness. Multiplying Eq. (2) by k and summing over k gives the following expression for the dynamics of the mean number of mutations 〈k〉ðtÞ ¼ ∑k kf ðk; tÞ, 〈k〉ðt þ 1Þ ¼

〈k w〉ðtÞ þU b : 〈w〉ðtÞ

ð3Þ

This expression can be further simplified assuming that the frequency distribution is narrowly peaked around the mean (which travels in time) and that it can be expressed as f ðk; tÞ  δðk; 〈k〉ðtÞÞ. In this case it can be easily verified that vk ¼Ub. A different, more instructive, relation, which keeps into account the width of f ðk; tÞ can be obtained starting from Eq. (3), and expanding the fitness around 〈k〉 under the assumption that Dk  ðk  〈k〉Þ 5 〈k〉, for every index k of non-empty classes. The assumption Dk 5 〈k〉 is verified by simulations (see Supplementary Fig. S3) and by further considerations on the finite-N width of the distribution given in the following sections. α α The fitness for the power law model results wðkÞ ¼ es0 k  es0 〈k〉   α1 1 þ s0 α〈k〉 Dk . Computing averages in Eq. (3) and noticing that 〈Dk 〉 ¼ 0 and 〈k Dk 〉 ¼ Vark , it is possible to obtain an expression for the speed of accumulated mutations as a function of the variance of the fitness class distribution. Estimating vk as d〈k〉=dt  〈k〉ðt þ 1Þ  〈k〉ðtÞ gives, α for the case gðkÞ ¼ k d〈k〉  s0 α〈k〉α  1 ðtÞVark ðtÞ þU b : dt

limit of small Ub) suggesting that our result is a consistent generalization to the epistatic case. Therefore, for α ¼ 1 the speed of adaptation does not depend on the mutation rate for infinite populations. On the other hand, for diminishing returns, the increase in the width of the distribution α1 of k, does not compensate for the term 〈k 〉 (which tends to 0), and, for long times, Eq. (4) predicts the limit velocity vk ¼Ub, as observed in simulations (Fig. 1E). Since experimental populations are finite it is necessary to discuss under which conditions this infinite-population limit can be considered a valid estimate of the model behavior for finite populations. The infinite-population limit can be seen as a “meanfield” approximation of the stochastic process defining the model. Such description is often used to capture mean trends. However, one must address how well this approximation works and under which conditions (if any). At finite population size, simulated data are in good accordance for intermediate time with the mean field estimate for the variance Vark of the mutation class distribution (Fig. 2). Additionally, Vark =〈k〉 decreases quickly with time (see Supplementary Fig. S3), justifying the assumption of small Dk =〈k〉 (since it is expected that jDk j 5 ðVark Þ1=2 for every k). However, the variability of Vark over different realizations increases quickly. In order to verify whether these fluctuations are well-behaved, we have evaluated γ ¼ σ R ðVark Þ=〈Vark 〉R , where Vark indicates the variance of the mutation class distribution in a single realization (roughly analogous to L2k ), and the suffix R indicates averages over realizations. Specifically, 〈x〉R indicates the average of the quantity x over different realizations, while σ R ðxÞ is its standard deviation. Therefore, γ, plotted in Fig. 2B as a function of N, represents the relative variability over the realizations of the variance Vark of the distribution. For any fixed time, this quantity decreases with N, suggesting that the infinite-population limit is well defined. Conversely, fixing N and increasing t, γ appears to reach finite values, hence the typical values of Vark , and vk, for any finite N at large t, differ considerably from the average over realizations (a property sometimes referred to as lack of self-averaging). This effect and its extent are due to both genetic drift and to the shape of the fitness landscape. In summary, a mean-field description of the population dynamics does not work properly for longer time-scales at finite N, as previously shown in the case of no epistasis (α ¼ 1) (Park et al., 2010). However, the infinite-population limit is instructive

ð4Þ

According to this equation, vk is driven by two terms, the increase of mutations due to the beneficial mutation rate Ub and the selection of individuals with larger fitness. This result is connected to Fisher's fundamental theorem and the result obtained by Guess (1974), which relate the speed of adaptation to the variance of the fitness. In this case, the speed of accumulation of successive mutations is related to the width of the mutation class histogram, α1 but rescaled by the factor 〈k 〉ðtÞ, which decreases with time. For the non-epistatic case (α ¼ 1) one recovers the usual linear proportionality, since the advantage is linear in the mutation class index (Park et al., 2010). The infinite-population limit of the model with α ¼ 1 has been previously addressed by Park et al. (2010) with a moment generating function approach. In particular, they estimated the distribution variance as Vark C ð1  U b Þ=s0 , which substituted in Eq. (4) gives precisely their expression for the speed vk C1 (in the

Fig. 2. The infinite-N approximation captures a relation between vk and the width of the fitness class distribution, which is valid at intermediate times for moderate N and until longer times for large population sizes. (A) Simulated variance of the mutation classes distribution, shown as a function of the expected variance from the infiniteN estimate (Varinf ¼ ðvk  U b Þ〈k〉1  α ðs0 αÞ  1 , see Eq. (4)). To avoid ambiguities, averages over realizations are indicated by a suffix R. The continuous red line represents the theoretical prediction Varinf ¼ Vark . The error bars (standard deviations over realizations, σ R ðVark Þ) become larger with 〈Vark 〉R . (B) While for increasing times σ R ðVark Þ diverges, the relative variability σ R ðVark Þ=〈Vark 〉R over realizations decreases with increasing population size N, for any fixed time. This suggests that the infinite-N estimate is well-defined. Simulations are carried out using the parameters s0 ¼ 0.5, α ¼ 0:02, U b ¼ 1  10  3 . Population size in panel A is N ¼ 107 . (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

for understanding the main mechanisms driving the model, and provides reasonably good estimates at intermediate times even for finite populations. Finally, the extreme diversity between realizations at finite population sizes might have some empirical relevance since most of the experimental results concern a single or a few evolutionary trajectories. 3.2. At finite population size, the advantage and fitness-class distributions remain Gaussian, but their width is unsteady, with opposed trends We now consider in more detail the behavior of finite-size populations at intermediate times (i.e. on the relevant experimental time scale  102 –104 generations, with 〈k〉  10–102 ). The frequency distributions of the mutation and advantage classes are approximatively Gaussian for a constant advantage model (Tsimring et al., 1996; Desai and Fisher, 2007; Brunet et al., 2008; Rouzine et al., 2003, 2008). Direct simulations of the diminishing return model show that this still holds, even when the advantage function has fairly high curvature (Fig. 3A, Fig. S5). The model without epistasis (Desai and Fisher, 2007; Brunet et al., 2008) (and the infinite-N estimates) lead to expect that the widths of these histograms are related to the speed of adaptation and consequently of mutation accumulation. Given the discrete nature of these distributions, their widths are well represented by the distances Lk and Ls of the foremost bin from the average (shown in Fig. 1A). For a diminishing return model Lk is an increasing function of the mean number of mutation classes 〈k〉, while Ls decreases with 〈k〉 (Fig. 3B). The two speeds are connected by the decay of the advantage between k and k þ 1, as well as by the change in the width of the distributions with increasing 〈k〉. Indeed, in the long-time limit, even if vk 4 0, vs tends to zero, because of the existence of a large number of sub-populations (mutation classes) with different k but essentially the same fitness, which leads to the effectively neutral behavior discussed in the previous section. The “stochastic edge” estimates for the finite-N adaptation speed available for the multiple-mutations model are based on the hypothesis that the only class subjected to substantial stochastic effects is the fittest one, i.e. that Δs≳U b (Rouzine et al., 2008;

27

Desai and Fisher, 2007). For the diminishing return model, when Δs  U b this approximation in general fails. However, for experimentally relevant parameters we can always suppose that the stochastic edge approximation is valid. Supposing that s0 ¼ 5  10  1 , α ¼ 0:02 (this is a much stronger epistatic effect than might be expected from experimental data, see Section 3.3) and that Lk  50 (of the order of the values obtained from simulations with this parameter set and population size N ¼ 107 –1013 ), then Δs becomes close to a beneficial mutation rate U b  10  3 (which can be considered very large Hegreness et al., 2006; Perfeito et al., 2007), for 〈k〉  5  102 . This exceeds the interesting experimental range of 〈k〉 (101 –102 ). Thus, since the simulated advantage and mutation class histograms are both nearly Gaussian (but not stationary in width), it is possible to generalize the estimates applied for the constant advantage model (see Supplementary Notes, Sec. S1C and Desai and Fisher, 2007). Supposing a slow increase of the width of the distribution with k and assuming that the width of the histogram is stable during the time necessary to the new class to reach the establishment threshold, the mean of the distribution moves from 〈k〉ðtÞ to 〈k〉ðtÞ þ 1 during the time and Lk  Lk þ 1 . For 〈k〉b Lk 4 1 (and 〈k〉 not too high due to the condition of sufficiently large Δs) the advantage of the edge with α α1 respect to the mean class is Δsk ¼ s0 ðk  〈k〉α Þ  s0 αLk k . As with the constant advantage multiple-mutations model (Brunet et al., 2008; Desai and Fisher, 2007), different estimates can be obtained depending on the approximations taken, which give implicit or explicit formulas. For example, assuming that k b Lk and neglecting the logarithmic corrections in Lk one obtains the following closed expression for Lk, Lk ¼

α1

2log ðNs0 αk

α1

s0 αk log Ub

Þ !:

ð5Þ

Comparison with simulated data shows that the expression for Lk in 5, including small-L corrections (see also Supplementary Notes), is a reasonably good estimate of the width of the distribution for empirically plausible parameters (Fig. 3B). In particular, the speed of adaptation and mutation accumulation (Fig. 3C) are well captured by this analytical description.

Fig. 3. The histograms of fitness advantage and mutation classes have nearly Gaussian forms. While adaptation slows down, the latter histogram expands while the former becomes increasingly peaked. (A) Histograms of the mutation classes (top) and advantage classes (bottom) obtained from simulations averaging over 200 realizations at different generations (different symbols, see legend). The parabolic form in the semi-log plot indicates that they are approximately Gaussian (solid lines connecting the symbols). The establishment size 1=πðsÞ is represented as a dashed line in the top panel. (B) Simulated data for the widths of the mutation class histogram Lk (green squares, top) and of the fitness advantage histogram, Ls (blue circles, bottom), plotted as a function of the mean number of mutations 〈k〉. The continuous line represents the theoretical estimates of the width (see Eq. (5) and Eq. (23) in Supplementary Notes). (C) Plots of the speed of mutation accumulation vk (top, green squares) and of adaptation vs (bottom, blue circles), as a function of the mean number of mutations 〈k〉. Continuous lines are the corresponding theoretical estimates (see Eqs. 17 of Supplementary Notes). Note that since the dependence on Lk is logarithmic the estimates for both speeds are in satisfactory agreement with the simulated data even if Lk is approximated more roughly. The parameters used in the simulations are N ¼ 109 , U b ¼ 6  10  6 , s0 ¼ 0:1, α ¼ 0:2, compatible with those estimated (see Section 3.3). (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

28

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

A mathematical argument for the validity and range of applicability of this extension of the multiple-mutations model (Desai and Fisher, 2007) is presented in Sec. S1 of the Supplementary Notes. The approximations discussed above are valid as long as 〈k〉 is large enough to exceed Lk, but small enough not to make Δsk too small (one can e.g. impose that Δsk b 1=U b to be far than the neutral limit). The sources of error in this estimate are to neglecting logarithmic corrections in Eq. (S23). It is possible to account for these corrections using the simulated values of Lk. We verified that this leads to a small underestimate of the width of the distribution (not shown). Additional sources of deviation are due to neglecting non-leading mutation classes in estimating the establishment time (Eq. (S15)), the use of an average growth rate (Eq. (S18)) and the assumption of exponential growth of each class starting from the establishment size. For the empirically plausible values of N, Ub, and s0 given in the Introduction, we estimate that the model should be valid for values of 〈k〉 up to 102. Depending on the parameters and the specific model chosen this range of validity can be far larger. Moreover, for even larger k we verified that vk tends to a constant, Ub with the approximations taken, restoring the correct infinite-N limit. 3.3. Comparison with data from laboratory evolution experiments Having explored some of the main features of the diminishing return model using theoretical arguments and simulations, we now compare it to data on the increase of fitness. Existing comparisons of similar data with models either neglect diminishing returns (Desai and Fisher, 2007) or do not account for multiple mutations (Kryazhimskiy et al., 2009). We used a simple procedure for choosing a set of parameters and a functional form of the advantage g(k) compatible with existing data. We considered fitness/mutations data (illustrated below) from two laboratory evolution experiments using the three different variants of the diminishing return model described in the Supplementary Notes (Sec. S1 A). The power law model, where the advantage is α described by s0 gðkÞ ¼ s0 k , is the main case presented in the former sections. The advantage functions of two models considered here can be approximated as the partial sum of the harmonic (logarithmic model, gðkÞ ¼ log ðk þ 1Þ) and geometric series (geometric model, gðkÞ ¼ ð1  qk Þ=ð1 qÞ), which allows to produce simple expressions of the antagonistic effect of each added mutation. These two model variants share most of the qualitative features of the main formulation. All the variants include a prefactor s0 that gives the scale of magnitude of fitness effects. The power-law and geometric model variants also depend on a second parameter that defines the strength of the epistatic effects, i.e., α and q (the ratio of the geometric series). Note that the advantage decays exponentially in the geometric model, i.e., the strength of epistasis is higher than in the other two scenarios, and the fitness is upper-bounded. 3.3.1. Experiments analyzed We considered data from two experiments. The first data set comes from the initial 20000 generations of the well-known “E. coli long-term evolution” experiment (Barrick et al., 2009; E. coli long-term experimental; Lenski et al., 1991), while the second data set was obtained from a chemostat experiment performed by some of the authors (Jezequel et al., 2013). The two experiments concern two bacteria (respectively Escherichia coli and Acinetobacter baylyi), were performed using distinct propagation techniques (serial dilution in batch and chemostat respectively) and also their duration is quite different both in terms of generations (2  104 and 3  103) and experimental time (  10 years and  4 months). Another remarkable

difference between the two experiments is the population size, which varies every day by two orders of magnitude (between  5  106 –108 ) for the serial dilution experiment while it is very large and approximately fixed in the A. baylyi experiment (  3  1010 ). The two experiments share interesting features suggesting that a diminishing return model might be applicable to describe the evolutionary dynamics of the populations. Firstly, their duration in terms of generations is long enough to observe a deceleration of fitness increase (Kryazhimskiy et al., 2009). Secondly, the large (effective) population size suggests that the clonal interference regime might be relevant in these experiments. The simultaneous presence of different genotypes within the population has been verified in both experiments (Jezequel et al., 2013; Elena and Lenski, 1997). Moreover, the decrease of the beneficial effect of the first five fixed mutations in the E. coli serial dilution experiment has been recently demonstrated (Khan et al., 2011). The effect is more complex than the description of diminishing returns given here. However, as suggested by the authors, one can surmise that a simplified model including epistatic interactions might be useful to roughly describe this phenomenon. A detailed description of the data and the choices made for the analysis procedure is given in Section S2 of the Supplementary Notes. Here we give a brief description of the main features of the two experiments. The A. baylyi experiment studied the population dynamics in a chemostat using a minimal medium supply for about four months, 1 at a dilution rate D  0:7 h . The use of chemostat allowed to grow a large population (N  3  1010 ) under controlled conditions for a fairly long time. Since the number of individuals is large, it is expected that clones with different genotypes will grow in parallel in clonal interference regime. This has been confirmed by population sequencing data (Jezequel et al., 2013). Additionally, several single clones where isolated from samples collected and frozen at different time. The maximum growth rate (μmax) of 21 isolated clones and of the original strain introduced in the chemostat (wild type) has been measured in batch, fitting the growth curve during the exponential phase. We considered the maximum growth rate measured in batch as indicative of the fitness of the population into the chemostat, defining the normalized fitness of clone i as wexp ðiÞ ¼ eμmax ðiÞ  μWT . Note that, since the reference fitness value is given by the ancestral growth rate, wexp ðWTÞ ¼ 1. A whole-genome sequencing was performed on two single clones isolated at the end of the experiment (AB2800b and AB2800a) and on the ancestral strain that served as a reference for identifying mutations in evolved clones. A total of 11 mutations were detected into the evolved clones, eight of them in common between the two. Additional sequencing has been performed in the remaining clones selected at different generations on PCR fragments encompassing the mutated loci identified in the two end-point clones in order to reconstruct a sketch of the history (or the genealogy) of the mutation appearances. Thus, the number of accumulated mutations have to be considered as a lower bound for all the clones, except for the ancestor and the two fully sequenced clones where they are measured directly. However, the indications of the number of mutations from the population sequencing data are compatible with the inferred values of k (Jezequel et al., 2013). The E. coli long-term evolution experiment concerns twelve E. coli populations evolved in parallel for about 5  104 generations in batch. Serial dilution was performed daily, allowing  6:6 generations each day and an effective population size of  2  107 individuals. We used the mutations and fitness data of the population designated Ara-1 referred to the first 20,000 generations, as given in ref. (Barrick et al., 2009). Fitness was measured experimentally through competition experiments as the logarithm of the increase in frequency (de Visser and Lenski, 2002; Lenski et al., 1991). These

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

29

measurements correspond essentially to the ratio between the Malthusian parameters, φi ¼ μi =μ0 , that is related to the advantage (and therefore to log-fitness) in the model. We define in this case the mean fitness of the i th mutation class (in the model) as wexp ðiÞ ¼ eðφi  1Þμ , where μ ¼ log ð2Þ is the approximate mean growth rate (see ref. (Lenski et al., 1991) and Supplementary Notes, Sec. S2). The fitness of the reference strain is again wexp ¼ 1. Genome sequencing was performed on samples from generation 2000, 5000, 10,000 150,00 and 20,000 as well as on the ancestor. A total of 45 mutations were found in the most evolved strain, most of which were stable in later clones (Barrick et al., 2009). For all the analyzed clones in both experiments, it is possible to associate fitness values with numbers of mutations, and thus bridge with the parameter k in the model.

3.3.2. Comparison procedure Fig. 4 summarizes the procedure adopted to choose model parameters. The first step finds the best-fitting parameters for each functional form of the fitness advantage function g(k). This uses data on fitness values and number of acquired mutations, using the definitions given in the previous section. Since the number of experimental points is low (5 for the E. coli and 21 for the A. baylyi experiment- corresponding to 9 different values of k), it is possible to obtain good fits with different functional forms of the advantage s0 gðkÞ. The results of the fit using the three different functional forms considered here (power law, logarithm and geometric) is shown in the top panel of Fig. 5. In a second step of the procedure, simulations are repeated for a wide range of values of Ub. A qualitative estimate of this parameter can be obtained using the mean number of mutations at the end of the two experiments (Fig. 4). In our case, 〈k〉 ¼ 11 and 〈k〉 ¼ 45 for the A. baylyi experiment and E. coli experiment respectively. In the model, for a fixed interval of time steps, the number of

Fig. 4. Sketch of the comparison procedure with experimental fitness data. Top: The experimental data for fitness as a function of the number of mutation (blue symbols) allow to estimate the advantage fitness function s0 gðkÞ from a fit (continuous green line). Middle: Simulations are run using the estimate advantage function for different values of Ub, which remains undetermined. This leads to different predicted dynamics for the number of acquired mutations (dotted, dashed, and dashed-dotted lines) and the fitness increase in time. These predictions can be compared with the experiment to estimate Ub. In particular, the predicted number of time steps necessary to reach the final experimental number of mutations varies with the beneficial mutation rate (filled circles). Bottom: The best value of Ub (blue star) is obtained when the predicted number of time steps necessary to reach the final number of mutations (determined in the experiment) corresponds to the number of experimental generations (horizontal continuous red line). We applied this procedure to the three model variants for the diminishing return described in the main text. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

Fig. 5. Estimate of the beneficial mutation rate from different experiments. Qualitative estimate of the beneficial mutation rate using data from Jezequel et al. (2013) (left panels) and Barrick et al. (2009) (right panels). Top: Estimate of the fitness advantage function s0 gðkÞ from data of advantage as a function of mutation number (red symbols), as described in the top panel of Fig. 4. The different lines correspond to the three model variants for the diminishing-return advantage (power-law dotdashed purple lines, logarithmic long-dashed green lines, geometric blue dashed lines, as in legend). For the sake of simplicity for experiment A. baylyi (panel A1) for each k only the mean value of the advantage is shown. Complete data are reported in ref. (Jezequel et al., 2013). Middle: Estimates of Ub, from different model variants, obtained by matching the time for which 〈k〉ðt exp Þ ¼ 〈k〉exp predicted by simulations with a given s0 gðkÞ to the experimental number of generations, as described in the bottom panel of Fig. 4 (the different line styles refer to the model variants as above). For A. baylyi experiment we considered a total of 2850 generations. Bottom: Comparison of the performance of the three model variants using the derived values of Ub. For the Jezequel et al. data (left panel), the power law and geometric variants are quite close, but power law model for the diminishing return gives the best agreement, especially considering the data relative to the number of acquired mutations as a function of time (not shown). For the Barrick et al. data, the logarithm and power-law models perform better, and give equivalent estimates of Ub. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.)

accumulated mutations decreases significantly and monotonically with Ub, and the number of time steps necessary to reach 〈k〉 ¼ 11 and 〈k〉 ¼ 45 depends on Ub. Thus, for each model there is a single value of the parameter that verifies 〈k〉ðt exp Þ ¼ 〈k〉exp (Fig. 4 and middle panel of Fig. 5). Note that referring to the experimental values of k as mean values we are assuming that the sequenced clones are representative of the populations. For the A. baylyi experiment, the uncertainty on the number of mutations present in the clones is a relevant source of error. For the E. coli experiment, the number of mutations is referred to single clones, but the associated fitness is a population mean. The procedure described so far yields some estimated values of the beneficial mutation rate. However, these values depend on the advantage model, and vary up to three orders of magnitude. Nevertheless, the estimates are roughly in the expected biological range (10  8  10  5 , see refs. Perfeito et al., 2007; Hegreness et al., 2006). The third step of the estimate procedure allows to select between advantage model, using the comparison of simulated and experimentally measured dynamics for the fitness and the number of mutations as a function of time (similarly to ref. Kryazhimskiy et al., 2009). Note that the latter function is not trivially equivalent

30

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

to the fitness as a function of k, but contains the effects of the population dynamics in presence of clonal interference, for which the model provides a description. A qualitative comparison between data and simulations indicates that for the A. baylyi experiment the power law and geometric advantage model (with parameters s0 ¼0.42, α ¼ 0:19 and s0 ¼0.24, q¼ 0.63 respectively) best describe the increase of the fitness with time (bottom panel of Fig. 5). Comparing the increase of the number of mutations as a function of time of the two models suggests that the power law model better resembles the data. This leads us to prefer the power law model to compare with data. With this choice the estimated value of beneficial mutation rate is around the value U b  3  10  7 . In the E. coli experiment, the power law and logarithmic advantage models describe the data best, and are roughly equivalent (with parameters s0 ¼0.22, α ¼ 0:26 and s0 ¼ 0.16 respectively). Note that the beneficial effect of the first fixed mutation is comparable to the value s0  0:1 estimated from experiments (Lenski et al., 1991; Wiser et al., 2013). The range of estimated values obtained for the beneficial mutation rate are also essentially equivalent (U b  1  10  5 for the power law advantage model and U b  6  10  6 for the logarithmic advantage model). Note that the data points used in this step are much more abundant, since they include a value of fitness every 1000 generations. A logarithmic increase of the fitness advantage for the E. coli long term evolution experiment is also suggested by a parallel work (Wielgoss et al., 2013). Finally, this simple procedure for comparing model with data should not be affected by the loss of self-averaging property found in the model, which becomes relevant in a later regime. For example, assuming the estimated parameters for the E. coli long-term evolution experiment, and using the model to explore the relative variance of the speed over realizations σðvk Þ=vk , at t ¼ 2  104 generations one gets an error of the order of 2%. This value is still relatively small. Considering the much larger number of accumulated mutations k  103 , corresponding to t  2  106 , the speed of evolution is  3  10  4 ( bU b ), and its relative variance over the realizations results σðvk Þ=vk  8%, which would be still under control.

4. Discussion and conclusions Different laboratory evolution experiments show a decrease of the fitness advantage due to newly acquired mutations and a decrease of the speed of evolution (Hindré et al., 2012; Kryazhimskiy et al., 2009; Elena and Lenski, 2003). We considered a simplified model, using a minimal number of parameters, which is a direct generalization of the multiple-mutations model with constant advantage, but describes this feature in terms of diminishing returns (Kryazhimskiy et al., 2009). Specifically, it is assumed that the selective advantage of all individuals having k beneficial mutations is identical, but decreases with k. We have shown that the basic phenomenology of the model entails a sublinear decrease of the mean number of fixed mutations and a steeper sublinear decrease of the mean advantage. This is in qualitative agreement with previous results using a similar model applicable in a regime where concurrent mutations do not occur (Kryazhimskiy et al., 2009). The evolutionary speeds of mutation accumulation and advantage are related to the width of the distribution of coexisting advantage classes. We showed how a theoretical infinite-N argument produces a relation between the speed of fixed mutations vk and the second moment of the histogram of mutation classes, confirmed by simulations. Interestingly, simulations indicate that for any finite N, different model realizations behave increasingly differently with time in terms of

both vk and width of the mutation class histogram. This non-selfaveraging property implies that even at intermediate times, the behavior of a realization can be quite different from the average. However, as we have seen, this effect appears to become relevant in the model on time scales longer than the experimental times considered here. Finally, for finite population size, we were able to define through analytical arguments the regime where the stochastic edge estimate of the adaptation and mutation speeds can be extended to the case of diminishing returns provided that the advantage s is substituted with the appropriate function s0 g 0 ð〈k〉Þ. While more complex and realistic descriptions exist (see ref. (Schiffels et al., 2011), that incorporates genetically linked multiple mutations explicitly), the advantage of this approach is that the model depends on few parameters. We performed a numerical experiment giving a qualitative idea of the comparison of model with data, and allowing to perform gross estimates. We considered two different experimental data sets from long-term evolution experiments, and defined a simple procedure to compare model with data. Assuming the model, this procedure yields an order-ofmagnitude estimate of the beneficial mutation rate Ub. The values obtained for the beneficial mutation rate fall within the range of the available measurements (Hegreness et al., 2006; Perfeito et al., 2007), on the order of 10  6 =10  5 mutations per genome per generation for the E. coli long-term evolution experiment, and between 10  7 and 10  6 in the case of A. baylyi. As previously mentioned, the constant-advantage model can be seen as an effective description of a multiple-mutation framework with a distribution of advantages (Good et al., 2012), provided an effective advantage is used. This description also produces an effective beneficial mutation rate that, in a diminishing return framework, would vary with time, and thus would need an extension of the present model to be fully implemented. While we leave to future work a detailed treatment of the problem, it is possible to give a rough but quantitative estimate of the underlying beneficial mutation rate in the simple case of an exponential distribution of the advantages using the rescaling procedure proposed by Good and coworkers. This estimate (Supplementary Notes, Section S3) indicates that the rescaling should not affect the current order-of-magnitude estimates. A very recent work (Wiser et al., 2013) examines fitness trajectories of the long term evolution experiment up to 50k generations and match them with a theoretical argument based on epistasis, but which neglects multiple mutations (Gerrish and Lenski, 1998). Their results are completely in line with ours: power law behavior of the fitness and estimated beneficial mutation rate of  10  6 or higher. Comparing the model to the first 20k generation of the new dataset (where both mutation data and fitness are available) gives α ¼ 0:27, which is in line with our previous estimate, and provides a measure of the intensity of the epistatic interactions. In their work, Wiser and coworkers (Wiser et al., 2013) introduce epistasis using a parameter g to express the decrease of the expected advantage of new mutations. They find that fitness, as a function of time, is proportional to t 1=2g . In order to obtain an equivalent expression for w(t) in our model we can correlate k and t through the establishment time (see Supplementary Notes Sec. S1E), obtainα ing the approximate relation wðkÞ ¼ s0 k p s0 t α=ð2  αÞ . Comparing the two expressions allows to map the parameters of the two models g ¼ ð2  αÞ=2α. Substituting the estimated value for α, gives g  3:2, which is close to the range of values g  4–9 derived by Wiser and coworkers, and in particular similar to the value g  4 obtained for the Ara-1 population analyzed here. Moreover our estimate of the epistatic interaction can be compared to the measurements obtained in ref. (Khan et al., 2011) on the first five mutations acquired during the E. coli long-term evolution experiment. A qualitative comparison shows accordance with our results. However, each single mutation seems

M.R. Fumagalli et al. / Journal of Theoretical Biology 365 (2015) 23–31

to have a specific value of s0, which would require a more complex modeling approach. The constant-advantage multiple-mutations model has also previously been applied to short-term laboratory evolution experiments (Desai et al., 2007). In those early stages, adaptation does not slow down, and the assumption of constant advantage is justified. We can compare our results to those obtained with the same procedure, using a constant advantage function gðkÞ ¼ s0 k, applied to the increase in fitness during the late stages of both the experiments. We have performed this test considering different “starting points”, i.e. initial generation in the empirical data. The order-of-magnitude values of Ub obtained with the procedure are similar to the ones quoted above. However, one is forced to discard the information about the initial mutations, and, as can be expected, the outcome for Ub depends on the chosen starting point. We found that it could vary by almost an order of magnitude for time intervals that appeared equally reasonable to fit with a constant advantage model. On the contrary, the diminishing return model allows to use data from all mutations, and does not leave this freedom. Additionally, it includes the early mutations, where s varies much more, and presumably the relative accuracy in its experimental measurement is higher. Other scenarios have been proposed and possibly co-occur with diminishing returns epistasis (Kryazhimskiy et al., 2009; Khan et al., 2011; Chou et al., 2011), and, accordingly, different models have been formulated in this context. For example, the speed of evolution could decrease because beneficial mutations with larger advantage fix sooner in the population (Schiffels et al., 2011) and because the mutation rate or the number of possible beneficial mutations decreases with time (Rouzine et al., 2008; Park and Krug, 2008). These different explanations are not necessary mutually exclusive, and could be stratified in actual laboratory evolution experiments (Tenaillon et al., 2012). It is currently unclear whether the available experimental observables allow to establish the relative weights of these distinct phenomena, or precisely which different experimental measurements would; we believe that simple and possibly falsifiable models could help exploring these questions. Acknowledgments MO and MCL acknowledge support from the International Human Frontier Science Program Organization (Grant RGY0069/2009-C). The work of MRF, MCL, FH, and PT was supported by a “Convergence” grant from the University Pierre and Marie Curie, Paris. We thank J. Krug, M. Laessig, L. Peliti, G. Malaguti and S. Wielgoss for the useful comments and discussions on this work. Appendix A. Supplementary material Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.jtbi.2014.09.042. References Barrick, J.E., Yu, D.S., Yoon, S.H., Jeong, H., Oh, T.K., Schneider, D., Lenski, R.E., Kim, J.F., 2009. Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature 461, 1243–1247. Brunet, E., Rouzine, I.M., Wilke, C.O., 2008. The stochastic edge in adaptive evolution. Genetics 179, 603–620. Chou, H.-H., Chiu, H.-C., Delaney, N.F., Segrè, D., Marx, C.J., 2011. Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science 332, 1190–1192. Desai, M.M., Fisher, D.S., 2007. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics 176, 1759–1798.

31

Desai, M.M., Fisher, D.S., Murray, A.W., 2007. The speed of evolution and maintenance of variation in asexual populations. Curr. Biol. 17, 385–394. Desai, M.M., Nicolaisen, L.E., Walczak, A.M., Plotkin, J.B., 2012. The structure of allelic diversity in the presence of purifying selection. Theor. Popul. Biol. 81, 144–157. de Visser, J.A.G.M., Lenski, R.E., 2002. Long-term experimental evolution in Escherichia coli. xi—rejection of non-transitive interactions as cause of declining rate of adaptation. BMC Evol. Biol. 2, 19. E. coli long-term experimental evolution project site, URL: 〈http://myxo.css.msu. edu/index.html〉. Elena, S.F., Lenski, R.E., 1997. Long-term experimental evolution in Escherichia coli. vii - mechanisms maintaining genetic variability within populations. Evolution 51, 1058–1067. Elena, S.F., Lenski, R.E., 2003. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat. Rev. Genet. 4, 457–469. Eyre-Walker, A., Keightley, P.D., 2007. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8, 610–618. Felsenstein, J., 1974. The evolutionary advantage of recombination. Genetics 78, 737–756. Fisher, R., 1930. The Genetical Theory of Natural Selection. Clarendon Press, Oxford. Gerrish, P.J., Lenski, R.E., 1998. The fate of competing beneficial mutations in an asexual population. Genetica 102–103, 127–144. Good, B.H., Rouzine, I.M., Balick, D.J., Hallatschek, O., Desai, M.M., 2012. Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations. Proc. Natl. Acad. Sci. USA 109, 4950–4955. Guess, H.A., 1974. Limit theorems for some stochastic evolution models. Ann. Probab. 2, 14–31. Hallatschek, O., Korolev, K.S., 2009. Fisher waves in the strong noise limit. Phys. Rev. Lett. 103, 108103. Hegreness, M., Shoresh, N., Hartl, D., Kishony, R., 2006. An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615–1617. Hindré, T., Knibbe, C., Beslon, G., Schneider, D., 2012. New insights into bacterial adaptation through in vivo and in silico experimental evolution. Nat. Rev. Microbiol. 10, 352–365. Jezequel, N., Lagomarsino, M.C., Heslot, F., Thomen, P., 2013. Long-term diversity and genome adaptation of acinetobacter baylyi in a minimal-medium chemostat. Genome Biol. Evol. 5, 87–97. Keightley, P.D., Eyre-Walker, A., 2007. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics 77, 2251–2261. Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E., Cooper, T.F., 2011. Negative epistasis between beneficial mutations in an evolving bacterial population. Science 332, 1193–1196. Kimura, M., 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge. Kryazhimskiy, S., Tkacik, G., Plotkin, J.B., 2009. The dynamics of adaptation on correlated fitness landscapes. Proc. Natl. Acad. Sci. 106, 18638–18643. Lenski, R.E., Rose, M.R., Simpson, S.C., Tadler, S.C., 1991. Long-term experimental evolution in escherichia coli. i. adaptation and divergence during 2000 generations. Am. Nat. 138, 1315–1341. Martin, G., Elena, S.F., Lenormand, T., 2007. Distributions of epistasis in microbes fit predictions from a fitness landscape model. Nat. Genet. 39, 555–560. Orr, H.A., 2003. The distribution of fitness effects among beneficial mutations. Genetics 163, 1519–1526. Park, S.-C., Krug, J., 2008. Evolution in random fitness landscapes: the infinite sites model. J. Stat. Mech. Theory Exp. 2008, P04014. Park, S.-C., Simon, D., Krug, J., 2010. The speed of evolution in large asexual populations. J. Stat. Phys. 138, 381–410. http://dx.doi.org/10.1007/s10955-0099915-x. Perfeito, L., Fernandes, L., Mota, C., Gordo, I., 2007. Adaptive mutations in bacteria: high rate and small effects. Science 317, 813–815. Rouzine, I.M., Wakeley, J., Coffin, J.M., 2003. The solitary wave of asexual evolution. Proc. Natl. Acad. Sci. USA 100, 587–592. Rouzine, I., Brunet, E., Wilke, C.O., 2008. The traveling-wave approach to asexual evolution: Mullers ratchet and speed of adaptation. Theor. Popul. Biol. 72, 24–46. Schiffels, S., Szöllösi, G., Mustonen, V., Lässig, M., 2011. Emergent neutrality in adaptive asexual evolution. Genetics 189, 1361–1375. Tenaillon, O., Rodrguez-Verdugo, A., Gaut, R.L., McDonald, P., Bennett, A.F., Long, A.D., Gaut, B.S., 2012. The molecular diversity of adaptive convergence. Science 335, 457–461. Tsimring, L.S., Levine, H., Kessler, D.A., 1996. Rna virus evolution via a fitness-space model. Phys. Rev. Lett. 76, 4440–4443. Wielgoss, S., Barrick, J., Tenaillon, O., Wiser, M., Dittmar, W., Cruveiller, S., ChaneWoon-Ming, B., Mdigue, C., Lenski, R., 2013. Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load. Proc. Natl. Acad. Sci. USA 110, 222–227. Wilke, C.O., 2004. The speed of adaptation in large asexual populations. Genetics 167, 2045–2053. Wiser, M.J., Ribeck, N., Lenski, R.E., 2013. Long-term dynamics of adaptation in asexual populations. Sci. Exp.. Wright, S., 1931. Evolution in mendelian populations. Genetics 16, 97–159.

Speed of evolution in large asexual populations with diminishing returns.

The adaptive evolution of large asexual populations is generally characterized by competition between clones carrying different beneficial mutations. ...
833KB Sizes 0 Downloads 7 Views