Multiple branched adaptive steered molecular dynamics Gungor Ozer, Thomas Keyes, Stephen Quirk, and Rigoberto Hernandez Citation: The Journal of Chemical Physics 141, 064101 (2014); doi: 10.1063/1.4891807 View online: http://dx.doi.org/10.1063/1.4891807 View Table of Contents: http://scitation.aip.org/content/aip/journal/jcp/141/6?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Multiple time step molecular dynamics in the optimized isokinetic ensemble steered with the molecular theory of solvation: Accelerating with advanced extrapolation of effective solvation forces J. Chem. Phys. 139, 244106 (2013); 10.1063/1.4848716 Effect of interaction with coesite silica on the conformation of Cecropin P1 using explicit solvent molecular dynamics simulation J. Chem. Phys. 138, 045103 (2013); 10.1063/1.4788662 Adaptive steered molecular dynamics: Validation of the selection criterion and benchmarking energetics in vacuum J. Chem. Phys. 136, 215104 (2012); 10.1063/1.4725183 Intra-protein hydrogen bonding is dynamically stabilized by electronic polarization J. Chem. Phys. 130, 115102 (2009); 10.1063/1.3089723 Adaptively biased molecular dynamics for free energy calculations J. Chem. Phys. 128, 134101 (2008); 10.1063/1.2844595

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

THE JOURNAL OF CHEMICAL PHYSICS 141, 064101 (2014)

Multiple branched adaptive steered molecular dynamics Gungor Ozer,1 Thomas Keyes,1,a) Stephen Quirk,2,3 and Rigoberto Hernandez3,a) 1

Department of Chemistry, Boston University, Boston, Massachusetts 02215-2521, USA Kimberly-Clark Corporation, Atlanta, Georgia 30076-2199, USA 3 Center for Computational and Molecular Science and Technology, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA 2

(Received 10 May 2014; accepted 18 July 2014; published online 8 August 2014) Steered molecular dynamics, SMD, [S. Park and K. Schulten, J. Chem. Phys. 120, 5946 (2004)] combined with Jarzynski’s equality has been used widely in generating free energy profiles for various biological problems, e.g., protein folding and ligand binding. However, the calculated averages are generally dominated by “rare events” from the ensemble of nonequilibrium trajectories. The recently proposed adaptive steered molecular dynamics, ASMD, introduced a new idea for selecting important events and eliminating the non-contributing trajectories, thus decreasing the overall computation needed. ASMD was shown to reduce the number of trajectories needed by a factor of 10 in a benchmarking study of decaalanine stretching. Here we propose a novel, highly efficient “multiple branching” (MB) version, MB-ASMD, which obtains a more complete enhanced sampling of the important trajectories, while still eliminating non-contributing segments. Compared to selecting a single configuration in ASMD, MB-ASMD offers to select multiple configurations at each segment along the reaction coordinate based on the distribution of work trajectories. We show that MB-ASMD has all benefits of ASMD such as faster convergence of the PMF even when pulling 1000 times faster than the reversible limit while greatly reducing the probability of getting trapped in a non-significant path. We also analyze the hydrogen bond breaking within the decaalanine peptide as we force the helix into a random coil and confirm ASMD results with less noise in the numerical averages. © 2014 AIP Publishing LLC. [http://dx.doi.org/10.1063/1.4891807] I. INTRODUCTION

Jarzynski discovered an equality (JE) relating the nonequilibrium work and equilibrium free energy change in 1997.1 Since then many applications in vitro2–4 and in silico5–7 have utilized this relation to extract free energy from work trajectories. Among these, steered molecular dynamics8, 9 (SMD) has found significant use in studying folding/unfolding and ligand binding/unbinding of proteins. The simulations include a wide range of chemical and biological systems and processes, including small polyalanines for benchmarking, unfolding of larger biologically important proteins10 and nucleic acids,11 drug/ligand binding to proteins12 and mobility of K+ cation in the waterdichloromethane interface.13 The primary difficulty for implementing SMD more broadly is that most trajectories used to compute the required equilibrium average (e.g., the free energy) do not contribute, i.e., there is a sampling problem which gets worse as the work distribution diverges from that of a Gaussian. This limitation can be overcome by generating a very large number of nonequilibrium trajectories or by steering the system out of equilibrium very slowly as in the reversible limit. It has also been shown that equilibration at intermediate segments of the nonequilibrium pulling could help achieve better convergence of the PMF.14, 15 However, depending on the system studied, a) Authors to whom correspondence should be addressed. Electronic

addresses: [email protected], Fax: (617) 353-6466 and hernandez@ chemistry.gatech.edu, Fax: (404) 894-0594. 0021-9606/2014/141(6)/064101/8/$30.00

the above-mentioned solutions could increase the compute time so that the efficiency of JE is reduced dramatically. Recently, Hernandez and coworkers16 proposed adaptive steered molecular dynamics (ASMD), an algorithm designed to choose an important path in the nonequilibrium reaction— in their case forced unfolding of neuropeptide Y—and iteratively generate new trajectories near that path. The method has proved to be more efficient in accurate determination of the free energy profile of the helix-coil transformation of decaalanine in vacuum, using fewer trajectories than SMD.15 ASMD works best if the nonequilibrium ensemble generated over the selected path optimally represents instantaneous equilibrium. However, for large systems—i.e., proteins, nucleic acids, etc.—such a single branch might not suffice. In this paper, we propose two new selection algorithms to select multiple paths for ASMD. We will call the new method multiple branched adaptive steered molecular dynamics, MB-ASMD. Beyond introducing MB-ASMD, we also validate and benchmark it with respect to the model system of decaalanine in vacuum which readily folds into an α-helix. The stretching of decaalanine provides a good benchmark for nonequilibrium methods, as it has a smooth free energy profile as a function of end-to-end distance. The stretching (steering) is implemented by pulling a pseudo-particle that is harmonically attached to the N-terminus (NC). When initiated from a compact helix (13 Å end to end distance), its potentials of mean force form a minimum at 16 Å (perfect α-helix) and a highenergy kink at 26 Å (all helical contacts broken), followed by

141, 064101-1

© 2014 AIP Publishing LLC

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

064101-2

Ozer et al.

J. Chem. Phys. 141, 064101 (2014)

a plateau representing the random coil. Studying such small peptides, although not themselves of major biological relevance, can provide significant insights into the more complex challenges of protein folding. Decaalanine, for example, has been a key model system for testing and benchmarking newly developed SMD algorithms since the work of Schulten and coworkers.8, 9 The literature contains extensive information of such systems to analyze, compare, and verify novel results for which the article by Gumbart and coworkers is a recent example.17 Here, we use decaalanine to benchmark the new MB-ASMD and compare the results to Park and Schulten’s SMD,9 and Ozer, Quirk, and Hernandez’s ASMD.15

II. MODELS AND METHODS A. Steered molecular dynamics

The free energy difference, G, between two equilibrium states, A and B, can be computed in a quasistatic way moving the system from A to B reversibly. However, going sufficiently slowly requires a great amount of computation time and is usually not a feasible approach, especially for large systems like biomolecules or nucleic acids. The JE is an exact relation between the nonequilibrium work and the free energy difference between two equilibrium states which works even when the process from A to B is irreversible and is valid for virtually any reaction. This means that, if enough realizations of the process or reaction of interest are sampled then the JE will hold, independent of the rate. The key question for practical computation, then, is how much sampling is required. If the nonequilibrium average converges too slowly for complex systems, then there may be no advantage over conventional methods. Free energy profiles contain more information than G alone and provide additional insight on any chemical, physical, or biological reaction. Over a decade ago, Park and Schulten applied JE in their nonequilibrium MD algorithm, steered molecular dynamics, SMD. They proposed to steer the system along a predefined reaction path via external forces from which they could easily calculate the work done on the system as the steering occurs. By repeating many independent realizations of the same experiment, they were able to generate a series of work trajectories whose exponential average is connected to the potential of mean force (PMF), G(ξ ), along the steering path denoted by ξ , the reaction coordinate, by the Jarzynski-Crooks relation. In SMD, most generally, a reaction coordinate which is a function of the atomic positions, ξ (r), is harmonically constrained to follow a parameter, λ, by addition of a guiding potential, hλ (r) =

k (ξ (r) − λ)2 . 2

(2.1)

Specifically, in the following, the steering force is applied by introducing a pseudo particle, which moves on the reaction path with position specified by λ(t), while ξ (r) is the position of the “real” atom being pulled. The initial pseudo-atom position obeys λ(0) = ξ 0 . It is straightforward that the force

applied on the system during steering is obtained from the derivative of Eq. (2.1). Starting from an equilibrium configuration and randomizing forces, as in the random forces in Langevin dynamics in the original Schulten paper, one can generate as many different trajectories as desired. The work, W, done on the system is then calculated for each trajectory from the external force derived from the steering potential, Eq. (2.1). At this point, JE can be applied, yielding the free energy via an exponential average of the work ensemble, 1 −βWξ ←ξ t 0 ; lne 0 β

G(ξt ) = G(ξ0 ) −

(2.2)

here β = 1/(kB T), kB is Boltzmann’s constant, T is the temperature, .0 denotes an ensemble average over the nonequilibrium trajectories, and subscript “0” indicates that the trajectories are started from configurations sampled from the initial equilibrium (ξ 0 ) canonical ensemble. B. Adaptive steered molecular dynamics

Most trajectories contribute very little to Eq. (2.2), and ASMD was developed to achieve more efficient sampling of the important ones. In ASMD, the reaction coordinate, ξ , is divided into a large number, N, of segments. At the end of each segment, i, the work is calculated for each trajectory. At this point, different criteria might be applied to select an initial configuration for the next segment, i + 1. If enough trajectories are sampled per ASMD segment, the selection criterion would not matter. But such a number may be too large to be feasible, and hence the selection of an optimal pathway is necessary. Hernandez and coworkers15, 16, 18 have suggested the use of (i) the configuration that had the work value closest to the exponential average of the work ensemble (Jarzynski’s average, JA), (ii) minimum work configuration (MW), and (iii) the configuration in which the atom being pulled is nearest to the pseudo particle at the end of segment i (RC). They benchmarked these selection criteria and concluded that JA leads to the smallest fluctuations within the distribution of work trajectories thus leading to a better converged free energy profile. Using any selection criterion, for an arbitrary amount of progress along the path, Eq. (2.2) takes the form 

Gξ ←ξ t

0

N 1 −βWξ  ←ξ  i i−1  =− lne i−1 β i=1

(2.3)

for N  chosen such that ξt ∈ [ξN  −1 , ξN  ], ξi ≡ ξi for i < N  , and ξN  ≡ ξt . Equation (2.3) achieves faster convergence compared to Eq. (2.2) by eliminating less significant events before their work trajectories start to diverge from the main basin of attraction. C. Multiple branched adaptive steered molecular dynamics

Similar to ASMD, MB-ASMD aims to enhance nonequilibrium sampling by forcing the ensemble to fluctuate only around significant events but avoiding the sampling of

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

Ozer et al.

statistically insignificant events. ASMD—selecting a single basin in this context—might achieve this if enough trajectories are sampled at each segment. However, ASMD might fall short if multiple basins of attraction are dominant in the ensemble or if the number of trajectories is not enough to sample the dominant basins. The latter condition is also a limitation to conventional SMD. MB-ASMD aims to address both of these limitations by selecting multiple basins thus reducing the probability of missing some or all important basins. Instead of using a single “best” trajectory to spawn the next segment, the idea is to assign a probability, based upon a selection criterion, that every trajectory might do so. The number N of trajectories is kept constant. After normalization, their relative probabilities form a partition of unity from which new ones are selected from the old according to N random numbers. Usually, the higher-probability trajectories have several progeny. As the higher-probability trajectories also tend to cluster around a given value of the selection criterion, this tends to contract the ensemble space as is desired. More precisely, we can write the potential of mean force at ζ 1 relative to some minimum energy point which we set to be the zero of energy as  e−βG01 (ζ1 ) = Q−1 d rδ(ζ (r ) − ζ1 )e−βV (r ) (2.4a) =Q

−1



dEρζ (E)e−βE , 1

(2.4b)

where ρζ (E) is the density of equilibrium energy states at 1 fixed ζ 1 and energy E. On the other hand, the Jarzynski’s equality can be written as e−βG01 (ζ1 ) = N −1 =Q

−1

N 



−βW(ζ

←ζ ) 0 i

(2.5a)

dWρnoneq (W )e−βW ,

(2.5b)

e

1

i

where ρnoneq (W ) is the density of nonequilibrium work states at fixed ζ 1 for a given work W . It is tempting to ignore the difference between E and W in these last two equations; that would be wrong as it is certainly not the case that ρnoneq (W ) is equal to the equilibrium distribution. In practice ρnoneq (W ) will rise and fall with W , and this shape will only be shifted slightly by the exponential as is indicated in Fig. 1. Jarzynski’s equality states that the average of this product is connected to the exponential of the free energy difference expressed in the LHS of Eq. (2.5). In naive ASMD,15, 16, 18 the integral was assumed to be dominated by structures with work corresponding to a single dominant value of W . The specific choice was the structure whose weight e−βW was closest to the average. This suggests that the absolute value of the energy difference between the nonequilibrium work and the average work is a metric which governs the likelihood that a given structure contributes to the equilibrium average. The purpose of this work is to show that we can use the product, ρnoneq (W )e−βW , to obtain a set of initial structures for a subsequent ASMD segment that is necessarily more compact with respect to this metric than the nonequilibrium distribution ρnoneq (W ) at the end of the segment.

Work PMF exp{-β|W-JA|} exp{-βW}

0

2

4

6

Selection probabilities

J. Chem. Phys. 141, 064101 (2014)

Work Distribution, P(W)

064101-3

8

Work & PMF (kcal/mol) FIG. 1. Representative work distribution at the end of a segment generated by SMD simulations (solid black), the position of the Jarzynski average at the corresponding reaction coordinate (solid red), and values of the MB selection criteria, JA-abs (green) and EW-Bz (blue). X-axis is kcal/mol. This figure is for illustration purpose only and is not based on any physical data.

In this work, we have employed two selection mechanisms, both of which are based on comparing their corresponding work values, as in ASMD: (i) selecting multiple configurations based on their probability of having absolute work values near the Jarzynski’s average (JA-abs), e−|J A−w|/kT and (ii) selecting multiple configurations based on the exponential of negative work (EW-Bz), e−w/kT . The JA-abs selection criterion in MB-ASMD is similar to JA in ASMD in the sense that it will search for important basins near the trajectory closest to the Jarzynski’s average. On the other hand, EW-Bz is similar to MW in ASMD as the minimum work trajectories will have higher probability to be selected in the following segment. EW-Bz can also be attributed to Jarzynski’s observation19 that exp(−βW ) is the reweighing factor that allows calculation of an equilibrium Boltzmann (canonical) average from the nonequilibrium trajectories. The idea is to keep the bulk of the swarm of trajectories as close to equilibrium as possible as they go between segments. The reweighted trajectories are replaced by a selection of trajectories, increasing or decreasing the number in a region of configuration space according to the reweighing factors. The new swarm, with the weights expressed via the number of points alone, as usual, is used for the next ASMD segment. Fig. 1 shows a typical near-Gaussian work distribution generated by many SMD runs. It also illustrates which configurations JA-abs and EW-Bz are likely to select: those that are on or near the Jarzynski average for JA-abs, and those that requires less work for EW-Bz. D. Simulation details

In order to validate the MB-ASMD approach, we return to the determination of the potential of mean force along the stretching coordinate of decaalanine in vacuum. It is a wellstudied case for which the potential of mean force has been obtained earlier using standard force fields and hence serves as a useful benchmark.8, 9, 15 Decaalanine is defined fully in a 104-atom model as illustrated in Fig. 2. The NAMD20 molecular dynamics integrator, and the CHARMM force field21

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

064101-4

Ozer et al.

J. Chem. Phys. 141, 064101 (2014)

FIG. 2. Representative ribbon and atomically detailed snapshots of decaalanine in vacuum at 27 Å end-to-end distance from exp(−βW ) selection criterion. The two larger ribbons are configurations that have the minimum work (MW, on the right) and that has the work closest to Jarzynski’s average (JA, on the left). The smaller ones are the most selected (top) and five randomly picked least selected (bottom) configurations.

have been used for all simulations in this study. The temperature is maintained at 300 K using a Langevin bath via the built-in NAMD plug-in. The peptide is placed in a vacuum, with no explicit or implicit solvent, and the random and friction forces in the Langevin dynamics serve as the thermostat. The initial structure has a compact helical shape, identical to that used in earlier SMD studies.8, 9, 15 Energy is minimized and the molecule is equilibrated through a series of heating steps up to the simulation temperature, 300 K. The coordinates are further relaxed at 300 K for several picoseconds. Multiple equilibrated configurations have been generated using different random forces. The reaction coordinate of the actual steering is defined as the end-to-end distance, that between the N-terminus nitrogen (NN) and the nitrogen at the C-terminus cap (NC). NN is kept fixed at (0,0,0) while NC is pulled along the Z-direction from (0,0,13) to (0,0,33) by pulling the pseudo-particle that is harmonically attached to NC, with a spring constant k = 7.2 kcal/mol Å2 as was used in previous benchmarking studies. The range of parameters specifying the simulations is matched to those of earlier work.15 The MB-ASMD method is benchmarked on the stretching of decaalanine using two pulling speeds, 100 Å/ns and 10 Å/ns which were seen earlier to give different potential energy curves using ASMD15 that nevertheless matched the corresponding ones8, 9 obtained by SMD. The range in the numbers of segments (10, 20, 40, 80) and trajectories per segment, TPS, (50, 100, 200, 400, 800, 1600, 3200) matches that of the earlier work. For each pulling speed, Ns sets of independently equilibrated configurations are generated. MB-ASMD is implemented over this equilibrium ensemble beginning from the first segment, i = 1. At the end of segment, i, the selection criterion is applied to each trajectory and the resulting unnormalized probabilities for the entire ensemble are repartitioned across the unit interval. Ns random numbers between 0 and 1 are generated, and survival of the trajectories is determined accordingly. The next segment, i + 1, starts over on the selected trajectories from the segment, i. This process is repeated until all N  segments have been computed. Note that if a trajectory is selected multiple times, as is usually the case, the random forces cause the velocities of the new trajectories to deviate at

the very first MD step of the next segment, and subsequently sample different paths in configuration space. Trajectories are analyzed both numerically and visually. We have used the NAMD/VMD package20, 22 for visual investigation and in-house scripts for numerical investigation. Hydrogen bond analysis has also been done through the VMD plug-in. Two non-bonded atoms are said to form hydrogen bonds when the distance between them is less than 4 Å and the angle between the two and the shared hydrogen is less than 40◦ , i.e., donor and acceptor atoms are within 4 Å and the angle formed by donor, hydrogen, and acceptor is greater than 140◦ .

III. RESULTS AND DISCUSSION

MB-ASMD is aimed at enhancing the sampling of trajectories during the nonequilibrium steering process by selecting/eliminating particular trajectories that are unlikely to contribute to the average. There is no limit to how many times a specific trajectory, or configuration from the end of the previous segment, will be selected to start a trajectory in the next segment. The more multiple selections, of course, the fewer different configurations are selected. We have observed that more than half of the structures reached by the nonequilibrium trajectories at the end of a given segment are not selected as initial structures for the subsequent segment, several are selected only once and some are selected multiple times. Fig. 2 shows the most selected configurations at the end of the seventh segment. This corresponds to the 27 Å end-to-end distance of the peptide for the 400 TPS 10 Å/ns pulling simulations selected according to the EW-Bz selection criterion. As can be seen in Fig. 3, the seventh segment is also where the kink in the PMF occurs. It is worth noting that all important hydrogen bonds (α-, 310 -, and π -helical contacts) are broken at the seventh segment. This same region corresponds to the segment in which the least number of different configurations were selected to survive for the next segment. Specifically, for the same simulation set, the number of configurations surviving between segments was only 105 in the seventh segment while the average across the other nine segments was 215. There is no clear visual difference in the structures between

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

Ozer et al.

J. Chem. Phys. 141, 064101 (2014)

30

30

20

20

10

10

0

0

20 15 10 5 0

20 15 10 5 0 15

20

25

30

15

20

25

30

end-to-end distance (Å)

RMS error (%)

PMF (kcal/mol)

064101-5

60 50 40 30 20 10 0 7 6 5 4 3 2 1 0 0

2 FIG. 3. Comparison of the PMFs obtained from the MB-ASMD method pulling at 100 Å/ns (top) and 10 Å/ns (bottom) for the JA-abs (left) and EWBz (right) selection criteria. Solid black is the PMF obtained from 10 000 standard SMD simulations, solid gray is the exact PMF obtained from reversible simulations. Dashed curves in red, green, blue, yellow, brown, purple, and cyan represent PMFs obtained from MB-ASMD simulations with 50, 100, 200, 400, 800, 1600, and 3200 trajectories per segment, respectively. Note that the standard SMD PMFs for the 10 Å/ns pulling simulations overlap onto the reversible PMF thus not fully visible in the bottom panels.

the most selected and least—zero times—selected configurations nor in terms of the hydrogen bonds maintained. A. The thermodynamics of decaalanine stretching in vacuum

We first analyze the convergence of the PMFs with respect to the number of trajectories per segment as it takes the values: 50, 100, 200, 400, 800, 1600, and 3200. ASMD, as seen in Ref. 15, showed good convergence using the JA selection criterion both at 100 Å/ns and 10 Å/ns pulling speed. Also in that paper, we noticed that the PMF obtained through the MW selection criterion at 100 Å/ns pulling speed converged to values in agreement with the the exact PMF. However, this was an accidental agreement because the use of the MW at 10 Å/ns pulling slightly underestimates the exact PMF. Consequently the use of the minimum work configuration to select an initial structure at the end of each segment does not accurately describe the reversible reaction. This is not surprising because, although it is very effective and practical for reconstructing PMFs along a particular reaction coordinate, Jarzynski’s equality is exact only when a representative set of realizations of the corresponding reaction are sampled. EW-Bz in ASMD and MB-ASMD, however, overcomes that artifact by not letting the minimum work path drive significantly away from representing the initial equilibrium ensemble. As seen in Fig. 3, EW-Bz never underestimates the exact PMF, but instead converges to it with as few as a 400 trajectories sampled per segment. The convergence is also numerically analyzed in terms of the end-point-deviation of the resulting MB-ASMD PMFs compared to the exact PMF. Hernandez and coworkers showed in Fig. 3 of Ref. 15, that compared to SMD, ASMD has much better convergence in the fast pulling limit, 100 Å/ns, using all selection criteria. In the slow pulling limit, 10 Å/ns, ASMD with JA yielded comparable results to SMD

2

1

2

2

3

2

4

2

5

2

6

2

7

2

8

2

Number of trajectories (X 50) FIG. 4. End-point convergence as a function of trajectories sampled per segment. Top panel shows convergence of fast 100 Å/ns pulling simulations, bottom panel shows convergence of slow 10 Å/ns pulling simulations. The RMS error in the PMF is calculated relative to the SMD result obtained with 10 000 trajectories (shown in filled circles) and to the exact result obtained at a reversible pulling speed (shown in filled squares or diamonds). Data are obtained from MB-ASMD using JA-abs (black), MB-ASMD using EW-Bz (red) and standard SMD (green).

(Fig. 4 of Ref. 15), while MW did not converge as well as JA or standard SMD. As seen in the top panel of Fig. 4, MBASMD with both JA-abs and EW-Bz converges onto the exact PMF much better than SMD in the fast pulling simulations. In 10 Å/ns pulling limit (Fig. 4, bottom), both selection criteria perform better than SMD with small numbers of trajectories. With more trajectories, however, JA-abs fell behind; EW-Bz always yields lower convergence than SMD. In fact, with EWBz the RMS deviation gets flat at 800 trajectories per segment, while SMD gets to the same level and flatness. Comparing MB-ASMD to the standard SMD with 10 000 trajectories, we find that JA-abs performs better than EW-Bz in fast pulling simulations, but EW-Bz performs better for slow pulling (see dashed curves in Fig. 4). Overall we conclude that EW-Bz is the best choice to estimate the exact PMF of a particular reaction. If the pulling speed is much higher than the reversible limit, JA-abs could be the best choice to reconstruct the PMF, which would require significantly more trajectories with standard SMD. Fig. 3 shows the PMFs when the reaction coordinate is divided into 10 segments. Fig. 5 demonstrates how the PMF is affected by varying the number of segments: 5, 20, 40, and 80. EW-Bz again outperforms JA-abs, causing less fluctuation in the PMF. JA-abs especially fluctuates over and under the standard SMD PMF at 100 Å/ns pulling, which limits its potential flexibility in dividing the reaction coordinate into arbitrary numbers of segments, and would potentially require an optimum segment size for the best results. EW-Bz, on the other hand, does not suffer from such fluctuations even at 100 Å/ns pulling. When the system is pulled significantly faster than the reversible limit, it inevitably drives further away from equilibrium. This is exactly why the fast pulling, 100 Å/ns, which is 1000 times the reversible speed, leads to larger deviation from the exact PMF than slow pulling, 10 Å/ns, using both

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

Ozer et al.

30

PMF (kcal/mol)

20 10

J. Chem. Phys. 141, 064101 (2014)

Exact PMF 10,000 regular 5 steps 10 steps 20 steps 40 steps 80 steps

30

30

20

20

10

0

0

20 15 10 5 0

20 15 10 5 0 15

20

25

30

15

20

25

30

PMF (kcal/mol)

064101-6

10

Exact PMF 10,000 regular 2 ps 10 ps 50 ps 100 ps 200 ps

30 20 10

0

0

20 15 10 5 0

20 15 10 5 0 15

20

25

30

15

20

25

30

end-to-end distance (Å)

end-to-end distance (Å)

FIG. 5. Comparison of the PMFs obtained from the MB-ASMD methods pulling at 100 Å/ns (first row of panels) and 10 Å/ns (second row of panels) for the JA-abs (left) and EW-Bz (right) selection criteria. Solid black is the PMF obtained from 10 000 standard SMD simulations, solid gray is the exact PMF obtained from reversible simulations. Dashed curves in red, green, blue, yellow, and brown represent PMFs obtained from MB-ASMD simulations when the reaction coordinate is divided into 5, 10, 20, 40, ad 80 segments, respectively. All MB-ASMD runs are sampled over 400 trajectories per segment. Note that the standard SMD PMFs for the 10 Å/ns pulling simulations overlap onto the reversible PMF.

FIG. 6. Comparison of the PMFs obtained from the MB-ASMD methods pulling at 100 Å/ns (top) and 10 Å/ns (bottoms) for the JA-abs (left) and EWBz (right) selection criteria. Solid black is the PMF obtained from 10 000 standard SMD simulations, solid gray is the exact PMF obtained from reversible simulations. Dashed curves in red, green, blue, yellow, and brown represent PMFs obtained from MB-ASMD simulations when the end configurations obtained at the end of each segment are further relaxed for 2 ps, 10 ps, 50 ps, 100 ps, and 200 ps, respectively. All MB-ASMD runs are sampled over 400 trajectories per segment. Note that the standard SMD PMFs for the 10 Å/ns pulling simulations overlap onto the reversible PMF.

SMD and ASMD. One possible way to overcome this and obtain PMFs as close as the reversible PMF from non-reversible trajectories might be introducing frequent relaxation stages in between segments. Namely, the system is pulled irreversibly till the end of segment i followed by an additional relaxation stage with fixed ξ i before moving on to the next segment, i + 1. Ozer et al. showed that such relaxation would make the PMFs more similar to the reversible PMF even when the system is pulled at 100 Å/ns. However, as an artifact of selecting only one of the further-relaxed configurations, the PMF— although closer to the exact PMF—did not converge well as the duration of the relaxation is increased. With MB-ASMD, on the other hand, not only do the PMFs from both 100 Å/ns and 10 Å/ns simulations become more similar to the exact PMF, but also they converge as the relaxation period is lengthened (Fig. 6). This is a significant finding as it confirms our main hypothesis that the reversible PMF can be obtained from irreversible pulling runs—in which the pulling rate is as high as 1000 times the reversible speed—by alternating pulling segments to ξ i and environment-relaxation segments at fixed ξ i . Note, however, that the numerical integration during relaxation periods adds to the total time needed to unfold the peptide through simulation. B. Decaalanine structure and hydrogen bonding along the stretching coordinate

PMFs calculated using JE, by definition, do not have the same contribution from every trajectory sampled. Any observable that needs to be computed should be treated in the same sense that the contribution from significant events should be larger. We use a weighting scheme based on the normalized exponential work distribution, N Nˆ (i, ξt )e−βWi (ξt ) , (3.1) NH (ξt ) = i=1NH −βW (ξ ) i t i=1 e

where N is the total number of trajectories and Nˆ H (i, ξt ) is the number of hydrogen-bonds for the configuration i at some point t along the reaction coordinate ξ . The results shown here are from simulations with 400 trajectories per segment. This is chosen as a compromise to get reasonably accurate information on the hydrogen bond behavior of the helix-coil transformation of decaalanine without spending too much computing time. The results shown in Fig. 4 (Section III A) indicate that at 400 both JA-abs and EW-Bz converge, which means that sampling with 400 trajectories per segment is good enough to represent the information one would have got from reversible stretching. The overall results and correlation between the broken/formed hydrogen bonds and helix-coil transformation of decaalanine are identical to that reported in Ref. 15. So, herein, we will focus on the noise in the hydrogen bond data obtained from MB-ASMD and compare it to that from ASMD. The hydrogen bond counts in Fig. 7 and in Fig. 8 exhibit lessened discontinuities when obtained by MB-ASMD rather than ASMD. This is to be expected because ASMD samples a single trajectory at the beginning of each segment while MB-ASMD samples an ensemble of trajectories. In the fast pulling simulations (Fig. 7, top panel), MB-ASMD with EW-Bz does not show any big jumps like ASMD has at ca. 23 Å and ca. 28 Å and MB-ASMD with JA-abs has at ca. 23 Å. In the slower pulling simulations (Fig. 7, bottom panel), both ASMD and MB-ASMD yield smoother hydrogen bond data as the nonequilibrium process is carried out much closer to the reversible limit. The hydrogen bond contacts in Fig. 7 are preserved through the first half of the simulation and then are gradually lost thereafter until all are broken near the end of the stretch. Since the simulations take place in vacuum, the total number of hydrogen bonds remains so as to stabilize the helix until after the end-to-end distance is long enough that the

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

Ozer et al.

Average H-bond count

064101-7

6 5 4 3 2 1 0 6 5 4 3 2 1 0

J. Chem. Phys. 141, 064101 (2014)

ASMD MB-ASMD (JA-abs) MB-ASMD (EW-Bz)

15

20

25

30

end-to-end distance (Å) FIG. 7. Average number of hydrogen bonds forming the decaalanine helix is shown as a function of end-to-end distance obtained from 100 Å/ns (top) and 10 Å/ns (bottom) pulling simulations; 400 trajectories/segment in all cases. Black represents ASMD with JA, red MB-ASMD with JA-abs, and green MB-ASMD with EW-Bz selection criteria.

Average H-bond count

donor and acceptor atoms no longer satisfy the hydrogen bond formation criterion. However, the actual bonds change from α helix to 310 helix pairs in order to accommodate the lengthening peptide.15 If the experiment was not carried in vacuum, but rather in a water solvent, these bonds would have been broken more rapidly as the dangling hydrogens of the decaalanine peptide quickly find water oxygens to stabilize themselves.18 Although the average number of hydrogen bonds remains almost constant for the first 10–12 Å of stretching, that does not mean that the decaalanine helix is unchanged. On the contrary, the initial α-helix starts to break apart well before the peptide is stretched 12 Å, i.e., when end-to-end distance is 25 Å . Fig. 8 demonstrates how specific helical contacts are interchanging during that period. Similar to the overall number

6 5 4 3 2 1 0 6 5 4 3 2 1 0 6 5 4 3 2 1 0

6 5 4 3 2 1 0 6 5 4 3 2 1 0 6 5 4 3 2 1 0

i→i+3 i→i+4 i→i+5

15

20

25

30

15

20

25

30

end-to-end distance (Å) FIG. 8. Average number of (black) α-, (red) 310 -, and (green) π -helical contacts made by the decaalanine is shown as a function of end-to-end distance of the peptide obtained from 100 Å/ns (left column) and 10 Å/ns (right column) pulling simulations. Bottom, middle, and top panels represent data from ASMD with JA, MB-ASMD with EW-Bz, and MB-ASMD with JA-abs selection criteria, respectively. The number of trajectories sampled per segment is 400 in all cases.

of hydrogen bonds, MB-ASMD with both JA-abs and EW-Bz yields smoother data than ASMD with JA does. Characteristic α-helix i → i + 4 contacts start to break just after 4 Å of forced stretching. The constancy of the overall number of hydrogen bonds is due to the formation of i → i + 3 contacts; while the α-helix is deforming another stable helix structure, 310 -helix, is forming. The formation of a 310 helix in the absence of solvent was observed previously by Doruker and Bahar.23 It is worth noting that all of the specific helical contacts are broken by 14 Å of stretching in Fig. 8 but the total number of hydrogen bonds is still present until 18 Å of stretching in Fig. 7. This is due to the formation of shorter hydrogen bonds, i → i + 2, which is actually characteristic of a coil structure rather than a helix. One can conclude that the helix-coil transformation is completed when the end-to-end distance of the peptide is 27 Å but hydrogen bonds still play a significant role in stabilizing the decaalanine in vacuum. As expected and shown in Ref. 18, this behavior is quite different from the helix-coil transformation of decaalanine in solvent. In solvent, the stabilizing role of intra-peptide hydrogen bonds is transferred to the neighboring water molecules, i.e., α-helix is broken more rapidly and not replaced by 310 -helix as much as in vacuum. IV. CONCLUSIONS

Jarzynski’s equality (JE) is a beautiful theoretical result, but its practical utility depends upon convergence of the average over nonequilibrium trajectories in an amount of CPU time that is competitive with more mundane methods. As with so many computational techniques, the difficulty is that the average is dominated by trajectories that are “rare events,” so much of the effort does not contribute to the result. For the PMF obtained from JE to accurately describe the correct free energy profile of the corresponding reaction, an adequate number of independent trajectories must be generated to construct the essential features of the work distribution. This, however, is not always computationally feasible especially for complex systems such as macromolecular reactions, e.g., protein folding and ligand binding. Much research has been devoted to address this issue since the discovery of JE. In addition to ASMD,15, 16, 18 Echeverria and Amzel, for instance, used intermediate stage relaxations to generate a distribution that contains millions of work values from a small number of trajectories and hence to obtain a more accurate PMF.14 Chelli et al. later used an adaptive scheme related to ASMD to eliminate trajectories on the fly that diverge significantly away from the same work average.24 Bussi and coworkers used a reweighing scheme that is based on normalizing the work distribution along the reaction coordinate so that the work trajectories fluctuate around Jarzynski’s work average.25 Oberhofer et al. formulated a large-time-step fast-switching methodology to map the phase space of a pulling process using multiple time steps.26 MacFadyen and Andricioaei developed a “skewed momenta” method where the rare transition states with high energetic barriers could be sampled by drawing momenta from an artificially enhanced biased distribution.27

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

064101-8

Ozer et al.

MB-ASMD proposed in this work is based on the ASMD methodology and aims to limit the fluctuations in the nonequilibrium work around multiple significant paths. ASMD is already a considerable step forward towards eliminating noncontributing trajectories, but follows a single path for each complete unfolding or any reaction simulated, keeping only one trajectory through each segment. Here we considered that multiple branching of the paths, MB-ASMD, based on a probabilistic determination of how a trajectory propagates into the next segment, might yield a better, and more efficient, representation of the PMF. As in ASMD, the reaction coordinate is divided into some number of segments, N, and the starting configurations for segment i + 1 are chosen among the end-configurations obtained from segment i. We propose two weighting scheme—based on their corresponding work trajectories—to assign probabilities of survival for each of the end-configurations, and then randomly select some of them based on the assigned probabilities. The two selection criteria we apply are (i) JA-abs, e−|J A−w|/kT , which is basically favoring configurations with work values close to the Jarzynski’s average, and (ii) EW-Bz, e−w/kT , which is favoring configurations with small work values. In the MB-ASMD scheme, the number of trajectories selected or how many times they are selected are completely arbitrary, and based solely on the distribution of work trajectories and weighting scheme. In this way, MB-ASMD is more likely to carry information from the initial equilibrium ensemble throughout the nonequilibrium process and yield a more accurate picture of the PMF along the studied reaction coordinate, depending of course on how many trajectories are sampled and how many basins of attraction are dominant in the free energy profile. This is not an artifact of MB-ASMD but rather a requirement coming directly from the JE itself as JE is exact when all possible realizations are averaged. In the present paper we benchmark MB-ASMD compared to the standard SMD8, 9 and ASMD.15, 16 Indeed, MBASMD is shown to work well. In particular, the EW-Bz selection criterion yields extremely good convergence to the exact PMF—reversible work—even when the pulling is 100 times of the reversible limit where the RMS error is 2.5% (SMD yield is 5.9%) with only 50 trajectories per segment and falls to 1.0% (SMD yield is 3%) for 800 trajectories. In the faster pulling simulations, when the pulling speed is 1000 times higher than the reversible limit, MB-ASMD with EW-Bz again converges significantly faster than SMD as shown in top panel of Fig. 4. Another important note is that, when combined with relaxation after pulling through each segment, MB-ASMD with both selection scheme yields extremely good convergence on to the reversible PMF in both fast and slow pulling limits, proving to be an efficient “enhanced sampling” algorithm of nonequilibrium free energy computations. Having established the method on a simple test system, we will next apply it to more challenging problems

J. Chem. Phys. 141, 064101 (2014)

with complex reaction routes and free energy profiles that are likely to involve multiple basins of attraction. ACKNOWLEDGMENTS

This work has been partially supported by NSF Grant Nos. CHE08-48427 and CHE08-33605 to Boston University and NSF Grant No. CHE-1112067 to Georgia Institute of Technology. The computing resources necessary for this research were provided in part by the National Science Foundation through XSEDE resources provided by the NICS (National Institute for Computational Infrastructure) XT5 Cray Linux Cluster (Kraken) at the University of Tennessee under Grant No. TG-CHE120001. 1 C.

Jarzynski, Phys. Rev. E 56, 5018 (1997); Phys. Rev. Lett. 78, 2690 (1997). 2 J. Liphardt, S. Dumont, S. B. Smith, I. Tinoco, Jr., and C. Bustamante, Science 296, 1832 (2002). 3 F. Douarche, S. Ciliberto, A. Petrosyan, and L. Rabbiosi, Europhys. Lett. 70, 593 (2005). 4 N. C. Harris, Y. Song, and C.-H. Kiang, Phys. Rev. Lett. 99(6), 068101 (2007). 5 S. X. Sun, J. Chem. Phys. 118, 5769 (2003). 6 R. Berkovich, J. Klafter, and M. Urbakh, J. Phys.: Condens. Matter 20, 354008 (2008). 7 J. Torras, G. de M. Seabra, and A. E. Roitberg, J. Chem. Theory Comput. 5, 37 (2009). 8 S. Park, F. Khalili-Araghi, E. Tajkhorshid, and K. Schulten, J. Chem. Phys. 119, 3559 (2003). 9 S. Park and K. Schulten, J. Chem. Phys. 120, 5946 (2004). 10 R. Nome, J. Zhao, W. Hoff, and N. Scherer, Proc. Natl. Acad. Sci. U.S.A. 104, 20799 (2007). 11 G. Hummer and A. Szabo, Proc. Natl. Acad. Sci. U.S.A. 107, 21441–21446 (2010). 12 B. Nandy, D. H. Bindu, N. M. Dixit, and P. K. Maiti, J. Chem. Phys. 139, 024905 (2013). 13 M. Valente, S. F. Sousa, A. Magalhaes, and C. Freire, J. Phys. Chem. B 116(6), 1843 (2012). 14 I. Echeverria and L. M. Amzel, Proteins: Struct., Funct., Bioinf. 78, 1302 (2010). 15 G. Ozer, S. Quirk, and R. Hernandez, J. Chem. Phys. 136, 215104 (2012). 16 G. Ozer, E. Valeev, S. Quirk, and R. Hernandez, J. Chem. Theory Comput. 6, 3026 (2010). 17 A. Hazel, C. Chipot, and J. C. Gumbart, J. Chem. Theory Comput. 10, 2836 (2014). 18 G. Ozer, S. Quirk, and R. Hernandez, J. Chem. Theory Comput. 8, 4837 (2012). 19 C. Jarzynski, Annu. Rev. Condens. Matter Phys 2, 329 (2011). 20 J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, and K. Schulten, J. Comput. Chem. 26, 1781 (2005). 21 B. R. Brooks, R. E. Bruccoleri, R. E. Olafson, D. J. States, S. Swaminathan, and M. Karplus, J. Comput. Chem. 4, 187 (1983). 22 W. Humphrey, A. Dalke, and K. Schulten, J. Mol. Graph. 14, 33 (1996). 23 P. Doruker and I. Bahar, Biophys. J. 72, 2445 (1997). 24 R. Chelli, C. Gellini, G. Pietraperzia, E. Giovannelli, and G. Cardini, J. Chem. Phys. 138, 214109 (2013). 25 T. N. Do, P. Carloni, G. Varani, and G. Bussi, J. Chem. Theory Comput. 9, 1720 (2013). 26 H. Oberhofer, C. Dellago, and S. Boresch, Phys. Rev. E 75, 061106 (2007). 27 J. MacFadyen and I. Andricioaei, J. Chem. Phys. 123(7), 074107 (2005).

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 130.113.86.233 On: Wed, 17 Dec 2014 18:52:44

Multiple branched adaptive steered molecular dynamics.

Steered molecular dynamics, SMD, [S. Park and K. Schulten, J. Chem. Phys. 120, 5946 (2004)] combined with Jarzynski's equality has been used widely in...
2MB Sizes 2 Downloads 4 Views