Protein Engineering, Design & Selection vol. 27 no. 10 pp. 365–374, 2014 Published online May 8, 2014 doi:10.1093/protein/gzu017

Structure-based approach to the prediction of disulfide bonds in proteins Noeris K. Salam1, Matvey Adzhigirey, Woody Sherman and David A. Pearlman1 Schro¨dinger, 120 West 45th Street, 17th Floor, New York, NY 10036, USA 1

To whom correspondence should be addressed. E-mail: noeris.salam@ schrodinger.com (N.K.S.); [email protected] (D.A.P.)

Edited by David Thirumalai

Protein engineering remains an area of growing importance in pharmaceutical and biotechnology research. Stabilization of a folded protein conformation is a frequent goal in projects that deal with affinity optimization, enzyme design, protein construct design, and reducing the size of functional proteins. Indeed, it can be desirable to assess and improve protein stability in order to avoid liabilities such as aggregation, degradation, and immunogenic response that may arise during development. One way to stabilize a protein is through the introduction of disulfide bonds. Here, we describe a method to predict pairs of protein residues that can be mutated to form a disulfide bond. We combine a physics-based approach that incorporates implicit solvent molecular mechanics with a knowledge-based approach. We first assign relative weights to the terms that comprise our scoring function using a genetic algorithm applied to a set of 75 wild-type structures that each contains a disulfide bond. The method is then tested on a separate set of 13 engineered proteins comprising 15 artificial stabilizing disulfides introduced via site-directed mutagenesis. We find that the native disulfide in the wild-type proteins is scored well, on average (within the top 6% of the reasonable pairs of residues that could form a disulfide bond) while 6 out of the 15 artificial stabilizing disulfides scored within the top 13% of ranked predictions. Overall, this suggests that the physics-based approach presented here can be useful for triaging possible pairs of mutations for disulfide bond formation to improve protein stability. Keywords: cysteine mutation/cysteine scanning/disulfide prediction/enzyme design/protein engineering

Introduction The incorporation of disulfide bonds into engineered proteins has been shown to enhance the structural integrity of the native protein conformation, resulting in improvements to several properties (Dombkowski et al., 2013). By lowering the configurational entropy of the unfolded polypeptide (Flory, 1956; Pace et al., 1988; Vaz et al., 2006), as well as improving enthalpic properties of the protein (Kuroki et al., 1992; Vaz et al., 2006), disulfide bonds have the potential to increase

# The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected]

365

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

Received July 24, 2013; revised March 3, 2014; accepted April 11, 2014

thermal stability (Matsumura et al., 1989; Le et al., 2011), enhance chemical and enzymatic resilience (Matsumura et al., 1989; Siadat et al., 2006), decrease the chance of misfolding (Ma˚rtensson et al., 2002) and improve the overall protein functionality (Wahlberg and Ha¨rd, 2006), including enhancements to therapeutic efficacy in biotherapeutics (Kim et al., 2012). However, engineering a stabilizing disulfide bond remains a significant challenge, due to the difficulty of predicting the conformation and thermodynamics of an engineered disulfide. Indeed, random mutations generally lead to less stable proteins (Dobson, 2003). In many cases, rational design of disulfides takes the form of manual inspection and/or sequence comparisons with evolutionary stable homologs, if present (Ma˚rtensson et al., 2002). This approach can be more productive than random experiments but lacks the ability to quantify and rank potential disulfides within a protein. While computational methods to predict stabilizing disulfide bonds do exist (Burton et al., 2000; Dani et al., 2003; Dombkowski, 2003), published validation studies that directly compare predictions to experiment are lacking. In this work, we assess the performance of a new method for predicting protein residue positions where mutation to cysteine might favor disulfide formation and improve thermal stability. As part of this study, we assemble suitable sets of proteins that can be used as a benchmark set for disulfide prediction. The first dataset (dataset 1), representing naturally occurring disulfides, consists of 75 publicly available, highresolution protein crystal structures, each with a single native disulfide bond. To test our method, the disulfide bond in each protein is first mutated to a pair of alanine residues and the structure is relaxed. Then, the method is applied to the mutated protein to determine if the pair of alanine residues is highly ranked as a candidate for formation of a disulfide upon mutation back to a pair of cysteines. The second dataset (dataset 2) consists of 13 engineered proteins comprising a total of 15 artificial stabilizing disulfide mutants (Dani et al., 2003). Here, the method is applied directly to the wild-type protein structure, which lacks a stabilizing disulfide bond, and then the possible disulfide mutants are ranked. Dataset 2 represents a much more challenging but realistic scenario—substantial backbone and side-chain rearrangement might be needed to accommodate the introduced cysteine residues, whereas in dataset 1 the structure is preformed to accommodate the wild-type disulfide. Within this study, the effects of improving the rotamer sampling for each cysteine are assessed, as are the outcomes of different structural relaxation protocols around the disulfide bonds. For each protein, a disulfide bond is introduced at all potential residue pairs within a distance cutoff. The disulfide is then scored based on a combination of non-bonded interactions, strain energy and geometric filters. The score can be used to rank the residue pairs based on their relative stabilizing effects, although no attempt is made in this work to compute

N.K.Salam et al.

Methods

Dataset 1: native disulfides Protein coordinates were retrieved from the RCSB Protein Data Bank (PDB) (Berman et al., 2000) that matched the following seven criteria: (i) macromolecule type: protein ¼ yes; DNA ¼ no; RNA ¼ no; (ii) number of chains (asymmetric unit): 1-1; (iii) number of chains (biological assembly): 1-1; (iv) number of disulfides: 1-1; (v) molecular weight (struc˚ ; (vii) ture): 10 000 – 30 000; (vi) X-ray resolution: 0 – 1.5 A remove similar sequences with 90% identity. A total of 75 protein structures were found, and categorized into two groups: (i) initial set of 15—1ABA 1GV9 1LF7 1T2J 1UNR 1VHU 1WCU 1XT5 1ZK5 2A6Y 2I4A 2P39 2PY0 3HNB 3L4R; and (ii) additional set of 60—1C7K 1DYQ 1KNG 1LJU 1M40 1MF7 1MJN 1NKO 1OAL 1OLR 1P3C 1QGV 1QK8 1R26 1RIE 1SHU 1T2I 1XBU 1Y9L 2A6Z 2AQM 2CE0 2E0Q 2ERF 2FWG 2HSH 2I1U 2ICC 2NWF 2P52 2QO4 2RKQ 2VYO 2XFD 2YXF 3CB9 3E8T 3EDI 3FSA 3FZ4 3GA4 3GNZ 3GUI 3HZ8 3KFF 3M1W 3O22 3RT2 3RXW 3SEB 3SH4 3T0V 3TPK 3VOR 3ZYP 4EQ8 4F0W 4FH4 4FTF 4HWM. See Table V for the list of all PDB IDs and the associated X-ray resolution.

Dataset 2: engineered stabilizing disulfides Starting with a published set of 24 proteins with known engineered disulfides (Dani et al., 2003), 13 proteins with a total of 15 stabilizing disulfides were suitable for studying in this work. Eleven proteins were removed because the wild-type protein contained Cb...Cb distance of the pre-mutated disul˚ . While it is our eventual aim to treat residues fide pair .5.0 A that require large-scale movements to accommodate the engineered disulfide bond, the current method only allows movements accessible through side-chain sampling and backbone minimization. Dataset 2 contains the following proteins (with the chain denoted after the colon, where multiple chains exist): 1FG9:A, 1LMB, 1RNB, 1SNO, 1XNB, 2CBA, 2CI2, 366

2LZM, 2RN2, 2ST1, 3GLY, 4DFR and 9RAT. Within this set, experimental data for 15 stabilizing mutants was available.

Structure preparation For dataset 1, the cysteine residues comprising the disulfide bond within the wild-type protein were first mutated to alanine using the Residue and Loop Mutation task within the BioLuminate software package (BioLuminate version 1.1, Schro¨dinger, LLC, New York, NY, USA). For both datasets 1 and 2, each structure was prepared using the Protein Preparation Wizard (Madhavi Sastry et al., 2013) in Maestro (Maestro version 9.4, Schro¨dinger, LLC). Bond orders and formal charges were assigned, and hydrogens were added to all atoms in the system. To optimize the hydrogen bond network, His tautomers and ionization states were predicted, 1808 rotations of the terminal dihedral angles of Asn, Gln and His residues were assigned and hydroxyl and thiol hydrogens were sampled. Crystallographic waters in all structures were removed before performing calculations. Modest relaxation was introduced through all-atom constrained minimization carried out using the Impact Refinement module (Impref ) (Impact v5.9, Schro¨dinger, LLC) with the OPLS_2005 force field (Jorgensen et al., 1996; Shivakumar et al., 2010), to alleviate steric clashes that may exist in the original PDB structures. The minimization was terminated when the energy converged or the root mean square deviation (RMSD) reached a maximum cutoff of 0.30 kcal/mol-A for heavy atoms.

Disulfide prediction Disulfide predictions were carried out using the cysteine mutation panel within BioLuminate. For each protein, all residue ˚ were considered as pairs with a Cb...Cb distance within 5.0 A potential disulfides and were subsequently mutated to cysteine. In the case of glycine, which does not have a Cb atom, the distance was taken from the hydrogen attached to the Ca that would be replaced by the Cb of the cysteine. Each potential residue pair was mutated independently and the rotamer state of each cysteine was conformationally sampled using a rotamer library containing 49 states determined from an analysis of the PDB (see Supplementary data). The rotamer pair ˚ was chosen to with a sulfur – sulfur distance closest to 2.0 A form the disulfide. Using Prime, energy minimization with the OPLS_2005 force field, either in the gas phase or with the VSGB2.0 implicit solvent model, was then performed to relax the geometry of the introduced disulfide. Relaxation of neighboring residues was also explored. Multiple minimization protocols were explored in this work to determine which performs best: (i) minimization of the two disulfide bond cysteine residues only, (ii) minimization of the two disulfide bond cysteine residues plus the two adjacent residues in the sequence on either side of the cysteine (X1X2 – Cys– Cys – Y1Y2); or (iii) minimization of the two disulfide bond cysteine residues plus ˚ shell from either residues containing any atom within a 5 A residue of the disulfide.

Genetic algorithm The disulfide prediction scoring function consists of a linear weighted sum of several coefficients. The values of these weight terms (W) were determined using a standard GA optimization. The general purpose GA PIKAIA was used (Davis, 1991; Charbonneau, 1995, 2002), along with an energy routine

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

absolute thermal stabilization upon introduction of the disulfide bond. The ranking procedure is initially applied to a training set of 15 proteins to determine optimal parameters for the physics-based scoring function. Once these parameters are determined, we include the geometric filter term and increase the dataset by an additional 60 high-resolution crystal structures (each manifesting a single native disulfide bond), bringing the total dataset to 75 structures (dataset 1). The 75 structures are randomly divided into training and test sets, and a genetic algorithm (GA) is used to adjust the weighting factors for the various terms in the scoring function to optimize the predictivity of the function. Next, we apply the method without any further training to a set of 13 proteins containing a total of 15 engineered disulfide bonds (dataset 2) to demonstrate the performance on the real-world scenario of predicting stabilizing disulfides in a wild-type protein. Dataset 1 represents a best-case scenario because the crystal structures are preformed to make the native disulfide bond; it serves as an initial survey on the method’s accuracy before it is tested on dataset 2 for de novo prediction of stabilizing disulfide bonds. Furthermore, both datasets can be used as the basis for assessing and comparing the performance of other methods.

Disulfide prediction

specially written to calculate the function we wanted to minimize, F(W). F(W) was formulated as: FðWÞ ¼

XNTRAIN XNCLOSEðiÞ i¼1

j¼1

RðSðWÞÞXRAY

ð1Þ

Results and discussion The work to identify mutations that could lead to viable cysteine disulfide bonds was carried out in multiple stages: (i) accumulation of native disulfide-containing protein structures to use as benchmark data; (ii) comparison of molecular mechanics methods to evaluate the structures; (iii) inclusion of additional scoring terms derived from an analysis of cysteine bond geometries in reported PDB structures; (iv) determination of appropriate relative weights for terms in the combined scoring function; (v) combination of the scoring function with a binary decision tree process, which ensures that favorable interactions for predicted geometries are acceptably balanced; and (vi) validation on the non-native-engineered stabilizing disulfide bonds (dataset 2). There is a paucity of public experimental data relating to wild-type crystal structures with engineered mutations to create stabilizing disulfide bonds. For this reason, we chose to first identify a large set (75) of structures that could be used for native disulfide bond prediction (dataset 1) before validating the procedure on the few structures (13) with known stabilizing disulfide bond data (dataset 2). For dataset 1, the two cysteine (Cys) residues that create the disulfide are mutated to alanine (Ala) residues, after which minimization of the resulting structure is performed to reduce biases in the structure toward the native disulfide. Next, all pairs of residues for which the ˚ are considered as disulfide candidates Cb...Cb distance is 5 A (typically 50–100 residue pairs for a protein). For each pair, the residues are both mutated to Cys, a rotamer scan is performed and a disulfide bond is formed with the best rotamer geometries. The mutated structures are then minimized and scored based on a combination of the molecular mechanics energetics and the likelihood of the disulfide conformation, evaluated using a function derived from crystal structure data. The final score for assessing the predictive capabilities of the method is based on

Table I. FRO calculated using gas phase energy function and no cysteine rotamer sampling for the 15 target exploratory set (dataset 1—native disulfides) PDB ID

Minimization extent Cys– Cys

1ABA 1GV9 1LF7 1T2J 1UNR 1VHU 1WCU 1XT5 1ZK5 2A6Y 2I4A 2P39 2PY0 3HNB 3L4R Average Median

X1X2 –Cys – Cys–Y1Y2

˚ Cys –Cys þ 5 A shell

Ranka

FROb

Ranka

FROb

Ranka

FROb

3 22 16 17 17 22 31 8 5 61 14 2 3 12 18

0.11 0.18 0.21 0.30 0.47 0.22 0.42 0.13 0.06 0.56 0.38 0.03 0.07 0.15 0.26 0.237 0.213

1 24 10 5 15 51 43 42 5 8 17 10 28 13 43

0.04 0.20 0.13 0.09 0.42 0.52 0.59 0.66 0.06 0.07 0.46 0.16 0.61 0.16 0.62 0.319 0.198

6 51 15 2 13 33 34 12 11 28 17 14 31 34 54

0.22 0.42 0.20 0.04 0.36 0.33 0.47 0.19 0.13 0.26 0.46 0.23 0.67 0.43 0.78 0.345 0.333

a

Rank begins from 0 for the best ranked residue pair. FRO, fractional rank order; the scored rank order of the native cysteine bond ˚ , using calculation 11, divided by the total number of residue pairs within 5 A Table VI. b

367

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

where R(S(W))XRAY is the predicted rank order (0 is best) of the disulfide that is experimentally observed in the protein, ˚ ) for the relative to all the potential disulfides (Cb  5 A protein i, NCLOSE(i), based on the energy function S(W) (see Equations (2) – (6) and Table VI). The 75 X-ray structures in dataset 1 was randomly divided into 50% training and 50% test data, and the outer summation for the GA was over the 50% of the structures in the training set. Cross-validation was carried out by subsequently applying the S(W) to rank order the proteins in the test set. Full GA optimization of F(W) was repeated 50 times with different random number seeds (changing both the partition of the training and test data and the random path of the GA itself ). For the GA optimization, a population of 200 members and 1000 generations were performed. For all optimizations, the number of chromosome segments was 7, the length of each chromosome segment was 5, the crossover probability was 0.85, the initial mutation rate was 0.005 and the maximum mutation rate was 0.25. A uniform, variable rate mutation mode was employed and a steady-state-replace-worst reproduction plan was chosen.

the rank of the native disulfide divided by the total number of pairs evaluated, which produces a fractional rank order (FRO). ˚ cutoff and the For example, if 100 residue pairs satisfy the 5 A rank order of the native disulfide pair is 6, then the FRO is 0.06. The FRO allows for direct comparisons among the predictions of different proteins with different numbers of potential disulfide candidates. For the first phase of our work, we initially identified a total of 15 protein crystal structures in the PDB with high resolution ˚ ) and only one disulfide bond in the structure (termed (,1.5 A ‘exploratory set’). For these 15 structures, we calculated the change in potential energy upon disulfide formation. At first, creation and minimization of the mutated cysteine bonds were carried out starting with arbitrary conformations for the two Cys side chains that were created and linked ( pre-existing rotamer states for each residue prior to Cys mutation, combined with extended geometries for any missing torsions). The resulting disulfide mutants were scored with the OPLS_2005 force field (Jorgensen et al., 1996; Shivakumar et al., 2010) in the gas phase and using an implicit solvent model (VSGB2.0) (Li et al., 2011). In addition, three minimization protocols were examined: (i) minimization of only the two Cys resides in the disulfide (Cys – Cys); (ii) minimization of the two Cys residues in the disulfide bond plus the adjacent residues to each of the Cys residues (X1X2 – Cys– Cys– Y1Y2); and (iii) minimization of the two Cys residues in the disulfide plus all ˚ of the cysteine disulfide (Cys – Cys þ residues within 5 A ˚ 5 A). Gas phase results are shown in Table I and implicit solvent results are presented in Table II. Comparing the FRO values in these two tables indicate clearly that the implicit solvent approach provides superior predictions (lower FRO) for all minimization protocols. Next, we ran the same sets of calculations, but this time performing conformational pre-sampling before forming the

N.K.Salam et al.

Table II. FRO calculated using implicit solvent energy function and no cysteine rotamer sampling for the 15 target exploratory set (dataset 1—native disulfides)

Table III. FRO calculated using gas phase energy function and enhanced cysteine sampling for the 15 target exploratory set (dataset 1—native disulfides)

PDB ID

PDB ID

Minimization extent Cys– Cys

X1X2 –Cys – Cys–Y1Y2

˚ Cys –Cys þ 5 A shell

Cys– Cys Ranka 5 18 13 6 14 25 37 2 7 27 10 0 3 10 14

Ranka

FROb

Ranka

FROb

Ranka

FROb

2 11 2 2 6 6 14 14 15 94 4 2 2 22 9

0.07 0.09 0.03 0.04 0.17 0.06 0.19 0.22 0.17 0.86 0.11 0.03 0.04 0.28 0.13 0.166 0.108

2 14 5 5 6 5 32 11 13 81 4 4 2 21 11

0.07 0.12 0.07 0.09 0.17 0.05 0.44 0.17 0.15 0.74 0.11 0.07 0.04 0.26 0.16 0.180 0.116

2 18 2 3 10 22 41 8 67 89 4 3 3 37 11

0.07 0.15 0.03 0.05 0.28 0.22 0.56 0.13 0.77 0.82 0.11 0.05 0.07 0.46 0.16 0.261 0.149

a

Rank begins from 0 for the best ranked residue pair. FRO, fractional rank order, the scored rank order of the native cysteine bond ˚ , using calculation 12, divided by the total number of residue pairs within 5 A Table VI. b

disulfide bond. The torsions of the two Cys side chains were sampled using a standard rotamer library (see Supplementary data) to find the pair of Cys side-chain conformations that placed the Cys sulfur atoms closest to the ideal disulfide bond ˚ ). After pre-sampling, subsequent minimizadistance (2.04 A tion and scoring were performed as before. The results reveal that rotamer sampling significantly improves the average FRO in both the gas phase and implicit solvent cases (see Tables III and IV). Focusing on the minimization protocols that yield the best results, the average/median FRO decreases from 0.237/ 0.213 to 0.186/0.173 using a gas phase potential, and from 0.166/0.108 to 0.074/0.074 using the implicit solvent energy function. The above analysis suggests that a protocol combining the implicit solvent energy function, minimization of the X1X2 – Cys – Cys– Y1Y2 sextet and pre-minimization rotamer optimization of the cysteine disulfide yields the best results. As a result of this analysis, we used this protocol for all subsequent work. To further improve the results, we introduced another term to the scoring function, Egeom, to reflect the distribution of cysteine disulfide geometries observed in the PDB. Based on an analysis of the geometries of the internal coordinates of 11 399 disulfide bonds extracted from the PDB (see Supplementary data), we created scoring terms for each of the 31 unique bonds, angles and torsions in the two residues comprising the disulfide. A net geometry score, Egeom, was created from a linear sum of these 31 terms, and combined with the implicit solvent energy terms: S ¼ DEintra þ W1  DEinter þ W2  DEgeomðpreÞ þ W3  DEgeomðpostÞ

ð2Þ

Here, S is the scoring function, DEintra is the change in the implicit solvent energies arising from a combination of internal 368

1ABA 1GV9 1LF7 1T2J 1UNR 1VHU 1WCU 1XT5 1ZK5 2A6Y 2I4A 2P39 2PY0 3HNB 3L4R Average Median

X1X2 –Cys – Cys– Y1Y2

˚ Cys–Cys þ 5 A shell

FROb

Ranka

FROb

Ranka

FROb

0.19 0.15 0.17 0.11 0.39 0.25 0.51 0.03 0.08 0.25 0.27 0.00 0.07 0.13 0.20 0.186 0.173

0 25 16 3 19 36 27 11 11 18 18 9 24 8 35

0.00 0.21 0.21 0.05 0.53 0.36 0.37 0.17 0.13 0.17 0.49 0.15 0.52 0.10 0.51 0.264 0.207

2 48 4 11 21 47 34 8 6 39 11 14 2 14 44

0.07 0.40 0.05 0.19 0.58 0.47 0.47 0.13 0.07 0.36 0.30 0.23 0.04 0.18 0.64 0.278 0.230

a

Rank begins from 0 for the best ranked residue pair. FRO, fractional rank order; the scored rank order of the native cysteine bond ˚ , using calculation 11, divided by the total number of residue pairs within 5 A Table VI. b

Table IV. FRO calculated using implicit solvent energy function and enhanced cysteine sampling for the 15 target exploratory set (dataset 1—native disulfides) PDB ID

Minimization extent Cys– Cys

1ABA 1GV9 1LF7 1T2J 1UNR 1VHU 1WCU 1XT5 1ZK5 2A6Y 2I4A 2P39 2PY0 3HNB 3L4R Average Median

X1X2 –Cys – Cys– Y1Y2

˚ Cys–Cys þ 5 A shell

Ranka

FROb

Ranka

FROb

Ranka

FROb

2 13 2 5 3 6 14 2 11 16 4 4 6 6 10

0.07 0.11 0.03 0.09 0.08 0.06 0.19 0.03 0.13 0.15 0.11 0.07 0.13 0.08 0.14 0.097 0.088

2 8 0 2 3 4 15 2 11 10 3 2 3 7 6

0.07 0.07 0.00 0.04 0.08 0.04 0.21 0.03 0.13 0.09 0.08 0.03 0.07 0.09 0.09 0.074 0.074

3 24 2 2 6 8 24 0 15 20 4 3 5 8 12

0.11 0.20 0.03 0.04 0.17 0.08 0.33 0.00 0.17 0.18 0.11 0.05 0.11 0.10 0.17 0.123 0.109

a

Rank ordering from 0 for the best ranked residue pair. FRO, fractional rank order; the scored rank order of the native cysteine bond ˚ , using calculation 12, divided by the total number of residue pairs within 5 A Table VI. b

coordinates and non-bonded interactions among atoms of the residue pair upon mutation and formation of the disulfide bond. DEinter is the change in the implicit solvent non-bonded interactions between the atoms of the residue pair and the rest of the system upon mutation and formation of the disulfide bond. Egeom(pre) is the geometric term described above, applied to the

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

1ABA 1GV9 1LF7 1T2J 1UNR 1VHU 1WCU 1XT5 1ZK5 2A6Y 2I4A 2P39 2PY0 3HNB 3L4R Average Median

Minimization extent

Disulfide prediction

value calculated using just the implicit solvent energy function (no Egeom term), we find that that value decreases from 0.111 to 0.088 and the median decreases from 0.087 to 0.059, confirming that including Egeom( pre) does improve the predictive ability of S, in a significant way (P ¼ 0.03). Interestingly, the Egeom( post) coefficient (W3) from the GA optimization is small (0.07), so a second set of GA calculations was performed after setting W3 ¼ 0. The results (calculations 3 – 4) demonstrate that the changes in the target ,FRO. function are negligible (0.089 in both cases, using all the data for training), confirming that the gain in predictive ability comes from the geometry term before minimization, Egeom(pre). Examining the coefficients from the previous GA (calculation 4), we find that W1 is 0.94, close to 1.0 (and much different than W2), suggesting that a unitary weight for both DEintra and DEinter may be acceptable. This makes sense, since these terms are derived from the same physics-based energy function and have the same units (kcal/mol). This suggested a new GA set, carried out adjusting only W2 in: S ¼ DEintra þ DEinter þ W2  DEgeomðpreÞ

ð3Þ

The results for this GA are calculations 5 – 6 in Table VI. Comparing the results to those for the previous calculation where separate weight terms were used for DEintra and DEinter shows that ,FRO. is nearly unchanged (0.089 versus 0.090), and the average value of the median FRO improves from 0.059 to 0.054. Therefore, the best net scoring function reflects only a single coefficient for the geometry term: S ¼ DEintra þ DEinter þ 0:27  EgeomðpreÞ

ð4Þ

This produces mean and median FRO values of 0.090 and 0.054, respectively, which are better than the FRO values calculated excluding the Egeom term (0.111 and 0.087, respectively; P-value ¼ 0.04). One limitation of linear scoring functions, such as Equation (4), is that sub-optimal conformations can be favorably scored because the energy function is dominated by a single energy term (either DEintra or DEinter). This arises from the non-linear

Table V. High-resolution crystal structures used in GA parameter optimization studies for the full set (dataset 1—native disulfides) PDB ID

˚) Resolution (A

PDB ID

˚) Resolution (A

PDB ID

˚) Resolution (A

PDB ID

˚) Resolution (A

1ABA 1C7K 1DYQ 1GV9 1KNG 1LF7 1LJU 1M40 1MF7 1MJN 1NKO 1OAL 1OLR 1P3C 1QGV 1QK8 1R26 1RIE 1SHU

1.45 1.00 1.50 1.46 1.14 1.20 1.40 0.85 1.25 1.30 1.45 1.50 1.20 1.50 1.40 1.40 1.40 1.50 1.50

1T2I 1T2J 1UNR 1VHU 1WCU 1XBU 1XT5 1Y9L 1ZK5 2A6Y 2A6Z 2AQM 2CE0 2E0Q 2ERF 2FWG 2HSH 2I1U 2I4A

1.10 1.50 1.25 1.34 1.50 1.20 1.15 1.50 1.40 1.42 1.00 1.10 1.24 1.49 1.45 1.10 1.35 1.30 1.00

2ICC 2NWF 2P39 2P52 2PY0 2QO4 2RKQ 2VYO 2XFD 2YXF 3CB9 3E8T 3EDI 3FSA 3FZ4 3GA4 3GNZ 3GUI 3HNB

1.20 1.10 1.50 1.50 1.35 1.50 1.50 1.50 1.19 1.13 1.31 1.30 1.40 0.98 1.38 1.30 1.35 1.45 1.15

3HZ8 3KFF 3L4R 3M1W 3O22 3RT2 3RXW 3SEB 3SH4 3T0V 3TPK 3VOR 3ZYP 4EQ8 4F0W 4FH4 4FTF 4HWM

1.45 0.96 1.45 1.38 1.40 1.50 1.26 1.48 1.50 1.45 1.30 0.90 1.50 1.39 1.24 1.09 1.48 1.38

369

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

cysteine disulfide after side-chain optimization but before full coordinate minimization. Egeom(post) is the geometric term described above, applied after both side-chain optimization and coordinate minimization. Weighting constants W1, W2 and W3 appear before the individual terms and are necessary to combine the Egeom contributions with the two terms from the traditional energy function. To determine appropriate weights for W1, W2 and W3, we chose to use a GA to optimize the ability of S to minimize the overall ranking of the native disulfides, FRO, over multiple proteins. Our initial set of validation test proteins was too small to appropriately parameterize our function, so we identified an additional 60 high-resolution protein crystal structures ˚ ), each of which contained only one disul(resolution 1.5 A fide bond, bringing the total number of structures for our GA optimization to 75 (termed ‘full set’; see Table V). The GA was carried out using a standard implementation (Davis, 1991; Charbonneau, 1995, 2002). Cross-validation was performed with 50 separate GA optimizations and the results were averaged to ensure that there was no overfitting. For each optimization, a different random number seed was used to separate the 75 PDB structures into 50% ‘training’ and 50% ‘test’ data. For each optimization, the GA minimized the average value of FRO (,FRO.) for all the training sets. At the end of each GA optimization, the refined coefficients were used to calculate ,FRO. for the test data that were set aside. At the end of the 50 separate GA optimizations, the averaged values of ,FRO. train and ,FRO. test were calculated over all 50 independent cycles. If the fitted coefficients are meaningful (not overfit), then we expect ,FRO. train and ,FRO. test to be similar. Assuming they are, a final single GA with all the data used as ‘training’ data is carried out to get a final set of fitted coefficients that reflect all the available data. An initial GA optimization was carried out to optimize all the coefficients in Equation (2), W1, W2 and W3. The results of this optimization are provided in Table VI, calculations 1 –2. As can be seen, the averaged training and test results are very similar (0.088 and 0.096, respectively), indicating that we are not appreciably overfitting the data. Comparing the value of FRO calculated for all 75 proteins using the GA method to the

N.K.Salam et al.

370 Table VI. GA optimization results for the full set (dataset 1—native disulfides) FROa

Coefficients fit to all sets

Train b

Function optimized by GA 1 2 3 4 5 6 7 8 9 10 11 12

S ¼ dEintra þ W1  dEInter þ W2  Egeom(pre) þ W3  Egeom(post) S ¼ dEintra þ W1  dEInter þ W2  Egeom(pre) þ W3  Egeom(post) S ¼ dEintra þ W1  dEInter þ W2  Egeom(pre) S ¼ dEintra þ W1  dEInter þ W2  Egeom(pre) S ¼ dEintra þ dEInter þ W2  Egeom(pre) S ¼ dEintra þ dEInter þ W2  Egeom(pre) Bin: dEintra  C1 & dEinter  C2 & Egeom(pre)  C3 & Egeom(post)  C4 S ¼ dEintra þ W1  dEInter þ W2  Egeom(pre) þ W3  Egeom(post) If (Bin ¼ true) S ¼ S þ 10 000 Bin: dEintra  C1 & dEinter  C2 & Egeom(pre)  C3 & Egeom(post)  C4 S ¼ dEintra þ W1  dEInter þ W2  Egeom(pre) þ W3  Egeom(post) If (Bin ¼ true) S ¼ S þ 10 000 Bin: dEintra  C1 & dEinter  C2 S ¼ dEintra þ W2  Egeom(pre) If (Bin ¼ true) S ¼ S þ 10 000 Bin: dEintra  C1 & dEinter  C2 S ¼ dEintra þ W2  Egeom(pre) If (Bin ¼ true) S ¼ S þ 10 000 S ¼ dEintra þ dEInter (gas phase energies) S ¼ dEintra þ dEInter (implicit solvent energies)

Test

% Train

% Test

Mean

Median

Mean

Median

W1

W2

W3

C1

C2

C3

C4

50 100 50 100 50 100 50

50 0 50 0 50 0 50

0.088 (0.012) 0.089 0.089 (0.013) 0.089 0.086 (0.011) 0.090 0.047 (0.010)

0.060 (0.011) 0.059 0.059 (0.009) 0.059 0.057 (0.010) 0.054 0.031 (0.007)

0.096 (0.014) – 0.093 (0.013) – 0.097 (0.012) – 0.103 (0.026)

0.064 (0.013) – 0.062 (0.010) – 0.063 (0.015) – 0.043 (0.007)

– 1.01 – 0.94 – – –

– 0.29 – 0.23 – 0.27 –

– 0.07 – – – – –

– – – – – – –

– – – – – – –

– – – – – – –

– – – – – – –

100

0

0.056

0.035





0.00

1.42

0.00

35

5

310

858

50

50

0.052 (0.012)

0.032 (0.007)

0.078 (0.020)

0.034 (0.007)















100

0

0.057

0.030







1.65



35

5





100 100

0 0

0.307 0.111

0.306 0.087

– –

– –

– –

– –

– –

– –

– –

– –

– –

a ˚ . Rank ordering is from 0 for the best ranked residue pair. Mean and median values FRO, fractional rank order; the scored rank order of the native cysteine bond divided by the total number of residue pairs within 5 A are calculated as averages over 50 distinct GA optimization calculations, with a different random partition of the residue pairs into training and test classes. See text. b dEintra is the change in intra-residue pair energy upon formation of disulfide. dEinter is the change in interaction energies between disulfide residues and remainder of system upon formation of disulfide. Egeom(pre) is the geometry scoring function, calculated after rotamer search and bond formation, but before coordinate minimization. Egeom(post) is the geometry scoring function, calculated after both bond formation and coordinate minimization. All energies are calculated using an implicit solvent function, except as noted.

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

Disulfide prediction

inverse power terms in the energy function. Good candidates should generally be acceptably balanced, so that a very favorable value of DEintra or DEinter cannot mask a significant and unfavorable value of the other term. To introduce this idea of avoiding mutations with low scores but unbalanced energy, we incorporated binning to the scoring function. The form of the binned scoring function is: Bin ¼ ½DEintra  C1 &DEinter  C2 &DEgeomðpreÞ  C3 &DEgeomðpostÞ  C4  S ¼ DEintra þ W1  DEinter þ W2  DEgeomðpreÞ

If ðBin ¼ falseÞ S ¼ S þ 10000

ð5Þ

The effect of this function is to add 10 000 to S for any putative disulfide that fails one or more of the comparisons in the bin, which means it will not be ranked among the best candidates (10 000 is a an arbitrarily large value that ensures no candidate that fails the bin will be ranked better than any candidate that passes the bin). A new round of GA optimizations was carried out allowing the seven parameters (C1, C2, C3, C4, W1, W2, W3) to vary. As before, the GA optimized ,FRO. averaged over the training data, and the GA was rerun 50 separate times with different random training/test partitions. The results for this set of GA calculations are shown in Table VI, calculations 7 – 8. Focusing on the values of the bin cutoffs, we obtained refined geometric cutoff terms C3 and C4 of 310 and 858, respectively. C4 is larger than the actual value of Egeom( post) for any pose, and so is not contributing to the score. The C3 cutoff only applies to 7% of the poses, and to none of the native poses, suggesting this term is also not of real significance in improving the scoring function and can be omitted. We also find that the weights of both the DEinter and Egeom( post) terms (W1, W3) are near zero, suggesting they are not appreciably contributing to the predictive nature of the overall function. Based on these observations, another GA set was carried out for the function: Bin ¼ ½DEintra  C1 &DEinter  C2  S ¼ DEintra þ W2  DEgeomðpreÞ If ðBin ¼ falseÞ S ¼ S þ 10000

ð6Þ

Applying a GA series to fitting these three variable functions (C1, C2, W2), we obtain the results in Table VI, calculations 9 – 10. Comparing these results to those from the previous run with additional coefficients (calculations 7 – 8), we see that, indeed, omitting the remaining fitted coefficients from the function has no significant effect on the value of FRO. Omitting C3, C4, W1 and W3 results in mean and median values for FRO of 0.057 and 0.030, versus 0.056 and 0.035 when these coefficients are included. As noted earlier, we carried out cross-validation for the entire set of GA calculations to determine if there is overfitting. Comparing the mean and median values for the training and test sets for the scoring functions that do not include binning (calculations 1 – 6, Table VI), we find the differences are modest, and so we have a good indication that the FRO values we calculate are reflective of the actual predictive power of the function. Adding the binning function with a full

Bin ¼ ½DEintra  35&DEinter  5 S ¼ DEintra þ 1:65  EgeomðpreÞ If ðBin ¼ falseÞ S ¼ S þ 10000

ð7Þ

Despite achieving excellent average and median FRO values, there is a clear case for further improvement when 8 out of the 75 predicted true disulfides have values 0.15, the worst being 1RIE, with a value of 0.62. Further improvement of the initial sampling is one avenue for development, especially given that, for this set of eight outliers, the predicted x 3 dihedrals (C1– S1 – S2– C2) for several of the true disulfides deviate appreciably from the x 3 values in the original X-ray structures, with an average deviation of 55.38. However, when compared with the ‘ideal’ x 3 dihedrals of 2908 or þ1008—based on the observed values from the PDB (see Supplementary data)—the average deviation is considerably lower, 20.48, suggesting that the majority of predicted dihedrals do attain good geometries. Nonetheless, among the worst ranking disulfides (those with FRO  0.15), 5 out of 8 deviated from the ideal angles by .208, while only 6 out of 20 best ranking disulfides (FRO  0.02) deviated from the ideal angles by .208. This suggests a relationship between the FRO and the quality of the predicted disulfide, where better FRO values are likely to originate from disulfides with x 3 dihedrals close to the ideal state. Figure 1 illustrates this with 1ZK5 (Fig. 1A, which has the lowest FRO possible, and small x 3 dihedrals deviations of 1.98 from the X-ray structure, or 2.18 from an ‘ideal’ angle) and 1RIE (Fig. 1B, which has the worst FRO value listed, 0.62, and large x 3 dihedral deviations from the X-ray structure and ‘ideal’ angles of 76.88 and 73.48, respectively). Figure 1B also shows the iron-sulfur cluster adjacent to the disulfide, suggesting its close proximity may have adversely affected the rotamer sampling. Indeed, the ˚ farther a Ca positions of the predicted cysteines are 0.87 A way from the iron-sulfur cluster, suggesting this movement occurred during sampling and minimization. 371

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

þ W3  DEgeomðpostÞ

set of seven bin cutoffs and weight coefficients (C1, C2, C3, C4, W1, W2, W3) (calculations 7 – 8) reveals a different story. In this case, the mean training and test FRO values are appreciably different (0.047 versus 0.103), suggesting that while the bin improves the average fit (training) FRO over the best nonbinned function (,FRO. ¼ 0.086), the improvement is probably not statistically significant, since the average test FRO for the best non-binned function is 0.097 versus 0.103 here. When we prune the binned function down to only three important fitted coefficients (C1, C2, W2) (Table VI, calculations 9 –10), the results change. With this function, the average and median values of FRO for the training data increase, from 0.047 and 0.031 for calculation 7 to 0.052 and 0.032 for calculation 9. But, significantly, the corresponding values for the test data decrease appreciably, from 0.103 and 0.043 for calculation 7 to 0.078 and 0.034 for calculation 9. Most importantly, the test FRO values for the binned scoring function of calculations 9 – 10 are appreciably better than the best FRO value for any non-binned calculation (0.096 and 0.064), indicating that, with a reduced number of fitted coefficients, the binned scoring function is clearly most predictive of all the variants tested. Table VII presents the individual results for calculation 9 in detail. From these results, including cross-correlation analysis, the best scoring function is:

N.K.Salam et al.

Table VII. FRO calculated from calculation 9 (see Table VI) and comparison of x 3 dihedrals (predicted versus X-ray) for the full set (dataset 1—native disulfides) PDB ID

0.00 0.04 0.07 0.09 0.06 0.02 0.10 0.05 0.00 0.10 0.03 0.07 0.07 0.03 0.00 0.00 0.05 0.02 0.17 0.17 0.09 0.03 0.02 0.03 0.00 0.06 0.35 0.02 0.05 0.62 0.01 0.00 0.07 0.00 0.05 0.01 0.18 0.02 0.09

x 3 dihedrals (C1 –S1– S2–C2) (8)

PDB ID

X-ray

Predicted

Difference to X-ray

Difference to ‘Ideal’b

73.6 292.2 83.6 74.1 102.0 87.0 100.0 282.2 104.0 294.1 78.9 94.2 288.9 288.0 283.0 88.9 162.8 77.6 280.2 2106.6 92.1 99.8 281.8 278.7 273.0 292.7 96.5 83.0 66.5 286.6 91.9 297.1 2101.8 80.4 298.5 277.6 297.5 78.4 77.7

73.7 293.2 87.3 272.2 105.5 83.9 2113.4 279.7 102.1 294.1 81.5 93.1 82.0 292.7 285.1 91.1 100.2 293.5 72.2 2155.7 291.3 124.3 2155.9 273.7 268.2 291.8 113.6 2107.6 2106.7 2163.4 77.4 78.6 2104.2 82.8 293.3 271.2 166.4 79.8 63.6

0.1 1.0 3.7 146.3 3.5 3.1 213.4 2.5 1.9 0.0 2.6 1.1 170.9 4.7 2.1 2.2 62.6 171.1 152.4 49.1 183.4 24.5 74.1 5.0 4.8 0.9 17.1 190.6 173.2 76.8 14.5 175.7 2.4 2.4 5.2 6.4 263.9 1.4 14.1

26.3 3.2 12.7 17.8 5.5 16.1 23.4 10.3 2.1 4.1 18.5 6.9 18.0 2.7 4.9 8.9 0.2 3.5 27.8 65.7 1.3 24.3 65.9 16.3 21.8 1.8 13.6 17.6 16.7 73.4 22.6 21.4 14.2 17.2 3.3 18.8 66.4 20.2 36.4

FROa

2FWG 2HSH 2I1U 2ICC 2NWF 2P52 2QO4 2RKQ 2VYO 2XFD 2YXF 3CB9 3E8T 3EDI 3FSA 3FZ4 3GA4 3GNZ 3GUI 3HZ8 3KFF 3M1W 3O22 3RT2 3RXW 3SEB 3SH4 3T0V 3TPK 3VOR 3ZYP 4EQ8 4F0W 4FH4 4FTF 4HWM

0.05 0.03 0.00 0.00 0.15 0.03 0.07 0.01 0.01 0.00 0.06 0.04 0.03 0.01 0.00 0.05 0.02 0.00 0.04 0.04 0.05 0.07 0.03 0.06 0.02 0.19 0.04 0.16 0.00 0.05 0.02 0.03 0.00 0.04 0.00 0.00

Average Median

0.057 0.030

x 3 dihedrals (C1 –S1– S2–C2) (8) X-ray

Predicted

Difference to X-ray

Difference to ‘Ideal’b

74.5 80.9 71.8 282.2 292.3 288.9 2105.7 98.4 298.9 279.2 283.4 75.7 288.2 272.4 290.2 70.3 70.0 274.9 2100.5 75.5 294.7 297.3 290.2 133.4 2110.1 72.8 89.7 77.4 77.6 269.0 280.3 291.1 283.3 2106.7 282.0 277.8

75.2 82.6 2107.7 283.0 282.6 283.9 298.5 176.3 290.1 275.5 281.3 83.3 280.0 87.1 297.9 69.3 74.5 270.2 293.2 2168.3 2151.6 2105.6 282.2 2142.0 2112.9 2115.6 2119.1 279.3 2156.6 271.3 281.0 286.4 283.5 2122.8 274.0 281.1

0.7 1.7 179.5 0.8 9.7 5.0 7.2 77.9 8.8 3.7 2.1 7.6 8.2 159.5 7.7 1.0 4.5 4.7 7.3 243.8 56.9 8.3 8.0 275.4 2.8 188.4 208.8 156.7 234.2 2.3 0.7 4.7 0.2 16.1 8.0 3.3

24.8 17.4 17.7 7.0 7.4 6.1 8.5 76.3 0.1 14.5 8.7 16.7 10.0 12.9 7.9 30.7 25.5 19.8 3.2 78.3 61.6 15.6 7.8 52.0 22.9 25.6 29.1 10.7 66.6 18.7 9.0 3.6 6.5 32.8 16.0 8.9

55.3 7.3

20.4 16.3

a ˚ . Rank ordering is from 0 for FRO, fractional rank order; the scored rank order of the native cysteine bond divided by the total number of residue pairs within 5 A the best ranked residue pair. b Ideal x 3 dihedrals for disulfide bonds are 2908 or þ1008.

In the final stage of this study, the method was validated using the wild-type structures of 13 proteins for which published stability data were available for engineered disulfide mutants (Dani et al., 2003). Among the 15 mutations that improved thermodynamic stability, the average/median predicted FRO was 0.31/0.23, considerably worse than the FROs achieved for native disulfides, with only 6 mutations ranked in the top 13% of predictions (Table VIII). This reveals the challenge in predicting engineered stabilizing disulfides. One difficulty in predicting stabilizing disulfides relates to large variations in volume change that may lead to either steric overlaps or cavity formation (Dani et al., 2003), and, presumable, energetic instabilities. Indeed, this accounts for the poor predictions in cases where creating stabilizing disulfides from bulky residue pairs (e.g. Tyr88(3)–Tyr88(4) in 1LMB or Ile9–Leu64 in 2LZM) results in improved thermodynamic stability despite large volume ˚ 3). Nonetheless, the FRO values for these changes (,2120 A 15 engineered stabilizing mutations are much better than random, which indicates that there is still value in the approach 372

presented here. However, more work is needed to further improve the predictive capabilities of the method. Conclusion The disulfide prediction method described here, along with its optimization and validation on a set of publically available disulfide-bearing proteins, can aid in the rational design of engineered proteins. The method represents a first step towards improved prediction of the structural integrity of proteins, while the list of disulfide-bearing proteins serves as a benchmark set for further improvement and comparison with other methods. The optimal cysteine scoring function derived for the dataset considered here is interesting in two ways. First, our results indicate that it is not ideal to rely either on a physics-based energy function or a knowledgebased scoring function alone. A weighted function of both is required. Second, our results demonstrate that scoring can be improved by a binning process that works to disfavor the

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

1ABA 1GV9 1LF7 1T2J 1UNR 1VHU 1WCU 1XT5 1ZK5 2A6Y 2I4A 2P39 2PY0 3HNB 3L4R 1C7K 1DYQ 1KNG 1LJU 1M40 1MF7 1MJN 1NKO 1OAL 1OLR 1P3C 1QGV 1QK8 1R26 1RIE 1SHU 1T2I 1XBU 1Y9L 2A6Z 2AQM 2CE0 2E0Q 2ERF

FROa

Disulfide prediction

Table VIII. Ranking and FRO calculated from calculation 9 (see Table VI) for the engineered proteins (dataset 2—engineered stabilizing disulfides) PDB ID

Mutation

Ranka

FROb

x 3 dihedrals (C1– S1–S2– C2) (8) Difference to ‘Ideal’c

1FG9 (chain A) 1LMB 1RNB 1RNB 1SNO 1XNB 2CBA 2CI2 2LZM 2RN2 2ST1 3GLY 3GLY 4DFR 9RAT

Glu7A– Ser69A Tyr88(3)– Tyr88(4) Ala43–Ser80 Ser85–His102 Gly79– Asn118 Ser100– Asn148 Leu60– Ser173 Thr22–Val82 Ile9– Leu164 Cys13–Asn44 Thr22–Ser87 Asn20–Ala27 Thr246– Cys320 Pro39A– Cys85A Ala4–Val118

15 32 3 0 22 8 55 0 13 25 44 187 19 16 25

0.71 0.48 0.07 0.00 0.34 0.08 0.47 0.00 0.21 0.33 0.23 0.82 0.08 0.13 0.68

12.3 28 19 4.7 4.9 23.3 44.4 4.4 12 25 3.7 48.8 70 7.9 40.2

a

Rank ordering from 0 for the best ranked residue pair. ˚ , using calculation 12, FRO, fractional rank order; the scored rank order of the native cysteine bond divided by the total number of residue pairs within 5 A Table VI. c Ideal x 3 dihedrals for disulfide bonds are 2908 or þ1008. b

singularities that can arise when using physics-based energy terms, which result from the inverse distance power function in these terms. Future studies will expand on this dataset to include more experimental examples where disulfide engineering has lead to improved thermal stability. Broader application of the tools developed here will likely benefit from more extensive sampling of large-scale conformational changes, which may be required to predict disulfides where at least one of the cysteine residues is in a loop or flexible domain. Integration of these tools with molecular dynamics, for example, may yield considerable benefits. Nonetheless, the method in its current form represents a suitable approach for rationally triaging disulfides that are likely to improve protein structural integrity.

Supplementary data Supplementary data are available at PEDS online. Acknowledgements We thank Tyler Day for his help in identifying and processing disulfide bond geometries in the PDB.

References Berman,H.M., Westbrook,J., Feng,Z., et al. (2000) Nucleic Acids Res., 28, 235–242. Burton,R.E., Hunt,J.A., Fierke,C.A. and Oas,T.G. (2000) Protein Sci., 9, 776–785. Charbonneau,P. (1995) Astrophys. J. Suppl. Ser., 101, 309.

373

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

Fig. 1. Superposition and comparison of the true disulfide bonds and neighboring residues for the predicted (orange carbons) and X-ray (cyan carbons) structures ˚ , and in 1ZK5 and 1RIE. (A) 1ZK5 represents a best-case example, with the true disulfide ranking first out of all possible disulfide-forming residue pairs within 5 A with a predicted x 3 dihedral (C1– S1–S2– C2) deviation of only 1.98 from the X-ray structure. (B) 1RIE represents a worst-case example, with the true disulfide ranking 31 out of 50 (FRO ¼ 0.62), and with a predicted x 3 dihedral deviation of 76.88 from the X-ray structure. The poor prediction of 1RIE is a likely ˚ during minimization. Cysteine consequence of the neighboring iron-sulfur cluster and its close proximity to the cysteines, forcing the Ca positions away by 0.87 A residues are represented as ‘ball-and-sticks’; the iron-sulfur cluster is represented as ‘spheres’; adjacent residues to cysteines are represented as ‘thin-tube’; and all other residues are represented as ‘wire’.

N.K.Salam et al.

374

Downloaded from http://peds.oxfordjournals.org/ at University of California, San Francisco on December 13, 2014

Charbonneau,P. (2002) Release Notes for PIKAIA 1.2, NCAR Technical Note TN-451-STR. National Center for Atmospheric Research, Boulder. Dani,V.S., Ramakrishnan,C. and Varadarajan,R. (2003) Protein Eng. Des. Sel., 16, 187–193. Davis,L. (1991) Handbook of Genetic Algorithms. 1st edn. Van Nostrand Reinhold, New York. Dobson,C.M. (2003) Nature, 426, 884–890. Dombkowski,A.A. (2003) Bioinformatics, 19, 1852–1853. Dombkowski,A.A., Sultana,K.Z. and Craig,D.B. (2014) FEBS Lett., 588, 206–212. Flory,P.J. (1956) J. Am. Chem. Soc., 78, 5222– 5235. Jorgensen,W.L., Maxwell,D.S. and Tirado-Rives,J. (1996) J. Am. Chem. Soc., 118, 11225– 11235. Kim,D.Y., Kandalaft,H., Ding,W., et al. (2012) Protein Eng. Des. Sel., 25, 581–590. Kuroki,R., Inaka,K., Taniyama,Y., Kidokoro,S., Matsushima,M., Kikuchi,M. and Yutani,K. (1992) Biochemistry, 31, 8323– 8328. Le,Q.A.T., Joo,J.C., Yoo,Y.J. and Kim,Y.H. (2011) Biotechnol. Bioeng., 109, 867–876. Li,J., Abel,R., Zhu,K., Cao,Y., Zhao,S. and Friesner,R.A. (2011) Proteins Struct. Funct. Bioinform., 79, 2794–2812. Madhavi Sastry,G., Adzhigirey,M., Day,T., Annabhimoju,R. and Sherman,W. (2013) J. Comput. Aided Mol. Des., 27, 221– 234. Ma˚rtensson,L.-G., Karlsson,M. and Carlsson,U. (2002) Biochemistry, 41, 15867–15875. Matsumura,M., Signor,G. and Matthews,B.W. (1989) Nature, 342, 291–293. Pace,C.N., Grimsley,G.R., Thomson,J.A. and Barnett,B.J. (1988) J. Biol. Chem., 263, 11820– 11825. Shivakumar,D., Williams,J., Wu,Y., Damm,W., Shelley,J. and Sherman,W. (2010) J. Chem. Theory Comput., 6, 1509– 1519. Siadat,O.R., Lougarre,A., Lamouroux,L., Ladurantie,C. and Fournier,D. (2006) BMC Biochem., 7, 12. Vaz,D.C., Rodrigues,J.R., Sebald,W., Dobson,C.M. and Brito,R.M. (2006) Protein Sci., 15, 33– 44. Wahlberg,E. and Ha¨rd,T. (2006) J. Am. Chem. Soc., 128, 7651–7660.

Structure-based approach to the prediction of disulfide bonds in proteins.

Protein engineering remains an area of growing importance in pharmaceutical and biotechnology research. Stabilization of a folded protein conformation...
231KB Sizes 2 Downloads 3 Views