Structure solution with ARCIMBOLDO using fragments derived from distant homology models ~ aki M. de Ilarduya1, Claudia Milla n1, Cecilia Massimo Sammito1,†, Kathrin Meindl1,†, In 2 2 1,3 n Artola-Recolons , Juan A. Hermoso and Isabel Uso 1 Instituto de Biologıa Molecular de Barcelona, Barcelona Science Park, Barcelona, Spain 2 Instituto de Quımica Fısica Rocasolano, Madrid, Spain  Catalana de Recerca i Estudis Avancßats, Passeig Lluıs Companys 23, Barcelona, Spain 3 Institucio

Keywords ARCIMBOLDO; density modification; fragment search; macromolecular phasing; MltE Correspondence n, Instituto de Biologıa Molecular de I. Uso Barcelona, Barcelona Science Park, Baldiri Reixach 15, 08028 Barcelona, Spain Tel: +34 93 4020198 Fax: +34 93 4034979 E-mail: [email protected] †These authors contributed equally to this work (Received 1 April 2014, revised 19 June 2014, accepted 25 June 2014) doi:10.1111/febs.12897

Molecular replacement, one of the general methods used to solve the crystallographic phase problem, relies on the availability of suitable models for placement in the unit cell of the unknown structure in order to provide initial phases. ARCIMBOLDO, originally conceived for ab initio phasing, operates at the limit of this approach, using small, very accurate fragments such as polyalanine a-helices. A distant homolog may contain accurate building blocks, but it may not be evident which sub-structure is the most suitable purely from the degree of conservation. Trying out all alternative possibilities in a systematic way is computationally expensive, even if effective. In the present study, the solution of the previously unknown structure of MltE, an outer membrane-anchored endolytic peptidoglycan lytic transglycosylase from Escherichia coli, is described. The asymmetric unit contains a dimer of this 194 amino acid protein. The closest available homolog was the catalytic domain of Slt70 (PDB code 1QTE). Originally, this template was used omitting contiguous spans of aminoacids and setting as many ARCIMBOLDO runs as models, each aiming to locate two copies sequentially with PHASER. Fragment trimming against the correlation coefficient prior to expansion through density modification and autotracing in SHELXE was essential. Analysis of the figures of merit led to the strategy to optimize the search model against the experimental data now implemented within ARCIMBOLDO-SHREDDER (http://chango.ibmb.csic.es/SHREDDER). In this strategy, the initial template is systematically shredded, and fragments are scored against each unique solution of the rotation function. Results are combined into a score per residue and the template is trimmed accordingly.

Introduction Crystallography provides unique insight into biological questions. However, the structural model cannot be calculated directly from experimental data due to the phase problem: in the X-ray diffraction experiment only the diffracted intensities and not the phases are

recorded, whereas the phases are essential to compute the structure. A universal method to solve the phase problem efficiently has long been a quest in macromolecular crystallography, and alternative ways to achieve a solution co-exist, but none of them can guarantee

Abbreviations CC, correlation coefficient; FOM, figure of merit; LLG, log-likelihood gain; MR, molecular replacement; RMSD, root mean square deviation.

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

4029

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

success, such that phasing often constitutes a bottleneck in structure determination. Direct ab initio phasing of macromolecules without previous knowledge of a related structure or measurement of heavy atom or anomalous scatterer derivatives [1], i.e. using only native diffracted intensities, was first achieved by enforcing atomicity as a constraint in real and reciprocal space [2], provided high-quality data to a resolu or better were available. To escape from tion of 1.2 A the requirement of atomic resolution data, extrapolation of non-measured data [3–6] and density modification approaches such as the VLD algorithm [7], hybrid Fourier syntheses [8] and charge flipping [9], as well as sophisticated use of the Patterson function [10], have been effectively exploited. A successful approach for  has equal atom structures at resolutions of 2 A resulted from enforcing secondary structure rather than atomicity [11] by multi-solution location of small, extremely accurate models such as polyalanine a-helices with the program PHASER [12] coupled to density modification and autotracing using the program SHELXE [13,14]. Fifty per cent of the crystal structure determinations deposited in the Protein Data Bank  or better. When the quality and resolu[15] reach 2 A tion limit of the available data are high, yet not necessarily atomic, even very limited starting phase information, derived from barely 10% of the mainchain atoms, may be expanded into a full solution of the structure. The method has been implemented in the program ARCIMBOLDO [11,16], (http://chango.ibmb.csic.es/ ARCIMBOLDO). Exploiting large libraries of small local folds, as an approach to enforcing unspecific tertiary structure, has allowed to solve larger or more complex, even all-b structures [17]. This use of massive numbers of local folds as alternative hypotheses blends ab initio and molecular replacement methods, as knowledge to define the starting hypotheses may come from structures of related sequence or function. Molecular replacement [18] requires the location of a model of known structure, close enough to the target structure, which can subsequently be used to derive starting phases to be be further improved, leading to the final new refined solution. Highly sophisticated approaches for model weighting, enhancement and combination can be found in Sculptor [19], mrtailor [20], SCEDS [21] or Ensembler [22]. Obviously, whenever a close structural model is known, and given enough resolution for model bias not to be a serious concern, molecular replacement offers an unbeatable approach as a phasing method, but such a model is not always available. Recently, hybrid approaches on the frontier between MR and ab initio methods have 4030

succeeded in solving unknown structures. Suitable search models can be produced through ab initio modeling or starting from structures of distant homologs or NMR models with RosettaMR [23] or AMPLE [24,25]. In this study, we explore the selection against the experimental data of fragments generated from low sequence identity models, to be used within ARCIM Evaluation of the BOLDO, at resolutions under 2 A. rotation function around its maxima, for sets of fragments systematically shredded from the original model, is exploited as a suitable way to identify and eliminate less accurate portions. A second trimming step is undertaken before density modification and autotracing expansion. The smaller, but more accurate, models were successful in phasing, whereas the original search model was not. The proposed method was effective in solving the previously unknown structure of MltE [26,27]. The procedure described has been named ARCIMBOLDO-SHREDDER, as the starting model is mechanically sliced regardless of structure or sequence, and the results of the rotation function are combined to crop search models. It has been implemented as an automated mode within the BORGES platform [17], and may be downloaded from http:// chango.ibmb.csic.es/SHREDDER.

Results and discussion Application of this approach to the solution of a previously unknown case is described, along with tests that probe various aspects of the method. Finally, the practical implementation leading to an ARCIMBOLDO mode, the ARCIMBOLDO-SHREDDER application, is outlined. Unknown structure of MltE MltE is a bacterial outer membrane-anchored endolytic peptidoglycan lytic transglycosylase [28]. Rodshaped single crystals of MltE from Escherichia coli were obtained as described previously [29]. Two X resoray diffraction datasets, both extending to 2.0 A lution, were collected from different crystals at beamline ID23-2 of the European Synchrotron Radiation Facility (Grenoble, France). Table 1 shows the statistics for the individual datasets and for the resulting merged dataset used in phasing. Crystals belong to space group C2221 and were expected to contain two copies of the 194 amino acids MltE monomer in the asymmetric unit, corresponding to a solvent content of 45%. Suitable search models were selected with the program Blast, using the blastp algorithm FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

Table 1. Summary of X-ray data for MltE in C2221. Values in parentheses indicate the figures for the outer resolution shell.

Beamline Wavelength ( A) a, b, c ( A) Resolution range ( A) Unique reflections Completeness (%) Rra a

Native 1

Native 2

Merged

ID23-2 0.8726

ID23-2 0.8726 123.32, 183.93, 35.29 19.3–2.0 (2.1–2.0) 27 678 99.5 (97.6) 0.060 (0.354) 12.1 (3.0)

123.32, 183.93, 35.29 40.1–2.0 (2.1–2.0) 27 762 99.6 (97.6) 0.062 (0.297) 12.9 (3.6)

40.1–2.0 (2.1–2.0) 20 219 72.5 (47.8) 0.066 (0.350) 12.1 (3.0)

Rr = ∑[r(Fo2)]/∑(Fo2).

(http://blast.ncbi.nlm.nih.gov) [30] against the Protein Data Bank. No structure with clear homology over a broad span of the sequence was identified, but similarity of predicted fold and 35% identity over a range of 66 amino acids (34% coverage, E-value 0.034) was determined for Slt70 from E. coli, followed by a Gfplike protein with 26% identity over a range of 53 residues (27% coverage, E-value 4.0). The high E-values characterizing the Gfp-like protein and subsequent hits mark them as very poor in terms of expected structural closeness, and thus as useless working hypotheses for phasing. Therefore, the only sequencerelated structure available was the structure of Slt70 (the 70 kDa soluble lytic transglycosylase from  in comE. coli, determined at a resolution of 1.9 A plex with a 1,6-anhydromurotripeptide; PDB code 1QTE) [31]. PDB entry 1QSA represents the same protein in a slightly different context. Molecular replacement with either of these models was unsuccessful, which was not unexpected given their low overall similarity to the target structure. Once the MltE structure was determined, a root mean square deviation  could be calculated for the superposi(RMSD) of 3.1 A tion of 160 Ca atoms of both models. However, even a remote homolog is bound to contain geometrically close partial sub-structures in its fold. As the resolution of the available data was high enough to attempt phasing by location of small fragments in combination with density modification, shredding the model of the closest homolog into smaller, overlapping fragments and trying each of them in attempts to locate two copies within ARCIMBOLDO presented itself as an appropriate strategy. Starting from the partial match found using blastp, the full 1QTE sequence was aligned against the target sequence using CLUSTALW [32]. The matching region was then cut from the full 1QTE model and used as input for SHREDDER to generate polyalanine models. Figure 1 shows the alignment of the target structure to the sequence of part of the 1QTE PDB entry, and a cartoon representation of this structure, highFEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

lighting the part used to derive search fragments, together with a view of the search template colorcoded to reflect the degree of similarity. As may be seen, the match between the sequence of the known structure and the one to be determined is spread over the entire span and poor at best, not indicating a clear selection for a partial structural hypothesis other than eliminating the first 60 N-terminal residues up to a gap in the alignment. Taking residues 457–596 from 1QTE, truncated to polyalanine or maintaining conserved side chains, as an MR search fragment did not lead to a solution. Neither did the corresponding looptrimmed model. On the contrary, shredding this model into all possible pieces obtained by either omitting a contiguous span of given length or extracting such sub-structures led to a solution in a ‘brute force’ approach. First structure solution The structure was first solved by the following procedure: From the original, non-solving template of 140 amino acids, 113 polyalanine models were extracted by cutting out all possible spans of 26 contiguous amino acids. Each of these models was used within ARCIMBOLDO to search for two such copies with PHASER and all resulting sub-structures comprising one or two placed models were subject to iterative density modification and autotracing with SHELXE. This first attempt failed to solve the structure, although some models appeared correctly located, as their PHASER Z-score for the translation function seemed to indicate. Presumably their differences from the true structure prevented phase improvement and successful mainchain tracing within SHELXE. Thus, optimization of the models was attempted by eliminating incorrect residues prior to the expansion stage. To this end, the model with the best Z-score for the translation function as evaluated in PHASER [33] for a partial solution made up of two fragments was

4031

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

A

B

C

subject to further trimming by iterative peaklist optimization [34] on a residue basis against the E-based correlation coefficient (CC) [35]. Each pair of equivalent amino acids from both monomers was omitted in turn, and a CC was calculated. Whenever this led to an increase in the CC, which naturally tends to be higher the larger the number of atoms, the pair of residues was eliminated from the model. The correlation coefficient increased from 8.0 to 10.9%. Whereas the untrimmed solution could not be successfully expanded with SHELXE, the remaining model, containing 85 amino acids per monomer, rendered a main-chain trace of 247 amino acids, characterized by a correlation coefficient of 36.1% after ten cycles of main-chain tracing interspersed with density modification were performed. The resulting electron density map could be easily traced and the side chains assigned. This stepwise residue optimization against the CC is now a standard option within SHELXE [14]. Figure 2 shows figures of merit (FOMs) characterizing the performance of each of the 114 models, for the location of the first and second copy and phasing attempts. Models are named after the first residue of the omitted region, following the notation in 1QTE. The abscissa is labeled accordingly. The panels show the number of solutions produced at each of the rotation and translation stages (panels (i) and (iii), 4032

Fig. 1. 1QTE search template for MltE. (A) CLUSTALW sequence alignment of residues 420–616 of Slt70 (1QTE) to the full sequence of MltE. The alignment score is 15.26. (B) Cartoon representation of the full structure of 1QTE shown in gray, with the portion corresponding to the alignment above highlighted in gold. (C) Cartoon representation of the 1QTE template with residues colored corresponding to similarity as in the alignment shown in (A). Asterisks, blue; colon, green; full point, yellow; orange, no similarity; red, gap.

respectively), along with the top (shown in green) and average (shown in blue) FOMs associated to different stages of the ARCIMBOLDO run (LLG after rotation (ii), LLG after translation (iv), Z-score after translation (v), SHELXE CC (vi), mean phase error against the final structure (MPE) (vii), SHELXE initial CC (viii)) after location of the first (Fig. 2a) and the second fragment (Fig. 2b), respectively. For example, a high value for the LLG in panel (ii) indicates an improvement in the rotation search when the fragment of 26 amino acids is cut out at the corresponding position of the 1QTE template. As may be seen from the mean phase error (MPE) values displayed, no interpretable solution is reached after placement of the first fragment, the best (i.e. lowest) value being 82° (Fig. 2a (vii)). Intermediate stages eventually leading to a solution are identified as black dots, but even though the FOMs follow a pattern related to the sliding window of residues omitted (see e.g. Fig. 2a and b (ii)), it is not until the stage of the second fragment, where high Z-scores for the translation function (Fig. 2b (v)), and correspondingly low numbers of translation solutions produced, give indications as to the potential of a particular model to yield a solution (Fig. 2b (iii)). Calculating a full ARCIMBOLDO run for each of the models generated is computationally demanding. Besides, the results shown correspond to just one of many possible ways to shred the original template, FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

A

Fig. 2. Top and average figures of merit (FOMs) characterizing all PHASER and SHELXE stages for both copies of the 114 ARCIMBOLDO runs with each of the models produced by sequentially omitting a section of 26 amino acids from the 140 residue template. Black dots correspond to stages from an eventually successful solution (in this case, if the SHELXE CC > 18%). Maximum values are shown in green, average values in blue. The eight plots represent (i) the number of rotation solutions produced, (ii) the PHASER rotation LLG, (iii) the number of translation solutions, (iv) the PHASER translation LLG, (v) the PHASER translation Z-score, (vi) the SHELXE CC of the best trace, (vii) the mean phase error of the SHELXE solution, and (viii) the SHELXE initial model CC. The x axis shows the number in the PDB file of the starting residue in the span of 26 contiguous amino acids omitted. (A) First fragment. (B) Second fragment.

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

4033

ARCIMBOLDO’s SHREDDER

M. Sammito et al.

B

Fig. 2. Contiued.

4034

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

extracting or omitting fragments of arbitrary length, and not all led to a solution. Thus, it would be desirable to evaluate models and guide their generation at the earliest possible stage, rather than at the final density modification cycle. Figure 2 also shows that the first and second fragments display the same pattern for the distribution of the LLG values for the rotation function (see panel (ii) for Fig. 2A and Fig. 2B) [36]. Model choice, improvement or even refinement should be exploited at this first rotation stage. In addition, a model that is good enough to produce a correct rotation may fail at the translation stage, and optimization increases its chances of succeeding. As a last consideration, analysis of FOMs derived from clearly different rotation solutions should be kept apart, as some of them may be incorrect and several correct, referring to location of different monomers in the asymmetric unit. It is also apparent that partial results from different ways to shred the same model need to be considered globally rather than isolated, that is, combining partial solutions related to equivalent rotations. FOMs characterizing the 1QTE models versus RMSD to the final MltE structure Figure 3 shows the RMSD characterizing a Ca superposition for each of the models, taking into account all 114 residues or only the geometrically closest 102, corresponding to a core that produces a structurally meaningful superposition. Models are named as in Fig. 2. For the complete models being used, values  the curve showing two vary between 2 and 2.6 A, regions of high differences alternating with two regions corresponding to more similar models. These patterns map the curves displayed for the LLG yielded by PHASER’s rotation function from the same sequence of models (Fig. 2A/B (ii)). However, neither low RMSD nor high LLG in the rotation function show perfect correlation to the final success in phasing, as judged from the SHELXE CC (and confirmed by MPE) scoring the best trace from each solution, presented in Fig. 3B. The 19 models leading to a solution are characterized by a CC above 18%, well discriminated from the average 12% for the traces corresponding to unsolved structures. Four points in these curves are discussed below, and the corresponding models are displayed in Fig. 3C, together with a plot showing the superposition of the 1QTE template with the final MltE structure (PDB code 2Y8P). The first case corresponds to a model where the gap starts at residue 488, and represents an isolated, less marked, CC maximum that is located in the first descent of the full RMSD curve but before it effectively drops. The corresponding FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

ARCIMBOLDO’s SHREDDER

cartoon representation shows that the omitted region (in gray) enters an area of high deviation that is thus removed, although this also applies for many other non-solving models with lower global RMSDs. This draws attention to the fact that a spurious success may depend on very small differences in the model and luck with sampling. The other three selected points are located towards the right part of the curve, where the second area of lower RMSD occurs. Again, there is a discrepancy in the point representing model 530, which, although corresponding to the top CC obtained, displays a rather high RMSD. Models 542 and 554 are characterized by lower RMSDs and are located in the region of the graph where solutions are more frequent. The cartoon shows the large deviations in the removed areas (gray, broad coil). The apparent discrepancies in the eventual phasing success and the accuracy of the model and its rotation location are better explained by looking at the core RMSD curve. This curve represents the RMSD calculated for the Ca of the 102 best agreeing amino acids, thus considering a core of the structure that produces a meaningful structural superimposition. In this curve, the second minimum is more pronounced than the first, and better reflects the phasing success. As the most incorrect residues are likely to be eliminated by the necessary peaklist optimization against the CC, this core better represents the nature of the structure entering the density modification stage. In summary, rotation appears to map the overall RMSD of the model while, for phasing success, a smaller, more accurate structure is preferable to a more complete one impaired by higher deviations. Given the same overall RMSD, it is better to have a more accurate core and some areas of higher deviation that may be readily trimmed than to have differences evenly spread with no way to improve the model. Shredder OMIT against LLG of rotation function The observation that the rotation function could provide an indication as to which parts of the model present highest RMSD from the true structure was worth pursuing, especially since the rotation function constitutes the first stage of the process when many hypotheses may be tested with little increase in computing time. To allow a more detailed analysis, it would appear convenient to cluster equivalent rotation function solutions, i.e. group rotations that map to a similar orientation in real space. FOMs for two correct rotations corresponding to different copies in the asymmetric unit will differ, and, even if at this stage, it is not yet known which rotation solutions may be

4035

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

A

C

B

Fig. 3. Relationship between model correctness and phasing success. (A) Plot showing the global RMSD of two sets of Ca alignments between the MltE structure and each 1QTE model with a missing gap of 26 residues starting at the residue indicated on the horizontal scale (top, all 114 residues; bottom, the best 102 residues). (B) Final SHELXE CC of the trace, the points discussed in the text are labeled and shown in blue. (C) Plots of the 1QTE template (colored ribbon) superposed on the MltE final structure (PDB ID 2Y8P; gray cartoon) and for the four models highlighted in (B), color coded (using red for the largest RMSD and blue for the smallest RMSD) after the 140 Ca superposition, RMSD as in A and with the omitted region in black. The width of the backbone also reflects the RMSD.

correct and which not, treating them independently would allow to isolate the noise derived from incorrect rotation solutions. We chose to cut models mechanically, regardless of secondary structure or sequence similarity, and to combine their scores, in order to retrieve the structural context lost in shredding. The procedure followed using the MltE data and the template 1QTE 457–596 is outlined below. A PHASER rotation search was run with the complete polyalanine template model, using the full  resolution and setting the RMSD parameter in 2A  The PHASER to an under-estimated value of 1.2 A. 4036

resulting 40 peaks within 75% of the top of the rotation function were clustered within 15° tolerance into 12 groups taking space group symmetry into account. Clusters were sorted after top LLG, and the first and fourth clusters were identified as corresponding to the two MltE monomers in the asymmetric unit by comparison with the rotation function of a perfect MltE model superimposed with the 1QTE template. A test aiming to identify the less correct parts of the model was set up for each cluster of rotations in turn. For a given cluster, the rotation was locally optimized and the LLG evaluated for each of a set of models FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

created by eliminating all possible consecutive spans of 4, 5. . . 19 amino acids. Figure 4 shows the 16 curves obtained in the case of the cluster around the top LLG value, corresponding to one of the correct rotations. All models within each curve have the same number of residues so that their LLGs can be compared. When a correct span is removed, the LLG should decrease whereas if the part omitted is incorrect, reducing the model should lead to higher LLG values. Thus, the local maxima in the plotted curves should indicate the most incorrect regions, and their removal should improve the model. As can be seen, the graph for the shortest omit is noisier but already contains all peaks representing the four areas with most significant deviations. As the omitted spans grow, the graphs become smoother. Persistence of peaks and consistency of results were expected to be a good indication to separate correct from spurious peaks, and indeed, the relevant areas become more distinct in the maxima around the 8–9 omit runs. Comparison of the kurtosis shown for equivalent peaks in different curves is used to decide on the precise span to be eliminated. The 16 omit curves are combined into a graph of the average LLG estimated for each omitted residue. The Shred-LLG is computed for a given residue as the LLG average over all NRES residues omitted and NMODELS models in which this residue is omitted: Shred-LLGRES ¼ ðR LLGMODELRES[ =NRESÞ=NMODELSRES[ Figure 5 shows the Shred-LLG function combining rotation LLG of all omit shreds for the two correct rotations. Both present maxima corresponding to the spans whose omission is most favorable for the four most incorrect areas in the template. In the case of the green line derived from the first rotation cluster, the areas identified by our algorithm and eliminated from the model are those of residues 502–513, 536–548, 557–575 and 594–596 plus 457. The most correct assignment, established from the true RMSD between template and target structure would be 508–513, 538– 543, 563–588 (in this region two highly differing spans with a few correct residues in between are found) and 592–596, even clearer in the purple line as the shoulder for 522–529 disappears. The peak identifying the last C-terminal residues behaves similarly and, accordingly, the spans eliminated for this second cluster are 500– 515, 538–549, 560–573 and 587–596, plus 457–460. Cartoon representations of the 1QTE template, color-coded to match the areas of high RMSD identified in each of the graphs, are displayed in Fig. 5B. In practice, it makes little difference, as either model,

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

ARCIMBOLDO’s SHREDDER

proceeding within ARCIMBOLDO to translation search, packing check, rigid body refinement and density modification-autotracing, succeeds in solving the MltE structure. The effect of eliminating individual residues on the rotation function LLG would not be correlated with their RMSD to the true structure. The RMSD of the model derived from the first rotation cluster (Fig. 5B, top) fitted to the target structure on the optimal rotation for the original template is reduced from 2.7 to  On the contrary, RMSDs for 15 000 randomly 1.93 A. trimmed models of the same number of residues render  This difference is rmsds spread from 2.04 to 3.06 A. significant, as the shredded model is set apart from the distribution by a Z-score of 4.68, well separated from the best randomly trimmed model, characterized by a Z-score of 3.95. It appears essential to perform the optimization restrained by the environment of a residue, evaluating the LLG averaged over a main-chain span. This is not surprising, as even in macromolecular refinement of complete models, individual residues would drift from their correct location if unrestrained. Tests on the solution of MltE with lysozyme As previously described, no other significant homologs were identified for MltE, but once the structure was solved, the Dali server [37] identified a similar fold in g-type lysozyme (PDB code 3GXR) [38]. Figure 6 shows the characterization of PDB entry 3GXR as a search template for MltE. The CLUSTALW alignment between its sequence and that of MltE results in a very poor score of 6.42. Accordingly, this model shows an even higher RMSD than 1QTE towards MltE, as even limiting the comparison to the best 130 residues leaves  It was interesting to assess an RMSD of 2.35 A. whether it could still be possible to probe correctness against the rotation function for this even more different search structure. To this end, a model was cut from 3GXR, from residues 49–188, spanning the equivalent 140 residues to the 1QTE model and char This model was acterized by an RMSD of 5.1 A. probed against the rotation function, setting it at both correct rotation values and shredding it as described for 1QTE. Figure 6C shows both curves describing the average LLG improvement derived from the elimination of each residue in the 1QTE polyalanine template. From the superposition, the areas of higher structural deviation may be identified as residues 77–93, 101–112, 132–146, 158–172 and 179–188, whereas our method would single out as most incorrect areas 66–81, 110– 124, 132–146, 154–167 and 179–188 plus 49–50. As illustrated in the plots of the 3GXR template as

4037

ARCIMBOLDO’s SHREDDER

M. Sammito et al.

Fig. 4. Rotation function LLG graphs obtained around the peak of a correct rotation by shredding out all possible spans of 4, 5. . . 19 amino acids from the original template. The peaks contributing to the areas selected for trimming (Fig. 5) are marked by blue vertical lines. The green vertical lines highlight the peaks in the four curves that determine the length of the areas to be omitted.

4038

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

A

B

Fig. 5. Evaluation of model RMSD from the rotation function of omit shreds. (A) Shred-LLG graph estimating the average LLG derived from the elimination of each residue in the 1QTE polyalanine template. The green and purple lines correspond to the two correct rotation clusters. The pattern revealing the four most incorrect areas [508–513, 538–543, 563–588 (comprising two highly differing spans with a few correct residues in between) and 592–596] appears as 502–513; 536–548; 557–575 and 594–596–457 in the green line, and as 500–515, 538–549, 560–573, 587–596 plus 457–460 in the purple line, where the shoulder for 522–529 disappears. Vertical dashed lines mark local maxima. (B) Plots of the 1QTE template (colored ribbon), with color and ribbon width indicating the RMSD from MltE (2Y8P) and omitted residues shown in gray, matching the four areas of high RMSD identified in the graphs in (A). Red color or wide ribbon indicate large deviations, whereas blue color or thin ribbon stand for low RMSD.

ribbon, coded by color and width to indicate the RMSD from 2Y8P and displaying in gray the areas eliminated, although the overall RMSD improves by cutting the selection, and some areas of high RMSD are effectively eliminated, other parts are missed while some of the removed residues should stay. It is true that the initial structure would not even be identified from the sequence alignment as an appropriate template, but these results prompted us to implement fragment trimming as a multi-solution approach, to generate a number of combined hypotheses from the suggested omit regions and pursue them in parallel from the initial optimization stage. Tests on models derived from a homolog of NovP, an ab protein The case of a mainly helical protein, such as MltE, may present advantages for the shredder method, as refining a rotation cannot compensate for the higher variability in strand main-chain conformation. Thus, we wanted to probe the case of a mixed ab structure, to see whether the approach could be generalized. A difficult case for its high RMSD was chosen, but for which good data, to high resolution were available. NovP is a 262 amino acid structure in space group P2, for which  exist [6]. The data to a maximum resolution of 1.35 A asymmetric unit contains one monomer. Nevertheless, FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

for the test described, data were cut to a resolution of  NovP was originally solved by experimental 2 A. SIRAS phasing with a mercury derivative, as MR with the available structure of an homologous catechol O-methyltransferase (PDB entry 1VID) [39] was unsuccessful. Although both structures share a similar fold, the overall difference is high enough to account for MR failure. Based on the alignment displayed in Fig. 7, the first 42 N-terminal residues up to the first gap were discarded. Thus, tests were performed using a model comprising residues 43–216. This structure shows an  to the NovP target when considering RMSD of 2.37 A only the closest 159 residues of the 174. Despite the large structural deviation, the correct rotation is located and identified as the first of eight clusters. The results of our shredding and LLG assessment procedure are summarized in the plot displayed in Fig. 7. In this case, the most remarkable feature identified in the function, other than a large maximum centered at 129 that led to residues 126–133 being discarded, is the large positive effect derived from omitting residues at the C-terminal half of the template, from 155 onwards. The resulting model comprised the 104 residues spanning 43–125 and 134–154. This model, improved against the rotation  (all 104 residues may be superimfunction at 2.0 A  solves the structure posed with an RMSD of 1.64 A),  at provided the resolution is limited to around 3.5 A the translation search step.

4039

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

A

B

C

D

Implementation within an ARCIMBOLDOSHREDDER mode The algorithm implemented in ARCIMBOLDOSHREDDER is summarized in Fig. 8. Obviously, small fragments derived from a distant homolog may be used in the same way as any set of coordinates in PDB format as a search fragment within ARCIMBOLDO. Still, given that the shredded fragments are originally part of a fold, equivalent rotation and translation operations will map them to reconstruct a part of the original template structure. We provide an automated set-up to exploit the advantage of combining the results and FOMs of related fragments, looking into particular rotation clusters by screening which parts of the structure are useful, combining promising results, and whenever possible validating at later stages. The ARCIMBOLDO-SHREDDER mode takes the template provided in PDB format and computes the 4040

Fig. 6. 3GXR search template for MltE. (A) CLUSTALW alignment of MltE and g-type lysozyme (PDB ID 3GXR). The alignment score is 6.42. (B) Cartoon representation of the 3GXR template with residues colored based on the similarity in the alignment shown in (A): asterisks, blue; colon, green; full point, yellow; orange, no similarity; red, gap. (c) Shred-LLG graph estimating the average LLG derived from the elimination of each residue in the 1QTE polyalanine template. The green and purple lines are calculated for the two correct rotation angles. The pattern selecting as five most disagreeing areas (66–81; 110–124; 132–146; 154–167; 179– 188 plus 49–50) is derived from the green and purple lines, whereas from the structure superposition the most different areas span residues 77–93; 101–112; 132– 146; 158–172; 179–188. (D) Plots of the 3GXR template coded through color and ribbon width to express RMSD from MltE (2Y8P). Eliminated spans, as identified in the graphs, are shown in gray for each of the functions shown in green and purple respectively.

rotation function with PHASER. Resulting rotations are clustered and each cluster is independently evaluated. For the top rotation of each cluster, a collection of fragments is generated, either by eliminating a span of amino acids at all possible consecutive positions or by extracting such spans from the template. The models may retain side chains, or, given the low identity, more frequently be trimmed to alanine. Particular side chains may also be retained for ARCIMBOLDO. Usually B-values are set at a common level but, as in this case fragments are shred from a structural hypothesis expected to bear resemblance to the target structure, the default is to retain the deposited experimental B-values. The program generates all possible partial models omitting fixed numbers of consecutive residues within a pre-determined range. The LLG of each model is optimized at full resolution around the rotation angles

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

A

B

Fig. 7. 1VID search template for NovP. (A) CLUSTALW sequence alignment of catechol O–methyltransferase (1VID) to the full sequence of NovP. The alignment score is 11.27. (Bb) Shred-LLG graph estimating the average LLG derived from the elimination of each residue in the 1VID polyalanine template with peak selection indicated by green dashed vertical lines and plateau identification indicated by blue dashed vertical lines. The blue horizontal solid line marks the value of the minimum among the peaks. A representation of the 1VID model is shown, in which the color and ribbon thickness reflect the RMSD against the coordinates of the O–methyltransferase NovP (PDB code 2WK1); the N–terminal region prior to the first alignment gap that is omitted from the initial model is shown in white; two regions excluded on the basis of the shredder analysis are shown in gray.

characterizing the cluster with PHASER. The program stores the LLG distribution for each sequentially ordered group of models of equal size, obtained by shredding out the same number of residues (as shown in Fig. 4). Each distribution is also normalized by averaging the LLG over the number of remaining residues in the corresponding shredded models. This normalization is a crude approximation under the assumption of a linear and even contribution per residue to the total LLG, but is essential in order to compare FOMs from models of different sizes. A single discrete descriptor FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

(the Shred-LLG function) is then computed to correlate the omit residue regions with a variation in the LLG. Each point of this function is related to a single residue of the template model. The function is computed by averaging over the number of contributing models all normalized LLGs of the models from which that residue was omitted. This last normalization step compensates for under-representation of terminal residues. The alternative of shredding the template in a circular way, omitting residues from the N-terminus when the C-terminal end is reached, may result in models

4041

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

Fig. 8. ARCIMBOLDO-SHREDDER implementation. The scheme summarizes the steps in the procedure. PHASER operations are printed in blue, and SHELXE operations are shown on a blue background. SHREDDER operations are printed in red. Starting from a model template, a rotation search is calculated at 2  A. Peaks are clustered geometrically, within a tolerance of 15°. For each cluster, models are produced omitting or selecting contiguous areas of variable length. Model rotation is locally optimized with PHASER, and the resulting LLGs are combined for each residue. The Shred-LLG descriptor function is used to propose four models combining peak shape analysis and LLG improvement value. Fragment location and density modification and autotracing are performed for each model.

that are too disconnected, depending on the structure of the template. Examples of the function described are shown in Figs 5–7. The Shred-LLG descriptor function is analyzed to find peaks corresponding to local maxima discriminated above the variance of their local environment. 75% of the top omit LLG, as well as the lowest LLG among the selected peaks are set as thresholds for residue selection. Residues characterized by higher function values are more likely to be incorrect since omitting them entails an increase in the LLG. Conversely, residues whose omission corresponds to a decrease in the LLG are expected to be correct. Plateau regions above the graph thresholds are identified as they may gather spans of incorrect residues. Once a peak is identified in the descriptor function, the shape of the corresponding peaks in all LLG distributions for the various shreds is analyzed, selecting the sharpest peak to identify the precise residue range to be omitted. The final proposed models combine all selected omit ranges. As a conservative approach, the program will automatically generate multiple models to be used in a multi-solution frame whenever the descriptor function does not show a clear numerical discrimination or if peak retrieval is not obvious. Thus, the default is to produce four models eliminating peaks, plateaus and residues characterized by values above 75% of top and by values above minimum peak height, respectively. 4042

All are then sequentially used as ARCIMBOLDO models for which to compute translations, evaluate packing, refine solutions and finally apply density modification and autotracing with SHELXE. Results are presented in a combined way as html output and success identified through a final CC > 30% causes the program to stop remaining calculations because the structure has been solved. A tutorial illustrating detailed parameterization and the particular solution of MltE is available from our website at http://chango.ibmb.csic.es/ARCIMBOLDO/

Conclusion In the case of models from distant homologs, smaller main-chain fragments derived from them may be effective in providing a starting solution using PHASER that density modification and autotracing using SHELXE may expand to the complete structure, provided  or beyond are available. data to a resolution of 2 A Identifying at the earliest stage of a first rotation function the most incorrect parts in the model is used to derive improved models for phasing that will be able to produce the correct translation. For the density modification and expansion stage to bootstrap, eliminating incorrect parts of the initial fragments is essential. Thus, the located fragments are optimized by FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

omitting one residue at a time and discarding them from the model if this leads to an increase in the correlation coefficient.  resoluThis strategy has enabled the solution at 2 A tion of the previously unknown structure of MltE, comprising 388 amino acids in the asymmetric unit, using portions of Slt70, which shares no more than one-third identity over 66 residues. The method has been automated in ARCIMBOLDO-SHREDDER. Our procedure internally shreds the template provided, regardless of structure or sequence, and probes the fragments derived against the rotation function LLG calculated by PHASER around each cluster or rotation solutions in order to provide an evaluation as to the optimal fragment hypotheses. Subsequently, it runs the corresponding ARCIMBOLDO phasing attempts on a few selected models.

Experimental procedures Computing set-up Structure solution was run on a local grid comprising 120 nodes totaling 175 GFlops. MltE can be run overnight. Tests were also run on a Condor [40] pool set-up in Calendula, at the Foundation of Supercomputing Center of Castile and Le on, Spain.

Software versions The original MltE structure solution was run with version 2011 of SHELXE and version 2.1.4 of PHASER. Other tests in this work and the ARCIMBOLDO-SHREDDER implementation used PHASER version 2.5.6 from the PHENIX distribution [41] and SHELXE version 2014. Models and maps were examined with COOT [42]. Figures were prepared with PYMOL [43] and GNUPLOT 4.4 [44]. Differences between models and structures were calculated with MINRMS [45]. Mean phase errors were calculated with SHELXE [46] against the final models deposited in the PDB database [15].

X-ray data collection for MltE Both synchrotron datasets were collected on a CCD detector, using native crystals, on station ID23-2 at the European Synchrotron Radiation Facility (Grenoble, France). Collected images were indexed, integrated and scaled using XDS [47], IMOSFLM [48,49] and SCALA [50] from the Collaborative Computational Project 4 [51]. Preliminary data analysis and scaling were performed using XPREP [52]. The scaled data used in this study lack only ten reflections at low resolution to be 100% complete to a resolution of  Data statistics are summarized in Table 1. 2.04 A.

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

ARCIMBOLDO’s SHREDDER

NovP data NovP is an O-methyl transferase novobiocin biosynthetic enzyme from Streptomyces spheroides that was solved from experimental phases obtained from a mercury derivative, using data extrapolation [6]. Crystals belong to space group P2, with approximate cell parameters  b = 46.0 A,  c = 61.2 A  and b = 105.0, and a = 51.8 A, contain one monomer of 262 residues and 40% solvent in  were the asymmetric unit. Diffraction data to 1.35 A available [53]. The closest available homolog at the time was catechol O-methyl transferase from Rattus norvegicus, characterized by an alignment score of 11.27. Its structure is deposited at the Protein Data Bank under accession code 1VID [39].

Acknowledgements We thank the staff from the European Synchrotron Radiation Facility (Grenoble, France) synchrotron beamlines for help with data collection. This work was supported by grants BFU2012-35367, IDC-20101173 and BFU2011-25326 (from the Spanish Ministry of Economy and Competitiveness), grant 2009SGR-1036 from the Generalitat de Catalunya (2009SGR-1036) and the Government of Community of Madrid (S2010/BMD-2457). KM thanks the Deutsche Forschungsgemeinschaft and the Spanish Ministerio de Economıa y Competitividad (ME 3679/1-1)/Juan de la Cierva Sub-Program for support. We thank Randy Read, Airlie McCoy and George M. Sheldrick for helpful discussion and corrections.

Author contribution All authors contributed extensively to the work presented in this paper.

References 1 Sheldrick GM, Gilmore CJ, Hauptman HA, Weeks CM, Miller R & Us on I (2011) Ab initio phasing. In International Tables for Crystallography (Arnold E, Himmel DM & Rossmann MG, eds), pp. 413–429. Kluwer Academic Publishers, Dordrecht, The Netherlands. 2 Miller R, DeTitta GT, Jones R, Langs DA, Weeks CM & Hauptman HA (1993) On the application of the minimal principle to solve unknown structures. Science 259, 1430–1433. 3 Caliandro R, Carrozzini B, Cascarano GL, De Caro L, Giacovazzo C & Siliqi D (2005) Phasing at resolution higher than the experimental resolution. Acta Crystallogr D Biol Crystallogr 61, 556–565.

4043

M. Sammito et al.

ARCIMBOLDO’s SHREDDER

4 Caliandro R, Carrozzini B, Cascarano GL, De Caro L, Giacovazzo C & Siliqi D (2005) Ab initio phasing at resolution higher than experimental resolution. Acta Crystallogr D Biol Crystallogr 61, 1080–1087. 5 Jia-Xing Y, Woolfson MM, Wilson KS & Dodson EJ (2005) A modified ACORN to solve protein structures  or better. Acta Crystallogr D at resolutions of 1.7 A Biol Crystallogr 61, 1465–1475. 6 Us on I, Stevenson CEM, Lawson DM & Sheldrick GM (2007) Structure determination of the Omethyltransferase NovP using the ‘free lunch algorithm’ as implemented in SHELXE. Acta Crystallogr D Biol Crystallogr 63, 1069–1074. 7 Burla MC, Carrozzini B, Cascarano GL, Giacovazzo C & Polidori G (2011) Advances in the VLD algorithm. J Appl Crystallogr 44, 1143–1151. 8 Burla MC, Carrozzini B, Cascarano GL, Giacovazzo C & Polidori G (2012) VLD algorithm and hybrid Fourier syntheses. J Appl Crystallogr 45, 1287–1294. 9 Palatinus L (2013) The charge-flipping algorithm in crystallography. Acta Crystallogr B Struct Sci Cryst Eng Mater 69, 1–16. 10 Caliandro R, Carrozzini B, Cascarano GL, De Caro L, Giacovazzo C, Mazzone A & Siliqi D (2008) Ab initio phasing of proteins with heavy atoms at non-atomic resolution: pushing the size limit of solvable structures up to 7890 non-H atoms in the asymmetric unit. J Appl Crystallogr 41, 548–553. 11 Rodrıguez DD, Grosse C, Himmel S, Gonzalez C, Martınez de Ilarduya I, Becker S, Sheldrick GM & Us on I (2009) Crystallographic ab initio protein structure solution below atomic resolution. Nat Methods 6, 651–653. 12 McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC & Read RJ (2007) Phaser crystallographic software. J Appl Crystallogr 40, 658– 674. 13 Sheldrick GM (2010) Experimental phasing with SHELXC/D/E: combining chain tracing with density modification. Acta Crystallogr D Biol Crystallogr 66, 479–485. 14 Thorn A & Sheldrick GM (2013) Extending molecularreplacement solutions with SHELXE. Acta Crystallogr D Biol Crystallogr 69, 2251–2256. 15 Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T & Tasumi M (1977) The Protein Data Bank: a computerbased archival file for macromolecular structures. J Mol Biol 112, 535–542. 16 Rodrıguez D, Sammito M, Meindl K, Martınez de Ilarduya I, Potratz M, Sheldrick GM & Us on I (2012) Practical structure solution with ARCIMBOLDO. Acta Crystallogr D Biol Crystallogr 68, 336–343. 17 Sammito M, Millan C, Rodrıguez DD, de Ilarduya IM, Meindl K, De Marino I, Petrillo G, Buey RM, de

4044

18 19

20

21

22

23

24

25

26

27

28

29

30

Pereda JM, Zeth K et al. (2013) Exploiting tertiary structure through local folds for ab initio phasing. Nat Methods 10, 1099–1101. Rossmann MG (1972) The Molecular Replacement Method. Gordon & Breach, New York. Bunk oczi G & Read RJ (2011) Improvement of molecular-replacement models with Sculptor. Acta Crystallogr D Biol Crystallogr 67, 303–312. Gruene T (2013) mrtailor: a tool for PDB-file preparation for the generation of external restraints. Acta Crystallogr D Biol Crystallogr 69, 1861–1863. McCoy AJ, Nicholls RA & Schneider TR (2013) SCEDS: protein fragments for molecular replacement in Phaser. Acta Crystallogr D Biol Crystallogr 69, 2216– 2225. Bunk oczi G, Echols N, McCoy AJ, Oeffner RD, Adams PD & Read RJ (2013) Phaser.MRage: automated molecular replacement. Acta Crystallogr D Biol Crystallogr 69, 2276–2286. DiMaio F, Terwilliger TC, Read RJ, Wlodawer A, Oberdorfer G, Wagner U, Valkov E, Alon A, Fass D, Axelrod HL et al. (2011) Improved molecular replacement by density- and energy-guided protein structure optimization. Nature 473, 540–543. Bibby J, Keegan RM, Mayans O, Winn MD & Rigden DJ (2013) Application of the AMPLE cluster-andtruncate approach to NMR structures for molecular replacement. Acta Crystallogr D Biol Crystallogr 69, 2194–2201. Bibby J, Keegan RM, Mayans O, Winn MD & Rigden DJ (2012) AMPLE: a cluster-and-truncate approach to solve the crystal structures of small proteins using rapidly computed ab initio models. Acta Crystallogr D Biol Crystallogr 68, 1622–1631. Artola-Recolons C, Llarrull LI, Lastochkin E, Mobashery S & Hermoso JA (2011) Crystallization and preliminary X-ray diffraction analysis of the lytic transglycosylase MltE from Escherichia coli. Acta Crystallogr F Struct Biol Cryst Commun 67, 161–163. Fibriansah G, Gliubich FI & Thunnissen A-MWH (2012) On the mechanism of peptidoglycan binding and cleavage by the endo-specific lytic transglycosylase MltE from Escherichia coli. Biochemistry 51, 9164–9177. Kraft AR, Templin MF & H€ oltje J-V (1998) Membrane-bound lytic endotransglycosylase in Escherichia coli. J Bacteriol 180, 3441–3447. Artola-Recolons C, Carrasco-L opez C, Llarrull LI, Kumarasiri M, Lastochkin E, Martınez de Ilarduya I, Meindl K, Us on I, Mobashery S & Hermoso JA (2011) High-resolution crystal structure of MltE, an outer membrane-anchored endolytic peptidoglycan lytic transglycosylase from Escherichia coli. Biochemistry 50, 2384–2386. Altschul SF, Madden TL, Sch€affer AA, Zhang J, Zhang Z, Miller W & Lipman DJ (1997) Gapped

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

M. Sammito et al.

31

32

33

34

35

36

37

38

39

40

41

BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389– 3402. van Asselt EJ, Thunnissen A-MWH & Dijkstra BW (1999) High resolution crystal structures of the Escherichia coli lytic transglycosylase Slt70 and its complex with a peptidoglycan fragment. J Mol Biol 291, 877–898. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948. McCoy AJ, Grosse-Kunstleve RW, Storoni LC & Read RJ (2005) Likelihood-enhanced fast translation functions. Acta Crystallogr D Biol Crystallogr 61, 458– 464. Sheldrick GM & Gould RO (1995) Structure solution by iterative peaklist optimization and tangent expansion in space group P1. Acta Crystallogr B Struct Sci Cryst Eng Mater 51, 423–431. Fujinaga M & Read RJ (1987) Experiences with a new translation-function program. J Appl Crystallogr 20, 517–521. Storoni LC, McCoy AJ & Read RJ (2004) Likelihoodenhanced fast rotation functions. Acta Crystallogr D Biol Crystallogr 60, 432–438. Holm L & Rosenstr€ om P (2010) Dali server: conservation mapping in 3D. Nucleic Acids Res 38, W545–W549. Helland R, Larsen RL, Finstad S, Kyomuhendo P & Larsen AN (2009) Crystal structures of g-type lysozyme from Atlantic cod shed new light on substrate binding and the catalytic mechanism. Cell Mol Life Sci 66, 2585–2598. Vidgren J, Svensson LA & Liljas A (1994) Crystal structure of catechol O-methyltransferase. Nature 368, 354–358. Tannenbaum T, Wright D, Miller K & Livny M (2002) Condor – a distributed job scheduler. In Beowulf Cluster Computing with Linux (Sterling T, Lusk E & Gropp W, eds), pp. 379–426. MIT Press, Cambridge, MA. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, Headd JJ, Hung L-W, Kapral GJ, Grosse-Kunstleve RW et al. (2010) PHENIX: a comprehensive Python-based system for

FEBS Journal 281 (2014) 4029–4045 ª 2014 FEBS

ARCIMBOLDO’s SHREDDER

42

43 44

45

46 47 48

49

50 51

52 53

macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 66, 213–221. Emsley P & Cowtan K (2004) Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60, 2126–2132. Schr€ odinger LLC (2012) The PyMOL Molecular Graphics System, Version 1.5.0.4. Williams T & Kelley C (2010) Gnuplot 4.5: an interactive plotting program. Available at: http:// gnuplot.info. Jewett AI, Huang CC & Ferrin TE (2003) MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance. Bioinformatics 19, 625–634. Sheldrick GM (2002) Macromolecular phasing with SHELXE. Z Kristallogr 217, 644–650. Kabsch W (2010) XDS. Acta Crystallogr D Biol Crystallogr 66, 125–132. Battye TGG, Kontogiannis L, Johnson O, Powell HR & Leslie AGW (2011) iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM. Acta Crystallogr D Biol Crystallogr 67, 271–281. Leslie AGW & Powell HR (2007) Processing diffraction data with MOSFLM. In Evolving Methods for Macromolecular Crystallography (Read RJ & Sussman JL, eds), pp. 41–51. Springer, Dordrecht, The Netherlands. Evans P (2006) Scaling and assessment of data quality. Acta Crystallogr D Biol Crystallogr 62, 72–82. Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR, Keegan RM, Krissinel EB, Leslie AGW, McCoy A et al. (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr 67, 235–242. Sheldrick GM (2008) A short history of SHELX. Acta Crystallogr D Biol Crystallogr. 64, 112–122. Stevenson CEM, Freel Meyers CL, Walsh CT & Lawson DM (2007) Crystallization and preliminary Xray analysis of the O-methyltransferase NovP from the novobiocin-biosynthetic cluster of Streptomyces spheroides. Acta Crystallogr F Struct Biol Cryst Commun 63, 236–238.

4045

Structure solution with ARCIMBOLDO using fragments derived from distant homology models.

Molecular replacement, one of the general methods used to solve the crystallographic phase problem, relies on the availability of suitable models for ...
3MB Sizes 0 Downloads 3 Views