J. Mol. Biol. (1991) 219, 671-692

Refined Structure0 of Porcine Pepsinogen at 143A Resolution A. R. Sielecki, M. Fujinaga, R. J. Read and M. N. G. James Medical Research Council Group in Protein Structure and Function Department of Biochemistry, University of Alberta Edmonton, Alberta, Canada T6G 2H7 (Received 19 November 1990; accepted 31 January

1991)

The molecular structure of porcine pepsinogen at l-8 A resolution has been determined by a combination of molecular replacement and multiple isomorphous phasing techniques. The resulting structure was refined by restrained-parameter least-squares methods. The final R factor (= ~~~F,l-lF,ll/~lFOl) is 0.164 for 32,264 reflections with Z 2 a(Z) in the resolution range of 8.0 to 1.8 A. The model consists of 2785 protein atoms in 370 residues, a phosphoryl group on Ser68 and 238 ordered water molecules. The resulting molecular stereochemistry is consistent with a well-refined crystal structure with co-ordinate accuracy in the range of (kl0 to 615 A for the well-ordered regions of the molecule (B < 15 A’). For the enzyme portion of the zymogen, the root-mean-square difference in C” atom co-ordinates with the refined porcine pepsin structure is 0.90 A (284 common atoms) and with the C” atoms of penicillopepsin it is 1.63 A (275 common atoms). The additional 44 N-terminal amino acids of the prosegment (Leulp to Leu44p, using the letter p after the residue number to distinguish the residues of the prosegment) adopt a relatively compact structure consisting of a long B-strand followed by two approximately orthogonal a-helices and a short 3,0-helix. Intimate contacts, both electrostatic and hydrophobic interactions, are made with residues in the pepsin active site. The N-terminal b-strand, Leulp to LeuSp, forms part of the six-stranded P-sheet common to the aspartic proteinases. In the zymogen the first 13 residues of pepsin, Be1 to Glu13, adopt a completely different conformation from that of the mature enzyme. The C” atom of Ilel must move N 44 A in going from its position in the inactive zymogen to its observed position in active pepsin. Electrostatic interactions of Lys36pN and hydrogen-bonding interactions of Tyr37pOH and Tyr9OH with the t’wo catalytic aspartate groups, Asp32 and Asp215, prevent substrate access to the active site of the zymogen. We have made a detailed comparison of the mammalian pepsinogen fold with the fungal aspartic proteinase fold of penicillopepsin, used for the molecular replacement solution. A structurally derived alignment of the two sequences is presented. Keywords:

crystal

structure

determination;

zymogen

1. Introduction

structure;

elect’rostatic

interactions

aspartic proteinases involved in other critical metabolic processes. A variety of approaches have been used to probe the molecular structure of porcine pepsinogen and how it contrasts with that of pepsin. Among them, spectroscopic techniques such as optical rotation (Perlmann, 1963), ultraviolet absorption (McPhie, 1972) and fluorescence (Bohak, 1973) gave results that have been interpreted in terms of a conformational change in the structure of the zymogen as a result of exposure to urea (Perlmann, 1963), high temperature (Perlmann, 1963) and low pH (McPhie, 1972; Bohak, 1973). The latter two studies are conreversible conformational sistent with a rapid, change in the zymogen as a prelude to the appear-

Pepsinogen is the zymogen precursor of gastric pepsin. Tt has no proteolytic activity at neutral pH values but is converted into active pepsin upon exposure to solutions of acidic pH (ed b? Marciniszyn et al. (1975). The nuctrotidr sequenceof the porcine zymogen is also consistent with the dcsIle230 variant that we have studied (Lin et al.. 1989). On the ot’her hand, our electron density map is not consistent with their other suggestion that Asp242 of the original sequence is in fact, a Tyr242. In an initial description of the three-tiimcnsional structure of porcine pepsinogen (James B Sielecki. 1986) we have discussedthe general conformat,ion of the proenzyme segment and the inhibitory inter actions it makes with the pepsin portion of t hts zymogen. The molecular structure also suggested ii pathway of activation consistent wit)h many of’ the> observations in solution. In this paper ae present the details of the solution of the pepsinogen caryst,aI structure and surnmarize the wealth of information that can be obtained from the atomic co-ordinat~~s of the fully refined final model. This model corr+ sponds to set 2PSG available from the Brookhavtbrr Protein Data Bank (Bernstein et al.. t 977).

2. Experimental (a)

Procedures Solution Intensity

f&a

and Structure

wllrdiorr

(:rvstalx of por&e pepsinogen were gro\~ n 1)~. tlw hang&-drop. vapor-diffusion method (James & Sielec.ki. 1986). The unit cell dimensions of t,he native enzymcx crystals and the isomorphous heavy-atom derivat,ives used are given in Table I. The intensity data were collected on all Enraf-Sollius CAD4 diffractometer using the supplied operat)ing system for PUPSa control. The incident X-radiation was X-filtered CuKz; the tube operated at 40 kV and 26 mA. The crystal to counter dist,ance was 6M (am and a He-filled tunnel was c-oincaident with the diffract,ed beam

path. Bragg reflection intensitieswrerr integrated with all o-scan technique (total scan angle 07.5” with a scan speed ). The background values were derived from of I” min-’ the count,s at the 2 ext,reme ends of the SCWI. eac.h I)ring l/4 of the reflection scan width. Rackground smoothing was achieved by averaging the measured background counts in regions of reciprocal space. Radiation damagts. as a function of time and resolution range. was c*orrrc+etl by t,he metShod of Hendricakson (1976). The empiric~al absorption c.orrection of Xorth rf crl. ( I!fBH) IVRS applied trj acc.ount for the variation it1 transmission fac+or due to the capillar~~. solvent and caryst,al shape, T,ore~~tz and polat,ization c.orrec*tions Were applied to tlerivr the struc+urc factor amplitudes.

The initial estimates of the native c,rystai struvturr factor phase angles. aN. were obt,ained from the appliqation of the molecular replacement method (Rossmann. 1972). The rotational parameters were derived using the fast rot,ation function ((‘rowther. 197%). For the translational parameters a systematic searc.h was carried out with the program BRI‘TE (Fujinaga & Read. 1987). .\ search model was constructed based on the R-dimensional structure of penicillopepsin (*James & Sirlecki. 1983) and the alignment of its sequrncae with that of porcine pepsin (Hsu of nl.. 1977). Minor changes in this published

Structure

Crystal

data

for

of Porcine Pepsinogen

porcine

Table 1 pepsinogen:

Native Unit cell parameters fz (‘4) b (A) c (4 B (‘7 Resolution range (A) Total reflections unique Reflections I > 2u(I) &,ergt Rot Max. absorption correction Max. decay correction (O/o) Scale1 Overall B (A’)$

10578 4341 88.58 91.4 400-I.8 37,575 27,251 0058 1.37 24-70 20.3 158

673

space group C2

Mercuric chloranilate

u ranyl acetate

K,PtC:I,

105.45 43.49 8874 91.2 220-2.8 10,196 7985 0.119 0.189 1.35 1 a-47 36.1 17.0

105.11 4326 88.45 91.6 22.0-2.8 10,068 8999 0.048 0.104 1.52 6-25 350 152

105x3 43.47 X%61 !>I4 22+2+! 10.180 9216 (PO56 0.129 I .40 3-14 43.7 161

for reflections measured more than once in each data set; R, = ZJIFPHJ-IFPll/Z:IFPl, whereIF, and t &erg = ~*rr(~illi-(l)ull/cili), IFPHI are the observed structure factor amplitudes for the native and heavy-atom complexed crystals, respectively. $ The program ORESTES (Thiessen & Levy, 1973) was used to obtain the scale to absolute units and an overall isotropic temperature factor for each of the data sets.

sequence alignment were necessary as a result of the later refinement of penicillopepsin and some sequence revisions that resulted therefrom (James & Sielecbki. 1983). Similar strategies and computer programs to those described for other modeling carried out in our laborat,ory (Strynadka & James. 1988: Murphy it al., 1988) were used. So attempt was made to predict the structure of the prosegment or to include it in this search model. In addition. some insertions in the pepsin sequence relat,ive t,o penicillopepsin were also left out. This led to a model containing 317 residues, representing -850,; of the pepsinogen rnolecwle. Sormalized structure factors in the IO to 3.5 A resolution range (I A = 0.1 nm) and a radius of integration from 6 to 21 A4 were used to orient the searcah model in the pepsinogen cell. The maximum obtained for the rotation function (Crowther. 1972) was 6.1 0 above the mean (the next highest peak was 4.2 o above the mean). This orientation was further refined with the program BRUTE (Fujinaga & Read. 1987). once the correct positioning of the model was found using the data in the 5 to 4 a resolution shell. The correlation coffirirnt between observed and calculated structure factor amplitudes was 0%?8. -5 c above the mean. The R factor (= CllF,,I -I~J/ClF,J. where IF01 and IF=,1 are the observed and cBalculat,ed structure factor amplitudes) for this oriented and positioned search model was 0.49 for the data in the range 8 to 2.8 A resolution. An electron densit,y map comput,ed with amplitudes based on the ohsrrved native crystal data (I.8 A resolution: Table I) and phases derived from the molecular replacement solution was not considered of sufficient yualit,y to allow for an unbiased reinterpretation of the model. It was decided to concentrat,e the rffort,s of the structure solut,ion in t,hr search for heavy-atom derivatives and in the improvement of t,he molecular replacaemrnt searrh model. (c)

Multiple

Three isomorphous pepsinogen were used. crystals in solutions

isomorphous

heavy-atom They were of mother

phasing

derivatives of porcine prepared by soaking the liquor containing the

heavy-atom compounds. The platinum derivative csr?tal solution contaming was soaked for 48 h in a 2 rnM-K,PtCI,: the mercury one was soaked for I6 h in a solution with I mm-mercuric chloranilate and the uranyl crystal remained in a solution containing @5 mw-uranyl acetate, U02(C,H30,),, for I9 h. Intensity data to 28 A resolution ( - 8840 unique reflections) were collected from each of these crystals with the protocol described in section (a). above. The details on crystal parameters and data collection statistics are included in Table I. The heavy-atom sites in each of the isomorphous derivative crystals were determined from difference elertron density maps computed from scaled difference amplitudes and the molecular replacement phases. These maps were relatively clean and showed only a few peaks of possible heavy-atom binding sites. The sites so-derived were confirmed on the difference Patterson maps computed in the usual way. The positional parameters, isotropic temperature factors. and site occupancies of the heavy-atom derivatives were refined by lack of closure error minimization (Dickerson et a,l., 1968) with a program written originally by M. G. Rossmann and modified in our laboratory. The results are summarized in Table 2. The overall figure of merit was 0.62. Only the uranyl acetate derivative, measured last. was entirely satisfactory. Simultaneously, several additional attempts were ma,de towards improving the model for the molecular replacement. (d) Hinged

model

of pepskogen

Because the 2 domains of the aspartyl proteinases form prominent lobes separated by a deep cleft, it seemed reasonable that penicillopepsin and pepsinogen might differ in the relative orientation of these domains. By adjusting their orientations independently, we hoped to account for any gross conformational differences between the 2 molecules. The adjustments were done using the program BRUTE (Fujinaga & Read, 1987), which allows for a subset of the atoms to be fixed while the relative orientation (rotational and translational) of the rest of the model is optimized. To construct the N- and C-terminal regions, the molecular

A. R. Sielecki

674

Summary Heavy-atom derivative

Uranyl

Table 2 isomorphousreplacement

phasing

x/a

K,PtC!I, Mercuric

of the multiple

et al.

chloranilate

acetate

t Occupancy

is given

02591 0.303 1 0.0765 0.0728 01368 0.1495 03573 03384 @3105

04892 0.4609 00644 0.2455 - 0+036 0.0539 0.2655 02778 02141

in approximate

number

254 3 I.0 563 469 43.7 28.0 37.7 37.0 41.8

08555 08569 00260 06218 01603 0.6621 0.7869 08305 0.8528 of electrons.

$R, = WJ’pHl+ IFAl- IfHIIPII~pHI + IJ’AI, wherelJ’,l and IFA heavy-atom complexed crystals. respectively; LfHl is the amplitude data (Cullis et al., 1961). §fH is the scattering contribution from the heavy-atom model; phasing power of the derivative (Blow & Crick, 1959).

replacement model of pepsinogen was divided at, the caarbonyl carbon atom of Glyl69, located in the hinge region connecting the 2 domains of the aspartyl proteinases (James & Sielecki, 1983) and small orientation shifts were explored. After small adjustments of first the Ii-terminal domain (@9” rotation) and then the C-terminal domain (2.1” rotation), the correlation coefficient for the 4 to 5 A data increased from 0283 to 0.300. Placing a hinge in the molecular replacement model for pepsinogen improved it somewhat, but Fig. l(a) shows that the improvement in the accuracy of the model phases was marginal

0.7

0.6,

are the observed for the heavy-atom En is the lack

structure model. of closure

(e) Phase Multiple molecular combined (Hendrickson probabilities.

factor amplitudes for the native and The sums are taken over all the centric error

and their

combination

and

ratio

map

is indicative

of the

coeficients

replacement (rr1.i.r.t) and isomorphous replacement model phase information wcrp by means of Hendrickson-Lattman coefficients & Lattman. 1970). For the m.i.r. phase the coefficients were calculated according to

-.t Abbreviations used: m.i.r.. multiple replacement: n.m.r., nuclear magnetic circular dichroism: r.m.s.: root-mean-quart)

isomorphous resonance: ch.tl..

0.8.

\’ ,p c\

0.7.

0.6

o-5 $ 8 v”

A a ”

0.4

0.3

0.4

v 0.3

o-2

o-2

O-I

0.0

0 .I

o-05

o-10

O-15

Sine/ (a)

X

0.20

0

5

o-o

O-05

0.15

0.10

0.20

0.25

Sine/X (b)

indicated as a function of Figure 1. Effect of the manipulations of the structural model on the phase accuracy, resolution by the mean cosine of the difference from the final refined phase. (The quantity (cos Aa) is comparable to a mean figure of merit.) Model phases come from the rigid-body molecular replacement model (continuous lines). the hinged model (dotted lines), the hinged, auto-edited model (broken lines). or the first manually-refit model (chain dashed lines). Part (a) gives the results for model phases alone, whereas in part (b) the model phases have been combined with the final m.i.r. phases.

Structure

of Porcine Pepsinogen

equations (4) and (6) of Hendrickson (1971) for noncentric and centric reflections, respectively. For phase information from partial structural models, Hendrickson & Lattman (1970) used the non-centric probability distributions derived by Sim (1959). Srinivasan (1966) has derived similar distributions for models with co-ordinate errors. Based on Srinivasan’s expressions. the Hendrickson-Lattman coefficients for model phases are: A = X COY CL,,

for non-centric

reflections,

A = (X/2)

B = X

and

cos tl,,

for centric reflections,

sin u,

B = (X/2)

3. Crystallographic

2~*l~oIl~,I *

=

1-a;

IE,I and JE,I are the normalized amplitudes of the observed and calculated structure factors, respectively, and oA is a parameter that can be estimated in resolution shells by the method given by Read (1986u). Electron density maps calculated using combined phases suffer from model bias, though to a lesser extent than maps using only model phase information (Rice, 1981). Stuart & Artymiuk (1984) have suggested that the Fourier coefficients used for combined phase maps should vary for each reflection according to the extent to which the model influenced the combined phase. Our approach is similar in philosophy, but differs in detail (Read, 19866). For centric reflections, figure-of-merit (m) weighted model phases do not introduce model bias (Main, 1979; Read, 1986a), so the combined phase map coefficients are mCOMB IF,,1exp (ia,,,). There is no model bias for noncentric reflections phased by m.i.r. alone and the coefficients are given by mm,, IF,1 exp (ia,,,). On the other hand, when the partial structure is the sole source of phase information, the map coefficients are given by (2m,lF,I-DlN’Cl) exp (ia,) (Read, 1986a). The situation is less clear-cut when model and m.i.r. phases are combined, but it is reasonable to impose that the coefficients should vary smoothly between these extremes. This requirement is satisfied by the combined phase coefficients: m,0t4A~ol

exp

(%OMB)

- WWW,I

exp

(W

l-W/2 where W = H,/(H,+H,,,) (Read, 19863), and H, the variation of information (Guiasu, 1977), is a measure of the information content of the phase probability distribution. (f) Automatic

model

603 atoms (25.5% of the model) with lowest integrated electron density were discarded. The atoms remaining in the edited model included 81.7% of the main-chain atoms, but only 661 o/0 of the side-chain atoms of the “hinged” pepsinogen model. This seemed a reasonable ratio, since one would expect that in a family of related enzymes the positions of main-chain atoms will be more strongly conserved. The increase in phase accuracy obtained by automatic editing was greater than that obtained by the adjustment of each of the domains in the “hinged” model, as can be seen in Fig. l(a). However, these effects are less pronounced on the combined phases (Fig. l(b)).

sin a,

where X is defined as: x

675

editing

Prior to the availability of the uranyl derivative, the m.i.r. phases based on the other two heavy-atom derivatives (Table 2) were quite poor, having an overall mean figure of merit of 648. When these phases were combined with those from the hinged model of pepsinogen, the mean figure of merit increased to 057 for those reflections with combined phases. The corresponding electron density map was judged not to be of sufficient quality to refit the protein model as yet. Pu’onetheless, it was thought that this map could give an indication about the worst regions of the search model. Accordingly, a procedure was devised in which those atoms that agree least well with the electron density were discarded from the model. Using a program (ADEPU’) of J. Moult, the densities within 1.5 A of all atomic positions were integrated. The

Restrained Refinement

The electron density map computed to 1% A resolution with the calculated phases based on the hinged, auto-edited model described in the previous section, in combination with the m.i.r. phases obtained from all three heavy-atom derivatives (Table Z), was used to revise the pepsinogen model. The positions of many side-chain atoms, were modified and some alternative main-chain conformations were selected. Nevertheless, none of the 44 amino acids of the proenzyme segment could be traced confidently in this map. Similarly, the electron density associated with the following w 13 residues (corresponding to residues 1 to 13 of the pepsin molecule) was weak and ambiguous. This was also the case for several other regions of the molecule where the electron density map indicated that the search model was not appropriate, but for which an alternative interpretation was not clear. The corresponding residues were removed from the model and a new map with combined coefficients and phases was calculated. In this map it was possible to trace some of the “missing” residues and the corrected model was once again used for the computation of a combined map. After three iterations of this procedure, restrained-parameter least-squares refinement (Hendrickson & Konnert, 1980; Hendrickson. 1985) was initiated with observed structure factors in the 5 to 3 A resolution range and a, pepsinogen model containing 2647 atoms (9576 of the nonhydrogen atoms in the pepsinogen molecule). The resolution of the observed data was slowly increased to include, by the end of each stage, all the measured reflections with intensity I 2 2a. Rounds of a few cycles of refinement followed by electron density map reinterpretation had to be repeated many times, as the improvement in the quality of the maps at each stage was rather marginal. The overall strategy was to improve the model by fitting as meticulously as possible those regions for which the combined phase map was clear and to omit those side-chains or complete residues with poor or ambiguous associated electron density. Revisions of the model were carried out in an MMS-X graphics system (Rarry et al., 1976) using the interactive program M3, developed by Colin Broughton (Sielecki et al., 1982). Maps calculated with combined phases and coefficients as discussed in Experimental Procedures section 2(e) were used

A. R. Sielecki

676

et al.

Table 3 Rejhement

R ( = W’ol-

statistics

pepsinogen

for porcine resolution

lJ”cll/W’,l)

No. of reflections (I 2 lo,) Resolution range (A) No. of protein atoms No. of solvent atoms r.m.s. deviations from ideal values: Bond distances (A) Angle distances (A) Planar 1-4 distances (A) Planar groups (A) Chiral centres (A’) Non-bonded contacts (a) single torsions (A) (b) multiple torsions (A) (c) hydrogen bonds (A) Planar torsion angle (w) (deg.) r.m.s. deviations for isotropic temperature main-chain bond (A*) main-chain angle (A’) side-chain bond (A2) side-chain angle (A’) Torsion angles (no restraints applied): staggered orthonormal

at 1.8 A

0164 32,264 E+o- 1.8 2785 (+ 1 phosphate group) 237 0.016 0047 0.048 @008 0179

(@024-@012)t (0.048-0.028) (0.048-0.028) (0.0 14-OQO9) (0180-0100)

420:

-,8&&.-.L+ -180

- _-120

-60

0

60

i

120

180

9

0251 (@40@@360) 0.183 (0.400-0.360) @ 173 (@400-0.360)s 1.6 (60-40) factors: 2.4 (1.5-2.0) 3.6 (2.0-2.5) 39 (2.0-2.5) 56 (2.50-32)

Figure

2. Molecular

nogen

molecule correspond to

conformation of the as shown by a +-$ plot. the following residue types:

(V) p-branched amino boundaries of the fully shown with continuous of accceptable van der of 115”.

refined pepsiThe symbols (0) proline.

acids, (+) glycine, (0) others. The allowed conformational regions are lines; broken lines delimit regions Waals’ contacts for a z (CD) angle

16 20”

model phases were used, wit,h map cLoefEc+nts as suggestSed by Read (1986a). Refinement strategies followed the path of other structures solved in this laboratory (Sielecki rl nl.. 1979; Sielecki & ,James, 1981; Fujinaga et al.. 1985). The values of the restraints used are given in Table 3. Solvent’ molecules were added only after the complete molecule of pepsinogrn could ~JC included in the model. Also, according to our experience, weaker data with 241) > 12 lo(Z) were refined only during the last cycles, as their earlier inclusion does not, seem to have an effect on the speed of convergence of the refinement.

t The values in parentheses are the input estimated standard deviations that determine the relative weights of the corre sponding restraints. They have been varied in the given range throughout the refinement. Typically, each round of refinement was started with lower resolution data ( - 2.8 A) and the weights corresponding to the values on the left. Higher resolution data were added in stages, as the value of the weights were increased (lower input standard deviations) reaching the values on the right towards the end of the corresponding round. $ No restraints were applied to possible hydrogen bonding distances longer than the sum of the van der Waals’ radii of the atoms involved. for all model interpretations the figure of merit, of

and map refitting until the combined phases did not significantly improve by the addition of the m.i.r. information. From that point on (cycle 19 of the least-squares refinement: R = 0.262 for the 23.662 reflections in the 6 to 2 A resolution shell) onI>

(a) Rejkement

results

A total of 80 cycles of restrained-parameter squares refinement of the struct,ure of porcine

leastpepsi-

Table 4 Poorly LyslOp Glnl4p LyslSp Lys27p Lys3Op Glu4Op Ala43p Leu44p Ill?1 Gly132 Ser157 to Serl61

CD, CE, CG. CD. CE, NZ’ CD, CE, CO. (‘D. / CG CD, Sidk-chain

Glu202 ml239

CG, CD I. to Gly243

Uln277

to Asp280

Thr293.

Ser294

NZ OEl.

determined

Side-chain

OEl 1 GE2

in

Low density Two distinct

NE2

NZ CE. / NZ OEl OP22 ’

regions

1 1

No interpretable

the

pepsinogen

for atoms alternative density

molec,ulr

listed. conformations for atoms

listed

Low density No interpretable density for side-chains; poor densit?; fhr main-cshaio Low density Weak density Very mobile, surface loop. There is no interpretable density for Asnl5X and Asp159, at the second and third positions of the turn No interpretable density for these atoms Mobile surface loop. Poor fit to weak density. 8er 241, in the second position of the turn, has no associated electron density Very mobile surface loop. No interpretable density for Asp279 and AspPXO. at the second and third positions of the turn Low density. Mobile loop

Structure

“i

luip

iiJ(p

il7

Ni7

Hi7

Ti7

lb7

of Porcine

Fli7

Ai

Peps¥

Sk7

Oi77

1167’

677

02” ”

A237”

D267”

0277

E297

N3”

(b)

Figure 3. The variation of the temperature factor B along the polypeptide chain. Thick lines represent averaged values over the main-chain atoms (N, C”, C, 0); thin lines correspond to the average over the side-chain atoms (set to the main-chain value for glycine residues). (a) B factors for the pepsinogen molecule. (b) Difference between the temperature factors obtained from the refined structure of pepsin (Sielecki et al., 1990) and those shown in (a) for the pepsin component of the pepsinogen molecule.

nogen have been done. The final R-factor for the 32,264 reflections with I 2 o(I) in the range 8-O to 1.8 A resolution is 0.164. Table 3 summarizes the molecular geometry of the refined model. These indicators are consistent with a well refined crystal structure with a co-ordinate accuracy of 610 to 0.15 A in regions of the molecule having B factors of 5 to 15 A* (Read et al., 1983; Fujinaga et al., 1985). A Ramachandran plot (Ramakrishnan & Ramachandran, 1965) is presented in Figure 2. Only six non-glycyl residues have I$, ti angles outside of the accepted values for hard-sphere models of L-amino acids. Among them, Asn158, Asp159, Asp279 (largest deviant at 5”, 106”) and Asp280 are at the tips of the two most mobile loops in the molecule (see Fig. 3 and Table 4). The other two residues are Asp52 (68”, 16”) and Ser196 (180”, 166”). Both are very close to the boundaries of regions of acceptable conformation. Residues of the pepsinogen molecule that are still poorly ordered at the conclusion of the refinement are listed in Table 4. As described for other members of this family (Sielecki et al., 1989), these regions are all at the tips of loops that protrude from the surface of the molecule and have only very weak or no associated positive electron density in the final maps. Other amino acids have well-ordered

atoms of the main-chain chains. Four of the lysyl zyme segment are in that

but poorly-ordered sideside-chains of the proencategory (Ta,ble 4).

(b) Thermal vibration parameters Figure 3(a) shows the variation of the mean isotropic B factors of the residues with position along the polypeptide chain of pepsinogen. For comparison, the differences in average temperature factors for each residue of the pepsin component of this molecule with the corresponding values from the refined structure of the mature enzyme (Sielecki et al., 1990) are depicted in Figure 3(b). As with all the aspartyl proteinases of known structure, the residues at the active site surrounding Asp32 and Asp215 are among the most highly ordered (1owB factors). This region is one of the least mobile regions of the pepsinogen molecule. The electron density of some of the residues in this part of the molecule is shown in Figure 4. Some of the strands of polypeptide chain that will constitut’e parts of the substrate binding sites in the activated molecule have higher B factors than the molecular average of 18.1 AZ (Fig. 3(a)). These segments span Glu239 to Glu244 and Va1291 to Gly296. However, the regions

678

A. R. Sielecki et al.

Figure 4. The region surrounding the catalytic aspartic residues, Asp32 and Asp215 density of the final (21J’J -IF&, u., map. The contour level is 650 e/A”.

of the molecule that form the S,t binding site, Thr74 to Ser79 and Serl 10 to Tyr114, are relatively well ordered due to the tight-packing against residues of the proenzyme segment with which they interact. The most striking feature evident in Figure 3(b) is the marked difference in B factors between the pepsinogen and pepsin molecule, for the residues Ser157 to Ser161. These residues are located at the outermost part of a loop that protrudes from the t The nomenclature of Schechter & Berger (1967) is used throughout this work to refer to the binding sites in the activated molecule.

x

L44P

with

the associated

electron

surface of the molecule (see section (e), below, on packing). In pepsinogen, these residues are fully exposed in a wide solvent channel and, together with the region of Gln277 to Asp280, constitute the most disordered portion of the molecule (Table 4). In the pepsin structure nonetheless, due to t.he different crystal packing, Asn158 to Asp160 form intimate contacts and hydrogen bonds with residues of the “flap” (Ser72 to Thr74) of a symmetry-related molecule (Sielecki et al., 1990). The stabilizing effect of these intermolecular contacts in the crystalline lattice is very marked. There are other poorly ordered parts of the molecule with relatively weak electron density distribu-

x

L44P

Figure 5. The region surrounding the junction between the prosegment and the pepsin component of the molecule with the associated electron density of the (21J’J-IF,I), a, map computed at the conclusion of the pepsinogen refinement. Ala43p to Gly2 are on the surface of the molecule; Gly76 to Gly78 are in the “flap” region. The contour level depicted is 625 e/A3, much lower than the one used for Fig. 4.

Structure

679

of Porcine PepGnogen

Figure 6. The helical region Ala229 to Ile235, flanking the residue that has been deleted (Ile230) from the published amino acid sequence (Sepulveda et al., 1975; Lin et al., 1989). The final refined model is shown with its associated electron density in a (21FJ - IFJ), a, map. The contour level is 050 e/A3.

tions in the final electron density map (Table 4). Of particular interest is the region at the junction of the proenzyme segment to pepsin proper, Leu44p-Ilel , shown in Figure 5. The density is very weak at the main-chain of Leu44p and below the noise level for its side-chain. Nonetheless, it is unlikely that the peptide bond would be cleaved position, as a single at this spot on SDS/polyacrylamide gel electrophoresis at a molecular weight -40,000 indicates that the zymogen is intact in the crystals (unpublished experiments).

(c) Amino acid sequencechange In spite of an early report by Marciniszyn et al. (1975) indicating the existence of a structural variant of porcine pepsin with a deletion of the isoleutine residue at position 230, all the published work and sequence alignments included this residue (see, for example, Shewale et al., 1985; Welinder et al., 1985; Hsu et al., 1977), thus giving a total of 327 amino acids for the pepsin portion of the zymogen. Early on during the refinement and model re-interpretation process it became evident that the electron density associated with the side-chains of residues Asn231 to Asn241 seemed to be out of step by one with the nature of the amino acids at the corresponding positions. In fact, trying to accommodate the carboxyl side-chain of Asp235 into an internal hydrophobic pocket drew our attention to this sequence misalignment. The deletion of the isoleucyl residue from position 230 allowed for a good fit of the other residues to the electron density for the remainder of the helix (see Fig. 6) and the shortening of the poorly ordered surface loop from Glu239 to Glu244 (Table 4). The density for this loop in the first few maps was very weak and the disruption in this region caused by the inclusion of an extra residue at position 230 could not be easily

detected. If there is a variant of porcine pepsin that contains an isoleucine at this position it is most likely that the disposition of the helix 224 to 235 would be significantly different from the one we have observed in the present structure. Otherwise such an insertion would alter the amphipathic nature of the helix, exposing hydrophobic residues to the solvent while burying hydrophilic side-chains. It should be pointed out that in the recently reported structures of the active enzyme (Sielecki et al., 1990; Cooper et al., 1990; Abad-Zapatero et al.. 1990) there is no isoleucine at position 236 either. This is also the case for the other co-ordinates of pepsinogen (set IPSG, deposited by J. Hartsuck & S. Remington) that are available in the Brookhaven Data Bank (Bernstein et al.: 1977). (d) Phosphoserine6X The consistent presence of residual density at the side-chain of Ser68 was interpreted during the later stages of the refinement as a phosphoryl group covalently attached to its OY at)om. Although the electron density for this phosphoryl group is weak, indicating low occupancy for the site and/or a high degree of mobility, confidence for such an interpretation was based on much earlier work. It has long been known t.hat both pepsin and pepsinogen contain one phosphorus atom per molecule (Northrop, 1939). Flavin (1954) suggested a phosphoserine group and Clement et al. (1970) also favored a phosphate monoest’er moiety. These results were later confirmed by a 31P r1.m.r. experiment (Edmondson & James, 1979). An analysis of the present crystallographic results on pepsinogen in conjunction with t.he results of the titration of the phosphoserine residue using 31P n.m.r. has been reported elsewhere (Williams et al.. 1986). Both techniques support the presence of a highly mobile phosphoserine group on the surface of

A. R. Sielecki

680

et al.

Table 5 Intermolecular Residues

contacts

in the crystals

of porcine

at the interface --

Molecule

Transformed

at z,y,z,

A. Symmetry

operation:

X, y+l,

G296

z (unit

--

molecule

cell tranhtion

operation:

X,y,?. (Z-fold

K21p, D24p, K27p, T28p 11, D3, E4, V43 Y44, 546, S47, L48 T81, E105, T106, El07 SllO, Y113, Y114 (‘. Symmetry

operation:

1 D. Jymmetry

operahan:

T203, 1204, A205, s226 > A U230 E. Symmetry

x +f, S207

operation:

x-f,

E65, T67, Y86, N98 t The number of contacts of additional contacts that

1(--l

4(l)

94( 10)

36(10)

.56( 10)

86( 14)

14(l)

34( 15)

38(5)

17(6)

axis at O,y,t)

Lip, V2p, G144, 8147 Q148, 1170, D171. S172 Y175, 5178 y + 4, z (unit-cell T24, p13, T63 y +f.

Number of protein-water contact&

axis at O,y,O)

K21p, D24p, K27p, T28p 11, D3, EP, V43 Y44, S46, S47, L48 T81, E105, T106, El07 SllO, Y113, Y114

X,y,Z + 1 (2;fold

Llp, V2p, G144, 6147 Q148, 1170, D171. S172 Y175, S178

Number of protein-protein contacts?

A

in y, b = 434 A)

D279

B. Symmeky

pepsinogen

T. (unit

cent&g

operation)

D.59, D60 cell cent&y

S248, C249, 5250,

operation/

8251

at the interface is given. The numbers are possible hydrogen bonds.

the molecule. It, points toward a larger solvent cavity in the crystals, with the surrounding molecules making no intermolecular contacts. The situation is very different in the monoclinic pepsin structure where the crystal molecular packing (space group P2,) around the phosphoryl group provides close contacts from this group to the sidechains of two aspartic acid residues and an asparagine, Asn57’, Asp59’ and AspGO’, from a neighboring molecule (Sielecki et al., 1990). This unusual environment, stabilizes the phosphoserinr group (compare Fig. 4 of Williams et al.. 1986. and Fig. 10 of Sielecki et al., 1990). In both the pepsin and pepsinogen structures, Ser68 is in a /? bulge/Asxtype turn charact’erized by the hydrogen bonding interaction from the Oyl atom of Thr67 to Gln69X. Yo I “extra” electron density was present near Ser68 in the structure of hexagonal pepsin (Cooper it al.. 1990): t,his region was not discussed by Abad-Zapatero et al. (1990). (e) Packing The molecule of pepsinogen in the general position (r.y.2) of space group C2 makes intermolecular contacts with eight symmetry-related molecules (Table 5). This central molecule has a total of 228 direct protein atom-protein atom contacts with its nearest neighbors that, are less than 4.0 A in length. Most of these contacts are localized in two areas on the molecular surface. The 2-fold axes at O,y,O and O,y,+ in the unit cell have the largest number of protein-protein contacts (B and C in Table 5; Fig. 7). The region across the 2-fold at O,y,O involves hydrogen-bonding and van der Waals’

in parentheses

refer to the number

intermolecular contacts for residues Lys2lp to Thr28p in the prosegment (helix B), Tlel to Glu4, Va143 t,o Leu48, Glu105 to SerllO and Tyrll3-Tyrll4. These last three residues (110, 113 and 114) are located on the small helical section that flanks the S,/S, substrate binding sites. Extensive hydrogen bonding and van der Waals’ contacts are also made between the two molecules related by the S-fold axis at O,y,& (Fig. 7(a)). The main interactions involve residues of the central six-stranded j? sheet (Leulp, Val2p) and residues of the crossover loop (James & Sielecki, 1987), Ile170 to Serl78. The number of protein-protein contacts is less extensive at the interface of the two molecules at x,y,z and %,y,Z+ 1 than at the interfacof the two related by the S-fold axis at O,y,O (Table 5). On the other hand, this smaller number of contacts is compensated by a much more extensive network of protein-water interactions (100 verBus 46 when including hydrogen bonding interactions; see Table 5). Figure 7(b) shows the interactions between molecules that are related by the unit cell translation in the h axis direction and by the unit cell centring at +(+,+,O). From this vantage point, it’ can be seen that the central molecule is making favorable van der Waals’ contacts and hydrogen bonding interactions with six surrounding molecules. The more abundant interactions are between molecules at xf),y-t-),z and the central one at x,y,z. Comparison of the two views of Figure 7 shows that the pseudohexagonal layers represented in part (b) are stacked on top of one another with the regions of close contacts distributed around the S-fold axes of the space group G2 (O,y,O; O,y,+; +,y,O; &y,i). There are

Structure

of Porcine Pepsinogen,

Figure 7. The molecular packing in the crystals of porcine pepsinogen. (a) A view of the unit cell parallel to the crystallographic b axis showing the central molecule at r,y,z (thick lines between C” atoms) and the contacts that it makes to the 2-fold related molecules at Z,y,Z (to the right) and Z,y,Z+ 1 (to the left). The details of the intermolecular contacts are given in Table 5. Also shown are contacts made by the central molecule with 2 C-centred molecules at r-4, y+f, z (top) and r+ f, y++, z (bottom). (b) A portion of the pseudo-hexagonal close packed layer viewed approximately in the r,y plane (a axis is vertical, b is to the right and c projects into the plane of the page). The central molecule at x,y.z corresponds to the central molecule in (a). The 4 molecules related by C-face centring to the central molecule are at: z+ 4. y+), z; z-+, y+f, z (to the right of the central cone) and z+& y-4, z; z--1, y-4, z (to the left of the central cone). no contacts among the molecules interleaving 2, screw axes of the t,y,O and i,y,4).

4. Molecular The nogen

related by the space group (at

Structure of Porcine Pepsinogen

overall folding of the entire molecule is shown in Figure

porcine pepsi8 in a V-atom

representation. It is most convenient to separate the description of the molecular conformation of pepsinogen into two sections. The first will involve the residues that constitute the proenzyme segment, Leulp to Leu44p, plus the first 13 residues of the pepsin portion of the molecule, Ilel to Glu13. All of these residues undergo large conformational changes during the activation process (James & Sielecki,

682

A. R. Sielecki

et al.

60

OP

Figure with the Cys45 to Asp215,

8. A C” representation of porcine pepsinogen in a stereoview. The prosegment. Leulp to Leu44p is rendered virtual bonds between c” atoms filled in, the virtual bonds for the pepsin part are open. The disulfide bridges Cys50, Cys206 to Cys210 and Cys249 to Cys282, as well as the side chains of the 2 catalytic groups Asp32 and are shown. Every 10th C” atom is labeled.

1986; A. K. Muir & M. N. G. James, unpublished results). The second section will consider the struct’ure of the remaining residues, Tyr14 to Ala326; this constitutes the majority of what becomesthe active pepsin

molecule.

(a) The proenzyme

segment

A prominent feature of the pepsinogen structure is the large six-stranded antiparallel p-pleated sheet (Fig. 9), a common secondary structural motif of the aspartic proteinases (James & Sielecki, 1987). It enfolds a central hydrophobic core on one side of the

molecule opposite to the active site cleft that contains Asp32 and Asp215 (Fig. 8). The hydrogen bonding observed for this sheet is similar to that we have described for the refined structure of porcine pepsin (Sielecki et al., 1990). In pepsinogen, however, its N-terminal strand is not the N-terminal portion of the active pepsin molecule (Ilel to Leu6) as described in the above reference. In the zymogen molecule, the N-terminal residues of the prosegment of pepsinogen (Leulp to Leu6p) are the ones that mimic those interactions. This major conformational difference between the N-terminal residuesof pepsin in the active enzyme as compared to those residues in the zymogen form is one of the

5P

Figure 9. A stereoview of l-sheet I of porcine pepsinogen overlapped with the same structural penicillopepsin.A total of 55 residuesin eachmoleculewereusedfor the least-squares fitting. All main-chain used and the r.m.s. difference for the 220 atoms was 083 A.

feature of atoms were

Structure

of Porcine Pepsinogen

reasons for the lack of proteolytic activity of the precursor molecule (James & Sielecki, 1986). The hydrogen bonding contacts involving residues of the proenzyme segment are listed in Table 6. The majority of hydrogen bonds made by the polar side-chains of the proenzyme segment are to main-chain or side-chain atoms of the active enzyme portion of the molecule. Ten of them correspond to ion pair interactions between charged sidechains and their importance has been discussed previously (James & Sielecki, 1986). The short helical segments in this region exhibit a mixed a and 3,0 conformation. It is those residues at the N and C termini of the helices that can be best described in terms of the 31,, helical parameters. The first ten residues of porcine pepsinogen adopt an extended b-sheet conformation (Figs 8 and 9). Residues Val2p, VaWp and Leu6p form hydrogen bonds to Leu167, Va1165 and Ser163 (Table 6), respectively, as part of the interactions in the sixstranded p-sheet characteristic of the aspartic proteinases. These same hydrogen-bonded interactions are observed in pepsin (Fig. 6 of Sielecki et al., 1990) and in penicillopepsin (Fig. IO(a) of James & Sielecki, 1983). Val7p and Lys9p form a short b-sheet interaction with Phe15 and Glu13 (Fig. 10, Table 6). Serllp, in spite of having 4, $ values characteristic of a p-sheet residue (-68”, 167”), can be considered the cap of the first helix A, that extends from Leul2p to LyslSp. The Oy atom of Serllp forms a hydrogen bond to the main-chain NH of Glnl4p (Fig. 10, Table 6). Argl3p is highly conserved among the prosegments of the aspartic 1988). It can be seen in proteinases (Foltmann, Figure 10 that the guanidinium group of its sidechain makes a hydrogen-bonded ion pair interaction with the side-chain of Asp11 (Table 6), a residue also highly conserved in the mature aspartic proteinases. This interaction stabilizes the conformation of the prosegment in this region and is a sensitive point for the acid conversion of pepsinogen to pepsin (A. K. Muir t M. N. G. James, unpublished results). A second pH-sensitive ion pair interaction between ArgSp and Glu13 serves to help tie down the termini of the small inhibitory domain of the zymogen. Disruption of these two interactions could trigger the conformational changes accompanying the conversion process (James & Sielecki, 1986). A second helix, B, starts with Lys2lp and finishes with Thr28p (Fig. 10). The two residues at the C terminus of helix A, Ilel’lp and LyslSp, are also part of a sharp, type I turn (Table 7) between helices A and B. The interhelix angle is approximately 112”. Helix B ends in a type I turn involving a hydrogen bond between Leu26pO and His29pN. This helical combination has not been observed previously in other proteins. However, it is reminiscent of the two helices present in the cro repressor molecule (Anderson et al., 1982). A short segment of chain with the residues in an extended B-sheet conformation (His29p to Asn32p)

Hydrogen

683 Table 6 bonding in the proenzyme segment d(D Acceptor

Donor (a) b-sheet Leul67 N Val4p N Va1165 N Leu6p N Val7p N Phel5 N Lys9p N (h) Helix A (a-helix) Serllp 0 Leul2p 0 Argl3p 0 Glnl4p 0 Asnl5p 0 LeulGpO .__ Ilel7p 0 (c) Helix 11 (a-helix) Lys2lp 0 Leu22p 0 Lys23p 0 Asp24p 0 (Leu26p 0 (d) Helix C (3,0-helix) Asn32p 0 Pro33p 0 Ala34p 0 (Ala34p 0 (e) Side-chains-Main-chain Lys9p NZ Thr28p OGl ::: Thr28p OGl His29p ND1 Asn32p ND2 Ser35p OG Asn8 ND2 Thrl2 OGl Thrl2 OGI Glnllp N Ala34p N SerllON PhelllN __. Asp290 N (f) Side-chain-Side-chain SArg8p NE jArg8p NH2 SArglSp NE $Argl3p NH1 :: llGlnl4p NE2 Lys27p NZ .t.His29p . NE2 SHis3lp ND1 tLys36p NZ jLys36p NZ ~Lvs~~D

NZ

‘Tyr37p OH Asn8 ND2 Tyr9 OH

A)? a(D-H

A)t

(4

(7

2.9 2.7 2.9 2.9 2.8 2.8 27

151 161 164 167 167 157 165

Asnl5p N LeulGp N Ilel’lp N LyslSp N Aspl9p N Lys2lp N Gly2Op N

2.9 2.9 2.8 30 30 3.0 2.9

143 153 176 155 138 149 154

Phe25p Leu26p Lys27p Thr28p His29p

N N N N N

36 31 30 32 36

151 174 138 151 158)§

Ser35p N Lys36p N Tyr37p N Phe38p N

2.9 30 31 32

166 163 143 134)§

Pro1 16 0 Asp24p 0 Phe25p 0 Phe25p 0 Asp299 0 Asn32p 0 Tyr75 0 Tyr9 0 Gly217 0 Serl lp OG Asn32p ODl Glu4 OE2 Glu4 OEl Asn32p ODl

28 35 3.4 3.0 2.9 2.7 2.7 3.4 2.6 3.3 3.1 30 2.9 31

179 165 176 141 167 170 137 171 175 167 150 130 157 173

Glu13 Glu13 Asp11 Asp11 Serllp Asp24p

3.1 3.1 2.8 2.8 3.2 28 2.6 2.7 3.0 3.1 2.7 2.6 2.9 2.7

148 144 172 164 151 179 165 159 171 141 175 178 169 169

V&p Va1165 Val4p Ser163 Phe15 Val7p Glu13

0 0 0 0 0 0 0

OEl OEl OD2 ODl OG ODl ASD~ OD2 Gl;7 OE2 Asp32 ODl Asp215 OD2 A~~215

ODl

Asp215 ODl Ser35p OG Asp32 OD2

t d(D A): distance of possible hydrogen bonds from D = donor to A = acceptor. a(D-H A): angle of the possible hydrogen bond at the calculated position of the H atom. $ These are ion pair interactions from positively charged groups in the propart to negatively charged groups in the pepsin component of the molecule. 0 Hydrogen bonds between the residues enclosed in parentheses do not conform to the helix type under which they are listed. 11The side-chain of Glnl4p is observed in alternate conformations. The H-bonds listed here are for the one of higher occupancy.

684

A. R. Sielecki et al. LIP

LIP

‘5: P5P

Figure 10. The prosegment Leulp to Leu44p and the first 15 residues of pepsin Ilel t,o I’helS in the conformation adopted in the zymogen. The main-chain atoms are shown with thick lines and side-chainswith thin lines representing the bonds. Hydrogen-bonding interactions are represented by broken lines. Geometrical are given in Table 6. Reverse turns of this proenzyme segment are listed in Table 7.

is followed by a predominantly 3,, helix from Pro33p to Tyr37p. This helical segment (helix C) seemsto be initiated by a hydrogen bond from the main-chain NH of AIa34p to the side-chain amide oxygen of Asn32p (3.1 A), in the manner described for Asx type turns (Rees et al., 1983; Baker & Hubbard, 1984; Richardson, 1981). There are two nested 3ro-turns forming a single helical segment from Phe38p to Ala42p (Table 7, Fig. 10). As mentioned earlier (Crystallographic Restrained Refinement section 3(b)), residues Leu44p and Ilel are on the surface of the molecule and are partially disordered (Fig. 5). The average B factor for the main-chain atoms of this region is 49 A2 (Fig. 3), indicating high conformational lability. The only secondary structural motifs associated with the following 13 residues of the pepsin molecule, Gly2 to Tyrl4, are the 3,, turns that approximate a 3,, helix as listed in Table 7. The hydrophobic core defined by residues of the proenzyme segment and the following 14 N-terminal

details of the hydrogen

bonds

amino acids of mature pepsin is not vary large; it consists of the following residues: Leul2p, LeulGp, Leu22p, Phe25p, Leu26p and LeulO (Fig. IO). These are highly conserved residues in all of the proenzyme segments of gastric aspartic proteinases (Welinder et al., 1985; Foltmann, 1988). Residue 10 is a hydrophobe in all the active enzymes of known sequence, indicating a possible functional role in substrate

binding

for its

side-chain

following

the

conformational change upon activation of the zymogen. In the light of the crystallographic structure it is now possible to interpret some of the results of earlier (1963)

studies.

The c.d. measurements

of Perlmann

suggested that, pepsinogen has a sizeablr component, of a-helical structure, whereas pepsin does not. Also, this helical content for the zymogen is lost on denaturation (Perlmann, 1963). Tndeed, in t’he crystal structure of porcine pepsinogen nearly. half of the residues of the proenzyme segment (21/44) adopt a helical conformation.

Table 7 Reverse turns in the proenzyme segment Residues Ile17p, Lysl8p,AsplSp, Gly20p Leu26p, Lys27p, Thr28p, His29p Phe38p, Pro39p, Glu4Op, Ala4lp ProfSp, Glu40p, Ala4lp, Ala42p Glu4, Pro5, Leu6, Glu7 Leu6, Glu7, Asn8, Tyr9 Tyr9, LeulO, Aspll, ThrlP

4% $2 -56, -66, -51, -56, -54, -56, -61,

-36; -41; -47; -32; -18; -38: -35;

43, +3

‘Met

-93, 12 -101. I -56, -32 -92, 8 -107, 16 -10, 2 -103, 14

t Turns have been classified following Crawford et al. (1973). $ d(D A): distance (in Angstroms) of possible hydrogen bond from the possible hydrogen bond at the calculated position of the H atom.

D = donor

d(D

I I

29 3.6 2.9 3.0 3.0 3.0 3.5

III I I I I

to A = acceptor.

A)f

a(D-H

a(D-H

.4)$ 154 158 144 161 155 157 169

A): angle (in degrees)

ot

Structure

of Porcine Pepsinogen

In addition, predictions of extensive a-helical structure on the basis of amino acid sequence analysis (Welinder et al., 1985) have also been largely correct, although the details of the start and endpoints of these secondary structures were not accurate. The b-sheet region at the N terminus of the zymogen is in reality much more extensive than the prediction and the two helices A and B are shorter, with the turn between them at a different position than what had been deduced on the basis of the amino acid analysis. The 310-helix at Pro33p to Tyr37p was predicted as a turn region for all of the sequences. The comparison of the pattern of location of hydrophobic and hydrophilic residues between the vertebrate gastric enzymes and penicillopepsin was used to predict two regions where the proparts might interact with the enzymes (Welinder et al., 1985). Indeed, residues Leul2p, Asnl5p and Leul6p of the proenzyme segment of porcine pepsinogen interact with residues Phelll and Tyrl14. In addition, they also make contacts with the residues of the N-terminal region of pepsin, Glu4 and Pro5 (Figs 8 and 10). On the other hand, the second predi&ed contact zone, residues Ser129 to Ala133, has no interactions with residues of the proenzyme segment as shown by the present crystal structure (Fig. 8). (b) The pepsin conformation

in the zymogen

Several refined crystal structures of porcine pepsin have been reported recently (Sielecki et al., 1990; Cooper et al., 1990; Abad-Zapatero et al., 1990). Because of the close preservation of the fold of the pepsin portion of the zymogen and that in the active enzyme, it is not necessary to duplicate these structural descriptions. In addition, a detailed study of the fascinating changes that accompany the zymogen to active enzyme conversion will be presented in a separate paper (A. K. Muir & M. N‘. G. *James, unpublished results). In this section we shall discuss the comparison of the pepsin component of the zymogen with penicillopepsin, the aspartyl proteinase that was used as the basis for the iiitial modeling and molecular replacement solution. This structural comparison has indicated major revisions to the amino acid sequence alignment between these two molecules that had been originally proposed by Hsu et al. (1977). The new sequence alignment based on tertiary structural comparisons is presented in Table 8. With the current interest in modeling experiments for proteins believed to be homologous (and, in some cases, not necessarily closely related; Pearl & Taylor; 1987) t’o others of known structure, it is of interest to analyze how the misalignment of portions of the sequences can affect the resulting model. In spite of the fact that the N terminus of pepsinogen (Leulp to Lys9p) is actually structurally homologous to the N terminus of penicillopepsin (Ser3 to Thrl 1), we show in Table 8 the alignment of the N terminus of the active pepsin molecule as it

685

was done before the structural insight. Also the regions discussed below exhibit major revisions to the original alignment (Hsu et al., 1977) as a result of the detailed structural comparison. The N-terminal ten residues are shifted by two residues (Table 8 and Fig. 3 of Hsu et al., 1977). The Thr22-Pro23 dipeptide is inserted at the end of a hairpin loop. This peptide bond adopts a cis configuration. Another dipeptide, Ser62-Thr63, is inserted in the sequence of pepsin relative to penicillopepsin and Gly68A is inserted into the penicillopepsin molecule relative to pepsin. The residue missing in the flap of pepsin is the serine at position 78A rather than Ala80A, as originally proposed (Hsu et al., 1977). A major difference between the two molecules is at position 131, where there is a tripeptide insertion in penicillopepsin, Gln-Pro-Gln. This caused the original alignment to be out of synchrony by two to three residues from position 132 to 144. A two-residue deletion in penicillopepsin at Asp142-Gln143 of pepsin was not detected originally. The alignment from Va1184 to Gly188 has been altered and the tripeptide deleted in penicillopepsin near the disulfide bridge at Cys206 (Hsu et al., 1977) was changed. The fact that an isoleucine residue at position 230 has not been observed in the present variant of pepsinogen or any of the porcine pepsin/ pepsinogen molecules of known three-dimensional structure (Sielecki et al., 1990; co-ordinate sets 1PSG (Hartsuck $ Remington, 1988), 2PEP (Cooper et al., 1990) and 3PEP (Abad-Zapatero et al., 1990) from the Protein Data Bank (Bernstein et al., 1977)) has reassured us that we were correct in deleting this residue from the sequence. Its inclusion has caused the model to be shifted from residue 230 concomitant errors to 2358 (Table 8), with distorting the appearance of the electron density maps in this region. The disulfide-bridged loop, from Cys249 to Cys282, was very hard to reinterpret in the associated electron density maps during the structure solution of pepsinogen due to major problems in the original alignment for this region. There is a two-residue deletion in penicillopepsin relative to pepsin at Ile252-Asp253 (Table 8), which caused the alignment and resulting model to be out of step by two residues up to Cys282. The alignment is again synchronous due, in part, to the tripeptide in penicillopepsin at position 279 insertion (Table 8). In spite of these differences, the original alignment (Hsu et al., 1977) was still acceptable on the basis of sequence identities or similarities. Nonetheless, it caused major problems in trying to sort out the initial electron density maps that were affected by some wrong model bias through the calculated phases and map coefficients. The major insertion of five residues at position 292 of pepsin over penicillopepsin was predicted almost correctly (Hsu et al., 1977). The addition of these residues accentuates the protrusion of the elongated loop of pepsin into the solvent, with a concomitant display of high mobility (see Fig. 3(a)). Thus, in spite of the high degree of expected structural equivalence, the correct sequence alignment

686

A. R. Sielecki et al. Table 8 Amino acid sequencealignment foq pepsinogen (PEP)

PEP PENA

It I S

A It

L N

E T ____=iiiiiii

N P

5

32 I F D N F D illlls13=llllllii=llIlll=l-=11ssliiiiiii=~===~==---30

G G

S T

S -

S A

N D

L L

A E

T L

686 G

S S

90 G G

G G”

90

I

G G

L L

F F

125 Y F

T H

S S

G -

PEPA PENA

G G

S 0

WV Y

P 0”

T

E 0

185 S -

V

130 A TV

S H

E N

210

213

S N

D A

G G

E G

265

P G

160 D 0

D -

E G

A L

Y I

275 I N

T T

T D

I F

S D

C C

D S

G

T 0

E

305 F F

L L

S”” G”

T NV

G L

P 9

I -

S N

280 D S

S T

C C

R K

0 S

Y 0

305

310 Y Y

V V

F 0

310

D 5 60

G G

I L NV iiiiiii_ilii==lii===iii

G F

A V

255 L L

P P

L F

D D

N TV

Y 0

115 A D

Y 0

140 L

170 I I

G F

I Y

T T

I V _._-

D D

T S

P N

F N

D D

G S

145 L” S

C

T T

G G

205 A D

C

115A T

W K

D -

0

D D

S S

5 S

175 Y Y K Y ii*==/==jiiii==?i

M A

175

200 D G

G S

E 0

T S

I G

A S

205

235

230 N 0

IO Y

S S

Y

D, 0

V

23SA S

230

0 I V 0 F S” ~__~~___~~~~~~~~____--~~~~~~

285 S G G G IIsI=ii=--------IIIl

315 R S

Y T

115

G G

S S

S S

F D F D ____________

P P

85 T S

260 F

235

T S

I I

265 G V G Y ~,,--------

D S

2896, F IO

E

G S

N

0 T

290 H D S G 1513131111iz

V I

P -.

A A

P P

T

S

325 V 0

A A

290

A D

N G

320 N K V P 0 L ,,iii==31311Sl?lll=li=====i-El 315

G G

L F

320

t Pepsin numbering scheme. $ Penicillopepsin sequential numbering scheme (James & Sielecki, 1983). Underlined residues are part of the 220 t’” atoms that optimallv superposed (algorithm of Rossmann & Argos, 1975) within 1.9 A, rendering an r.m.s. difference of 191 A. A subset above (double dashes) corresponds to those residues that can be optimally overlapped within 1.0 A (r.m.s. difference of 6.66 A for atoms). The 6 regions that are underlined by dots ( ) represent residues connecting polypeptide segments that superimpose but these 7 residues depart from superposition to within 1.9 A.

was very difficult to anticipate. Any comparative molecular models built on the basis of such incorrect sequence alignments will clearly be in error at these positions; the effects of the erroneous sequence alignments can propagate to even larger regions. Also shown in Table 8 is a summary of the structural comparison data that were used to derive the new alignment. A total of 220 u-carbon atoms in

G G

260

285

T V

N N

200

T L

T N

145

L F

195 D 0

225 T D

F L

.__~.

170

L

D -

F Y

. ..__~_.

255

C G

5s 0

N 5”

225

S T

D T ~--.--iii=i=i=

85

F F

F

L D

Y

I F

T L

250 S S

H H __..._

140

290

I L

D G

M A

I10 5 A

P G IS i-l__---_-----------

132 135 G A T P” S 0 T T .__.__......II=liiil=iiiiii=-=====~===-===.---

G P

0 T

80 S S

195

B D

A G ====_-

55

S

250

D P

S S

165

275

300

G G

220

V V

0 G

C 0

165

220 L L

S T

P -

110

188 190 G Y W 0 G F W S ----~~~~~~Ei~iiiiiiiiiiiiiiiiiii________----

2796 L Y

300 L Y I L G D” F S I F G D I ____________1111____~-----------~-------====-~--=======~~~*=~~~~-~~~---~...., 295

G G

T

80

10s E 0

c 0

S 0

215

270

295

B 0

245

S S

G G

190

245 M Y

A 0

786 Y Y

S A

B P

0

G G

so

T S

L A

,316 S

N -

S

T T

L S

135

1876 V D

215 D D

240A S

S A

105

S N

S K

S P

160

210 C 0 A I ” F S G I A ____~==_-_..iilrlll=~~=*==~~~~~=--..,,~,~l=ili~lllll--------

240 N D

I I

F G V 0 1-___-1-11511i-siiii==----

25 I I

50 C L

7s

185

270 P L S T V P ____,,,.________lllSS=jr---_-------------

G -

loo I A

0 0

I I

155 L L

Y A

240

PEPS PEN-

S S

155

180 L N L T II==l=l=llll.=lSI~13_il=-===

PEPS PEN-

L W

130

180

PEP” PENA

N G

G T

25

75

E T

==

P S

5” A”

150

PEP,S PENS

Y E

T jiii....iiiii=ii

70

125

150 0 D L 0 P L ________n51111511111==ll=lllllii----.

5” S

100

A A

120

PEPS PENA

D A ----iilllllll

I

20

70

S T

T P”

45

0 Y ------------

95

120 I L L L lillrllll==ll-=IIIIIjlllli?iilll------------=-==

PEPG PENG

P F

95

0” TV Il=i-__-_-__llli____,

20 G T

45

WV WV

65

PEP” PEN”

15 E Y F E Y I ----_=liliiiiii=======iiiiliiiiiiiii

T E

40

65 F E G K ----II=irlll--------

T -

D D

40 S S

35

60

IOA N

15

35 T T

33

PEPD PENA

10 L A

Y T

10

30

PEP” PENL

5 P T

G D E G” A _==_Il*=l=li=~li*i=_=----

and penicillopepsin (PEN)

can he of the 138 (‘” closely

each protein can be superimposed simultaneously within 1.9 A (the r.m.s. difference between these C positions is 1.01 A). These residues are underlined in Table 8. The double-dash underline defines the subset of 138 residues whose or-carbon atoms superimpose within I.0 A (r.m.s. deviation for this subset is 660 A). These latter residues form a conserved core of the aspartic proteinase family and, as

Structure

of Porcine Pepsinogen

687

Figure 11. A stereoview of the best fit overlap between C” atoms in porcine pepsinogen (thin lines). The overlap shown is for 275 common C” atoms that have an r.m.s. least-squares fitting. The 195 residues in the extended K-terminal domain (Sielecki deviation of 1.12 8, whereas 72 C” atoms of the smaller C-terminal domain fit with Residues of pepsinogen are labeled at. every 10th residue.

expected, the sequence identity between penicillopepsin and pepsin is greatest for these core residues (41 y. of the 138 residues are identical in sequence as compared to 31 y. sequence identity for the 3261323 residues in pepsin/penicillopepsin). For the comparison of secondary structural features between penicillopepsin and pepsinogen (Fig. 1 l), the orientation matrix derived for the 138 structurally equivalent residues has been used to define the global agreement in atomic positions. Using the matrix derived from the superposition of the 220 a-carbon atoms (with a cut-off value of 1.9 A) makes no significant difference to the results. It should be emphasized however, that we are referring here to an overall comparison where each of the molecules is treated as a rigid body. This approach is different from the one we have used for a more detailed comparison among the several members of the aspartyl proteinase family (Sielecki et d., 1990) where we analyzed conformational differences by optimizing the pairwise overlaps among the more preserved, extended N-terminal domains and among the smaller, more mobile, C-terminal domains, in separate calculations.

Comparisons of homologous proteins have indicated that local structure is conserved more strongly than global structure (Lesk & Chothia, 1980, 1982; Chothia & Lesk, 1982; Read et al., 1984). The aspartic proteinases are no exception. We have described the sequence alignment presented in Table 8. Whereas the several regions of insertions and/or deletions may have caused shifts in the conformation of helices and sheets, once the superposition of these secondary structural units is optimized locally the agreement is enhanced. Figure 11 shows the globally fitted cc-carbon atom represen-

(thick lines) and penicillopepsin difference of 1.63 A after the et aZ., 1990) fit with an r.m.s. an r.m.s. difference of 1.80 A.

tations of penicillopepsin and pepsinogen. Table 9 has the details of the local conservation of several of the secondary structural units of these molecules. Aspartic proteinases are essentially all P-proteins. A summary of the nomenclature commonly used to describe the P-sheet structure in this family is given in Table 10. There are only four helical segments involving 32 residues in the pepsin portion of pepsinogen, SerllO to Tyrl14, Pro135 to Gln143, Pro224 to Ile235 and Asp303 to Gln308. All of these helices are present in penicillopepsin: three of them are Table 9 Global Jitting and local conservation of structure pepsinogen and penicillopepsin r.m.s. Secondary structuret

p-sheet Sheet Sheet Sheet. Sheet a-helix

No.

I VI II VII III

of

global

in

r.m.s. IOCd

Angle

atoms

(4

(A)

(7

220 164 72 156 92 48

086 0.84 1.30 1.21 2.31 2.74

033 0.79 030 1.03 1.47 0.60

1.4 1.7 62 *4 143 30.9

Translation

(4 0.11 0.16 055 025 1.12 0.95

t The residues making up these secondary structural units are defined in Figs 5 to 7 of Sielecki et al. (1990). All of the atoms of the main-chain (N, C”, C, 0) were used in the comparisons and r.m.s. deviations shown in this Table. The r.m.s. deviations in atomic position for the global fit were deduced using the matrix from the superposition of penicilopepsin C” atoms on the equivalent atoms of pepsinogen (simultaneous superposition to within 1.0 A). The r.m.s. (local) is the r.m.s. deviation that results from the local superposition of these same main-chain atoms of the individual secondary structural units. Angle and translation refer to the reorientation required to transform the secondary structural units from the local superposition to the global fit.

A. R. Sielecki

688

Table 10 Summary Sheet

I II III l\ v VI

VlI

Strands

of the P-sheet nomenclature comprising

the sheet

GIy2-Leu6, Ser163-Gly168, Asp149-Tyr154, Gln308-Arg315, Asn318-Ala326, Tyr174-Val184 Ile18-Gly21, Gln25-Phe27, Thr88-Va191, Ile94-Asp96 Ser196-Met199, Glu202-Ala205, Ile258-Ile262, Val265-Leu269 Leu71-Tyr75, Gly78-Gly82, Glu105-Thr106 Ala237-Ser241, Gly243-Ile247, Tyr274-Asp278, Ser281-Gly285 GIy16-Thr17, Thr28-Asp32, Asp1 18.Leu123, Asn37-Pro41, Gln99-Ser104, Ile83-Asp87. Glu65-Gln69 Trp190-Asp195, Gly209-Asp215, Leu298-Asp303, Leu220-Thr225, Gly285-Met289

structurally conserved between the two proteins whereas the one at Pro224 to Ile235 undergoes a major shift in position in pepsinogen (see below). The antiparallel P-sheet I (see James & Sielecki, 1987: Sielecki et al., 1990) is a mainstay for the aspartic proteinase fold. The four central strands of this sheet are formed from two hairpin loops each coming from the C-terminal residues that comprise the N-domain and the C-domain of these proteinases.The data in Table 9 and careful examination of Figure 9 show that the local and global fits of this sheet in penicillopepsin to that in pepsinogen are equivalent. Only a small angular rotation (1.4”) and an insignificant translation (0.11 A) exist between the matrices derived by a least-squares superposition of these structural units. Not only is this true for the residues of the sheet itself, but also for those residues that constitute the peptide linking the two domains, Tle170 to Leu179, have virtually identical structures and almost identical sequences (9 out of 10 residues are identical: Table 8. Fig. 9).

Figure 12. A stereo diagram comparing with part of the b-sheet III. The bulky strands relative to the equivalent strands bonds in penicillopepsin.

et al.

Similar arguments hold for t)hr residues that comprise sheet, VT of the X-terminal domain (Table IO); t,he local fit) is very close to t.hr global orientation. A slightly larger reorientation of thr strands of sheet II is required t’o optimize thr local superposition (Table 9). The two sheets, VII and III (Table IO), and the, helix (residues 224 to 235) of the C-domain exhibit poor fits locally and even larger r.m.s. deviations on a global basis. Examination of Figure 11 shows that indeed the residues of the C-terminal domains of pepsinogen and penicillopepsin do not superimpose as well as those of the N-terminal domain. This observation can be extended when incaluding in thca comparison t’he ot’her members of t,he family. This point has been discussedin detail elsewhere (Sielecki et aE., 1990). The junction of sheet I and sheet 111 in pepsinogen and penicillopepsin is displayed in Figure 12. The amino acid replacements (see structural alignment in Table 8) that are at this junction are Tyr267 (pepsinogen) to alanine. Leu269 to valine. Gln308 to serine and Tyr309 to glutamine. The two replacement,sthat, probably have t,he most effect on the reorganization of sheet III are at positions 267 and 309. These are tyrosyl residues in pepsinogen and their side-chains are in direct contac*t with onc another. The smaller corresponding residues in pencillopepsin. Ala265 and Glu306. result in the rwr~ ientation of sheet III relative to sheet 1 as indicat)etl in Figure 12 and Table 9. The disulfide bridge at 206 to 210 in pepsinog,rcbn and the three-residue insertion at position 205 cause the major reorganization of this loop and the helix (Pro224 to Ile235) that is packed against sheet)TIl (Fig. 13) relative to penicillopepsin. The substitution of isoleucine for tyrosine at position 231 on t,he helix (Table 8) is compensated by the exactly opposite substitution of tyrosine for isoleucine at position 274. Interestingly: in the same internal (‘ore

the junction of the small residues AsnlSO, Tyr267 in penicillopepsin. Thick

helix from Asp303 to Glu308 and part of the B-sheet I and Tyr309 in pepsinogen cause the shift of these lines represent bonds in pepsinogen, t,hin lines are thr

Structure

of Porcine Pepsinogen

689

Figure 13. A stereo diagram showing the environment of the helical segment Pro224 to Ile235 in pepsinogen (thick lines) and in penicillopepsin (thin lines). The primary cause of the difference in the positions of these 2 helices seems to be associated with the disulfide bridge Cys206 to Cys210 and the insertion Cys206 to Gly208 in pepsinogen over the equivalent region in penicillopepsin. In spite of the large difference in position the local structure is conserved (Table 9).

region an analogous phenylalanine/isoleucine exchange at positions 258 and 286 occurs. Thus, the difference in the helix position requires a large angular

and translational

reorientation

and

can be

directly attributed to the bulge at Ile204, Ala205, Cys206 (Fig. 13). Probably also contributing to this helix

difference

is the presence

of the bulky

side-

chain of Met199 in pepsin. The reorganization of the four strands of sheet V, comprising Ala237 to Ile247 and Tyr274 to Gly285 (Table 10 and Fig. 7 of Sielecki et al., 1990), is so extensive that it is very difficult to provide a

cause and effect. Not only are there deletions in the loops of these strands in pepsinogen relative to penicillopepsin but also the helix difference and/or the proximity to residues of the proenzyme segment could contribute to the many observed conformational differences (Fig. 11). The positions of residues Gly2 to Asn8 in the pepsin part of pepsinogen have a major effect on the flap conformation when compared to the position of the flap in penicillopepsin (Fig. 14). Residues Glu105 to Gly109 in pepsinogen overlap closely with t’he equivalent residues in penicillopepsin, even

L44P

\ Figure 14. A stereo diagram showing the difference in the environment of the flap region in pepsinogen (thick lines) relative to penicillopepsin (thin lines). A section of the prosegment, Ala42p to Leu44p and the first 8 residues of the pepsin part of the zymogen, insert between a strand of the flap, Thr77 to Met80, and the antiparallel strand Glu107 to Serl 10. This causes the separation of the strands and the conformational change in the flap position. Similar differences are seen in the comparison of pepsinogen and the mature enzyme (A. K. Muir and M. X. G. James, unpublished results).

690

A. R.

et

Sielecki

al.

Figure 15. A stereoview of the residueson the prosegmentportion of pepsinopenand the residues that lint> thr -‘Hap side” of the active site substratebinding cleft in pepsin.The 2 regionshave extensive buried hydrophobic surfacesthat

are highly complementary.Details of the non-bondedcontacts betweenresiduesare describedin the text. allowing for the insertion of Glu107 (Table 8). The position of the short helical segment, SerllO t,o Tyrl14 in pepsinogen, is displaced by residues Glu4 to Glu7 as they pass between the helix and the flap (Fig. 14). The sequence homology in the region of the flap is high, even though there is an extra residue in penicillopepsin (Ser78A). Thus, the conformational difference in this region is due mainly to the presence of the residues of the proenzyme segment including the N-terminal residues of the pepsin part of the molecule. The maximum differences in the flap positions induced by the residuesGly2 to Asn8 are greatest at the turn Tyr75 to Gly78 and average 4.8 ,&. The flap is a relatively flexible structural unit and moves by a similar extent when the pepsatin analog Tva-Val-Val-StaOEt binds to penicillopepsin (James et ab., 1982, 1983: James & Sielecki, 1985). Tn penicillopepsin the isotropic H factors of the atoms in the flap region are among the highest, in the molt,cule (James & Sielecki, 1983). Tn pepsinogen, becauseof the special interactions mentioned above and shown in Figure 14, the R factors of these residuesare relatively low, in the range 10 to 18 Ahz (generally below the average R factor for all of the atoms of the molecule, l&l -4’). The region around the t,wo active site carboxyl groups of Asp32 and Asp215 is one where the two molecules are the most structurally similar. The comparison of the co-ordinates of the main-chain atoms neighboring the active site yields a rootmean-square difference of 0.36 a for 48 common atom pairs (James & Sielecki, 1986). This faithful conservation of the active site architecture is also preserved among the other aspartyl proteinasrs of known structure (Sielecki et nl.. 1990). In conclusion, the above analysis clarifies t,he reasons why, in spite of considerable sequence homology and the striking conservation of thr overall folding patterns between pepsinogen and penicillopepsin (Fig. ll), the model constructed did not lead to a straightforward molecular replacement solution. Even though the rotation and translation

functions were solved unequivocally, the quality ot the phases hased on that model contained enough errors to confuse the appearance of the electron density maps produced with thern. (‘orrect alignment of sequencesis the first step in the modeling experiment. Nonetheless, even in regions of great sequence homology, this alignment is not uneyuivocal prior to t.he determination of the three-dimensional structure. (c)

7’hr

active

ckft and

sits with

hydropho6ic thP proseymmt

interactioru

Electrostat,ic ion pairs and hydrogen bonding interactions that involve residues on the proenzyme segment and the two activfb site ca.rhoxy1at.r groups of pepsinogen have been desc*ribtad (*James & Sielecki, 1986). The positions of tht, two FJ-OH beneyl side-cBhainsof Tyr9 and Tyr37p providfa important dat,a regarding thr nature of’ the residue?, involved in substra,t,e binding in subsites S, and S’! on pepsin. In addition. the possible mode of hindiny of a hypothetical heptapeptide substrate to pepsin has been discussedin an earlier publication (Sielccki et al.. 1990). Tn addit,ion to the electrostatic mtrractiotl~ between residues on the prosegment and opposite11 charged residues on the main body of pepsin, the complementarit,y of the prosegment and the pepsin active site is provided by several regions of hydraphobic non-bonded contacts. The two tyrosims residues, Tyr9 and Tyr37p. that form s?-tnmet,ricaai hydrogen-Ir)onding interactions wit.h the t,wo ac+vra sit,{>aspartates, Asp32 and Asp21 5. respectively, f-ii approximately into the S, and SS;binding pockets in pepsin. Residues that make hydrophobic contacts with the phenolic group of Tyr9 are: lle30. Tyr75. Phe117 and Ile120 (Fig. 15). Similarly, hydrophobic contacts are made between Tyr37p and residues that’ line the S; binding pocket: Ile213, Va1291, Leu298 and Ile300. The type I reverse t.urn from Leuti to Tyr!) (Table 7) places the side-chain of Leu6 adjacent t*o

Structure

of Porcine

the phenolic side-chain of Tyr9 (Figs 10 and 15). Leu6 is also adjacent to the methylene carbons of Pro5 as a result of the type I turn from Glu4 to Glu7. Thus, there is a hydrophobic region on the segment from Pro5 to Tyr9 that makes hydrophobic contacts with the main portion of pepsin. Other residues involved in these contacts are Trp39, Met80, Prol08, Phelll, Leull2 and Phel17. Most of these residues are conserved among the several aspartic proteinases of known sequence and since Leu6 is also highly conserved (Welinder et al., 1985; Foltmann, 1988) it is likely that similar hydrophobic interactions in the proenzymes of these other proteinases will exist in them as well. There is an adjacent extensive hydrophobic surface on the prosegment that is composed of residues Leul2p, LeulGp, the methylene carbons of Lys2lp and Phe25p (Figs 10 and 15). These residues are also part of the hydrophobic core of the prosegment. The side-chains of Phelll and of Tyrl14 that are on the same side of the short active site helix (SerllO to Tyrl14; Sielecki et al., 1990) make intimate contact with these residues. In addition, t)he side-chain carboxylate of Glu4 is stacked against the phenyl group of Phel 11 at an average distance of 3.5 A. Hydrogen-bonding from this carboxylate side-chain to the main-chain NH groups of SerllO and Phell 1 provides stabilizing interactions t’hat will also be pa-sensitive (Table 6). The “flap-side” of the active site binding cleft of pepsin provides the majority of the complementary hydrophobic contacts with the prosegment (Fig. 15). There are more buried and partially buried water molecules that lie between the prosegment and the other “wall” of the cleft (in the region from 284 to 291). This region of the prosegment’ is much more “open” (Figs 8 and 11). The side-chain of Phe38p makes hydrophobic contacts with the side-chains of Va1291, Pro292 and Leu298. The conformational changes that take place when pepsinogen is converted to pepsin are extensive. The contact areas between the prosegment’ and the pepsin portion of the molecule that we have discussed above make some of the more profound changes that are most likely important in providing the substrate specificity of pepsin (Sielecki et al.. 1990).

Dr Y. Nakagawa provided a highly purified sample of porcine pepsinogen. Koto Hayakawa grew the crystals of pepsinogen that we have used. We thank Mae Wylie for her expert typing and manuscript organization. This research was supported by the Medical Research Counc~il of’ Canada through a grant to the Group in Protein Structure and Function at t’he ITniversity of Alberta.

Proteins:

Al-Janabi, (%em.

(I.. Rydel. &‘tructure,

.J., Hartsurk, 247.

4628-4632.

T. J. & Erickson.

Function

Anderson, W. F., Takeda, Y., Ohlendorf. D. H. 8r Matthews, B. W. (1982). J. Mol. Biol. 159. 745-751. Baker, E. N. & Hubbard, R. E. (1984). Pray. Biophya. Mol.

Biol. 44, 97-179 C. D., Molnar, C. E. & Rosenberger, F. r. (1976). MM&X Molecular Modelling System. Technical Compute1 Systems Memorandum No. 229, Washington University. St Louis. Laboratory, Missouri. Bernstein, F. C.. Koetzle, T. F., Williams. 0. tJ. B.. Meyer, E. F. Jr, Brice, M. I).. Rodgers. .J. R.. Kennard, 0.. Shimanouchi, T. & Tasmui. M. (1977). J. Mol. Biol. 112, 535-542. Blow, D. M. & Crick. F. H. C. (1959). Acla (‘rystallogr. 12.

Barry,

794-802. Bohak, Bustin.

Z. (1973). Eur. J. Biochem. M. & Conway-Jacobs. A. 246, 615-620.

Chothia, C. & Lesk. 309-323. Christensen, (1977). Clement, G. (1970). (:ooper,

Refined structure of porcine pepsinogen at 1.8 A resolution.

The molecular structure of porcine pepsinogen at 1.8 A resolution has been determined by a combination of molecular replacement and multiple isomorpho...
2MB Sizes 0 Downloads 0 Views