PROTEINS Structure, Function, and Genetics 12:299-313 (1992)

Recurrent C

X

Loop ~ Structures in TIM Barrel Motifs

Show a Distinct Pattern of Conserved Structural Features Jean-Pierre Y. Scheerlinck,' Ignace Lasters: Michel Claessens? Marc De Maeyer: Frederic Pie: Philippe Delhaise? and Shoshana J. Wodak4 'Institute of Molecular Biology, Department of Cellular Immunology, VrGe Universiteit Brussel, B-1640 Sint Genesius Rode, Belgium; 2Corvas International N.V., B -9000 Ghent, Belgium; 3Plant Genetic Systems, Brussels Laboratories CP160 P2 U.L.B.,B-1050 Brussels, Belgium; and 4Unite de Conformation de Macromolecules Biologiques, CP160 P2 Universite Libre de Bruxelles, B-1050 Brussels, Belgium

A systematic survey of seven ABSTRACT parallel alp barrel protein domains, based on exhaustive structural comparisons,reveals that a sizable proportion of the a p loops in these proteins-20 out of a total of 49-belong to either one of two loop types previously described by Thornton and co-workers. Six loops are of the a p l type, with one residue between the ahelix and p-strand, and 13 are of the 4 3 3 type, with three residues between the helix and the strand. Protein fragments embedding the identified loops, and termed a p connections since they contain parts of the flanking helix and strand, have been analyzed in detail revealing that each type of connection has a distinct set of conserved structural features. The orientation of the p-strand relative to the helix and loop portions is different owing to a very localized difference in backbone conformation. In apl connections, the chain enters the p-strand via a residue adopting an extended conformation, while in ap3 it does so via a residue in a near a-helical conformation. Other conserved structural features include distinct patterns of side chain orientation relative to the p-sheet surface and of main chain H-bonds in the loop and the p-strand moieties. Significant differences also occur in packing interactions of conserved hydrophobic residues situated in the last turn of the helix. Yet the a-helix surface of both types of connections adopts similar orientations relative to the barrel sheet surface. Our results suggest furthermore that conserved hydrophobic residues along the sequence of the connections, may be correlated more with specific patterns of interactions made with neighboring helices and sheet strands than with helix/strand packing within the connection itself. A number of intriguing observations are also made on the distribution of the identified apl and 433 loops within the alp-barrel motifs. They often occur adjacent to each other; ap3 loops invariably involve even numbered p-strands, while apl loops involve preferentially odd p-strands; all 0 1992 WILEY-LISS, INC.

the analyzed proteins contain at least one up3 loop in the first half of the eightfold d p barrel. Possible origins of all these observations, and their relevance to the stability and folding of parallel a / p barrel motifs are discussed. Key words: alp-barrels, protein structure, loops, molecular modeling INTRODUCTION The eightfold a/p barrel is a ubiquitous folding motif that occurs in a sizable portion of enzyme structures known to date. First discovered in triose phosphate isomerase (TIM) and therefore termed TIM barrel, it has since been found in a t least 15 more enzymes.' Among the TIM barrel motifs known to date, some display homology in amino acid sequence and/or function. But in general, sequence identity among the motifs is very low and they perform different enzymatic reactions on a variety of substrates. This suggests that the eightfold a/p barrel could be a rather nonspecific stable motif that is quite tolerant to sequence variations and onto which different functionalities can be designed.' Understanding the limits of this tolerance, and for that matter, the role of the amino acid sequence in determining this ubiquitous fold, remains a major challenge. Other, not unrelated, questions concern the evolution of the TIM barrel motif. Has it evolved from a common ancestor,' through convergent evo-

Received March 21, 1991; revision accepted June 27, 1991. Address reprint requests to Dr. Ignace Lasters, Corvas International nv, Jozef Plateaustraat 22, B-9000 Ghent, Belgium. Dr. Claessens is now at COME" Technical Offce, rue Montoyer 14, B-1040 Brussels, Belgium. Dr. Pi0 is now at the University of California, Department of Medicine, Division of Endocrinology and Metabolism, San Diego, La Jolla, CA 92093. Dr. Delhaise is now at UsrConsult sc, rue du bosquet 10, B-1348 Louvain-la-Neuve, Belgium.

300

J.-P.Y. SCHEERLINCK ET AL.

lution, or alternatively through a process of cut and past of structural and/or functional sub domain^?^ Experimental approaches that probe into these questions have recently been reported. They include the synthesis of circularly permuted homologues of Escherichia coli N-(-5'-phosphoribosyl)anthranilate isomerase, which folds as a separate monomer in this ~ r g a n i s mand ~ . ~the de novo synthesis of an artificial polypeptide made of repeats of a putative p/ a-forming amino acid sequence.6 Theoretical analyses of the structural and energetic requirement for folding of the a / p barrels and their relation to the amino acid sequence, are a valuable complement to these approaches. While the @-barrelportion of the TIM motifs has been studied in quite some one has a much poorer understanding of the loop structures that connect the alternating a and p secondary structure elements. Recently, Claessens et a1.l' rebuilt the backbone of TIM using fragments of variable lengths extracted from a database of known protein structures while being guided by the crystallographically determined C a positions. Their study seemed to suggest (Fig. 7 of Claessens et a1.l') that pa-loops connecting the C-terminii of p-strands to the N-terminii of a-helices contained fewer common folding motifs than the rest of the chain, since they could be fitted only with short contiguous polypeptide fragments from the database. By contrast, a p loops connecting N-terminii of p-strands to C-terminii of helices could be fitted by longer contiguous segments from the database. These findings seem to agree well with the observation that in alp proteins, and in particular in eightfold a / p barrels, pa loops are generally much longer and more variable in structure than a p loops and sometimes contain entire structural domains. Another remarkable observation is that active sites in a / p barrel proteins are always located in the Cterminal portion of the p-strands and in Pa loops.' Furthermore, most of the known a / p barrel structures belong to oligomeric proteins, and it can be readily verified that while Pa loops are heavily involved in subunit contacts, ap loops are not. All this suggests that Pa loops may be shaped mainly to comply with functional requirements (enzymatic function and/or specificity of subunit interaction), while a p loops would be a good place to look for conserved structural and sequence patterns that may characterize basic structural and folding constraints. In a survey of 70 pap units from 17 alp proteins, Edwards et a1.l' identified two types of a@ loop. The a p l loops with one residue between the a-helix and the p-strand, and the ap3 loops with three residues between the a-helix and the p-strand. One loop of each type was moreover shown to occur in TIM. The present study reports the first step toward a systematic analysis of a p loops in TIM barrel motifs.

In particular, we set out to determine whether the a p l and a p 3 loop types recur in a sample of seven proteins that contains TIM and six additional a / p barrels belonging to more recently determined structures. Having identified five additional a p l loops and 12 additional 4 3 loops in this sample, we systematically analyze the detailed structural features of main chain segments identified as belonging to a given loop type(ap1 or ap3) by measuring rms deviation of atomic coordinates after superimposition. These segments, when extended to include more of the secondary structure elements flanking the loops, are shown to display a number of conserved structural features, specific to each loop type. The relations of these specific features to the amino acid sequence, and to packing interactions, within the fragments, and with surrounding protein atoms are analyzed. Furthermore, specific observations concerning the positions of the loops within the TIM barrel motifs are made and discussed in light of possible constraints imposed by the overall folding architecture.

METHODS The seven proteins used in this survey are triose phosphate isomerase" (TIM), taka amylase13 (TAA), xylose isomerase14 (XI), glycolate oxidase15 (GLO), tryptophane synthase16 (TRPAB), and the double ap-barrel enzyme N-(5'-phosphoribosyl) anthranilate isomerase-indole-3-glycerol-phosphatesynthase phosphoribosyl-anthranilate i s ~ m e r a s e ' ~ (IGPS and PRAI). Atomic coordinates for these proteins were obtained from the Brookhaven databank," or as a gift from the authors. The atomic coordinates of all seven ap-barrel motifs were stored in a database. Using the a p l and ap3 connections identified in TIM by Edwards et a1.l' as the reference structures, the a/P-barrel database was surveyed for the occurrence of other a p l and 4 3 loops using the model building package BRUGEL.l9 Both types of loops are characterized by a residue that adopts a positive dihedral angle, denoted as L1 by Edwards et a1.l' This residue can be easily identified, irrespective of inherent ambiguities in the determination of secondary structure boundaries" and was therefore chosen as the reference residue for comparing identified loops in this study (Fig. 1).For example, in TIM, the L1 residues are Gly 120 and Gly 87, for the a p l and ap3 loops, respectively. The survey consisted in searching for fragments of varying length, each encompassing an ap1 or 4 3 loop. Since longer fragments also contain secondary structure elements flanking the loops, they have been termed a p l and 4 3 connections, respectively. To define fragment boundaries, it was convenient to use a general convention, which is based on a two digits code. The first digit defines the N-terminal boundary by indicating the number of residues this boundary is removed from L1, counting

+

ap

LOOPS IN PARALLEL dP-BARREL PROTEINS

301

C-terminus. For example, the 43 fragment associated with the L1 residue (Gly 87) of the ap3 loop in TIM, refers to the segment Ile 83-Ala 90 of this protein. In addition, the location of the u p connections within the a p barrel is identified by a two-letter code that indicates which of the eight helices marked A-H and eight (3-strands marked a-h are involved in the connection. Thus, the ap connection between the third helix and the fourth p-strand, is designated as Cd. The detailed steps of the survey procedure are as follows:

Fig. 1. Ca chain tracings illustratingthe apl and ap3 connections. a. apl connection. b. ap3 connection. 0, residues that are part of secondary structure elements, either the helix, on the right, or the strand, on the left. 0 , loop residues. The residue in L1, the first residue in the loop, adopting a positive angle, and often a glycine, is used as the reference position to compare the two connection types, and to label flanking residues (see text). The coordinates used here are those for the apl and ap3 connections in TIM. Note that for both connections, the assignment of the first P-strand residue is ambiguous. In apl connections the first pstrand residue can be either L1 + 1 or L1+2. In ap3 connections, it is L1+3, if both extended backbone and standard H-bonding pattern are required. However, L1+2 which does not qualify as a strand residue in view of its near helical backbone conformation could qualify as a sheet residue in view of its H-bonding interactions with adjacent strands (see text).

+

backwards towards the N-terminus. The second digit defines the C-terminal boundary, and indicates the number of residues the boundary is removed from L1 counting forward, toward the

1. Define the minimum length templates characterizing a p l and ap3 connections, respectively. Two fragments of 7 residues each were extracted from TIM. One for the a p l loop (fragment 42, relative to Gly 120) and one for the ap3 loop (fragment 42 relative to Gly 87). These fragments include the last turn of the a helix and the entire loop region (and first p-strand residue in the up1 type connections), and display the consensus amino acid pattern and hydrogen bonding interactions described by Edwards et a1.l' Superimposing the backbones atoms (N,C,Ca,O) of these two fragments yields a rather low rms deviation of 0.56 A, revealing that they do not differ sufficiently to characterize the corresponding loop types. Extending both fragments by one residue in the C-terminal direction into the p-strand so as to obtain fragment 43 relative to Gly 120, and fragment 43 relative to Gly 87, yielded a 1.26 A, rms deviation of the backbone coordinates, indicating that these longer fragments differ more significantly in structure. The origin of this difference and its influence on the topology of the backbone will be discussed in Results. As expected, extending both fragments by one residue toward the N-terminus and into the much more regular helical structure (52 fragments in our convention) did not increase the rms deviations, and was thus not helpful in differentiating between them. The 43 fragments of TIM relative to Gly 120 and to Gly 87 were therefore chosen as minimum length templates characterizing a p l and ap3 loop types, respectively, and longer templates were generated from these by extending them, one residue a t a time, in the C-terminal direction. 2. Each of the 43 fragments of TIM defined above as minimum length templates was compared independently with all consecutive overlapping chain fragments from the seven @-barrel motifs, including TIM itself. The comparison involved superimposing backbone atoms21and computing rms deviations. All protein segments with less than 2 A rms deviation from the templates were retained for further analysis. 3. Step (2) was repeated using longer templates obtained by extending stepwise, the minimum length templates in the C-terminal direction.

302

J.-P.Y. SCHEERLINCK ET AL.

TABLE I. List of All the as Connections From Our Sample of Seven dp-Barrel Proteins That Display r m s Deviations of Less Than 2 A With the apl and afi3 Templates Taken From TIM*

1.21

1.57

1.62

1.80 1.86

1.81

1.54 1.53

1.91 1.62

1.96 1.65 1.90

1.65

1.64

179

If I I

3271 -64 375 110 420 132

1 iI

1.44 1.74 1.54

1.31 1.76 1.65

1.80 1.76

1.31 1.89 1.84

I

179

*The templates correspond to the Cd and D e connections of TIM for apl and 4 3 respectively. The listed rms deviation for these connections in TIM is therefore zero, while blanks correspondto rms deviations that are larger than 2 A. Column 1 gives the protein abbreviation. The values in column 7 refer to matches of the minimum length templates of TIM, corresponding to 43 fragments relative to the reference residues 87 and 120,for ap3 loops and apl loops, respectively. Values in columns 8-10 refer to matches of templates extended in the C-terminal direction, one residue a t a time. Column 6 lists RMS values obtained for matches with templates including only the last turn of the helix, the reference residue L1 and two residues following L1 (42 segments). We find that varying the size of the up1 and ap3 templates does not alter the sequence alignment between the templates and the best matching fragment in each loop. In particular, the L1 residue of each template is matched with the same residue in the loop irrespective of template size: "H refers to the capital letter that indicates which of the eight barrel a-helices denoted, A-H, is involved in the ap connection; bPOSindicates the sequence number of the protein residue that is aligned with L1 in the matched template; 'backbone Cp dihedral angle (in degrees) of the residue matching the L1 residue of the template; % ' is the residue type (one-letter code) at the position that matches the L1 residue of the template; etwo-digit code describing the boundaries of the template (see text). Shaded lines in Table 1A (left hand side) indicate up connections identified as apl,and shaded lines in Table 1B (right hand side) indicate those identified as 4 3 .

RESULTS Table I lists all the a p connections from our sample of seven dp-barrel proteins that display rms deviations of less than 2 li with the a p l (Table IA)and ap3 (Table IB) templates taken from TIM.We find

dition to that in TIM, match the minimum length ap3 template with a rms deviation of less than 1A, while only five connections (in addition to the one in TIM) match the minimum length template characterizing apl loops to the same extent. The results

that among these connections, 12 connections in ad-

confirm that the chosen minimum length templates

QP LOOPS IN PARALLEL dP-BARREL PROTEINS

303

TABLE 11. Values of +,@, Backbone Dihedral Angles (in Degrees) of Residues in ap3 and a B 1 LOODS*

Ef

ICPS PRAI TAA

1

238

1

35

541 -83

1531 -125

136

134

86

51

-67

-46

-120

163

Ef

177

75

59

-76

155 -102

-44

-113

111

Ab Ab

275

145

-9

-60

142

-88

-55

-116

132

56

115

-33)

-53

135 -104

-119

147

51 -101

1521 -99

-431 -132

154

I

122

I

99

171

115

Cd

Cd

-137

601

2

*Values of the backbone dihedral angles rounded off to the nearest degree are given for residues L1 through L1+ 3, in ap3 loops (above) and a p l loops (below). The first column lists the protein code, the second column labled C, specifies the loop connection using the two letter code explained in the text. The column labeled L1 lists the residue number identifying the reference residue in each protein.

(43 fragments) discriminate well between the a p l and ap3 connections. On the other hand, shorter templates corresponding to 42 fragments of TIM relative to residues 87 and 120, respectively, for a p 3 and a p l loops, do not discriminate between the two loop types, since they match with less than 1A rmsd at the same locations. The data shown also illustrate that templates elongated in the C-terminal direction (into the strand portion) are helpful in further discriminating between a p l and ap3-type connections. Low rmsd values obtained for matches of a given minimum length template (either a p l or ap3) are maintained, while high rmsd values keep increasing, when the matched region is extended up to three additional residues in the C-terminal direction (fragments 4446).A survey with templates elongated in the N-terminal direction, to include more of the helix portion, performed as a control (results not shown) did not yield additional information.

Backbone Conformations and Sequence Features The +,$ backbone dihedral angles of loop residues in all connections identified as a p l and a p 3 are listed in Table 11. We see that all these loops feature a positive value at residue L1, which results in a tight reversal of the main chain, and is the strongest common characteristic of these loops as previously described." Positive values correspond to energetically unfavorable regions of the +,$ map, accessible to Gly. We see indeed (Tables I and 111)that all of the identified 4 3 loops have a glycine at position L1. In the identified apl loops, there are two exceptions to this rule, with Asp and Asn at position L1 in the De loops of IGPS and of TRPA2, respectively. Another interesting feature is the striking difference in backbone conformation that occurs at residue L1+ 2. In ap3 loops the majority (10/13) of the

+

+

304

J.-P.Y. SCHEERLINCK ET AL.

TABLE 111. Aligned Amino Acid Sequences of the a p l and a p 3 Connections Identified in the 7 a / p Barrel Motifs Analyzed in this Study*

4 3 3 connections L Ll Cd 8 7

S E A

V I L

A b 47

Cd 130

A V H C U D

I T F L V L

GLO

Cd 1 4 8 Ef 2 3 8 Ch 2 0 4

L V R D A R D V F

I A L I I V V F I

ICPS

Cd 1 3 4 Ef 1 7 7

A R Y E Q E

L S V V C I

PRAI

Ab 2 7 5

D A K

G G L

K L D

I W I

I I D F Y A Q V S

L E L V L V A I S

TIM XI

TAA

A b 56

TRPA2 Ab 4 4 Cd 1 2 2 Ch 227

I

alp1 connections

L Ll TIM

De 1 2 0

K V A

I A C

XI

De 173

L A 4

R F A

CLO

Ab 100

T A R

T L S

L A A

L T E

L S S

M V D

F R Q

I F 1

ICPS TAA

De 157 BC 111

TRPA2 D e 1 4 7

*The sequence stretches shown encompass eight residues preceding the L1 reference position and six residues following L1. They cover the range of fragment lengths used in the structural comparisons. The L1 residue in each connection is shaded and the chain segments that match the minimum length templates from TIM characterizing a p l and ap3 loops are boxed. The column labeled C specifies the loop connection using the two-letter code explained in the text.

+

L1+ 2 residues adopt a +,

Recurrent alpha beta loop structures in TIM barrel motifs show a distinct pattern of conserved structural features.

A systematic survey of seven parallel alpha/beta barrel protein domains, based on exhaustive structural comparisons, reveals that a sizable proportion...
2MB Sizes 0 Downloads 0 Views