Molecular Microbiology (1992) 6{3). 283-292

MicroReview D-E-A-D protein family of putative RNA helicases S. R. SchmJd and P. Under* Department of Microbiology. Biozentrum, 70 Klingetbergstrasse, CH-4056 Baset, Switzertand. Summary RNA metabolism plays a central role in cell growth. It is essential to regulate RNA synthesis, processing, stability and degradation. Conformational changes in RNA are key elements in regulating cellular processes. Recently, an increasing number of putative RNA helicases from different organisms ranging from Escherichia coli to humans and viruses have been identified. They are Involved in diverse cellular functions such as RNA splicing, ribosome assembly, initiation of translation, spermatogenesis, embryogenesis, and cell growth and division. Based on sequence homologies these proteins were grouped in a family, the D-E-A-D box protein family (D-E-A-D ^Asp-Glu-Ala-Asp). Some of the better characterized members have been shown to possess ATP-binding and hydrolysing activities as well as ATP-dependent RNA helicase activities. Most of the genes encoding such proteins have been isolated from yeast, on which we will focus in this review. From sequence data, three of the members form a subfamily, the D-E-A-H subfamily. A rapidly growing family To date, the D-E-A-D protein family includes 29 proteins that originate from a wide range of organisms from bacteria to humans (Table 1). This family is characterized by a core region of 294 to 359 amino acids (see Fig. 1) that shows strong sequence homoiogy to the mouse elF-4A protein, a factor involved in translation initiation (elF stands for eukaryotic Initiation factor) (Linder etai. 1989). Individual members of the family have distinct amino- and carboxy-terminal regions that vary in length (21-505 and 34-373 residues, respectively). The D-E-A-D box proteins are related to a superfamily of NTPases involved in RNA and DNA metabolism (Koonin, 1991). Although many viral proteins have been shown to exert helicase

Received 16 September. 1991: revised and accepted 21 October, 1991. •For correspondence. Tel. (61) 267 21 35: Fax (61) 267 21 18; e-mail Linder%[email protected].

activity in vitro, consideration of these proteins lies beyond the scope of this review. Nevertheless, as indicated by the many functions these proteins can have, it is clear tbat they play a central role in the life of a cell. Because of limitations In length, only key references will be cited. Conserved elements in the D-E-A-D protein family The core region found in all the members of the D-E-A-D family contains eight domains of strong peptide sequence conservation (Fig. 1). At present, a function can be attributed only to two of them. The first domain (AXXGXGKT) has been previously described as the A motif of ATPases (Walker ef ai. 1982) and has been shown to be involved in ATP-binding (Rozen ef ai. 1989). The alanine is typical of D-E-A-D proteins and of helicases in general (Ford ef ai. 1988). Other ATPases may possess either a glycine or an alanine residue at this position. A mutant Saccharomyces cerevisiae elF-4A factor, in which this alanine residue was substituted by a glycine, was shown not to affect cell growth or in vivo translation rate, whereas substitution of this alanine residue by an aspartic acid or a valine diminishes or abolishes in vivo translation, respectively (Schmid and Linder, 1991). A temperaturesensitive Tif1 factor and a dominant Prp16 mutant were shown to have mutations affecting an amino acid six residues carboxy-terminal to the GKT motif (see Fig. 1). The fifth domain (LDEAD) is suggested to be a special form of the previously described ATPase B motif (Walker ef ai, 1982) and therefore may also be involved in ATP-binding and/or ATP hydrolysis. Other ATPases involved in nucleic acid metabolism have at least the residues DE at this site (Gorbalenya etat., 1989). Yeast elF-4A mutants affecting the aspartic acid residues of this motif (DEAD—^EEAD and DEAD-^'DEAH) were shown to be lethal to the cell (Schmid and Linder, 1991). The eighth motif is part of a basic domain, which has led to the suggestion that it may be involved in the interaction with RNA (Linder ef ai. 1989). No function can yet be attributed to the other conserved domains indicated in Fig. 1. Biochemical analysis of the mutants obtained so far (Fig. 1) within these domains will certainly help to elucidate their role in protein function. It is interesting to note that a subgroup of proteins, the D-E-A-H proteins Prp2, Prp16 and Prp22, involved in pre-mRNA splicing show other conserved motifs. They differ in the PTRELA, the IXXXTPGRL, the TXVXARGXD

284

S. R. Schmid and P. Under

Table 1. The known members of the D-E-A-D and D-E-A~H protein families are shov^n.

Protein

kDa/ amino acids

elF-4AI elF-4AII

Organism

Cellular function

Biochemical activity

46/406 46/407

M. musculus

Translation initiation; unwinding of mRNA together with 8IF-4B

ATP-dependent binding of ssRNA; ATP-dependent RNA helicase; RNA-dependent ATPase

Tif1/Tif2

45/395

S. cerevisiae

CselF-4A NelF-4A2 NelF-4A3 Me31B Ste13 Spb4 SrmB

48/423 47/413 44/391 52/459 49/436 69/606 50/444

C. sonorensis N. plumbaginifolia

Translation initiation; homologoijs to elF-4A; unwinding of mRNA together with elF-4B Unknown; homologous to elF-4A Translation initiation (?); homologous to elF-4A

DsaD

64/571

£ coli

RMB p6B

47/421 69/614

E. coti H. sapiens

DmRm62 DpRm62 ScDbp2 Spdbp2 DbpA Dm73D fyissiie

62/575 63/587 61/546 61/550 46/432 64/572 76/664 96/849 67/588

D. melanogaster D. pseudoobscura S. cerevisiae S. pombe E. coli D. melanogaster S. cerevisiae S. cerevisiae S. cerevisiae

Spp81/Ded1 Dbpi Vasa

65/604 68/618 72/662

S. cerevisiae S. cerevisiae D. melanogaster

An3 PIIO

77/697 73/660

X. Iaevis M. musculus

Pmst P)p16

100/876 121/1071

S. cerevisiae S. cerevisiae

Pip22 htto

130/1145 144/1293

S. cerevisiae O. melanogaster

Prpe

p^m

D. melanogaster S. pombe S. cerevisiae E. coli

Maternal expression during oogenesis Mating 25S ribosomai RNA maturation 50S ribosomai assembly (?); suppresses, when overexpressed, a temperature-sensitive mutation in the ribosomal protein L24 Unknown; suppresses, when overexpressed, a temperature-sensitive mutation in the ribosomal protein S2 Unknown Cell growth and division; early step in nucleolar assembly (?) Unknown Unknown Unknown Unknown Unknown Unknown

RNA-dependent ATPase; RNA binding

ATP-dependent RNA helicase; ATPase stimulated by ssRNA

Mitochondria! pre-mRNA splicing Spliceosome assembly First step of pre-mRNA splicing; unwinding of U4 and U6 snRNA in the context of the U4/U6/U5 snRNP in the spliceosome (?) Unknown; suppressor of a prp8 mutation Unknown; suppressor of a spp81/ded1 mutation Maternal effect; formation of germ cells and abdominal segments Unknown Unknown; maie germ-line- and stage-specific expression

First step of pre-mRNA splicing reaction Second step of pre-mRNA splicing reaction; suppressor of a branch-point mutation Release of spliced mRNA from spliceosome Male-specific dosage compensation of X chromosomelinked genes

RNA-d^>endent ATPase

Their deduced relative molecular masses and lengths (in numbers at amino acids) are indicated. Also shown are the organisms from which they were isolated, the cellular functions (if known) of the proteins, and characteristics of in vitro analyses. (For references, see the legend to Fig. 1.) Organisms: Mus musculus. Saccharomyces cerevisiae, Campoletis sonorensis, Nicotiana plumbaginifolia. Drosophila melanogaster, Schizosaccharomyces pombe. Escherichia coli. Homo sapiens. Drosophila pseudoobscura and Xenopus Iaevis.

and the YXHRIGRTGR regions. They also possess a DEAH motif instead of the DEAD motif and were therefore grouped in the D-E-A-H subfamily. These proteins show an overall high similarity and are aii exceptionally large. The divergence of the D-E-A-H proteins with respect to the D-E-A-D proteins becomes obvious when one attempts to align all these proteins and is schematically represented in the dendrogram in Fig. 2. Upon comparison of all these proteins it becomes clear that the core region, represented by translation initiation

factor elF-4A, is flanked in most cases by W-terminal and C-terminal extensions. In general, these sequences have little in common and are probably responsible for the specific function of the individual proteins. elF-4A has in vitro unwinding activity To find our more about the molecular function of these proteins, several polypeptides have been purified and tested in vitro for specific activities. Most of the in vitro

RNA heticases from Escherichia colitomammals 285 CONSERVED ELEMENTS IN THE DEAD BOX FAMILY

A

Conaanaua SeDbp2 SpdbpZ Ra.62 p(8 PIIO An3 SppSl Dbpl Va>a Ka31B Stal3 aIF-4AI

156 lCS ISO 137 222 2«4 185 198 288 101 38 75

aa aa aa aa aa aa aa aa aa aa aa aa

[JID73D

83 • • «1 aa eS aa 21 aa C< aa 47 aa 52 aa 299 aa 220 aa 151 aa 50 aa 18> aa

Prpie Prp2 PEp22 Ml*

372 245 505 434

N*ir'4A2 !(aIF-4Al Tlfl/Tl(2 DbpA O«aO Bnfl RhIB Ptp5 Prp2B HaallC Spb4

Conaanaua ScDbp2 Spdbpf Roe 2 p9 M. •uicului N. •vicuiuj « . pluabfginitolia pluabaginitolia w. S. carvviJiaa eojl x. E. coJi coii c. coil

J13 aa 315 aa D. iDaiano^aacar

Fig. 1. Alignment of the D-E-A-D box protein family. This family includes ScObp2 and Spdbp2 (Iggo etal., 1991), Rm62 (from D. melanogaster, Dorer ef ai. 1990), p68 (Ford etai. 1988), PI10(Leroy e(a/.. 1989); An3 (Gururajan etai, 1991|, Spp81/Ded1 (Jamieson efa/., 1991), Dbpl (Jamieson and Beggs, 1991). Vasa (Hay efa/., 1988; Lasko and Ashburner, 1988). Me31B (De Valoir efa/., 1991), Stel3(C. Shimoda, personal communication), el F-4AI (Nielsen etai, 1985), elF-4AII (Nielsen and Trachsel, 1988), NelF-4A2 and NelF-4A3 (G. Ovrttrim. personal communication), Tifl/Tif2 (Linderand Slonimski, 1988), DbpA (Iggo et al., 1990), DeaD fToone et ai, 1991), SrmB (Nishi ef a/., 1988), RhIB (Kalman ef a/., 1991), Prp5 (Daibadie-McFarland and Abelson, 1990). Prp28 (Strauss and Guthrie, 1991), Mss116(Seraphin ef a/., 1989), Spb4 (Sachs and Davis, 1990), Dm73D (P. F. Lasko, personal communication), Prp16 (Burgess et al., 1990), Prp2 (Chen and Un, 1990). Prp22 (Company efa/., 1991), and MIe (Kuroda etal., 1991). Four partial sequences of putative genes encoding RNA helicases in S. cerevisiae (Chang ef ai. 1990), two putative DEAH-encoding polypeptide fragments from S. cerevisiae (Company et al.. 1991), a partial sequence from £. co//(Kalman efa/., 1991), Rm62 from 0. pseudoobscura anti CselF-4A are not included in the alignment. A schematic representation of the eight highly conserved motifs is shown in panel A. The distances between the motifs are indicated (in numbers of residues) above the schematic representation. The alignment shown in pane! B was performed wrth the PILEUP program of ttie GCG package (version 7), and was completed by visual editing. The protein sequences of the similar regions are shown. A consensus sequence is shown at the top of the alignment. The residues that are identical in alt the sequences are marked in bold letters. A residue was considered conserved when 19 or more of the 29 sequences showed the same amino acid. The encircled amino acids indicate the residues of the tifi mutants (A-^>D,V,G/A-»V (temperatLire-sensitive)/G—»D/G-»D/ G->D,S/D-.E/D-^H,T=t--E.G,l,S.T), the dominant mutation in prp16'1 (Y—D), the cold-sensitive mutation in prp28 (G-.E), the mutation in p/p22(G-*D), and the temperature-sensitive mutation in mie (T—^S).

286

S. R. Schmid and P. Linder

Fig. 2, The relaledness of protein sequences of the D-E-A-D and D-E-A-H famiiies is shown. The dendrogram was obtained with the program of the GCG package (version 7).

Studies have been done with the mammalian translation initiation factor etF-4A. This factor has been suggested as binding to the 5' end of mRNA in conjunction with the elF-4F complex composed of the cap-binding protein (elF-4E) and a 220 kDa protein. This complex, with the help of another initiation factor, elF-4B, may then unwind RNA secondary structure to allow the 43S pre-initiation complex to bind and scan for the initiator AUG oodon. ATP hydrolysis has been measured for elF-4A and has been shown to depend on the presence of RNA and to be stimulated by elF-4B, although elF-4B does not possess ATPase activity by itself (Grifo etat.. 1984). Furthermore, elF-4A was shown to have an ATP-dependent mRNA helicase activity which renders the RNA susceptible to nuoleases (Ray et at.. 1985). This activity is suggested to aid the ribosome to bind and to initiate scanning. When only elF-4A is added to the reaction unwinding activity is low, whereas it is stimulated when elF-4B is present (Goss et ai. 1987). Yet another in vitro system has been developed to quantify the RNA helioase activity of elF-4A and elF-4B using partially double-stranded RNA molecules (Rozen ef ai. 1990). These studies revealed that the unwinding activity of elF-4A functions in the 3'-to-5' and 5'-to-3' directions and depends in both cases on the factor elF-4B and ATP hydrolysis. The human elF-4B gene

has been cloned and sequenced (Milburn ef at.. 1990). Although elF-4B does not bind RNA Independently (Abramson et ai. 1987), the deduced amino acid sequence contains an RNA recognition motif. Since it appears that elF-4A alone binds mRNA with low affinity and does not have such an RNA-bindIng motif, it was postulated that elF-4B mediates efficient binding of elF-4A with the mRNA, in addition to stimulation of unwinding activity of elF-4A (Abramson etai, 1987; Goss etai. 1987; Linder and Prat, 1990). This postulated RNA binding activity of elF-4B may need a conformational ohange in protein structure which could be facilitated by a specific interaction with elF-4A. A gene corresponding to the human elF-4B has not been isolated from yeast. As shown in the dendrogram in Fig. 2 there are several proteins that are highly homologous to mouse elF-4A. The only proteins which have been shown to be the functional homologues of mouse elF-4A and to be required for translation are the Tifi and Tif2 gene products from S. cerevisiae (Sohmid and Linder, 1991). A mutational analysis of the TIF1 and TIF2 genes has demonstrated that mutations in residues conserved among the D-E-A-D protein family often affect protein function in vivo and therefore the conserved motifs must have a role in protein structure/function. At present, some mutant tif1/2 factors are being analysed for their in vitro ATP-binding and -hydrolysis, RNA helicase and translation activities. The genes encoding the NelF-4A2 and NelF-4A3 proteins from Nicotiana piumbaginifolia have been isolated by hybridization with the yeast TIF1 gene (G. Ovrttrim, personal communication). Interestingly, NelF4A3, which is 65% identical to Tifi, possesses a DESD motif instead of DEAD. However, it has not been shown to be funotional in vivo. Except for the homology to Tif1/2 (70%) little is known on the CselF-4A protein isolated from the endoparasite Campoletis sonorensis (B. Webb, S. Dib-Hajj, and M, Summers, personal communication). In the absence of a name for this protein, we provisionally called it CselF-4A (for Campoietis sonorensis elF-4A-like protein). The proteins most similar to the elF-4A group are the Me31 B and Stel 3 proteins from Drosophita melanogaster and the yeast Schizosaccharomyces pombe. respectively (De Valoir ef a/., 1991; C. Shimoda, personal oommunication). The gene encoding the Stel 3 protein has been isolated by complementation of a sterility mutant. Analysis of physiologioal properties has shown that a defect in the Stel 3 protein impairs agglutination in fission yeast, whioh in turn abolishes mating. Moreover, diploid strains homozygous for the stel 3 mutation are unable to undergo meiosis. The stel 3 mutant is also highly sensitive to starvation and to heat-shook treatment (Kitamura ef ai.. 1990). It Is clear that the putative helicase Ste13 plays an important role in the physiology of fission yeast, as

RNA helicases from Escherichia coli fo mammals demonstrated by these many defects. The Me31 B-encoding gene is only expressed at an early stage of oogenesis and the transcripts remain uniformly distributed in the early embryo and disappear by germ-band extension. The function and distribution of the protein encoded by this gene are not known. The Ste13 and Me31B proteins are probably not functional homologues of elF-4A, which is a universally required translation initiation faotor, because they are either needed for a specific process (mating) or only expressed In a stage-specific pattern during development. Proteins involved in ribosomal biogenesis Another protein presumably involved in translation is Spb4 from S. cerevisiae (Sachs and Davis, 1990). The SPB4 gene was cloned by complementation of a cold-sensitive spb4-1 ailele that suppressed a temperature-sensitive mutation in the pcly(A)-binding protein {pab1-F364L). The Spb4 protein Is highly basic and the oarboxy-terminal part has a pi of 10.0. This protein is only distantly related to the rest of the family, as seen in Fig. 2, and it also shows differences in two of the highly conserved motifs (SXXGXGKT and HRCGR) and lacks the two contiguous glycine residues. Mutations in the essential SPB4 gene decrease the amount of 60S ribosomal subunits. The rate of maturation of the 25S rRNA relative to the 18S rRNA was also significantly decreased in the spb4-1 mutant strain, and the 25S rRNA was 2.5-fold less abundant in the mutant, suggesting that muoh of the inefficiently prooessed rRNA was degraded. These results implied that the protein could be part of the pre-riboscmai particle in the nucleus that is involved in ribosomal biogenesis. Interestingly, more than 75 independent complementation groups were found in which lethal mutations can be suppressed by the spb4-1 allele (Sachs, 1990). It is intriguing that 75% of these mutations can also be suppressed by deletion of the spb2 gene encoding the large ribosomal L46 protein. These alterations dc not lead to a general suppressor activity of amber- or ochrecodons, since such mutations in synthetic p-galactosidase genes or highly suppressible auxotrophic markers are net suppressed. Protein analysis from wild-type SPB4 and mutant spb4-1 strains revealed considerable differences in the amounts of individual proteins, which could indicate different efficiencies of translation initiation or RNA stability. A protein presumably involved in ribosomal biogenesis is the SrmB protein from E. co//(Nishi et ai. 1988). The srmB gene, when expressed at high copy number, can suppress the effect of a temperature-sensitive mutation in L24, a protein which is necessary for the assembly of the large ribosoma! subunit. It was suggested that SrmB binds to regions of the 23S rRNA that are not protected by L24

287

during 50S ribosome assembly and thereby prevents degradation of an unstable assembly precursor molecule. However, the role of the normally regulated protein is not known. The SrmB protein has an ATPase activity that depends on the presenoe of RNA. It is able to bind RNA efficiently without the help of other proteins, in contrast to elF-4A that binds RNA inefficiently, unless ATP together with elF-4B are added (Abramson ef ai. 1987; Rhoads, 1988). This could be due to the longer carboxy-terminal sequence present in SrmB that has several basic residues. This basio stretch could be sufficient for RNA binding, although no characteristic nucleotide-binding sequences are present. A protein that is also presumably involved in translation is the DeaD protein from E. coti (Toone ef ai. 1991). The gene encoding this polypeptide was isolated by a dosagedependent suppressor activity of a temperature-sensitive mutation in the rpsB gene enooding the ribosomal protein S2. Overexpression of the DeaD protein does not suppress the mutation in L24 and SrmB does not suppress the rpsB mutations, indicating clearly that these two proteins have different functions, as can also be deduced from the sequence data. The DeaD protein can suppress several different mutations In the rpsB gene. However, the mechanism cf suppression activity and the function of the non-overexpressed DeaD protein remain unknown. Another E. co//RNA helioaseHike protein is encoded by the r/7/Sgene (Kalman efa/., 1991). Its function is unknown and it is not essential for cell growth except in some genetic backgrounds. However, the reason for this lethality is not understood. Five independent partial RNA helicase-tike genes were isolated from E. coti by the polymerase chain reaction (PCR). Three encode the previously described SrmB, DbpA and DeaD proteins and the fourth is RhIB. From the fifth gene, named rhtE. only a fragment has been sequenced, and therefore its sequence has not been included in the figures. The p68 group One of the well characterized proteins of the D-E-A-D family is p68 (Ford ef at., 1988). This is a major nuclear antigen that was identified because of its specific immunological cross-reaction with a monoclonal antibody (DL3C4, now called PAb204) against SV40 large T antigen. This protein is longer than elF-4A and has stretches of basic residues at the amino-terminus which could promote RNA binding. It is expressed in dividing cells from a wide range of vertebrate species, but net in quiescent cells (Lane and Hoeffler, 1980). It shows a distinct granular distribution in the cell nucleus. This protein undergoes dramatic changes in nuclear location during the cell cycle (Iggo et ai. 1991). It is found in the nucleoplasm during

288

S. R. Schmid and P. Linder

interphase and transiently enters the nucleoli during telophase. Anti-p68 antibodies, however, do not stain the nucleoli of Interphase cells. This suggests that p68 may be required for an early step during nucleolar assembly, and also that is probably not direotly involved in rRNA processing. The protein p68 possesses an RNA-dependent ATPase activity (Iggo and Lane, 1989) and RNA helicase activity (Hirling et ai. 1989). The rate of ATP hydrolysis is highly stimulated by single-stranded RNA, and to some extent by double-stranded RNA, whereas single-stranded as well as double-stranded DNAs do not stimulate this activity. RNA binding is independent of the presence of ATP and Mg^^ and is resistant to high salt concentrations (Hirling ef a/., 1989). In contrast to p68, the RNA-binding activity of SrmB is decreased by the addition of ATP, whioh could be explained by the dissociation of the protein from the RNA after ATP hydrolysis. The in vitro RNA helicase activity of purified p68 depends on the presence of ATP and Mg^^ (Hirling efa/., 1989). Interestingly, theSV40 largeTantigen (not a D-E-A-D protein) shows RNA-unwinding activity (Soheffner ef at.. 1989) but, in contrast to p68, also has DNA helicase activity (Stahl et ai. 1986). Despite this RNA-unwinding activity, the SV40 large T antigen does not match the sequences of the D-E-A-D box family. Therefore, it is not surprising that the epitope recognized by the antibody is located outside the region of homology with the D-E-A-D family (Ford et ai. 1988). The Drosophita genes RM62 and the genes DBP2 and dbp2 (dbp for DEAD box proteins) from S. cerevisiae and S. pombe. respectively, encode proteins similar to p68. The yeast genes were isolated by low-stringency hybridization with p68 DNA (Iggo et ai, 1991). Gene disruption analyses showed that they are both essential, but their functions remain unknown. The two proteins (ScDbp2 and Spdbp2) encoded by these two genes differ most from p68 in the oarboxy-terminal region. They do not have the epitope sequence recognized by the anti-SV40 large T-antigen antibody. Glycine-rich stretches are found in both the amino- and carboxy-terminal regions of these two proteins. The D. melanogaster RM62 (DmRM62) gene was identified during the analysis of the triplo-lethal locus, which results in non-viability when present in one or three copies per diploid oell (Dorer ef ai. 1990). However, the DmRM62 gene does not oorrespond to the triplo-tethat loous. A highly similar gene was found in Drosophila pseudoobscura {DpRM62\ not inoluded in Figs 1 and 2) which shows 91 % identity with DmRm62 at the amino aoid level. The amino terminus of the DmRm62 protein is glycine-rlch, with seven glyoines in a row, and highly charged with acidic and basic residues. The carboxy-terminal end also has repeats of up to eight glycines and is charged, although less so than the amino terminus. These differences from the p68 protein suggest that the proteins

have different cellular functions and that the DmRm62/ DpRm62 proteins are not the p68 homologues. Interestingly, five genes (p68, ScDBP2, Spdbp2, DmRM62. and DpRM62) each have an intervening sequence at exactly the same position, whioh is also found in the Drosophila gene vasa (see below). Together, these characteristics suggest that these genes evolved from a common anoestor but may now carry out different functions. A gene cloned by hybridization with an S. pombe dbp2 DNA probe is the E. coti dbpA gene (Iggo et ai, 1990) Overexpression of the DbpA protein resulted in no phenotype, although this gene is normally expressed at low levels. Protein sequence analysis has shown that this protein contains all the conserved motifs of the D-E-A-D family, which suggests that it functions as an RNA helioase. It is most similar to the SrmB and DeaD proteins (Fig. 2).

Heticases involved in splicing The Prp28, Prp5 and Mssi 16 proteins from S. cerevisiae are involved in nuclear and mitochondriai pre-RNA splicing, respectively. Th MSS116 gene was cloned by complementation of a pet mutation affecting splicing of mitochondriai oytochrome b and cytochrome c oxidase subunit I mRNAs (Seraphin etai, 1989). Disruption of the MSS116 gene resulted in respiratory deficiency, as expected for genes whose products are essential for mitochondriai function. The Mssi 16 protein might be required for optimal mRNA conformation in in vivo spiioing reactions even for introns which can be self-splioed in vitro. The Mssi 16 protein has several serine, threonine, and basio residues at its amino-terminus; the latter ones were suggested to mediate mitochondriai import. The PRP5 (prp for pre-RNA processing) gene of S. cerevisiae was oloned by complementation of a temperature-sensitive mutant for splicing (DalbadieMcFarland and Abelson, 1990). The Prp5 protein is involved in spliceosome assembly. There is a potential nuclear localization signal at the amino-terminal domain, and two domains whioh oould be related to RNA-binding motifs are present at the carboxyl end. Interestingly, the HRIGR motif is not completely conserved but is HTTGR and the ATPase A motif has a serine instead of the conserved alanine. The PRP28 gene was cloned by complementation of a cold-sensitive mutant for the first step of mRNA splicing (Strauss and Guthrie, 1991). A suppressor of the cold-sensitive prp28-1 allele is encoded by the spp281-1 mutant allele of the PRPS gene, suggesting an interaction between Prp28 and Prp8 (a component of the U5 snRNP). The finding that prp28-1/prp24-1 double mutants are not viable indicates an interaction of Prp28 with Prp24 in

RNA heticases from Esoherichia coli to mammats 289 U4/U6 snRNA unwinding. A model was postulated in which Prp8 would stabilize the base-paired form of U4/U6 snRNP, Prp24 would stabilize the unwound form by binding to U6 snRNA, and Prp28 would catalyse the unwinding of U4/U6 snRNAs. In accordance with this model, the Prp28 polypeptide sequence matches the domains conserved within the D-E-A-D protein family, and the Prp24 sequence contains three regions homologous to the RNP consensus domain conserved among many RNA-binding proteins and has been shown to associate with U6 snRNA (Shannon and Guthrie, 1991). The mutation in prp28 was found to ohange a conserved glycine residue to a glutamic aoid (see Fig. 1). Interestingly, some non-viable mutants in the TtF1/TIF2 genes were isolated and shown to affect these glycine residues (Schmid and Linder, 1991). No precise function can yet be attributed to these residues, but they may play a role in protein structure. In an approach designed to clone other members of the D-E-A-D family, gene fragments of five new putative RNA helioases were isolated by PCR (Chang efa/., 1990). They encode potential D-E-A-D box proteins and contain all the motifs between the DEAD and the HRIGR sequenoes. One of these protein fragments (CA8) was then identified as being part of the Prp28 gene product (Strauss and Guthrie, 1991). From a soreen for suppressors of the yeast temperature-sensitive mutant prp8-1 alleie, a cold-sensitive mutation (spp81-1) was isolated and molecular genetic methods have shown that the sppSI mutation is an allele of DED1, a gene of previously unknown function (Jamieson ef at.. 1991). The suppressor activity mediated by a putative ATP-dependent RNA helicase that in some way interacts with PrpB may suggest that the assembly of the U5/U4/U6 snRNP necessitates unwinding of the snRNAs. However, the exact role of Spp81/Ded1 in splicing, if any, has not been determined. Interestingly, the SPP81/DED1 gene has also been isolated as a gene dosage-dependent suppressor of a mutation in RPC31, coding for a subunit of RNA polymerase C (S. Stettler and P. Thuriaux, personal communication). While screening for the SPPB1/DED1 gene that could complement the cold-sensitive spp81-1 allele, another gene was isolated that oould suppress the oonditional lethal phenotype, DBP1 (Jamieson and Beggs, 1991). The suppressor activity is seen when DBP1 is present on a low copy-number plasmid. Complementation of a sppSI null mutant is only detected when DBP1 is present on a multicopy-number 2jxm plasmid. The deduced protein coding sequence revealed that Dbpl is 72% identical to Spp81/Ded1 and has all the elements characteristic of the family. Therefore, and because the physiotogioal role of Dbpl is not known, this protein was called Dbpl (for DEAD box protein). Since DBP1 can complement the SPP81/

DED1 gene, it was postulated that the two genes code for the same function (or a closely related one). The fact that SPP81/DED1 has approximately 10-fold increased expression relative to DBP1 and is essential for cell viability, whereas DBP1 is not, suggests that the tatter gene might only encode an aooessory protein with a slighty different substrate specificity which Is needed under certain grovrth conditions.

Proteins invoived in deveiopment The Spp81/Ded1 and Dbp1 proteins are closely related to Vasa, An3 and PIIO (Fig. 2). However, the three latter ones may have a role in stage- or tissue-speoific expression of a subset of genes by acting on the conformation of the mRNAs of these genes or as factors involved directly or indirectly in the translation machinery, and are probably not involved in the splicing meohanism. vasa is a maternal-effeot gene whioh has a role in growth and maturation of the oocyte and is required for the formation of germ cells and abdominal segments (Hay ef ai. 1988; Lasko and Ashburner, 1988). The Vasa protein is synthesized in nurse cells and transferred to the oocyte, where it becomes localized at the posterior pole. The mechanism of its action is unknown although it is possible that it interacts directly with the mRNA for its genetically defined target, nanos (Lasko and Ashburner, 1990). The protein has a glycine-rich and very hydrophilic amino terminus, containing five tandem repeats of the heptad sequence (F/S)RGG(E/Q)GG, which is thought to bind RNA (Hay ef al.. 1988). It is possible, however, that RNA binding is mediated by the negatively charged oarboxyend, since acidic carboxy-termini have been found in other RNA-binding or ssDNA-binding proteins, or by both terminal sequences. Whether Vasa really binds RNA remains to be seen. The Xenopus taevis An3 mRNA is distinctly localized to the animal pole in oocytes and early embryos, is uniformly distributed in the embryo, and is also found (to a lesser extent) in adult tissues (Gururajan et at.. 1991). It will be interesting to see whether the An3 protein shows a similar distribution and what function it has. The Pt10 transcript has been shown to be expressed only in the male germ line in a developmentally regulated manner (Leroy efa/., 1989). High levels of expression were detected during meiotic and haplcid stages of spermatogenesis. From its similarity to elF-4A and the fact that the PIIO protein has a longer carboxy-terminal sequence, it was proposed that the PIIO protein might regulate translation of selected cytoplasmic transcripts, although a nuclear function in RNA processing could not be excluded. Its role in spermatcgenesis and its mode of action in the oeli remain unknown. One of the D-E-A-D box proteins least related to elF-4A

290

S. R. Schmid and P. Under

is the Dm73D protein from D. metanogaster (P. F. Lasko, personal communication, see Fig. 2); its function in the oeli and in the organisms is not known. The D-E-A-H subfamily A subgroup of the putative RNA helicase family is formed by the three yeast splicing proteins, Prp2, Prp16 and Prp22, and the Drosophila MIe protein involved in X chromosome dosage compensation. It is interesting to note that they do not have a DEAD sequence but instead possess a DEAH motif. These four proteins are exceptionally large and show other differences from the D-E-A-D protein family such as the lack cf a PTRELA, an ARGXD and an HRIGR sequence (Fig. 1). Therefore they have been tentatively grouped into the D-E-A-H protein family (Gorbalenya efa/., 1989). These proteins are highly conserved all along their polypeptide sequences and have other motifs in common, such as the TOPRRVAA and the ORXGRAGR sequences which could represent equivalents of the PTRELA and HRIGRTGR domains, respectively. These features suggest that these proteins belong to a somewhat different, but still related, family and that their mechanism of action may be distinct. A mutant tif1 protein containing the sequence DEAH instead of DEAD is not viable in S. cerevisiae. arguing for the existence of a subfamily (Schmid and Linder, 1991). Two putative DEAHenooding polypeptide fragments were isolated by PCR (Company ef a/., 1991), leading to the assumption that this subfamily is larger than it seemed. The Prp2 protein is required for the first step of the splicing reaction whioh oonsists of the 5' splice-site cleavage and formation of the 2'-5' branched lariat intermediate (Chen and Lin, 1990; King and Beggs, 1990). It acts after spliceosome assembly, and was found to be associated with spliceosomes prior to, and throughout, the first step in pre-mRNA processing. It has a putative zinc-finger motif in the oarboxy-terminal region, which may be involved in RNA binding. The protein Prp22 is necessary for the release of the spliced mRNA product from the spliceosome (Company ef a/., 1991), a step which was not expected to require ATP hydrolysis and the activity of a factor. The mutation in prp22 was localized to the ARGXD motif where the glycine was changed to an aspartic aoid (see Fig. 1). The Prp22 protein has an amino-terminal peptide sequence similar to a motif present in bacterial ribosomal protein SI and polynudeotide phosphorylase. Although the funotion of this motif has not been characterized it has been suggested to have RNA-binding activity. A mutation in PRP16 was isolated as a dominant suppressor of a lariat branch point mutation (Couto ef at., 1987). The mutant protein affects the fidelity of branch point recognition. The meohanism for fidelity of splicing is

not clear, but it was shown that the Prp16 protein transiently associates with the pre-formed spliceosome in vitro but is not an snRNP protein (Schwer and Guthrie, 1991). It promotes the second step of the splicing reaction, the cleavage of the 3' splice site and ligation of the two neighbouring exons. The purified protein was shown to possess ATPase activity which was significantly stimulated by RNA but not by single-stranded DNA (Schwer and Guthrie, 1991). The prp 16-1 mutant contains a tyrosine to aspartic acid substitution affecting a residue close to the ATPase A motif (Fig. 1). How this mutation may modulate the accuracy of the branch point recognition is not understood. Interestingly, a temperature-sensitive mutation in the transiation initiation faotor gene TIF1 was found to lie in exactly the same position, changing the alanine residue to a valine (Schmid and Linder, 1991). A fourth D-E-A-H box protein, named MIeformaleless, has been isolated recently from D. melanogaster (Kuroda etat., 1991). MIe is required in males for dosage compensation of X chromosome-linked genes together with at least three other gene products. Loss of function of one of these gene products leads to male lethality. The MIe protein is found assooiated with the X chromosome in males but not in females, although it is expressed in both sexes. Yet, the role of MIe in dosage compensation and the mechanism of male-specifio association with the X chromosome are not understood. Sequence data reveal that the MIe protein is most similar to the D-E-A-H subfamily. Interestingly, a temperature-sensitive mutation has been reported to be located close to the ATPase A motif changing a threonine to a serine residue (see Fig. 1; Kernan ef ai. 1991). The distance between the GKT and the QRXGRAGR motifs is the largest of all the proteins isolated so far. These two domains span 371 residues in MIe (compared to 359 residues in Dm73D, and 294 to 326 in other proteins).

Conclusion In summary, analysis of sequences conserved within the D-E-A-D and D-E-A-H protein families demonstrates that these proteins, isolated from a wide spectrum of organisms ranging from E coti to mammals, are highly related and possibly have a common ancestor. Probably they all have RNA helioase activity that depends on ATP hydrolysis and perform these reactions by the same (or a very similar) mechanism. Although for most members of the family an RNA heiicase activity has not yet been demonstrated, the fact that p68, elF-4A/elF-4B, and viral helicases (Lain ef at., 1990) can unwind double-stranded RNA argues strongly for such a funotion in the other family members. The proteins characterized so far interact with RNA in

RNA heticases from Escherichia ooli fo mammals spliceosome assembly, nuclear and mitoohondrial splicing, ribosomal RNA maturation, ribosome assembly, translation, regulation of maternally expressed mRNAs, and regulation of mRNA expression during development. This wide spectrum of functions could be explained by the fact that these proteins differ in size and that additional amino- and carboxy-terminal sequences could determine the specificity of each polypeptide. Such sequences wculd include nuclear or mitochondriai targeting signals, RNA-binding motifs (eventually sequence-specific), and regions required for the interaction with accessory proteins. Sequence comparison of the D-E-A-D box proteins revealed several subgroups of proteins, whose members are more related to each other than to the rest of the family. It beoomes clear from Fig. 2 that proteins ScDbp2, Spdbp2, DmRm62/DpRm62 and p68 form a subgroup, PIIO, An3, Spp81/Ded1, Dbpl and Vasa; Me31B and Ste13; DbpA, DeaD, SrmB and RhIB form other subgroups, and Prp22, Prp2, Prp16 and MIe form the least related subgroup, in contrast, elF-4AI/elF-4All, CselF-4A, NelF-4A2/NelF-4A3 and Tif1/Tlf2 are probably the functional analogous products in different organisms, and have extremely high sequence conservation.

Note added in proof Since the submission of the manuscript a new sequence from Saccharomyces cerevisiae has been included in the EMBL databank by M. Johnston and I. Nogae. The sequence has the accession number M80437. The function of the corresponding gene is unknown.

Ac kno wledgements We would like to thank S. Dib-Hajj, P. F. Lasko, G. Owttrim, C. Shimoda, S. Stettler, M. Summers, P. Thuriaux and B. Webb for communication of unpublished results. We are grateful to C. Guthrie and those in her laboratory, R. D. Iggo and P. F. Lasko, for critical reading of the manuscript, and R. Dolz for maintenance of computer facilities. The work in the authors' laboratory was supported by grants from the Swiss National Science Foundation.

References Abramsom, R.D. et ai (1987) J Biot Chem 262: 3826-3832. Burgess, S. etai (1990) CelieO: 70&-717.

291

Chang.J.H. etai 0990) Proc Natt Acad Sci USAB7•.^57•\~^575. Chen, J.-H., and Un. R.-J. (1990) Nuct Acids Res 18: 6447. Company, M. etai (1991) Nature349: 487-493. Couto, J.R. etaL (1987) Genes Dev 1: 445-455. Dalbadie-McFarland, G., and Abelson, J. (1990) Proc NatI Acad Sci USA 87: 4236-4240. De Valoir, T. et ai (1991) Proc Natt Acad Sci USA 88: 2113-2117. Dorer, O.R. etai (1990) Nuct Acids Res 18: 5489-5494. Ford, M.J. etai (1988) Nature332: 736-738. Gorbalenya, A.E. etai (^989) Nuct Acids Res i7: 47 ^ 3-4730. Goss, D.J. etai (1987) Biochem26:1551-1556. Grifo, J.A. ef a/. (1984) J Biol Chem 259: 8648-8654. Gururajan, R. etat. (1991) Nature349: 717-719. Hay, B. etaL (1988) Cett55: 577-587. Hirling, H. etai (1989) Nature339: 562-564. Iggo, R.D. etai (1990) Nuct Acids Res 18: 5413-5417. Iggo, R.D. etaL {^99^) Mot Cett Biot M: 1326-1333. Iggo, R.D., and Une, DP. (1989) EMBOJa: 1827-1831. Jamieson. D.J.. and Beggs. J.D. (1991) Mot Microbiot 5: 805-812. Jamieson, D.J. etai (1991) Nature349: 715-717. Kalman, M. etaL (1991) New Biot3: 886-895. Kernan, M.J. etai (1991) CeW66: 949-959. King, D.S., and Beggs, J.D. (1990) Nuci Acids Res 18:6559-6564. Kitamura. K. etai (1990) Ci/rrGenef 18: 315-321. Koonin, E.V. (1991) Nature352: 290. Kuroda, M.I. et ai (1991) Celt 66: 935-947. Lain, S. etai (1990) Nuct Acids Res 18: 7003-7006. Lane, D.P.. and Hoeffler, W.K. (1980) Nature288: 167-170. Lasko. P.F., and Ashburner. M. (1988) Nature 335: 611-617. Lasko, P.F., and Ashburner, M. (1990) Genes Dev 4: 905-921. Leroy P. etai (1989) Ce//57: 549-559. Linder. P. etai (1989) Nature337: 121-122. Linder, P., and Prat, A. (1990) BioEssays 12: 519-526. Under, P., and Slonimski, P.P. (1988) Nuct Acids Res 16:10359. Milburn, S.C. et ai (1990) EMBO J 9: 2783-2790. Nielsen, P.J.. and Trachsel. H. (1988) EMB0J7: 2097-2105. Nielsen, P.J.etai 0985) Nuct Acids Res ^3: 6867-6880. Nishi, K. etai (1988) Nature33&: 496-498, and correction. Nature 340: 246. Ray, B.K. et ai (1985) J Biot Chem 260: 7651-7658. Rhoads, R.E. (1988) Trends Biochem Sci 13: 52-56. Rozen, F. etai. (1989) Mot Cett Biot 9: 4061-4063. Rozen, F. etai. (1990) Mot Cett Biot AO: 1134-1144. Sachs, A.B. (1990) In Isolation of the Isd Mutations in Saccharomyces cerevisiae. NATO ASI Series. McCarthy. J.E.G., and Tuite, M.F. (eds). Berlin: Springer-Verlag, pp. 549-555. Sachs, A.B., and Davis, R.W. (1990) Sc/ence 247:1077-1079. Scheffner, M. etai (1989) Ce//57: 955-963. Schmid, S.R., and Linder, P. (1991) Mot Cett e/oM1: 3463-3471. Schwer, B.. and Guthrie. C. (1991) A/ature349: 494^99. Seraphin, B. etaL (1989) Nature 337: 84-87. Shannon. K., and Guthrie, C. (1991) Genes Dev 5: 773-785. Stahl, H. etai (1986) EMBOJS: 1939-1944. Strauss, E.J., and Guthrie, C. (1991) Genes Dev 5: 629-641. Toone, W.M. etaL (1991) J Bacteriot 173: 3291-3302. Walker, J.E. etat. (1982) EMBOJi: 945-951.

D-E-A-D protein family of putative RNA helicases.

RNA metabolism plays a central role in cell growth. It is essential to regulate RNA synthesis, processing, stability and degradation. Conformational c...
4MB Sizes 0 Downloads 0 Views