Proc. Natl. Acad. Sci. USA Vol. 89, pp. 1105-1108, February 1992 Biochemistry

Intron-exon organization of the gene for the multifunctional animal fatty acid synthase (domain boundaries/evolution/gene hfsion/domain shuffling/intron phasing

CHRISTOPHER M. AMY, BRENDA WILLIAMS-AHLF, JURGEN NAGGERT*,

AND

STUART SMITHt

Children's Hospital Oakland Research Institute, 747 Fifty-second Street, Oakland, CA 94609

Communicated by Paul K. Stumpf, November 12, 1991 (received for review August 23, 1991)

directly with the T7 sequencing kit (Pharmacia) and adenosine 5'-[a-[35S]thiojtriphosphate (3000 Ci/mmol; 1 Ci = 37 GBq; Amersham) according to the manufacturer's instructions. Location of Exon-Intron Boundaries. All of the introns were sequenced and their boundaries were established by comparison with the cDNA sequence (12).

ABSTRACT The complete intron-exon organization of the gene encoding a multifunctional mammalian fatty acid synthase has been elucidated, and specific exons have been assigned to coding sequences for the component domains of the protein. The rat gene is interrupted by 42 introns and the sequences bordering the splice-site junctions universally follow the GT/AG rule. However, of the 41 introns that interrupt the coding region of the gene, 23 split the reading frame in phase I, 14 split the reading frame in phase 0, and only 4 split the reading frame in phase II. Remarkably, 46% of the introns interrupt codons for glycine. With only one exception, boundaries between the constituent enzymes of the multifunctional polypeptide coincide with the location of introns in the gene. The significance of the predominance of phase I introns, the almost uniformly short length of the 42 introns and the overall small size of the gene, is discussed in relation to the evolution of multifunctional proteins.

RESULTS AND DISCUSSION The rat fatty acid synthase gene was sequenced in its entirety and the location of introns was established (Fig. 1). The coding sequence is interrupted by 42 introns and the sequences bordering the splice-sitejunctions universally follow the GT/AT rule (14). Of the 41 introns in the gene, 23 split the reading frame in phase I, 14 split the reading frame in phase 0, and only 4 split the reading frame in phase I. Remarkably, 46% of the introns interrupt codons for glycine. It has been pointed out that many mosaic proteins, such as extracellular matrix proteins, blood coagulation proteins, and membrane-associated proteins are composed of modules that appear to have fused in various combinations (15). These modules, which encode discrete functional units such as epidermal growth factor domains, fibronectin domains, and kringle domains, are composed of exons flanked at both ends by introns in the same phase, and it has been suggested that this identical phasing could have promoted the coupling of functional domains through the process of intronic recombination by preserving a continuous open reading frame (15). Although from the sequence data available at present it is not possible to say whether the exons that constitute the multifunctional fatty acid synthase gene represent modules that have been shuffled in various combinations to produce other unique mosaic proteins, the exons flanked by phase I introns clearly cluster together in continuous groups, exons 3-6, 17-21, 26-27, 30-31, and 37-39 (Figs. 1 and 2). Similarly, exons flanked by phase 0 introns cluster together at exons 13-14, 23-24, and 41-42. These observations are consistent with common phasing having played a role in the fusing of some of the domains through intronic recombination. As is typical for a vertebrate gene (17) the first intron, which separates the single 5'-noncoding exon from the first coding exon, and the last exon are relatively long. Indeed, the entire 3'-noncoding region of the mRNA is represented in a single exon, the largest in the entire gene, showing that the 8.3- and 9.1-kilobase mRNA species transcribed in the rat arise not from alternative splicing but from differential utilization of two polyadenylylation signals (12). This conclusion has also been confirmed by amplification of rat spleen DNA by using primers flanking the polyadenylylation signal for the

The de novo synthesis of fatty acids from malonyl-CoA requires seven enzyme activities that in most bacteria and plants exist as discrete monofunctional polypeptides (1). However, in fungi these activities are distributed between two nonidentical polypeptides encoded in two unlinked genes (2-5), and in animals they are integrated into a single polypeptide chain (6, 7). The hypothesis that the multifunctional fatty acid synthase evolved by fusion of the genes for ancestral monofunctional enzymes is supported by the finding that several of its components can be proteolytically cleaved from the complex as discrete catalytically competent monofunctional proteins that exhibit amino acid sequence similarity with present-day monofunctional analogs (7-10). It has been anticipated that structural features of the animal fatty acid synthase gene such as the location of introns may provide clues as to how the gene fusion events occurred (9). The objective of this study therefore was to determine the nucleotide sequence of the rat fatty acid synthase gene and map the location of the introns, particularly in relation to the location of the various catalytic domains of the multifunctional proteinA

EXPERIMENTAL PROCEDURES Genomic Clone Isolation and Sequencing. Genomic libraries, prepared from partial EcoRI or Hae III digests of Sprague-Dawley rat DNA cloned into the EcoRI site of the A vector Charon 4a (11), were screened with probes containing parts of rat fatty acid synthase cDNA clones obtained from a lactating rat mammary gland library (12). Four overlapping clones from the two rat genomic libraries were identified that spanned the entire gene (13). Double-stranded genomic DNAs recloned in pUC vectors were sequenced

Abbreviation: nt, nucleotides. *Present address: The Jackson Laboratory, Bar Harbor, ME 04609. tTo whom reprint requests should be addressed. tThe sequence reported in this paper has been deposited in the GenBank data base (accession no. M84761).

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

1105

1106 No.

Proc. Nati. Acad. Sci. USA 89 (1992)

Biochemistry: Amy et al. Exon

location length

1

1

77

2

1265

136

3

1938

153

4

3114

174

5'-splice donor 77

313

Intron -

length 1187 537

566 Ada~ 540

3476

201

ATICWaoaa 884

175

6

3852

123

4062

116

89

8

4267

135

610

9

5012

463

ccacqt 367

87

ccaaXX

188

11

6018

190

12

6712

95

13

7014

135

14

7226

204

15

7535

116

16

7833

173

17

8077

192

18

8343

81

19

8516

180

3186

AAGLZAg

260-Gly

ctacowmITG 1579

0

298-Lys/Val 343-Lys/Val

I

498-Gy

0

1767

504

ttatagM= 2052

105 2506

182

2679

3B71 ~ACT

21

9106

204

22

9571

287

23

10250

387

24

10719

165 119

26

11163

155

27 28

11424

207

11834

151

29

12066

179

30

12332

120

768-Gin/Ala

caccaaual 2690

3'

807-Gy

2507

865-Asp

ctgc A; 2953

I

92

ctacsgA4W 3133

I

157

ctgcqmw3 3313

I

929-Giy 956-Gly 1016-Gly

I

1076-Val

I

1144-Gly

2372

261

cc-atITX 3517 tcccagCl1

Go0G

392

cacc2

4A35 CGAon 4474

82

GATI1gtagga

82

4190

10962

gtgtLfl2C

74

73

303

25

0

0

gtgcagmw= 2391

I

332

Cc~aiAtagct 231S6

0

ttaceXCXC

3132 180

ctgcqfl136 Z187

560-GIn/ile 624-Gly 655-Gln/Ala 700-Lys/Val

1967

71

2952

8853

487

5266

3904

ati 4191

0

1239-Glu/Val

0

1368-Gln/Glu 1423-Lys/Ser

4356 78

ctgcmkT=

0

4475

II

1463-~rg

106

cctcgCA

I

1515-As

203

acaci3AT 498

I

1584-Gly

81

cctcag2XAr

II

1634-Trp 1694-Gly 1734-Gly 1775-Gly

4629

636 mcaw

iMIMI

4630

.37

87

3167 tgacal33

I

115

ccgtr323 5410

I

5409

5287

atacrA1U 5634 atgcag=%= 3936

0

I

1849-Gln/Va 1917-Gly

ttctagm

0

1967-?*t/Val

la

I

1998-Ag 2049-Gly 2131-Gly

I

2194-Glu

I

2270-Ala

0

2343-Gin/Ser 2376-Lys/Val 2460-Gln/Val

31

12567

123

32

12773

224

33

13345

202

34

13662

152

111

35

13925

92

167

36

14184

152

8o

37

14416

246

38

14861

189

178

tcttgm 6478 tgtta:cA 6667 tta p=l

I

39

15228

228

88

tta

40

15544

221

118

41

15883

99

42

16162

252

43

16496 1676

CIC~ftatga 5633 Got

am 37 15w

83

348 115

199 694r im

Kam

7466

152-Gy 219-Gly

0

tcgcaTltx

77

43-Gly 94-Gly

ctgcagG=

237

207

Amino Acid

I

9a51

118

AlTCCA3gatgc 1956

2390

20

I

742 ctcta3AMZ. 8am m6

3578 5593

1 I

1766

10

phase I

541 188

7

Ad

Codon

ccac 214

1023

741

5

3-splice

acceptor

180 82

ccata:E 59"

Mi6

7215 tctagGTOM 7467 ttac X

I

0 0

FIG. 1. Location and characteristics of introns in the rat fatty acid synthase gene. The position of introns between codons is indicated by phase 0, interruption after the first base is indicated by phase I, and interruption after the second base is indicated by phase II; the amino acid(s) encoded at the splice site is indicated. The locations of the splice junctions in the cDNA sequence are indicated as superscripts. Invariant nucleotides at the splice sites are in boldface type.

smaller mRNA species; only fragments with the lengths predicted by the cDNA and genomic clone sequences were produced (details not shown). Two elements of the short interspersed repetitive DNA sequence type (16, 18) are associated with the fatty-acid synthase gene, one between the enoyl reductase and ketoreductase coding sequences [nucleotides (nt) 13154-13254], the other, in the opposite orienta-

tion, in the 5' flanking region (nt -3724 to -3833). The elements exhibit 91% sequence identity, both contain A-rich 3' ends, and both are flanked by direct DNA repeats (AAACCTGATGACCC and AGACATT, respectively). Earlier we proposed precise locations for the functional components of the animal fatty acid synthase based on the hypothesis that the individual catalytic domains should exhibit significant sequence similarity with monofunctional analogs and might be expected to be joined by mutable surface loops, which would be susceptible to proteolytic attack (10). Indeed, similar features have been found at the boundaries ofother multifunctional proteins (19) and it seems likely that fusion of the genes for the primordial monofunctional proteins in a way that results in joining of the polypeptides via a hydrophilic surface loop region is likely to preserve the three-dimensional structure and biological function of each of the fused components. Of the seven functional components of the animal fatty acid synthase, only the dehydrase could not be precisely located, although we suggested that it was likely to be positioned somewhere between the transferase and enoyl reductase domains (10). More recently, S. Donadio and L. Katz (personal communication) and P. F. Leadlay and colleagues (personal communication) have attempted to locate the interdomain boundaries of the multifunctional polyketide synthase of Saccharopolyspora erythraea by similar criteria. The predicted amino acid sequence, the size, and the organization of the individual domains of this protein are strikingly similar to that which we have elucidated for the animal fatty acid synthase, except that the polyketide synthase likely contains not one but multiple fatty acid synthase-like modules, and several of these modules appear to lack specific functional domains. The conclusions reached by these workers concerning the predicted size and location of the functional domains of the polyketide synthase agree remarkably well with the model we advanced earlier for the animal fatty acid synthase. In addition, since in the case of the polyketide synthase only one of the modules contains a dehydrase function, S. Donadio and L. Katz (personal communication) were able to deduce that this enzymatic function is encoded within 140-170 amino acids immediately downstream of the transferase domain. A stretch of 100 amino acids within the corresponding region of the animal fatty acid synthase (residues 875-975) exhibits marginally significant sequence similarity with both the putative dehydrase region of the polyketide synthase and the 170-residue monofunctional 13-hydroxydecanoyl thioester dehydrase that provides the branch point to unsaturated fatty acids in Escherichia coli (ALIGN scores 3-4 SD). This evidence supports our original suggestion that the dehydrase domain of the animal fatty acid synthase is located between the transferase and enoyl reductase and suggests that the most likely location is close to the C terminus of the transferase in a region encoded by exons 16-19. Again, the boundaries of the encoded putative dehydrase domain are marked by a predicted surface region at the N-terminal end and by a highly mutable predicted surface region at the C-terminal end (details not shown). Because of the tentative nature of the evidence relating to the location of the dehydrase domain, it is not possible to pinpoint precisely the location of the associated boundaries. However, for the remaining functional components of the complex their boundaries can be predicted with confidence and, with only one exception, appear to coincide with the location of introns in the gene (Fig. 2). The 3' ends of exons 28 and 37 and the 5' ends of exons 33 and 39 encode mutable amino acid sequences, which, we have deduced, define interdomain boundaries (10). Thus, exons 29-32 appear to encode the enoyl reductase, exons 33-37 encode the ketoreductase, exon 38 encodes the acyl carrier protein, and exons 39-42 encode the thioesterase; exon 42 also includes the

Biochemistry: Amy et al. 2

4 5 77

3

1

6

/

/

I.

10

12 SINE 14 16 23 25 27 29 31 33 35 37 39 41

/ /

cr

'

8 13 15 1719 21

11

9

/ 581

161 /

/

Proc. Natl. Acad. Sci. USA 89 (1992)

/

/

/ ,

1

/

/

/I LXGXCeXG

/ ,

188

XGXXG

1107

18 kb 43

exons

2 5cr

.

I

1r1000 500 EDehy rase Ketoacyl Synthase Trans ferase

1

1500

2000

l

acid 2500 amno residues

Enoylreductase ACP Ketore uctase Thioesterase

FIG. 2. Map of the rat fatty acid synthase gene and the encoded multifunctional polypeptide. Exons are shown as boxes separated by introns depicted as lines. Exons flanked at both 5' and 3' ends by phase I introns are solid boxes, those flanked at both ends by phase 0 introns are darkly shaded boxes, and those flanked by introns of mixed phasing are lightly shaded boxes. SINE, short interspersed repetitive DNA sequence (16). The location of interdomain boundaries in the multifunctional polypeptide (10) is shown in relation to the position of introns in the gene. The locations of active site residues for the ketoacyl synthase, transferase, and thioesterase; the nucleotide-binding site motifs; and the site of 4'-phosphopantetheine attachment in the acyl carrier protein (ACP) domain are also marked; the location of the dehydrase coding region is tentative (see text). Exons, introns, and protein domains are drawn to scale. kb, Kilobases.

entire 3'-noncoding region. A mutable amino acid sequence corresponding to the C-terminal end of the malonyl/acetyl transferase is also encoded in a region of the gene interrupted by an intron (intron 15). However, there is no intron lQcated in the region of the gene encoding the mutable amino acid sequence marking the boundary between the transferase and the ketoacyl synthase; this boundary is apparently encoded =200 nt from the 5' end of exon 9. Exon 9 is, however, the largest in the coding sequence and it is possible that an intron marking this interdomain boundary- has been lost. The central region of the fatty acid synthase, encoded by exons 20-28, has not yet been assigned a functional role. The region is the least conserved in the multifunctional polypeptide and may play a role in stabilizing subunit interactions (10). An extensive, poorly conserved region of unknown function is also present in the corresponding location of each of the fatty acid synthase-like modules that constitute the polyketide synthase of S. erythraea (20). The two readily identifiable reductases contain, in their first exons, the Gly-Xaa-Ser-Xaa-Gly motif characteristic of nucleotide-binding domains (10). The coding sequences for the motif in the enoyl reductase and ketoreductase domains begin 108 nt into exon 29 and 109 nt into exon 33, respectively. The two nucleotide-binding domains exhibit significant sequence homology with the corresponding domains of other pyridine nucleotide-requiring enzymes, and secondary structure analysis predicts the presence of a/3 subdomains typical of a mononucleotide-binding fold (10). In common with other pyridine nucleotide enzymes, the nucleotidebinding structural modules of the multifunctional fatty acid synthase are not encoded by single exons. However, the sizes of the introns within the reductase domains of the fatty acid synthase are considerably smaller than those found, for example, in other vertebrate dehydrogenases (21). If the nucleotide-binding domains of the multifunctional fatty acid synthase are descendants of the same primitive mononucleotide-binding fold, encoded within several exons, that has fused with a variety of substrate-binding domains to give rise to many different enzymes with different specificities (22), then it would appear that the introns within the reductase coding regions must subsequently have been shortened. Indeed, perhaps the most remarkable feature of the fatty acid synthase gene is the overall relatively short length of the introns. The mean length of introns that interrupt the coding regions of vertebrate genes has been estimated as 1127 nt (17); in contrast, the mean length of introns within the coding region of the fatty acid synthase gene is only 191 nt. The significance of the prevalence of shortened introns within the gene is not immediately clear. On fusion of the seven genes

encoding the component activities of the fatty acid synthase it was probably advantageous that individual promoter and enhancer regions be removed to avoid deleterious effects resulting from the presence of multiple regulatory sequences positioned throughout the gene. This process could have resulted in a shortening of a number of the intronic regions. However, the fatty acid synthase introns are almost uniformly short, whether they are located at the 5' end of individual enzyme coding regions or whether they interrupt the internal coding regions of the enzymes, suggesting that perhaps there has been additional selective pressure, either on the size of individual introns or on the upper size limit for the gene. Other possible pressures might have been related to the chromosomal location of the gene or the need to have a compact gene that could be transcribed quickly and efficiently. Packaging of a highly fragmented gene, which encodes a long mRNA into a relatively modest stretch of DNA could avoid the extensive chromatin decondensation that would otherwise be required for activation of a gigantic gene and might allow more rapid transcription in response to regulatory stimuli. The fatty acid synthase gene is highly regulated in a tissue-specific manner according to the nutritional, hormonal, and developmental status of the animal, and both its mRNA and the encoded protein are highly abundant in tissues active in lipogenesis (13). Fatty acid synthases have been found in eukaryotes in two multifunctional forms, the a2 animal form, and the a436 yeast form. The component enzymes of the two types of complex exhibit little sequence similarity and are connected structurally in completely different ways [in yeast: a, acyl carrier protein-ketoreductase-ketoacyl synthase; f, acetyl transferase-enoyl reductase-dehydrase-malonyl/palmitoyl transferase (23)], indicating that they have most likely evolved by separate series of gene fusion events. Until recently, it had seemed likely that the two independent series of gene fusions took place after divergence of the yeast and animal lines. However, the finding that the multifunctional proteins required for polyketide synthesis in S. erythraea and fatty acid synthesis in animals are composed of functional domains of similar size and amino acid sequence linked in the same order [ketoacyl synthase-transferase-dehydrase-enoyl reductaseketoreductase-acyl carrier protein (20, 24)] directly challenges this view and raises the distinct possibility that the gene fusion events leading to a common polyketide synthase/ fatty acid synthase module took place prior to the divergence of prokaryotes and eukaryotes. Most of the clearest examples of exon shuffling involve genes that are unique to vertebrates and that presumably appeared relatively late in eukaryotic evolution (15, 25). However, if the introns located at the

1108

Biochemistry: Amy et al.

boundaries of the coding regions for the constituent enzymes of the fatty acid synthase are indeed relics of the process of assembly of the multifunctional polypeptide, then it seems likely that this assembly process occurred early in evolution, before separation of the prokaryote and eukaryote lines. Of course, the possibility that the animal-type fatty acid synthase module was acquired by prokaryotes by a relatively late "horizontal" gene transfer mechanism cannot be entirely ruled out at present but could be evaluated once additional sequences of related polyketide and fatty acid synthases become available. We thank Drs. R. F. Doolittle (University of California, San Diego) and K. Paigen (The Jackson Laboratory, Bar Harbor, ME) for their helpful comments. We also are very grateful to Drs. S. Donadio and L. Katz (Abbott Laboratories, IL) and Dr. P. F. Leadlay (University of Cambridge, Cambridge, U.K.) for generously sharing with us their unpublished results. The work was supported by Grant DK 16073 from the National Institutes of Health. 1. Alberts, A. W. & Greenspan, M. D. (1984) in Fatty Acid Metabolism and Its Regulation, ed. Numa, S. (Elsevier, Amsterdam), pp. 29-58. 2. Schweizer, E., Muller, G., Roberts, L. M., Schweizer, M., R6sch, J., Wiesner, P., Beck, J., Stratmann, D. & Zauner, I. (1987) Fat Sci. Technol. 89, 570-577. 3. Schweizer, M., Roberts, L. M., Holtke, J., Takabayashi, K., Hollerer, E., Hoffmann, B., Muller, G., Kottig, H. & Schweizer, E. (1986) Mol. Gen. Genet. 203, 479-486. 4. Chirala, S. S., Kuziora, M. A., Spector, D. M. & Wakil, S. J. (1987) J. Biol. Chem. 262, 4231-4240. 5. Mohamed A. H., Chirala, S., Mody, N. H., Huang, W.-Y. & Wakil, S. (1988) J. Biol. Chem. 263, 12315-12325. 6. Smith, S. & Stern, A. (1979) Arch. Biochem. Biophys. 197, 379-387.

Proc. Natl. Acad. Sci. USA 89 (1992) 7. Tsukamoto, Y., Wong, H., Mattick, J. S. & Wakil, S. J. (1983) J. Biol. Chem. 258, 15312-15322. 8. Smith, S., Agradi, E., Libertini, L. & Dileepan, K. N. (1976) Proc. NatI. Acad. Sci. USA 73, 1184-1188. 9. Hardie, D. G. & Coggins, J. R., eds. (1986) Multidomain Proteins-Structure and Evolution (Elsevier, Amsterdam). 10. Witkowski, A., Rangan, V. S., Randhawa, Z. I., Amy, C. M. & Smith, S. (1991) Eur. J. Biochem. 271, 571-578. 11. Sargent, T., Wu, J., Sala-Trepat, J. M., Wallace, R. B., Reyes, A. A. & Bonner, J. (1979) Proc. Natl. Acad. Sci. USA 76, 3256-3260. 12. Amy, C., Witkowski, A., Naggert, J., Williams, B., Randhawa, Z. & Smith, S. (1989) Proc. Natl. Acad. Sci. USA 86, 31143118. 13. Amy, C. M., Williams-Ahlf, B., Naggert, J. & Smith, S. (1990) Biochem. J. 271, 675-679. 14. Senapathy, P., Shapiro, M. B. & Harris, N. L. (1990) Methods Enzymol. 183, 252-278. 15. Patthy, L. (1991) Curr. Opin. Struct. Biol. 1, 351-361. 16. Daniels, G. R. & Deininger, P. L. (1985) Nature (London) 317, 819-822. 17. Hawkins, J. D. (1988) Nucleic Acids Res. 16, 9893-9908. 18. Jelenik, W. R. & Schmid, C. W. (1982) Annu. Rev. Biochem. 51, 813-844. 19. Crawford, I. P., Clarke, M., Cleemput, M. V. & Yanofsky, C. (1987) J. Biol. Chem. 262, 239-244. 20. Donadio, S., Staver, M. J., McAlpine, J. B., Swanson, S. J. & Katz, L. (1991) Science 252, 675-679. 21. Fukasawa, K. M. & Li, S. S.-L. (1987) Genetics 116, 99-105. 22. Rossmann, M. G., Powers, D. A. & Sofer, W. (1977) in Pyridine-nucleotide-dependent Dehydrogenases, ed. Sund, H. (de Gruyter, Berlin), pp. 3-30. 23. Schweizer, E., Mfller, G., Roberts, L. M., Schweizer, M., Rosch, J., Wiesner, P., Beck, J., Stratmann, D. & Zauner, I. (1987) Fat Sci. Technol. 89, 570-577. 24. Cortes, J., Haydock, S. F., Roberts, G. A., Bevitt, D. J. & Leadlay, P. F. (1990) Nature (London) 348, 176-178. 25. Doolittle, R. F. (1987) Trends Biochem. Sci. 10, 233-237.

Intron-exon organization of the gene for the multifunctional animal fatty acid synthase.

The complete intron-exon organization of the gene encoding a multifunctional mammalian fatty acid synthase has been elucidated, and specific exons hav...
934KB Sizes 0 Downloads 0 Views