YEAST 0

VOL.

oooo

0 0

0

8: 749-759 (1992)

0

XI

0 0

Yeast Sequencing Reports

DNA Sequencing and Analysis of a 24.7 kb Segment Encompassing Centromere C E N l l of Saccharomyces cerevisiae Reveals Nine Previously Unknown Open Reading Frames ANDREAS DUSTERHOFT? AND PETER PHILIPPSEN* Institut fur Mikrobiologie und Molekularbiologie, Justus-Liebig- Universitat, Frankfurter Str. 107,0-6300 Giessen, Germany *Institut fur Angewandte Mikrobiologie, Biozentrum der Universitat Basel, Klingelbergstr. 70, CH-40.56 Basel, Switzerland

Received 20 March 1992; accepted 8 April 1992

A 24.7 kb segment of the cosmid clone pUKG047 containing a Sau3AI-partial fragment from the centromere region of Saccharomyces cerevisiae chromosome XI was sequenced and analysed. A mixed strategy of directed methods including exonuclease 111 nested deletion, restrictian fragment subcloning and oligonucleotide-directed sequences was carried out. Exclusive use was made of the Applied Biosystems Taq DyeDeoxyO Terminator Cycle technology and a laser-based ABI373A sequencing system for reactions, gel electrophoresis and automated reading. A total of 12 open reading frames (ORFs) was found. Nine new ORFs (YK102 to YKI 10) were identified, three of which (YK102, YK107, YK108) showed homologies to proteins of known function from other organisms. In addition, sequence analysis revealed three recently functionally characterized genes (METI4, VPS/SPO1.5, P A P I ) , which could be joined to the earlier published CENII region. KEY WORDS - Yeast genome project;

DNA sequencing; chromosome XI; centromere C E N I l

INTRODUCTION As a contribution to the European BRIDGE programme to sequence and analyse the genome of the yeast Saccharomyces cerevisiae, we have determined a DNA segment in the direct neighbourhood of the centromere CENl1 (Mortimer et al., 1989; Fitzgerald-Hayes et al., 1982). The segment was part of a cosmid clone insert coming from a contiguous library of yeast chromosome XI constructed by B. Dujon (Institute Pasteur, Paris; unpublished). MATERIALS AND METHODS Strains and plasmids Cosmid pUKG047 was provided by B. Dujon (Institute Pasteur, Paris). It contained a partial tTo whom correspondence should be addressed. 0749-503X/92/090749-11 $10.50 0 1992 by John Wiley & Sons Ltd

Sau3AI fragment of about 31 kb from the centromere CENlI region of yeast strain S288C, which had been inserted into pOU6lcos vector arms (Knott et al., 1988). For all standard cloning procedures in Escherichia coli, as described by Ausubel et al. (1987), plasmid pBluescriptI1 SK( +) (Stratagene) and strain HBlOl (Boyer and Roulland-Dussoix, 1969) were applied. General sequencing strategy Overlapping, large (1.7-12 kb) subfragments of the insert of cosmid pUKG047 were cloned into pBluescriptI1 SK( + ) using four rare-cutting restriction enzymes, (BamHI, EcoRI, HindIII, XbaI). For the construction of sequencable subclones, a mixed strategy of directed methods was carried out on a selection of the subfragment clones from above.

750 Nested deletions using exonuclease 111, as well as restriction site-directed deletions were performed to cover nearly 90% of the sequence of both strands. To determine the sequence of remaining gaps or to resequence ambiguous positions, specific oligonucleotide primers were synthesized. Nested deletion of plasmid D N A

Plasmid DNA cleaved by suitable restriction enzymes was directionally deleted by exonuclease IIIjSl nuclease treatment followed by a Klenowpolymerase fill-in reaction according to the protocol of Henikoff (1984), except that the truncated fragments were excised from an agarose gel and rapidly purified by a glassmilk procedure (GENECLEAN kit, BI0101) prior to ligation with T4 DNA ligase and transformation into E. coli HB101. Oligonucleotide-directed sequencing

Specific 18mer oligonucleotides were kindly synthesized by H. Hegemann in our laboratory and used directly for sequencing after a simple ethanol precipitation step. Sequencing method

Sequence reactions were performed on 1 pg double-stranded Quiagen affinity column-purified DNA (DIAGEN, Diisseldorf) using a modified chain-termination protocol. Because of the high flexibility and the handling advantages of the method, Applied Biosystems fluorescence-dye terminators (Prober et al., 1987) were used as labels in a linear polymerase chain reaction cycle procedure (Perkin Elmer-Cetus). Unincorporated dye terminators could be removed quantitatively via phenol/chloroform extraction (high-grade; ABI, Weiterstadt). Gel electrophoresis and gel reading were carried out using the automated laser-based ABI373A system (ABI, Weiterstadt). Computer analysis

Sequence data were transferred to IBM-ATcompatible computers and analysed using the DNASIS/PROSIS software packages (Pharmacia/ LKB, Freiburg). Homology searches were carried out against the EMBL, GenBank, NBRF-PIR and SWISS-PROT databases using DNASIS/PROSIS as well as the HUSAR software package (DKFZ, Heidelberg). Data aquisition and additional

A. DUSTERHOFTAND P. PHILIPPSEN

computer analysis for the BRIDGE yeast genome project were carried out by M. Haasemann at MIPS (Martinsried). RESULTS AND DISCUSSION D N A sequence determination

A DNA segment of 24 743 bp in the neighbourhood of CENII has been subcloned, mapped and sequenced on both strands (schematic overview in Figure 1; sequence submitted to MIPS on November 19th 1991 and to EMBL database; accession number X65 124). The described technical configuration (see Materials and Methods) allowed an average resolution of 35MOO bases per individual sample (counted from the 3’-end of the primer), showing only 2% ambiguities (‘N’ positions, which can normally be resolved by manual editing) and less than 1% error (i.e. strand/complementary strand discrepancies). Table 1 indicates the statistics of the mixed directed sequencing strategy for the complete segment as well as for the most recently determined subfragment of 5-5 kb (marked in Figure 1). Utilization of 350 bases for each of the 230 useful sequencing samples resulted in 3.3 readings per base pair on average. The development of an optimized strategy in the case of the 5.5 kb subfragment, as well as some improvements in terminator sequencing reactions and sequencer handling as recommended by ABI (Weiterstadt), not only allowed a slight reduction of this value (3.0), but mainly increased the ratio of useful to total numbers of sequenced samples (47 of 60). D N A sequence characterization

Computer analysis revealed the centromere region of chromosome XI asymmetrically located near one end of the cosmid insert (Figure 2). In addition, the chromosomal segment turned out to encompass three recently characterized genes, MET14 (Korch et al., 1991), VPSl/SPOlS (independently isolated: Rothman et al., 1990; Yeh et al., 1991) and PAP1 (Lingner et al., 1991), and nine previously unknown open reading frames (ORFs) longer than 100 codons (YK102 to YKl10). Altogether, the ORFs occupy 68% of the complete sequence, which is not as densely packed as has been described for some yeast genome segments (e.g. Yoshikawa and Isono, 1990; Oliver et al., 1992). Orientation and localization of the three known genes relative to CENII are in agreement with the published mapping and sequencing data (Lingner

-

I

I

EE I1

EEEE Ill1

E1750-I

I

E

1

E I

M

I

E

(w

E E I 1

I

M

EE H II

M

I

E3300

D--t

M

I

2 6 %

El800

85900

I

*

u

-

..

c-t

Y

..

mslridim fragment w

.-

exdl1 dddi~nsubclone p-+

-

-o----,

c ,

CI

(t.B--t

M

M -

M

u

a t ,

M

-*

*

Figure 1. Subcloning, mapping and sequencing strategy for 24.7 kb of the insert ofpUKG047. Acontig of ordered overlapping restriction fragments used for further preparation of sequencable subclones is shown above a BamHI and EcoRI restriction map of the segment. A map of reduced size demonstrates the segment's position relative to the complete cosmid insert. Below, arrows of different form indicate individual sample preparation method, direction and length of the resulting DNA sequence. The 5.5 kb fragment determined most recently using an optimized sequencing strategy is marked.

M

r3-P-

----------- - -- - -

,'(E9000)

M

a--9

o--,

_...

-

HE

0 I 2 3 4 b [kb]

752 Table 1.

A. DUSTERHOFTAND P. PHILIPPSEN

Statistical analysis of the directed strategy to sequence 24.7 kb in the vicinity of CENII

Total number of sequenced samples 409

Minimal number of useful samples

Average bases read per sample

Final sequence [bPl

Average readings Per bP

230

350

24 743"

3.3

350

5496b

3.0

Subfragment analysis of 5.5 kb (optimized strategy): 60

47

Distribution of methods: '49% via exonuclease 111 deletion clones; 37% via restriction fragments; 13% via synthetic primers. bi'9% via exonuclease I11 deletion clones; 13% via restriction fragments; 9% via synthetic primers.

the whole segment relative to the chromosome is presently based on genetic mapping data (Mortimer et al., 1989; Yeh et al., 1986) and will be later assigned to the corrected map of chromosome XI (B. Dujon, personal communication). Seven of the new ORFS (YK103, 293 codons: YK104, 448 codons; YK105,276 codons; YK108, 901 codons; YK109,772 codons; and N-terminally incomplete YK110, 214 codons) are transcribed towards the centromere, while YKlO2 (449 codons), YK106 (185 codons) and YK107 (626 codons) are transcribed in the opposite orientation (Figure 2). Detailed comparisons of our sequence data with the published sequences revealed a number of differences (Table 2). In the case of MET14 and P A P l , open reading frame compositions were not affected. For V P S l I S P 0 1 5 , three amino acid exchanges had been described before (Yeh et al., 1991), indicating a natural polymorphism of yeast strains, and our version of the gene turned out to be a mixture of both types [VPSl: Asn33(AAT), Glu ll l(GAA), Lys141(AAG); S P 0 1 5 : Thr33(ACT), A s n lll (AAC),. Gln14l(CAG); YKVPSl/SP015; Thr33 (ACT), Asnll I(AAC), Lysl41(AAG)]. Outside the consensus elements of the centromere, three base pair changes were observed when compared with the published sequence of the C E N l l region (Fitzgerald-Hayes et al., 1982). While the functionally most important centromere elements CDEI and CDEIII turned out to be conserved, one base substitution was seen within the less important A + T-rich CDEII area. The segment sequenced shows a high A + T content of 63%, slightly higher than the S. cerevisiae

conserved among the ORFs, with the extreme ranging from 67% A + T (YK107) to 55% A + T (YK 106). The long gap separating O R F YK109 from YK108 (Figure 2; nt 3355-5223) contains a leucinetRNA(CAA) gene. Bases 3721 to 3608 (compare Figure 2) show 100% identity to both, the two exons and the intron sequence, of yeast Leu-tRNA-3 (Andreadis et al., 1982).A deltaelement (Genbauffe et al., 1984) is present 81 bp upstream of this gene. This has been found to be a common location for delta sequences (Gafner et al., 1983). The highest score of 94% identity over 332 bp was obtained with the solo delta-31C-GF3 sequence located at a hot spot of Ty transposition on chromosome I11 (Wannington et al., 1987). Outside of the tRNA gene and the delta element, no significant DNA homologies have been found to either of the known tRNA loci. However, this region upstream of O R F YK108 (ATG at position 5224; compare Figure 2) turned out to contain a high number of specific regulatory or functional DNA sequences. Two possible heatshock response elements (TTCNNGAA; nt 46814688 and nt 43224329; Amin et al., 1988; Perisic et al., 1989), a 'pre' element (TGAAACA; nt 39613967) and the central consensus of a GCN4 trans regulator binding site (TGACTC; nt 41964201; Hope and Struhl, 1985) have been identified. At position 3605 to 3595 (Figure 2), a DNA stretch showing a perfect match with the core consensus sequence of potential ARS elements (WTTTATRTTTW; Newlon, 1988) occurs. Additional DNA sequence elements could be identified within some other putatively non-coding

average. This value turned out to

areas. A DNA sequence resembling parts of the a2

et al., 1991; Korch et al., 1991). The orientation of

be more o r less

642

YK109

;

3354

',.

5224

I

5000 [bp]

YK108

7926 8197

YK107

YK106 YK105

YK102

15479 15551 16897

@ YK103

13757 14601

YK104

10077 10844 11028 11855 12414

I9046

PAP1

17340

21670

VPSl/SPO15

19556

-h A

b

r

d

23501

r

CENll MET14

22893

-

Figure 2. Genetic organization of the 24.7 kb segment in the vicinity of CENI I . Localization ofcentromere CENII, open reading frames (ORFs), a leucine-tRNA gene, a delta element and two ARS consensus elements is indicated as found during sequence characterization. Numbering of start and stop codon positions is given for ORFs longer than 100 codons according to the data submitted to EMBL (X65124). Areas of previously published DNA sequence data ( a d ) including the genes PAPI (a), VPSIjSPOl.5(b), METI4 (d) and thecentromere region ofchromosome XI (c) are marked above the 'right' end of the segment. Our sequence closes the gap between PAPl/VPSI/SPOIS (Lingner et al., 1991; Rothman efal., 1990; Yeh et al., 1991) and CENII-MET14 (Fitzgerald-Hayes et al., 1982; Korch et al., 1991). Discrepancies to our data are summarized in Table 2 and discussed in the text.

1039

1000

(YKllO)

1

B

P

-

-

-

213 -

5’-upstream Total nt Insertions/deletions

418 317 62 1 238

S’-upstream Total nt Insertions/deletions 609 2115 2115 1707 -

-

-

-

-

114 -

CDEl/II/III Substitutions Total nt Insertions/deletions

-

2

-

-

ORF Substitutions Total nt Insertions/deletions

“’Silent’third position exchange, eliminating a SulI restriction site. b T ~amino o acid alterations, see text. ‘One amino acid alteration, see text. d‘Silent’third position exchanges. ‘A-T exchange in CDEII.

CENI I

Comparison with CEN sequence

MET14 VPSl SPOl.5 PAP1

Comparison with genes

Table 2. DNA sequence discrepancies with available data base sequences

547 33 33 340

-

-

2

-

2 -

1 -

1‘

530

2

1

3’-downstream Substitutions Total nt Insertions/deletions Substitutions

5d

1” 3b 1‘

3’-downstream Substitutions Total nt Insertions/deletions Substitutions

755

ANALYSIS OF 24.7 kb SEGEMENT OF S. CEREVISIAE

Table 3. Characterization of open reading frames within 24.7 kb near C E N l l Name of ORF

Length [aal

MW

[kDI

Homologous protein

Reference

Identity Overlap FASTP score [%I [aa] (optimized)

NODQ-rhme Dynamin-rat Mx-mu PAP-bov OSBP-hu

Cervantes et al., 1989 Obar et a\., 1990 Staehli et al., 1986 Raabe et a/., 1991 Levanon et al., 1990

53 47 38 46 25

192 423 25 1 420 23 1

486 983 523 1159 186

CCG1-hu HDE-ct

Sekiguchi e t a / . , 1988 Nuttley et a/., 1988

24 53

252 919

157 2340

~~~

~

MET14 VPSl iSPOl5

202 704

23 79

PAP 1 YK102 YK103 YK104 YK105 YK106 YK107 YK108 YK109 (YKI 10)

568 448 292 447 275 184 625 900 77 1 (213)

65 52 35 50 32 20 72 99 86 (25) HDE-Ct

1

\

:. , ..

906

'1

'.

.%.

.

..

1

t

Figure 3. Dot matrix comparison between YK108 amino acid sequence (vertical) and peroxisomal trifunctional enzyme hydratase-dehydrogenase-epimerase(HDE) from C. tropicalis (horizontal). Parameters in the PROSIS software are set as follows: window size 30, matching amino acids 15.

binding site (TTACATG; Johnson and Herskowitz, 1985; Miller et al., 1985) is located 66bp upstream of O RF YK102 and a single cell cycle box (CACGAAA; Miller et al., 1985) precedes O R F YK109 by 239 bp. At nt 14082 to 14092, a second ARS consensus sequence is present between ORFs YK104 and YK103. Such regulatory sequencesmay help in the functional analysis of the products of these new genes.

Analysis of ORF-encoded amino acid sequences

All genes, including the nine potential ORFs YK102 to YK110, were translated to their amino acid sequences for further investigations. In order to get some hints on their possible functions, database homology searches using the FASTA and FASTP algorithms (Devereux et al., 1984; Lipman and Pearson, 1985) were carried out. Table 3 summarizes the results of this analysis. Most obvious similarities were observed for YK 108 with a trifunctional peroxisomal protein from the yeast Candida tropicalis (hydratasedehyrogenase-epimerase (HDE); Nuttley et al., 1988). A dot-plot analysis (Figure 3) demonstrates long stretches of homology over the complete length of the proteins and gives a hint on an internal duplication event within both sequences. The detailed amino acid comparison (Figure 4A) additionally exhibits an acidic region in HDE (position 602-619) missing in the S. cerevisiue product (position 610). It will be of interest whether the loss of this structural feature in YK108 has any effect on its enzymatic activities. During the final preparation of this paper we learned that W. H. Kunau and co-workers have independently cloned and characterized a gene called FOX2 encoding a multifunctional peroxisoma1 S. cerevisiae enzyme, which seems to be identical-at least in the coding region-with ORF YK108 (Hiltunen et al., in press). Among the other previously unknown ORFs, two could code for peptides showing less extensive, but still significant similarities to other proteins. A

756

A. DUSTERHOFT AND P. PHILIPPSEN

A YK108 HDE-ct YK108 HDE-ct

MPGNLSFKDRVWITGAGGGLG~YALAYASRGAKVWNDLGGTLGGSGHNS~LVVDEIKKAGGIAVANYDSVNENGEKIIETAIKEFGRVDVLINN 1 0 0

::III:II:IIIIIIIIII I:I::I: IIIIIIIIIII:I:I I IIIIII:IIIII l ~ l l ~ l l l I:II:III:I:II ~ l ~ ~ l I:I:III

: M S P V D F K D K W I I T G A G G G L G K Y Y S L E F A K L ~ K V W N D L G G A L N G Q G G N S ~ ~ E I V K N G G V A V ~ Y N N V L D - ~ K I V E T A V K N F G T V H V I98 INN

200

AGILRDVSFAKMTEREFASWDVHLTGGYKLSRMUPYMRSQKFGRIINTASPAGLFGNFGQANYSMKMGLVGLAETLAKEGAKYNINVNSIAPLARSR

1III:I: IIII::: I:IIIIII:I:: :::IIIII:::II:III:II:IIIII:llllIIII::II

:I:I:IIIIIIIIIIIII::I:IIIIIIII AGILRDASMKKEITEKDYKLVIDVHLNGAFAVTKAAUPYFQKQKYGRIVNTSSPAGLYGNFGQANYASAKSALLGFAETLAKEGAKYNIKANAIAPLARSR

198

MTENVLPPHI LKQLGPEKIVPLVLYLTHESTKVSNSI FELMGFFGQLRWERSSGPIFNPDPKTYTPEAI LNKWKE 1TDYRD- -KP- FNKTQHPYQLSDY 297 ::::::: : HDE-ct MTESILPPPMLEKLGPEKVAPLVLYLSSAENELTGQFFEVMGFYAQIRUERSGGVLFKPD-aSFTAEWAKRFSEI LDYDDSRKPEYLKNQYPFMLNDY 297

YK108

YK108

11I::111::1::11III::I11III:

:II:IIII::I:IIIII:I

:I:II :::I:I:: ::::I1 1 1 I 1 1

1:I:l: I:II

NDLITKAKKLPPNEQGSVKIKSLCNKVWVTGAGGGLGKSHAIUFARYGAKWVNDIKDPFSWEEINKLYGEGTAIPDSHDWTEAPLIIQTAISKFQR 397

:I:I:I:I::

11

:III::IIII:IIII::I

III:IIIIIIIII:II: ::l:ll I I I:I 1 1 111::::

1l:::I:l:

: HDE-ct ATLTNEARKLPANDASGAPTVSLKDKWLITGAGAGLGKEYAKUFAKYGAKVVVNDFKDATKTVDEI-KAAG-GEAUPDPHDVAKDSEAIIKNVIDKYGT 395 YK108

VDILVNNAGILRDKSFLKMKDEEUFAVLKVHLFSTFSLSKA~PIFTKQKSGFIINTTSTSGIYGNFGQANYAAAKAAlLGFSKTIALEGAKRGIIVNVI497

:IIIIIIIIIIII:II 1I:::II :I :lll::ll:ll: :II I:::: I III:IIIIIIIIIIIIIII:::II:III:III:I:IIII::I

11::

HDE-ct 1DILVNNAGILRDRSFAKMSKQEIlDSVQQVHLIGTFNLSRLAUPYFVEKQFGRIINITSTSGIYGNFGPANYSSSKAGILGLSKT~IEGAKNNIKVNIV 495 YK108

APHAETAMTKTIFSEKELSNHFDASQVSPLWLLASEELQKYSGRRVIGQLFEVGGGUCG~TRVQRSSGYVSIKETIEPEEIKENUNHITDFSRNTINPS 597 ::

IIIIIIIII III:I:: :I : : ~ : ~ l : ~ ~ : ~ I:::::

:I:I: I:IIII I:IIIII::I II :I

I Ill: I:IIII: :I II:

HDE-ct APHAETAMTLTIFREQO-KNLYHADQVAPLLVYLGTDDV------PVTGETSEIGGGWIGNTRVQRAKGAVSHDEHTTVEFIKEHLNEITDFTTDTENPK 588 YK108

STEESSMATLQAV------------------QKAHSSKELDDGLFKYTTKDCILYNLGLGCTSKELKYTYENDPDFQVLPTFA-VIPFMPATATLAMDN677

HDE-ct

STTESSMAILSAVGGDDDDDDEDEEEDEGDEEEDEEDEEEDDPVWRFDDRDVILYNIALGATTKQLKYVYENDSDFPVIPTFGHLITFNSGKSQNSFAK 688

YK108

LVDNFNYAnLLHGEQYFKLCTPTMPSNGTLKTLAKPLQVLDKNGKAALWGGFETYDIKTKKLIAYNEGSFFIRGAHVPPEKEVRDGKRAKFAVQNFEVP 777 : : :: :

.......I 1 1

II:IIIII:I II

I: 1 1 1 :IIIII:I:I:

1::l::Il

:I:::

I 1111::II I:I:III:IIII:III1:III:

:::::::

I !:::::I I

I I: : I 1

~~:::l~l:

:!:I

::: ::::

::::I ::I: 1I:::l :I

HDE-ct LLRNFNPMLLLHGEHYLKVHSUPPPTEGEIKTTFEPIATTPK-GTNVVIVHGSKSVDNKSGELIYSNEATYFIRNCQ--ADNKV-YADRPAFATNQFLAP 783

YK108

HGKVPDFEAEISTNKDQMLYRLSGDFNPLHIDPTLAKAVKFPTPILHGLCTLGISAKALFEHYGPYEELKVRFTNWFPGDTLKVKKQ-GSVVVFQT

::II:::::::::

I I I IIIIIII I IIIII l ~ ~ l l ~I I I~I:I lI lI:Il llll::::l ~l

I:I:I I I:: I II I:I I:I I I I: ::::I I II

876

: :: HDE-ct K-RAPDYQVDVPVSEDLMLYRLSGDRNPLHIDPNFAKGAKFPKPILHCFlCTYGLSAKALIDKFCFlFNEIKARFTGIVFPGETLRVLAUKESDDTIVFPT 882

IDTTRNVIVLDNMVKLSQAKSKL*

900

HDE-ct HWDRGTIAINNMIKLVGDKAKI*

906

YK108

::(::I:::I II:I I :I:I:

B MET14

MATNITWHP-NLTYDERKALRKQDGCTIULTGLSASGKSTIACALEPLLLQKNLSAYRLDGONIRFGLNKDLGFSEKDRNENIRRISEVSKLF

I:I: 11:

:::

::I:I:::I

::1:1111:IIIIIII :I:::I

:I

IIIII:I

III:IIII:I:II IIIII::II:II:

92

::: NODQ- pme RADNVHWHALEVNKSARSAMKNQLPAVLUFTGLSGSGKST IANELDR I LHAQGKHTYLLDGDNVRHGLNRDLGFTEEDRVENI RRVAEVAKLM 528 MET14

ADSCAISITSFISPYRVDRDRARELHKEAGLKFIEIFVDVPLEVAEQRDPKGLYKKREGVIKEFTGISAPYEAPKAPELHLRTDQKTVEECA

II: I ::IIIII:I :I :Ill1 :I:I :IIIIIII:II: ::IIIIIII:II :I I :lll:l: IIII::IIII:II

:

::

I

185

NODQ-rme.

ADAGLIVLVSFISPFRDERRIIAREL-MEEG-EFIEIFVDTPLDECARR~LYEKLAGKIANFTGVSSCYEAPENPELHIRTVGHQPNDLA619

MET14

TIIYEYLISEKIIRKHL*

NDDa-rme

:I I:I LAIEEFLDRRIGGQMTPLQRPT*

202 64 1 Figure 4.

(a-b)

757

ANALYSIS OF 24.1 kb SEGEMENT OF S. CEREVISIAE

C YK102

MGSKKLTVGSDSHRLSKSSFSSNKSSHSATKDQPIDTDDIDEDDESGHNIILNIISQLRPGCDLTRITLPTFILEKKSMLERVTNQLQFPEFLL 94

I :1::1::1:

I

: ~ ~ ~ : ~ : ~ : : ~ : : : ~ : OSBP-hu MPENLGHKRTGSNISGASSDISLDEQYKHQLEETKKEKRTRIPYKPNYSLNLUSIMKNCIGKELSKIPMPVNFNEPLSMLQRLTEDLEYHELLD 4 6 1 YK102

QAHSEKDPLKRFLYVMKUYLAGUHIAPKAVKKPLNPVLGEYFTAYVDLPNKWAY-YISEQTSHHPPECAYFYMIPESSIRVDGWIPKSRFLG

187

OSBP-hu

: : :::: :: : :: : ::: RAAKCENSLEPLCYVAAFTVSSYSTTVFRTSKPFNPLLGETFELDR-L--EENGYRSLCEQVSHHPPAAAHHAESKNGUTLRPEIKI-TSKFRG

551

YKlO2

NSSAAnMDGSTVLQFLDIKDGNGKPEKYVLTQPNVYVRGILFGKMRIELGDHMIIKSPNFQADIEFKTKGYV-FG--TYDAIEGTVKDYDGNAY 278

OSBP-hu

K-YLSIHPLGTIHCIF-----HATGHHYTUKKVTTTVHNIIVGKLUIDQSGEIDIVNHKTGDKCNLKFVPYSYFSRDVARKVTGEVTDPSGKVH 639

YK102

YEISGKUNDVMYLKDLKQPRSSPKVFLDTHKESPLRPKVRPLSEQGEYESRKLWKKVTDALAVRNHPVATEEKFQIEDHQRQLAKKRIEDGVEF 372

OSBP-hu

::: FALLGTM)EKMECFKVQPVIGENGGDARPRGHEAEESRVMLUKRNPLPKNAENMYYFSELALTLNAWESGTAPTDSRLRPDPRLMENGRVDEAN 733

YK102

HPKLFRRSKPGEDLDYCI YKN I PVDEDPEKQIRSI L P I A P I LPGQQFTDKFFIPAFEKI KSPKKMI ENEKQNPAKP*

448

OSBP-hu

AEKPRLEEKQRLSRKKREAEAMKATEDGTPYDPYKALUFERKKDPVTKELTHIYRGEYUECKEKQDUSSCPDIF*

807

:I

:

:::I::: 1 1

::I :I:

::

::II:II:III I:

:::

::I:

:: ::

I ::::I

1::I: 1I::l:

::::

::II:IIIII: I:

I

:::

::

::I

I :I:I I

I I:

:

::I:I:I :I:::

1:I:: I

D YK107

MVVKKRKLATEAGGSDERPKYLPGKHPKNQEKTPHVDYNAPLNPKSELFLDDUHIPKFNRFISFTLDVLIDKYKDIFKDF-IKLP-SRKFHP

1 1 I:

::I I 1 1

I: I: :I:: :I:

11

I

I

::: :: ::: : : ::: :: : :: CCG1-hu LIESADEVRRKSLVLKFPKQQLPPKKKRRVGTTVHCDY---LN-RPHKSIHRRRTDPMVTLSSI-LESIINDMRDLPNTYPFHTPVNAKVVK

YK107

90 1359

CCG1-hu

QYYYKIQQPMSINEIK--SRDYEYEDGPSNFLLDVELLTKNCQAYNEYDSLIVKNSMQ~LIEFEVLKAKNLKRNYLINSEVKAKLLHYLN 1 8 0 ::~~::~ : ~ :::: : ::: ~~ : : :: DYYKIITRPMDLQTLRENVRKRLY-PSREEFREHLELIVKNSATYNGPKHSLTQISQSMLDLCD-EKLKEK---------EDKLARLE~IN 1 4 4 0

YK107

KLVDATEKKINQALLGASSPKNLDDKVKLSEPFMEL~KDELPEYYEIVHSPMALSIVKQNLEIGQYSKIYDFIIDMLLVFPNAHIFNDPSA 2 7 2

CCGl -hu

PLLDDDDQVAFSFILDNIVTQKWA-VPDSWPFHHPVNKKFVPDYYKVIVNPMDLETIRKNISKHKYPSRESFLDDVNLILANSVKYNDNEC

YK107

LIYKDATTLTNYFNYLIQKEFFP

CCGl - h u

SSKANDIVCLIQYCSSQIEELRF*

:I1 I :II::::::

1:l::::

:I::

I

I: I ::::I

:::: :

I I 11

:

I:I: :I:II::: :II:I:::::(::

I

I II:I

:I

:

!:I:

:I

:I: I: 1:::I: :I1

1531 625 1554

Figure 4. (c-d) Figure 4. Amino acid sequence comparisons. Identical amino acids are indicated by (I); related residues are marked (:) according to the rules of the PROSIS software. (A) Comparison of the translated product of new ORF YK108 with the peroxisomal trifunctional HDE enzyme from C . tropicalis revealing the highest similarity (53% identity over the whole length of the proteins). An internal duplication within both enzymes ranges from amino acids 5 to 282 and amino acids 319 to 591, respectively (according to the numbering of YK108). Another feature of HDE is missing in YK108: the stretch of highly acidic residues (compare position 602 in HDE, position 610 in YK108). (B) Comparison ofthe METllgeneproduct and the C-terminal part oftheNODQ regulator protein of R. rnefilori(53% identity in an overlap of 129 amino acids). (C) Homologous area between acentral part of YK102 and the C-terminus of human oxysterol binding protein OSBP (25% identity over 23 1 amino acids). (D) Region of similarity shared between YK107 and the C-terminus of the human cell cycle gene 1-encodedprotein CCGl (24% identity over 252 amino acids).

758 central segment of 231 amino acids within YK102 is 25% identical in its composition with a C-terminal portion of human oxysterol-binding protein OSBP (compare Figure 4C), which is suggested to be involved in sterol regulatory processes (Levanon et al., 1990). The putative translation product of O R F YK107 has 24% identity with the C-terminal region of the human cell cycle gene 1 product (see Figure 4D), which is able to complement certain G,-arrested mutants of a mammalian cell line (Sekiguchi et al., 1988). No additional structural features, like nucleotide-, DNA- or protein-binding motifs, could be found. The three known proteins PAPl, VPSl/SPO15 and MET1 4 were reanalysed for homologies in the same way. The similarities of VPSl/SPO15 with the anti-viral Mx proteins of higher eukaryotes and the rat dynamin have been presented earlier (Rothman et al., 1990) or in parallel to our investigations (Yeh et al., 1991), as also in the case of yeast PAPl and bovine poly(A) polymerase (Raabe et al., 1991; Lingner et al., 1991). For MET14 we describe here a striking homology (53% identity over 192 amino acids; see Figure 4B) of the yeast APS kinase with the C-terminal part of the NODQ regulator protein from Rhizobium meliloti (Cervantes et al., 1989), which has not been mentioned by Korch et al., 1991. Within this region of homology, two of the consensus motifs putatively responsible for nucleotide binding could be identified (GXXXXGK, DXXG; Dever et al., 1987). ACKNOWLEDGEMENTS We thank Dr B. Dujon for providing cosmid pUKG047 and for helpful discussions, Dr W. Keller and J. Lingner, as well as Dr W.-H. Kunau and A. Beyer for exchange of data prior to publication, and Martina Haasemann for rechecking our analysed sequence data. This work was supported by the Commission of the European Communities under the BRIDGE programme of the division of Biotechnology . REFERENCES Amin, J., Ananthan, J. and Voellmy, R. (1988). Key features of heat shock regulatory elements. Mol. Cell. Biol. 8,3761-3769. Andreadis, A., Hsu, Y. P., Kohlhaw, G. B. and Schimmel, P. R. (1982). Nucleotide sequence of yeast Leu2 shows 5'-noncoding region has sequences cognate to leucine. CeN31,319-325.

A. DUSTERHOFT AND P. PHILIPPSEN

Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D. Seidman, J. G., Smith, J. A. and Struhl, K. (Eds)(1987) Current Protocols in Molecular Biology. John Wiley & Sons Ltd, New York. Boyer, H. W. and Roulland-Dussoix, D. (1969). A complementation analysis of the restriction and modification of DNA in Escherichia coli. J. Mol. Biol. 41,459472. Cervantes, E., Sharma, S. B., Maillet, F., Vasse, J., Truchet, G. and Rosenberg, C. (1989). The Rhizobium meliloti host range nodQ gene encodes a protein which shares homology with translation elongation and initiation factors. Mol. Microbiol. 3,745-755. Dever, T. E., Glynias, M. J. and Merrick, W. C. (1987) GTP-binding domain: three consensus sequence elements with distinct spacing. Proc. Natl. Acad. Sci. USA84,1814-1818. Devereux, J., Haeberli, P. and Smithies, 0. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12,387-395. Fitzgerald-Hayes, M., Clarke, L. and Carbon, J. (1982). Nucleotide sequence comparisons and functional analysis of yeast centromere DNAs. Cell 29,235-244. Gafner, J., De Robertis, E. M. and Philippsen, P. (1983). Delta sequences in the 5' non-coding region of yeast tRNA genes. EMBO J. 2,583-591. Genbauffe, F. S., Chisholm, G. E. and Cooper, T. G. (1984). Tau, sigma, and delta. A family of repeated elements in yeast. J. Biol. Chem. 259,10518-10525. Henikoff, S. (1984). Unidirectional digestion with exonuclease 111 creates targeted breakpoints for DNA sequencing. Gene 28,351-359. Hope, I. A. and Struhl, K. (1985). GCN4 protein synthesized in vitro binds HIS3 regulatory sequences. Implications for general control of amino acid biosynthesis genes in yeast. Cell43, 177-188. Johnson, A. D. and Herskowitz, I. (1985). A repressor (MAT a2 product) and its operator control expression of cell type specific genes in yeast. Cell42,237-247. Knott, V., Rees, D. J. G., Cheng, Z. and Brownlee, G. G. (1988). Randomly picked cosmid clones overlap the pyrB and oriC gap in the physical map of the E. coli chromosome. Nucleic Acids Res. 16,2601-2612. Korch, C., Mountain, H. A. and Bystrom, A. S. (1991). Cloning, nucleotide sequence, and regulation of METII, the gene encoding the APS kinase of Saccharomyces cerevisiae. Mol. Gen. Genet. 229,96108. Levanon, D., Hsieh, C.-L., Francke, U., Dawson, P. A., Ridgway, N. D., Brown, M. S., and Goldstein, J. L. (1990). cDNA cloning of human oxysterol-binding protein and localization of the gene to human chromosome 11 and mouse chromosome 19. Genomics 7,65-74. Lingner, J., Kellerman, J. and Keller, W. (1991). Cloning and expression of the essential gene for poly(A) polymerase from S. cerevisiae. Nature 354, 49W98.

ANALYSIS OF 24.7 kb SEGEMENT OF S. CEREVISIAE

Lipman, D. J. and Pearson, W. R. (1985). Rapid and sensitive protein similarity searches. Science 227, 1435-1441. Miller, A. M., MacKay, V. L. and Masmyth, K. A. (1985). Identification and comparison of two sequence elements that confer cell-type specific transcription in yeast. Nature 314,598-603. Mortimer, R. K., Schild, D., Contopoulou, C. R. and Kans, J. A. (1989). Genetic map of Saccharomyces cerevisiae, edition 10. Yeast 5,321403. Newlon, C. S. (1988). Yeast chromosome replication and segregation. Microbiol. Rev. 52,568-601. Nuttley, W. M., Aitchison, J. D. and Rachubinski, R. A. (1988). cDNA cloning and primary structure determination of the peroxisomal trifunctional enzyme hydratase-dehydrogenase-epimerase from the yeast Cundida tropicalis pK233. Gene 69, 171-180. Obar, R. A., Collins, C. A., Hammarback, J. A., Shpetner, H. S. and Vallee, R. B. (1990). Molecular cloning of the microtubule-associated mechanochemical enzyme dynamin reveals homology with a new family of GTP-binding proteins. Nature 347, 256-261. Oliver, S . G. et a / . (1992). The complete DNA sequence of yeast chromosome 111. Nature, 357,3846. Perisic, O., Xiao, H. and Lis, J. T. (1989). Stable binding of Drosophilu heat shock factor to head-to-head and tail-to-tail repeats of a conserved 5 bp recognition unit. Cell 59,797-806. Prober, J. M., Trainor, G. L., Dam, R. J., Hobbs, F. W., Robertson, C . W., Zagursky, R. J., Cocuzza, A. J., Jensen, M. A. and Baumeister, K. (1987). A system for rapid DNA sequencing with fluorescent chainterminating dideoxynucleotides. Science 238,336341.

759 Raabe, T., Bollum, F. J. and Manley, J. L. (1991). Primary structure and expression of bovine poly(A) polymerase. Nature 353,229-234. Rothman, J. H., Raymond, C. K., Gilbert, T., O’Hara, P. J. and Stevens, T. H. (1990). A putative GTP binding protein homologous to interferon-inducible Mx proteins performs an essential function in yeast protein sorting. Cell61, 1063-1074. Seikiguchi, T., Miyata, T. and Nishimoto, T. (1988). Molecular cloning of the cDNA of human X chromosomal gene (CCGl) which complements the temperature-sensitive G, mutants, tsBN462 and tsl3, of the BHK cell line. EMBO J . 7 , 1683-1687. Staehli, P., Haller, O., Boll, W., Lindemann, J. and Weissmann, C. (1986). Mx protein: constitutive expression in 3T3 cells transformed with cloned Mx cDNA confers selective resistance to influenza virus. Cell 44, 147-158. Warmington, J. R., Green, R. P., Newlon, C. S. and Oliver, S. G. (1987). Polymorphisms on the right arm of yeast chromosome I11 associated with Ty transposition and recombination events. Nucleic Acids Res. 15, 8963-8982. Yeh, E., Carbon, J. and Bloom, K. (1986). Tightly centromere-linked gene (SP015)essential for meiosis in the yeast Saccharomyces cerevisiae. Mol. Cell. Biol. 6 , 158- 167. Yeh, E., Driscoll, R., Coltrera, M. and Bloom, K . (1991). A dynamin-like protein encoded by the yeast sporulation gene SP015.Nature 349,713-715. Yoshikawa, A. and Isono, K. (1990). Chromosome 111 of Saccharomyces cerevisiae: an ordered clone bank, a detailed restriction map and analysis of transcripts suggest the presence of 160 genes. Yeast 6,383401.

DNA sequencing and analysis of a 24.7 kb segment encompassing centromere CEN11 of Saccharomyces cerevisiae reveals nine previously unknown open reading frames.

A 24.7 kb segment of the cosmid clone pUKG047 containing a Sau3AI-partial fragment from the centromere region of Saccharomyces cerevisiae chromosome X...
688KB Sizes 0 Downloads 0 Views