YEAST

0

VOL.8: 977-986 (1992)

o o o o 0

0

XI

0 0

0

00

Yeast Sequencing Reports

0 0 0 0

The Sequence of a 12 kb Fragment on the Left Arm of Yeast Chromosome XI Reveals Five New Open Reading Frames, Including a Zinc Finger Protein and a Homolog of the UDP-Glucose Pyrophosphorylase from Potato BENEDICTE PURNELLE, JACEK SKALA*, LUC VAN DYCK AND ANDRE GOFFEAUf UnitP de Biochimie Physiologique. Universite Catholique de Louvain, Place Croix du Sud, 2-20.1348 Louvain-la-Neuve, Belgium

Received 2 July 1992; accepted 6 August 1992

We report the sequence of a 12 399 bp DNA fragment from the left arm of Saccharomyces cerevisiae chromosome XI. This fragment is located between the genetic loci mif2 and makll. We have detected five new open reading frames (ORFs) longer than 300 bp provisionally called YKL248 to YKL252. One of them, YKL248 encodes a homolog of the UDP-glucose pyrophosphorylase from potato. The product of YKL251 contains the consensus for zinc binding proteins, similar to those of a number of fungal transcriptional activators. The three other ORFs do not show significant homology to known protein. KEY WORDS - Saccharomyces cerevisiae; chromosome XI; zinc finger motif; UDP-glucose pyrophosphorylase.

INTRODUCTION During the systematic sequencing of chromosome XI sponsored by the European BRIDGE project, we have sequenced a DNA segment of cosmid pEKG119 issued from the left arm of the chromosome, between the genetic loci mif2 and makll. The sequence of this 12399bp DNA fragment reveals five new open reading frames (ORFs) provisionally called YKL248 to YKL2.52. MATERIALS AND METHODS Plasmid The cosmid pEKG119 was provided by A. Thierry and B. Dujon (Thierry et al., manuscript in preparation). This cosmid contains a 37kb fragment from the chromosome XI of the strain aS288C inserted into the vector pWE15. *Permanent address: Institute of Microbiology, Wroclaw University, Przybyszewskiego 63,51-148 Wroclaw, Poland. tTo whom all correspondence should be sent. 0749-503X/92/011977-10 $10.00 0 1992 by John Wiley & Sons Ltd

Sequencing strategy We have used the random sequencing method of Sambrook et al. (1989) modified by J.-C. Jauniaux (ULB-Brussels). After sonication of the whole pEKGll9 cosmid, DNA fragments were terminirepaired using the Klenow fragment and the T4 DNA polymerase. Fragments ranging from 800 to 1200 bp were cut out from a 1% agarose gel, purified by centrifugation through glass wool (Heery et al., 1990) and ligated into the SmuI site of the sequencing vector pSEQl . Escherichia colistrain XL1 Blue was transformed by the complete ligation mixture following the calcium chloride method (Mandel and Higa, 1970). Transformants were selected by hybridization with a 12 kb BumHI DNA fragment from the pEKG119 cosmid. Double stranded DNAs were sequenced on both strands by the method of Sanger et al., (1977), using the T7 DNA polymerase from Pharmacia. A total of 360 sequencing reactions were carried out for obtaining a final sequence of 12 399 bp. Junctions between

978

B. PURNELLE ET AL.

?(-----

721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821 2881 2941 3001 3061

YKL253

start

TTACCCTGAGAATGCCTTATGTATTCZGTAGTCCGCTAAATACGCTTCTTM GATTAATMGCAATATTGTAAGTAGATCACTGAAATTCGGAAAKTTAAATTATTTATA

ATACTCAATGTACATl'GTITTATCCTCAAGAGACCGCAGACG'ITTCAAl"GTGTTTAA AAGAACAATGGACAGTATATACGTAGTGCCAAmTTCTATGGTGTATTATGCGCATT GAACATTGTAATATCCGTAATGTTGAACATAATTTCC TIYIATGTATACCAGAGTGTGGTACMTAATTACTGClTAGTTGTWTGC CCGATlTAGATATTGCACCCGTGCACGTCACACATAAGCTTffiTCmAAATAGTGTGT TATA .. box _. GAAAAAACGAACTTGAAGAAAAAGATAGAGCAAGATATATATAGGAGTAGGAATAGGGAG

AGTGGAAAAAAAAACTTAGCCTCAAACTAATAGTCAAGAGGATCAACTATTCTGTTCAGG TmGCAGGTTAAAAATTGAAATAGAATATTTWTCGGTAATTTCPAGAAACAGCTGGTA r s t a r t YKLZSZ--->AAAAGGCAGTAAATTACAGAAACTAAGAGTGTAAGGGCACAATGAGAGTATATCAGTTT

GATTGGAAGGGCAA4ATAGACCCGlTTATATCATTCAATCTT'AAAAAAATAAGTGGATTG RGTGATGAGTCTGATCCAGGCTTAGTCGTTGCAATCTATGATTTCCAAGACTTTGAGCAC TTAGGTGTGCAATTGCCGGATGAAGL4ATGTATTATATTTGTGATGATTATGCAATTGAT RTCGGGAT'PTGTGAAGAGGAGAATCGTGATGAG'TATATTGTTCAAGATGTTGTATATGAT CCATATACTTCCACAAATAGATCTTTGGCCAATCCAATCATGACGTTTTCTCAAAATGAG GTCGGCCTTCATGATACTAGATACCCAATAAAAGAAACCffiATTTTATTGTGTTAC%CG TTCAGATCATCGACGTCTACCAAGTTCAACGCTGTGGTGAAPTTCAGGAATGCTTATffiT r A r * ~ G G G C A C C ~ G A T A A ~ ~ ~ C T A T A T G G T C ~ G G T C G C A T A C r**GATr**~AT4~~CGCC~TffiAAGCACAA~TGMTTG~ACCA

ATGTATGGTGCGTTAACTTATGCAATTTGCATTffiATTTITGATCCAAAGTTA~ACC GACATGGAGGCACCATCTCCACTAATCTTGATCACCTTGATTCCI'ATGGCGCTTGCATTA ATCATATTTTACTACATGATTATTAGATCCATGACAAAAACAGTTATCTATTI'GAAAGAG CAAAGGCAGATCGTAAAATTGAACATGTACPAGAAGCTACTATACATCATTI'AWGTCC

TTmGAGTGTACTAGCTGGTAGTATAGTTTCGTCATITATTTACGTTGGCATGAACACT ATTGATATGATTGAGAAGMCT~GG~Mffi~GTTA~AlTTCTGGCCTACG TTAGTGTATTTWLTN3lmCGlTACCATCGCGTTCTTATGGAGACCAACAGATACTTCA TACATGCTAGCCGCATCCCAACAACTACCCACAGACCCAGWTGTCGCffiA-GAC TTGGGCGAlTTGCAATCGTTTGACGATCAAGATGACGCAAGTATCATCACWTGAGCGT GGCATAGACGAGGATGACCTCAATTTGAATTTTACTGATGATGAAGAAWCACGATMT

GTAAATAACCATAGCCAACAAGCCACGGGCCAGTGTCTCCCTCTCCAAWTAACGCTACC TCCACAAGCAGA-GATT~TGATGAGA-TAG~~GA~GWCGAAATGT

-YKc252

sto+

3121

TTACAAGCGAAAAGAAAGAAATTATGAACCATATGCACTTC~CPATGCGATWGAAT polgadeqlatim sit.

3181 3241 3301 3361 3421

ATGAATATGAATATAAATATGMTATUATAAATAAATAAATMTGCGTTGCACGTTTC lTTTCATTTATITGTGCGTGCTCCTGT'lTATMCTGTAGWCCMAGCTA~A TACTCTAGACCTAACCGTTTTGTATTTATTTAGCCGCTCGCTGGAG~ T T T C G G T G P I T A A G A ~ A T T T A T A ~ C C G A A A C A R G C TATAAAATGGAAGAGAAAATTCGTCCATAGACM'TTCATGTGTAmTTTATGAGAAGA

EcoRI. _ ~

3481 3541 3601 3661 3721 3181 3841 3901 3961

GACGCAGGAAGACCAGAAGTGA-CCTTTGTTCAAGGATGAACAACGAAGCCAGA

TAACGTAAGGTTGCGCGCTGCAC^'-^^^' TCTTTCCTATGTAACAT AACATL ..- - - - .- - . TGACAATGCTATAAACCCAGCTCTTC:GATTCGTTTTTAA? -

- ..- ..- - - -

__ ._

_._._._..... ~~

GTCMACGCTCTGAGTAATAGMTATACATGCGA'CAACATITGTAGTT A CTTGTCTTACG CTTCCTAGGGAACGGAWATCCGAGTAGAATATCAlTlTACGGGATCACTTTATTC~TG TGATl'GCTCGTTMGCCGCGACCATTTTATGGGCACGCTTCCGCAAGTTGGTTC~C ATCAGAAGCTTTACCACGATTGGCAAGTCATGTAGCATGTGCGCTAACATCGTTTATCGT YAW ABFl TTTCTGTGACGTGGCTTATG&XACCAGAGACGAATAATCAACAGTACATCAM~G AFGR I1

4021 4081 4141 4201 4261

CTACGTDCCTCACCAATAATCCGAGCTTCTTGAAATGGCAGGTATTGGATTCATWT ATGTC-TTTWCTTTAGTTTTTTC-TGTAGCTAACTAA GCAAGACGATTATGGGAAGGGACCTTTTAAGGAAAAACAGG~TAGTAGCAAC

4321 4381

T C G T T G G A A A T C G C C T C T CCCTTGTAC~GTGCAGAACCGTTCCCGCAA~CAT TGGTGCAATAATAACACGCTTCTTCGATTCWCA-TCAAGAAGCTGTACTCTTCTC

TAGT~TTCT~~-TAAGAACACA~GTTAffiMCCGTGC TTGGAGTFTXAGACTGGAGTATTGTCCATCCPACATATTCTGCTCTTTCACCATAGACT

Figure I

YKL251--->

-start 4441 4 '.0 1

TCAAACTCCAATATAmCAAATFWCGAGCTGAACACTGTTTCGACTMCTCCAGT

1 GAPTTCACCAAGAACGGTG&TACATCCAATAGTCCCGACGATATGGACTCCGCAGCAGCA h ACA AC A AArXZCTCCCGGGCCTGCGAXAGTGTCGTAAA

'TCAAACTCZCAAAGGAAT

5 3 41 5 401 5461 5521 5581 5641 570 1 5761 5821 5881 5941 6001 71

~~TCITTPACTA~CAGCCTCAGCTTCCCC~C~CAGCAACA4TCGCAG CCACMGCACACCMC-~CAAffiACAGAACCTATACTCGTA~CAAm

TCCCAGCAGCAACCGTACAACCCATCGATATCATC~CAGTTC~TMTGGT m G C C A G C G A G G C T A ~ C C A G T G C A C C T G C C A T G m C G T A A A T G ~ C A - C A - G A C C T C A A G G A C A G C A G G T A C C G C A A T G A C A A T C A G C C C C C

GTATCAGTMCATTATCCACGGACA~~AATGAGAATMTM~-TT A A C I V L C M T A A ~ G C A A T A A T A G C G G ~ M ~ T A I GAA~TMCTACACC~GGC~MGCATCCAGCC~-CGCCACA~~ A G T A ~ C G C A G A A P I P I P l G T T A C G T A T C T T C T A T A ' K C T A T A A C A T C G A A A G A ~ ~ C C C M T G A C C C G C G

TATGGACAGATAWCGATGTAGA'KCTAATAGACACCTACTATGAGTTCATACATGTAGGA r

A

r

*

r

A

A

~

r

M

T

A

T

A

C

A

C

G

M

~

T

X

A

T

A

T

D

p

6421 6481 6541

A T T C A T C C C A C A A C G A C C A A A ~ G C C ~ G A ~ ~ T T T A T TTATCTACTITTATTCTTTACATTPXGCATTTGlTGGATACGATAACTCAm GTACTTGGMTGTCGGTGACAATITTTAACGAATTTAAATTATACAM~TATTA~

6961 7021 7081 71 41

GGTGPATTGATGAGTGAGCAGGAATAGAATATCAATGAACGGCAATAGGAAGCCT GGGTTCGATATTACGGTCTWACTCTTATCGGAATATGTGGAAACTCAACCTCTA T C C G T A G C A ~ G T T A ~ G C T A T T M ~ C A C A A ~ A T C M ~ T T A TTATCATTATACGATTCAGAAGCAGGAG'KCTATXAGA~CATFGGA~AG~G

720 1

A_ A A_ A T_ A C- Z A G A C T.__._ C T C T. A. "_ GATA~TAA~GTATTGACGll'GATA . _._

__

6

75 1 73 2 1 7 3 81 7441 7501 7561 7621 7 681

TFAAGACTGAATCCTACGAACAGTATKATITTMTTATAGACCACCGAACCCACCCGCT

A A T M T C C ~ C A G T G C A A ~ ~ C C A ~ T A T G G G C I CITAGCGCIY;CACCTCCATCGGA~AATC~~TITTTACAA~TFACTAGGCCTG AAACAAGACAC1Y;GCACTATCC~GACCTTTGCC~TAA~CCC~GCT A T F G C T A T C ~ A C W L G C A A ~ ~ ~ ~ ~ G C A ~ T G C ~ A C A T C A CTCATTAGCATTATGATGACGGCAACTACAACGCAGMTACTCAGGACACCMGAAGTCG CAGGACCTGGWATGAAGCI'GTCAAACTCGATGAATGPAGTA~TCACCAGCGTG ~ A C A A T G A - G C C A ~ ~ ~ G A G C A ~ ~ M T A A G C C C A T A A ~ T C C

. ..

7801 7861 7921

I AATTTCGTTAGTATCGGCTGGAAGCTGTTGGACGATTCCGAGTTAGGC'XGTATZ%WAG

m 2 5 1 stop---+ 798 1 80 4 1

G

8101

TACACATATACACACTAGCTGTTlTAACCTGTAACAATACTGCCTIWTCAGAGTGCGCAC

81 61

-AT GAT G G C C T m A ~ G C A ~ C A C G G T A ~ G A G A ~ G C C M G C T C A G G G

8401 8461 8521

T

M

~

A

T

G

C

T

C

C

C

~

C

C

~

~

A

G

G

C

A

G

A

A

A

~

G

m

II1

J

T C C G G C ~ ~ C A G G T G T C ~ T T C T C A C A ~ A T C C ~ A C

AGCGTCTTGTAACCTGATCCCCTTCCCCCCGCCCCT'WTCGAACAGCCTGACGCGCTKCT

-__-__ -

.F*.c*

C

T

~

G

~

G

G

G

C

T

C

G

G

C

C

T

G

~

h

G

G

A

~

G

~TCTAAACAAACAACAAAG~CAGCAGGTACCA~GA~GTTAC~A GGAGlTGGAGATKAATCACATCAW2AACGTT~GGCCTAGTCCT

870 1

CTTCCGATFATATAATATAACTI'CCACAGCAAGTfiTACCATAACATMTM.$CACTA

&art

m 2 .

II ICCTCACAAATGAGAAACEZTW.AACAAGTTGGCGGACTCTAGTAAA-ACGATGCTG r(--YKL.249 start

88 21

8881

8941 9001 9061 91 21

AGAAGCACACCAAAACACATPCCACTTATGCACTTATGCATFCGAGAGCAACACAAACAGCGTW2CTG

CTCGCGCTAAOI?TGAGAACGAACTGGAlTCGTTPTTCACGCTTTTCAGGAGATA~

TAGAGAAGTCTTCTAGAACCCCTTGGAATGGGACAAGATCA4GTCXCCMCCCGGATG A A G ~ T F A A G T A ~ T F A ~ G C A G C C C G A G A A T G ~ T C A A A C ~ C ~ T

TGGCTGTT?n;AAGTIYiAACGGT~ACCTCCA~~CCTAAAT C T G T T A T X A A G T G A G A G A G C A C ~ ~ - T I Y i A A T

BcoRI 9 1 81 924 1 93 0 1

~

YKL250 stop

8581 8641

8761

C

GTAAAGMCTACCATGCAATGMTGACCAAlTTMGTAAAACACATAAATMTAC?TACT

ACTn;MCAGACAGTACGATAGCGACGTGCCATl"ATI'GAAACACTG ACAAGGATACGGAACATFAAGMGTATFCCGCTAACA~XAGAATCAGATCTF TCAATCAATCCAGGTTCCCAAGAGTCTACAAGGATTCTTTA~CTGTCCCCACCGAAT

Figure 1. Continued.

980

B. PURNELLE ET AL.

9661 9721 9781 9841 9901 9961 10021 10081 101 41

AACAC~GTl'AGACCCATCCC~C!AMNZCCATTAATCAAGTRX G C T C G C A ~ A G C M G M T C C C T C A C A T C C ~ T C G T C G A G C T A G A T C A ~ C C M C G T C ~ A G G T A A A G A T ~ C A C ~ A G G G G T A ~ AT C A T C G C G A C ~ T - T C G A T A T T C C A A A C ~ C A

10201

rYKL248 s t o p TATTGGAAAATGTTGTCGTTA~AATTTGCAAATClTGGAACA~GGTGCGGCT

10261 10321 10381 10441

TATCAGAAAGTITACCAACTTCAACACGAATAACTTATGGATCA

TTGAC-

ATCTGUJIGCAGTAAA~~TCGMTCGAGCM~GGAGA~AAATCATTC~ ACmCTATAACAAGAGACGGTCATGAAATTAATGl"ACAATTAGAAACCGC2TT

Gn;Gn;CPGCTATCAGGCAmTGATOGTGCTCACGGTGTT TCTTGCCTGTCAPIGAC~GTTCCGA~TTAAATCAGA~ATTCC~

TGCAAKCCTTTmAC~TCG~A~TTAmATTCTACTACACGCAG~

'ITTTITITCAATGCGTTATATATCTCTMTATATACACAlTTATATGlTTAGACTTATAC A G M ~ n ~ ~ A G T G ~ A T M ~ ~ A C G T TATA bm ATATAlTlGTl'AAGCGTAGCTKA~TAGAGATlTTCCTTTGGCGGGGTMGGGT rnl.rd.a.l.tlrn nitr

C

A

T

G

T

T

10501 10561 10621

-start 10681 10741 10801 10861 10921 10981 11041 11101 11161 11221 11281 11341

YXL247--->

I

mTCPCCTCACCGGGTGGTGATGGTGTGTCTTCTCAATATGAGTTCAATCAACTCCAGA GACTCAAGCAGCAGTTCCGAACAGAGCACAATACGTTCGTCAATATGACATATACAGATA GmTCGAAATATTACCGGGC~AAA-~AC~GACA~TMTC~TCC

11 401 11461 11521 11581 11 641 11701 11761 11821 11881 11941 12001 12061 12121 12181 12241 12301 12361

WamnI RGTATATCCTAATCTTCATATTMCTCATAW

Figure 1. The complete sequence of the 12 399 bp BamHI fragment from the pEKGll9 cosmid. The ORFs are boxed from the first possible ATG codon to the first encountered stop codon. The BamHI and EcoRI sites are underlined two-fold. The arrows indicate the orientation of the ORFs. The putative binding sites for transcription factors are indicated on the top of the sequence. They are underlined once as well as the presumed TATA boxes, the starting ATG and the stop codon. The Zaret and Sherman termination consensus is written in bold character.

non-overlapping clones were determined using synthetic oligonucleotides as primers. Ambiguities due to compressions were solved by running the acrylamide gels in the presence of 7 M-urea plus 40% formamide (U.S. Biochemical, 1990).

Search for ORFs was carried out following Fickett's method (1982). For each ORF we assume that the first ATG is the initiation codon. The search for homologies using the FASTA score (Pearson and Lipman, 1988) was carried out in the database MIPSX, release 30-2, Martinsreid.

Sequence analysis Manual introduction of the data as well as assembling of the sequences were performed using the PCGene software, release 6.50 (IntelliGenetics

The complete nucleotide sequence as well as the

Inc.).

deduced amino acid sequence are presented in

RESULTS AND DISCUSSION

98 1

THE SEQUENCE OF A 12KB FRAGMENT

B E E

E

E

B

B 5'

YKL252

YKL251

3'

YKL248 YKL250

YKL247

5'

3' 1 T

YKL253

YKL249

A

C

2000

4000

6000

8000

10000

12000

2000

4000

6000

8000

10000

12000

E

N

Figure 2. (A) Partial restriction map deduced from the 12 kb sequence. Abbreviations: B: BamHI; E: EcoRI. (B) Position of the ORFs. (C) ORF map in the six possible reading frames obtained using DNA Strider 1.2 software (Marck, 1988).

Figure I and Figure 3 respectively. The restriction map is shown in Figure 2A. No tRNA genes, ARS, Ty elements or consensus sequences for the splicing of introns were detected in the 12 kb fragment. Search for coding regions by Fickett's method (1982) identified seven ORFs longer than 300 bp temporarily named YKL247 to YKL253 (Figures 2B and 2C). Two of them, namely YKL247 and YKL253 will be described in another report, for they are overlapping the neighbouring fragments. The best scores for homologies with proteins in the databases MIPSX, release 30-2, are given in Table 1. The product of YKL248 is a protein of 499 amino acids (Figure 3A) which shows 53% identity in a 471 amino acids overlap with the UDP-glucose pyrophosphorylase (UTP: glucose- 1-P uridylyltransferase) from potato (Katsube et al., 1990). This protein is important for glucosyl donation in the synthesis of various carbohydrates (sucrose, glycogen, PI-3 glucan). The alignment of both proteins as shown in Figure 4 suggests that the

YKL248 gene product is the homolog of this enzyme in yeast. A putative TATA box (TATATAAA) is present at position 8575 in the 5' non-coding region. A putative polyadenylation site (AATAAA) is present at position 10522. The tripartite consensus TAG.. TATGT..TTT proposed by Zaret and Sherman (1 982) for the transcriptional termination is located 48 to 201 bp downstream from the translation stop site (Figure 1). YKL249 and YKL250 are translated in polypeptides of 130 and 118 amino acids respectively (Figures 3B and 3C). They do not have significant homology, as revealed by the FASTA score, with any known protein. Hydropathic analysis predicts three integral transmembrane spans in YKL250 (Klein et al., 1985). The product of YKL251 is a 1170 amino acids integral protein (Figure 3D) with three transmembrane segments according to the method of Klein et al. (1985), two transmembrane helices predicted by Rao and Argos (1 986) or only one as predicted by Eisenberg et al. (1984). Search in the

YKL248

A.

1 M S T K K H T K T H S T Y A F E S N T N S V A A S Q M R N A 3 1 L N K L A D S S K L D D A A R A K F E N E L D S F F T L F R 6 1 R Y L V E K S S R T T L E W D K I K S P N P D E V V K Y E I

9 1 1 S Q Q P E N V S N L S K L A V L K L N G G L G T S M G C V 1 2 l G P K S V I E V R E G N T F L D L S V R Q I E Y L N R Q Y D 1 5 1 S D V P L L L M N S F N T D K D T E H L I K K Y S A N R I R

1 8 1 I R S F N Q S R F P R V Y K D S L L P V P T E Y D S P L D A 2 1 1 W Y P P G H G D L F E S L H V S G E L D A L I A Q G R E I L 2 4 1 F V S N G D N L G A T V D L K I L N H M I E T G A E Y I M E

2 7 1 L T D K T R A D V K G G T L I S Y D G Q V R L L E V A Q V P 3 0 1 K E H I D E F K N I R K F T N F N T N N L W I N L K A V K R

3 3 1 L I E S S N L E M E I I P N Q K T I T R D G H E I N V L Q L 3 6 1 E T A C G A A I R H F D G A H G V V V P R S R F L P V K T C 3 9 1 S D L L L V K S D L F R L E H G S L K L D P S R F G P N P L

4 2 1 I K L G S H F K K V S G F N A R I P H I P K I V E L D H L T 4 5 1 I T G N V F L G K D V T L R G T V I I V C S D G H K I D I P

4 8 1 N G S I L E N V V V T G N L Q I L E H

B.

YKL249 I M C F G V L L S G H Y Y V M V H L L W K L Y Y I I G R G L G 3 1 L L T L L M W K E P E I E S P T P K T V T S I S W Y L L E E 6 1 L C C L F R F I Y S I Q N R E R K C K N G P S P N K R G S A 9 1 S G C S R R G G G R G S G Y K T L Y R D L C E N K P T P E L

121 W T L Q A P E L N L

C.

YKL250

I M Q T M G G E H L L L S O L K G S F F L L L L A Y F F R G R 3 1 S P Y Y A R C Y R R L A V T P G A I T I A I A I A T D S I P

6 1 e K S K V L V S V C S H T D P C T A S C N L I P F P R P 9 1 F S N S L T R F L F C L G S A R F C I S F P C F G L S I

D.

YKL251

1 M N E L N T V S T N S S D S T K N G G T S N S P D D M D S R 3 1 A A A S H A I K K R T K A S R A C D O C R K K K I K C D Y K 6 1 D E K G V C S N a C S F D R V P L K R G P S K G 9 1 Y T R S T S H P R T N E I Q D H N N S R S Y N T F D N S N N

1 2 1 T L N N N T G N S G D N G I N S N T V P S T P S R S N S V L 1 5 1 L P P L T Q Y I P Q A G G I P P S F Q N P A I Q S T M P A G

1 8 l N I G Q Q Q F W K V P Y H E F Q H Q R K G S I D S L Q S D I 2 1 1 S V R T L N P N E Q L S Y N T V Q Q S P I T N K H T N D S G 2 4 1 N A N G S V T G S G S A S G S G G Y W S F I R T S G L L A P 2 7 1 T D D H N G E Q T R R S S S I P S L L R N T S N S L L L G G 3 0 l Q P Q L P P P Q Q Q S Q P Q A H Q Q K L Q Q G Q N L Y S Y S 3 3 l Q F S Q Q Q P Y N P S I S S F G Q F A A N G F H S R Q G S V 3 6 1 A S E A M S P S A P A M F T S T S T N P V N V A Q Q T Q R P 3 9 1 Q G Q Q V P Q F S S E L D G N K R R Q S A P V S V T L S T D

4 2 1 R L N G N E N N N G E I N N N N G S N N S G S S K D T S Q H 4 5 1 S O E S V T T P A A ~ E I S S P G S T P Q R S T K K R R K S

THE SEQUENCE OF A 12KB FRAGMENT 4 8 1 Y V S K K T K P K R D S S I S I T S K D S A H P H T T S S T 5 1 1 1 A Y G Q I S D V D L I D T Y Y E F I H V G F P I I P L N K 5 4 1 T T L T S D L L L V N T Q P I S N I H E V N S Y V I L W F R

5 7 1 N S L E L L V R V A L K Q K P G G K F F D N I V G V A L S P 6 0 1 S N D N N K A G F T T A T A R D D A E K T R R D S H N E V Q 6 3 1 D T L E V Q S V F I A A L N E C F Q K I V D I H P K F R E N 6 6 1 N D Q I S P K I K V I Y L S T F I L L N Y I L A F V G Y D N 6 9 1 S F V L G M S V T I F N E F K L Y K L L L F P E P D I N D V 7 2 1 K P P V D E E V S T G N G N T K T S E F E I G S E S A G H M 7 5 1 N P S N S P N S M D E N I S H Y S V L F K R L Y V L L S V F

7 8 1 D S L Q S C A F G G P K L L N I S I Q G S T E R F F S N D L 8 1 1 G S K W C L E Q S Q L R L K S V L Q S L K L G E L H S E L T 8 4 1 R N R I S M N G N R K P G F D I T N S S S L L S E Y V E T Q 8 7 1 P L S V A Q L F C K L L I G K H N F I N C L L S L Y D S E A

9 0 1 G V Y S D L T L D L S S K I A D S L C S L I S I I L O V L T 9 3 1 u R L N P T N S I D F N Y R P P N P P A N N P T V Q E G 9 6 1 P S A M G S S P V A G N L S A A P P S E G N P D F Y K K L L

9 9 1 G L K Q D T G T I L S D L C R G I I S P F A I A I L H E V Y

1021 N I T E L V K Q M P T S L I S I M M T A T T T Q N T Q D T K 1 0 5 1 K S Q D L V H K L S N S M N E V V Q I T S V L T H I K P F K

1081 I F E H E L N K P I M S L T G G L S S T T R N D V M W P K S 1 l l l G Q G L R E S S V H K T L L D E R R T S G T Q P T T A P V A 1 1 4 1 A E E P R L E N V A L E N F V S I G W K L L D D S E L G W Y

E.

-52

I M R V Y Q F C R P F O L F T C F L C Y L L V F V K A N K E K 3 1 I S Q K N Y Q V C A G H Y S X E D W K G K I D P F I S F N L 6 1 K K I S G L S D E S D P G L V V A I Y D F Q D F E H L G V Q 9 1 L P D E E M Y Y I C D D Y A I D I G I C E E E N R D E F I V 1 2 l Q D V V Y D P Y T S T N R S L A N P I H T F S Q N E V G L H 1 5 1 D T R Y P I K E T G F Y C V T A F R S S T S T K F N A V V N

1 8 l F R N A Y G Q L A G T E I N K L P L Y G L L A V A Y V V A H 2 1 1 A L Y S F A F W K H K H E L L P L Q K Y L L A F F V F L T A 2 4 1 E T I F V W A Y Y D L K N E K G D T A G I K V Y H V F L S I

2 7 1 L T A G K V T F S F F L L L I I A L G Y G I V Y P K L N K T 3 0 1 L M R R C Q M Y G A L T Y A I C I G F L I Q S Y L T D M E A 3 3 1 P S P L I L I T L I P M A L A L I I F Y Y M 1 1 i 7 S M T K T

3 6 1 V I Y L K E Q R Q I V K L N H Y K K L L Y I I Y A S F L S V 3 9 1 L A G S I V S S F I Y V G M N T I D H I Z K N W R S R F F V 4 2 1 T D F W P T L V Y P I V F V T I A F L W R P T D T S Y M L A 4 5 1 A S Q Q L P T D P E N V A D F D L G D L Q S F D D Q D D A S 4 8 1 1 1 T G E R G I D E D D L N L N F T D D E E G H D N V N N H

S I I S Q A T G Q C L P L Q Q N N A T S S R F D

Figure 3. Putative proteins encoded by (A) YKL248; (B) YKL249; (C) YKLZSO; (D) YKLZSI; (E) YKLZSZ. The transmembrane helices predicted by the method of Klein et uf.(1985) are underlined. The zincbinding finger structure is underlined twice in YKL.251.

983

984

B. PURNELLE ETAL.

Table 1. Best FASTA scores for homology with proteins in the database MIPSX, release 30-2 Number of amino acids

FASTA opt score*

YKL248 YKL249 YKL2.50 YKL25I

499 130 188 1 170

1249

YKL2.52

53 1

116

ORF

Best homology with UDP-glucose pyrophosphorylase (potato) No significant homology No significanthomology Regulatory protein QUTA (Aspergillus nidulans) Hypothetical protein 1 (mitochondrian Leishmania)

163

*The FASTA opt score takes into account conservativereplacements and gaps (Pearson and Lipman, 1988). YKL248 UDP-GP

YKL248 UDP-GP

YKL248 UDP-GP

YKL248 UDP-GP

20 30 40 50 60 70 NSVAASQMRNALNKLADSSKLDDAARAKFENELDSFFTLF~YLVEKSSRTTLE~KIKS

. .... . . . . . . . . . . ... . ... . ... . ... . . . . . . . . . .. . .

~TATTLSPADAEKLNNLKSAVAGLNQISENEKSGFI~VGRYLSGEAQH--IDWSKIQT 10 20 30 40 50 80 90 100 110 120 130 PNPDEWKYEIISQQPENVSN----LSKLAVLKLNGGLGTSMGCVGPKSVIEVREGNTFL

. . .. .. . . . . . . . . . . . .. .. . . . . . . . . . . . . . . . . . . . .. .. . ... . . . ......... .

PTDEVWPYDKLAPLSEDPAETKKLLDKLWLKLN~LG~M~TGPKSVIEVRNGLTFL 60 70 80 90 100 110 140 150 160 170 180 190 DLSVRQIEYLNRQYDSDVPLLLMNSFNTDKDTEHLIKKYS~RIRIRSFNQS~PRVYKD

........ ............. . . . . . . . . . . ......................................................

DLIVKQIEALNAKFGCSVPLLLMNSFNTHDDTLKIVEKY~SNIDIHTFNQSQYPRL~E 120 130 140 150 160 170 200 210 220 230 240 250 SLLPVPTEYDSPLDAWYPPGHGDLFESLHVSGELDALIAQGREILFVSNGDNLGATVDLK

. . ... . ... .. ..... . ..... . ... .. . ... .. ..... . ... .. . ... .. ..... . ..... . ... .. ..... . ....

DFAPLPCKGNSGKDGWYPPGHGDVFPSLMNSGKLDALLAKSDNLGAIVDLK

180 YKL248 UDP-GP

YKL248 UDP-GP

200

190

21 0

220

230

260 270 280 290 300 31 0 ILNHMIETGAEYIMELTDKTRADVKGGTLISYDGQVRLLEVAQVPKEHIDEFKNlRKFTN

.. ..... .. .. ..... .. ..... .. ....... .. ..... .. ....... .. ..... .. ....... .. ..... .. ....... .. ..... .. ... .

ILNHLILNKNEYCMEVTPKTLADVKGGTLISYEGKVQLLE 240 250 260 270 280 290 320 330 340 350 360 370 FNTNNLWINLKAVKRLIESSNLEMEIIPNQKTITRDGHEI~LQLETACGAAI~FDGAH

...................... .............

. . . ....... . .. .. ........ . .. ....

:.::::::.:.x

FNT"LWVNLSAIKRLVEADALKMElIPNPKEV----DGVKVLQLETAAG~IKFFD~I 300 310 320 330 340 350

380

390

400

41 0

420

430

YKL248

GVWPRSRFLPVKTCSDLLLVKSDLFRL-EHGSLKLDPSRFGP-NPLIKLGSHFKKVSGF

UDP-GP

GANVPRSRFLPVKATSDLLLVQSDLYTLTDEGYVIRNPARF

................................................... ..................... ............. 360

370

380

390

400

41 0

YKL248

440 450 460 470 480 490 NARIPHIPKIVELDHLTITGNVFLGKDVTLRGTVIIVCSDGHKIDIPNGSILE~TGN

UDP-GP

LGRFKSIPSIIDLDSLKVTGDVWFGSGVTLKGKVTVMKSGKDINGP

. . . . . . . . . . . . . . . . . . . . . . . ................................................... 420

YKL248

LQI

UDP-GP

ED1

430

440

450

460

470

..

Figure 4. Homology between the predicted product of YKL248 and the UDP-glucose pyrophosphorylase (UDP-GP) from potato. ':' indicates identical amino acids and '.' conservativesubstitutions.

databases reveals homology with several regulatory proteins. The best FASTA score (163) is obtained

with the QUTA gene's product, a positively acting

regulator from Aspergillus nidulans (Beri et ul., 1987) involved in quinic acid utilization. The

product of YKL2.51 shows 20% identity with

985

THE SEQUENCE OF A I2KB FRAGMENT

YKL251 QUTA GAL4 CYPl PDRl PPRl UGA35

YKDEKGV---

RNGDR SLSRP K N M KTGVA I YSDE K L EVP TSRTK

Figure 5. Alignment of the predicted YKL251 protein with some yeast regulatory proteins in the N-terminal region respects the consensus for the cysteine-rich region characteristic of the zinc-finger motif.

QUTA on an overlap of 250 amino acids in the N-terminal end. This region contains a zinc finger consensus with a number of other conserved residues in the loops region between the cysteines. This consensus pattern [GAS]-C-X,-C-[RKHIX,-[RKI-X-[RKI-C-X,,-C-X2-C-X,,-C (Figure 5) is characteristic of fungal transcription activators such as C Y P l (Creusot et al., 1988), GAL4 (Laughon and Gesteland, 1984), PDRl (Balzi et al., 1987), PPRl (Kammerer et al., 1984), ARGRII (Messenguy et al., 1986) and UGA3.5 (Coornaert et al., 1991). Analysis of the 5' non-coding nucleotide sequence reveals a putative TATA box at position 4348 (Figure l), a perfect consensus GTGACGT for the regulatory DNA-binding proteins yATF (Lin and Green, 1989) in position 3967 as well as a consensus for ABFl (Dorsman et al., 1990; Della Seta et al., 1990) in position 3981 and for ARGRII (Crabeel et al., 1985) in position 4026. YKL2.52 encodes a polypeptide of 531 amino acids which has no significant homology with proteins of the databases. This protein is classified as integral with seven transmembrane spans predicted according to Klein et al. (1985), eight membrane associated helices according to Eisenberg et al. (1984) and only six transmembrane helices according to Rao and Argos (1986) (Figure 3E). The 5'-flanking region contains a putative TATA box and, at position 837 to 849 (Figure l), a ATCACTTGATACG sequence, consensus for the ABFl factor binding site (Dorsman ef al., 1990; Della Seta et al., 1990). This might suggest that the gene is controlled by multiple transcriptional regulators. The 3' flanking region bears a putative polyadenylation site (AATAAA) at position 3206 (Figure 1).

ACKNOWLEDGEMENTS We gratefully acknowledge Dr Martina Haasemann (MIPS, Martinsried, Germany) for help in sequence analysis. This work was supported by the European Communities Commission programme BRIDGE and by matching funds from the Region Wallonne, Belgium. REFERENCES Balzi, E., Chen, W., Ulaszewski, S., Capieaux, E. and Goffeau, A. (1987). The Multidrug Resistance Gene PDRl from Saccharomyces cerevisiae. J. Biol. Chem. 262,16871-16879. Beri, R. K., Whittington, H., Roberts, C. F. and Hawkins, A. R. (1987). Isolation and characterization of the positively acting regulatory gene QUTA from Aspergillus nidulans. Nucl. Acids Res. 15,799 1-8002. Coornaert, D., Vissers, S. and Andre, B. (1991). The pleiotropic UGA35 ( D U R L ) regulatory gene of Saccharomyces cerevisiae: cloning, sequence and identity with the D A U I gene. Gene97, 163-171. Crabeel, M., Huygen, R., Verschueren, K., Messenguy, F., Tinel, K., Cunin, R. and Glansdorff, N. (1985). General amino acid control and specific arginine repression in Saccharomyces cerevisiae: physical study o f the bifunctional regulatory protein of the ARC3 gene. Mol. Cell. Biol. 5,3139-3148. Creusot, F., Verdikre, J., Gaisne, M. and Slonimski, P. P. (1988). CYPl (HAPI) regulator of oxygen-dependent gene expression in yeast. I. Overall organization of the protein sequence displays several novel structural domains. J. Mol. Biol. 204,263-276. Della Seta, F., Ciafre, S. A., Marck, C., Santoro, B., Presutti, C., Sentenac, A. and Bozzoni, I. (1990). The ABFl factor is the transcriptional activator of the L2 ribosomal protein genes in Saccharomyces cerevisiae. Mol. Cell. Biol. 10,2436-2441.

986 Dorsman, J. C., van Heeswijck, W. C . and Grivell, L. A. (1990). Yeast general transcription factor GFl: sequence requirements for binding to DNA and evolutionary conservation. Nucl. Acids Res. 18, 2769-2776. Eisenberg, D., Schwarz, E., Komaromy, M. and Wall, R. (1984). Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179, 125-142. Fickett, J. W. (1982). Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10, 5303-53 18. Heery, D. M., Gannon, F. andPowell, R. (1990).A simple method for subcloning DNA fragments from gel slices. Trends Genet. 6, 173. Kammerer, B., Guyonvarch, A. and Hubert, J. C. (1984). Yeast regulatory gene PPRI. I. Nucleotide sequence, restriction map and codon usage. J. Mol. Biol. 180, 239-250. Katsube, T., Kazuta, Y., Mori, H., Nakano, K., Tanizawa, K. and Fukui, T. (1990). UDP-Glucose Pyrophosphorylase from Potato Tuber: cDNA cloning and Sequencing. J. Biochem. 108,321-326. Klein, P., Kanehisa, M. and DeLisi, C. (1985). The detection and classification of membrane-spanning proteins. Biochem. Biophys. Acta 815,468-476. Laughon, A. and Gesteland, R. F. (1984). Primary structure of the Saccharomyces cerevisiae GAL4 gene. Mol. Cell. Biol. 4,260-267. Lin, Y. S. and Green, M. R. (1989). Identification and purification of a Saccharomyces cerevisiae protein with the DNA binding specificity of mammalian activating

B. PURNELLE ET AL.

transcription factor. Proc. Natl. Acad. Sci. USA 86, 109-1 13. Mandel, M. and Higa, A. (1970). Calcium dependent bacteriophage DNA infection. J. Mol. Biol. 53, 154. Marck, C. (1988). ‘DNA Strider’: a ‘C’ program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computer. Nucl. Acids. Res. 16,1829-1836. Messenguy, F., Dubois, E. and Descamps, F. (1986). Nucleotide sequence of the ARGRII regulatory gene and amino acid sequence homologies between ARGRII, PPRI and GAL4 regulatory proteins. Eur. J. Biochem. 157,77-8 1. Pearson, W. R. and Lipman, D. J., (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85,24442448. Rao, J. K. M. and Argos, P. (1986). A conformational preference parameter to predict helices in integral membrane proteins. Biochem. Biophys. Acta 869, 197-2 14. Sambrook, J., Fritsch, E. F. and Maniatis, T. (Eds) (1989). Molecular Cloning: A Laboratory Manual. Second edition. Cold Spring Harbor Laboratories, Cold Spring Harbor, New York. Sanger, F., Nicklen, S. and Coulson, A. R. (1977). DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74,5463-5467. U.S. Biochemical. (1990). Formamide gels (40%) for sequencing DNA. Comments 17,31 . Zaret, K. S. and Sherman, F. (1982). DNA sequence required for efficient transcription termination in yeast. Cell 28,563-573.

The sequence of a 12 kb fragment on the left arm of yeast chromosome XI reveals five new open reading frames, including a zinc finger protein and a homolog of the UDP-glucose pyrophosphorylase from potato.

We report the sequence of a 12,399 bp DNA fragment from the left arm of Saccharomyces cerevisiae chromosome XI. This fragment is located between the g...
686KB Sizes 0 Downloads 0 Views

Recommend Documents