Nucleic Acids Research, Vol. 18, No. 11 3387

A sequence pattern that occurs at the transcription initiation region of yeast RNA polymerase 11 promoters Emmanuel Maicas and James D.Friesen Department of Genetics, The Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8 and Department of Medical Genetics, University of Toronto, Toronto, Canada Received October 6, 1989; Revised and Accepted January 29, 1990

ABSTRACT Saccharomyces cerevisiae mRNA 5'-ends map at a variable distance from the TATA element. The mechanism for the choice of the transcription Initiation Region (IR) over other neighbouring sequences is not clearly understood. Sequences on the coding strand flanking the IR of 95 yeast RNA polymerase 11 promoters have been compared. They indicate the following pattern: statistically, a preponderance of T residues beginning as far as 30 nucleotides upstream and ending approximately 10 nucleotides upstream of the IR, and a preponderance of A residues from approximately 8 nucleotides upstream of the transcription initiation-site onward. The switch in base composition noted above thus occurs over a short region that is centered typically -9 nucleotides with respect to the major transcription start-site. We call this overall sequence pattern the locator. It is more evident among strong promoters than weak ones, suggesting a role in transcription initiation. The promoter of the TCM1 gene (coding for ribosomal protein L3) has a typical locator in the region of its IR. In an attempt to confirm the role of this sequence motif in defining the IR, deletions were introduced between the TATA element and the IR of the TCM1 gene. In most deletions, the new transcription start-sites are found within a recognizable locator, supporting the suggestion that this sequence pattern is important in defining the IR. These data appear to indicate that in yeast the IR is defined by a pattern of base composition situated at a suitable distance from the TATA element. INTRODUCTION In the yeast Saccharomyces cerevisiae, the transcription of most genes is initiated at several positions clustered in a region called here the Initiation Region (IR). Both the position of the TATA element and sequences within the IR itself have been proposed as important factors in the choice of transcription start-sites by RNA polymerase H (1, 2, 3, 4, 5, 6). Transcription start-site selection, however, cannot be explained completely by the sequence and position of these two elements. Spacing between the TATA element and the IR is highly variable in yeast, ranging from 30 to 120 base-pairs (bp). Moreover, the IR sequence is

short and the homology (pyrimidine-A) among different IR sequences is not striking (1, 2, 3, 4, 5, 6). Additional properties of yeast promoters must therefore contribute to the choice of a given IR over other comparable sequences positioned at an appropriate distance from the TATA element. The yeast TCMJ gene codes for ribosomal protein (r-protein) L3, a moderately abundant basic protein of the large ribosomal subunit. Like other r-proteins, L3 is synthesized under all growth conditions, albeit at a higher rate on good carbon sources (7, 8). Transcription of the TCMJ gene is under the control of a single upstream activation sequence recognized by a multi-functional transcription factor (9, 10). In this study we describe a DNA sequence pattern, which we call the locator, spanning the IR of most yeast promoters. Using deletions within the TCMJ promoter we show that this pattern is important in defining the IR position relative to the TATA element.

MATERIALS AND METHODS Yeast strains and plasmids S. cerevisiae strain SC252 (ho, MATa, adel, leu2-3, 2-112, ura3-52) obtained from James Hopper, was used as a recipient for all the experiments in this study. Strain YF585 was constructed by replacing the chromosomal TCMJ promoter and activating sequences of strain SC252 by the GALIO promoter and activating sequences. This strain is therefore dependent on galactose for growth and displays a GALIO-like pattern of transcription start-sites at the TCMI gene. All mutant alleles of the TCMJ gene used here were inserted as a BamHI-AvaI 3.3 Kb fragment (13) in the BAMHI site of the yeast multi-copy shuttle vector YEpl3 (LEU2, 2-,m) described by Broach (11). These plasmids are called pEM39 followed by the name of the TCM1 allele. The unmutated allele is called wild type or w.t. in the text. Growth media were prepared as described by Sherman et al. (12). Construction of the deletions Deletions were introduced using nucleases Bal3I or S], starting from the XbaI site in the normal locator (11 bp upstream of the normal transcription start-site). Some deletions have also been

3388 Nucleic Acids Research, Vol. 18, No. 11 -30

-20

-10

0

GALl GALI0 GAL7 MEL1 SUC2 1.9Kb SUC2 1.8Kb HXEl HXK2 TPI1 G3PD-A G3PD-B G3PD-C G3PD-D PGK ENOI-A ENOI-B PYK FUl1 ADHI ADMII

TTCGGTTTGT ATTACTTCTT ATTCAAATGT A TATGATTATT AAACTTCTTT GCGTCCATCC ATTATGCAGA GCATCAACAT GATAAAAAAA A GATAATTTCT TACTTGATTC TAGGAGAGCA A TATATCTTAC TTTTTTTTTT CTCTCAGAGA a CTTTTCCTTT TGGCTGGTTT TGCAGCCAAA A AATACTCAAT TAGAATTCTT TTCTTTTAAT C TATAAAAAAG AGTAATCCTA CCCCATATCT A TTCTTTTCTT GCTTAAATCT ATAACTACAA A TTATAGTTAG TCTTTTTTTT TAGTTTTAAA A ATTTTATTAT TGTTTGTTTA AATGTTAAAA A TACAGTTCTA TTCTCTCTCT TGCATAAATA A GTTAGTCTTT TTTTTAGTTT TAAAACACCA A CTTTCTTTTT CTCTTTTTTA CAZACAC&CA TCCTTATTCC TTCTAGCTAT TTTTCATAAA A TTTTATTTTT CTTTTCTTAG TTTCTTTCAT A TGTTTATTAT TCTCTCTTGT TTCTATTTAC A TGTAGACGTT AAACAAGCAT CCAACAGGAG Ca TCCTTGTTTC TTTTTCTGCA CAATATTTCA A TCTTGTTTCT TCTTGGTAAA TAGAATATCA A

L2 L3 TCM1 L17A L25 L29 CYH2 L32 L44 L44' L45 L46 S24 S33 rp29 rp39A rp5lA rp59 CRY1

CTTTTCATCA TCATTTCGGT TTTGATTCTT ATATGCTTTT CTTCTTAAGT ACTTACTCCT TCTCATTGGA TGCTAAAACT

EFl-10 A EFl-U B mesl serS SEC18

Actin

PTubulin SPs1 SPS4

SSU1

+10

ATAAAATAZ MCB 4: 1440 (1984) AAAAAAAAGT Thi study. CAGTTGAATA MCB 6: 246: 256 (1986) CGGTAATAAA N8R 13: 7257-7268 (1985) ACAAGCAAAA MM 3: 439-447 (1983) TATfCTGLATC 91B 3: 439-447 (1983) AAACTCACCC Ge 39: 95-102 (1985) AAAAAAATTC Gw. 36: 105-111 (1985) AACACATAC MG 1: 419-434 (1982) CACCAAGAAC JB 258: 5291-5299 (1983) AACCAAGAAC 5B 258: 5291-5299 (1983) GAAATTCATC B 258: 5291-5299 (1983) GAACTTAGTT Gene 32: 263-274 (1983) C&AACTAATTA N5R 10: 2625-2637 (1982) AACCAAGCAA B 258: 5291-5299 (1983) ACACCAAGCA B 258: 5291-5299 (1983) AGACACCAAT B 258: 2193-2201 (1983) CCGGACCGAA .B 262: 12275-12282 (1987) GCTATACCAA Nature 293: 717-722 (1981) GCTACAAAAA (33 6: 1894-1902 (1986)

TTCATATTAT CAGTTGTAAT TTCGTTTTCG GCGCTGTTGC GAGGCATTCA GCATTAATCG TGAAGATTCT TTTTTTACTT TAAGGAGTAC TTTGTGGAGG GTAATATATA TGTGTTTTCT

CTTTTCTTTG TAATTTAGCA A TATCCCAAGA TTTGTCATCT CTAGAACAAC A CAGTTACTAC GGGTCTTACA AGCAATACAA A ACCAACACAC CTTCTGCTGT TGAAAACGCT A AACAAAGAAG GATTTTTGCA TTAAATCCCA C AACAATCATC TGCTCTTTTA TAATATAACC A AACAGACCGG AATTCATACA CAACAACAGA A CCAATCATAC AGTTTCCAAT GTAATACCAA A LAAGCATTTG ACTTCCAGTT ACAGATCGTT AAGGAATAGTC ACTGAAAATT CGAAAAAGAC A AGCAAATAAA AATTTTTATA AAACAAGACC A ACATACATAT ATTCTTCTCA AAAGTAGAAA A CCAAGCTAGC TATTGCCAAG TTAAGCGAAG A CACCAAGACA TTTCGTGTAT TATAACGTTZ a GCATCAGTTA ACTTAATTCT AAGAAAAGTL A AGATCTCCAG ATTTTAAACC CAAGGAGAGT A AGAAACCACA CTTGTTCATT AGAAAGAAAG C ATAGCAATCT TTGTTTTTAG AATATACGGT C AACGAACTAT AATAAAACTT GAAAAAAAAG C GGAAATTTAC TATATATATA TATATATTAC a ATTCCTGAAC CTCAAAGATA AGAGCAAATA G TACCGTATAT

TCTTGTATTC CACGAAAACT CTTCACTAGT AAAATTTCTT TTTAAAGGAA

TTCCTTCCCC TTAATAGATC ATTTCTGTTA CCTTTGTTAT GTTGTGGCAT

TCTTGATTTA

B 263: 6188-6192 (1988) This study. NM5. 12: 6685 (1984) N8R. 12: 6685 (1984) .B. 262: 5690 (1987) B. 262: 16055 (1987) B 263: 9094-9101 (1988) J 263: 9094-9101 (1988) BZ 263: 9094-9101 (1988) MR. 13: 701 (1985) N85. 12: 701 (1985) CG3 9: 273-277 (1985) JB. 259: 9218 (1984) ((B6: 674-687 (1986) 8N8S 90: 4403 (1983) K32 7: 1764-1775 (1987) Gwe 45: 265-273 (1986) Germ 45: 265-273 (1986) 8RS 90: 2437-2441 (1983) N85 15: 1887-1904 (1987) 9128 8: 4098-4109 (1988)

TTTCTACTCA A ACCAAGAAGA O511 45: 81-93 (1986) TAAGTGAAGT a GTGGTGAGGC Nature 306: 704-707 (1983) TTCAAATAGZC AATAAAGAAA K36: 2443-2451 (1986) TTTTGAGAGC C AACGAACAAG 91B 6: 4478 4485 (1986) TATCACTACA CGAGAGAAGA K12 7: 2947-2955 (1987)

-30

ARG3 ARG4 CAR1

CPO

PETill PET4 94

AATATTCTTT CCTTATACAT TAGGTCCTTZ C TTCTCTTTTT GTTGAGGTTG AAACAAAATC A TTTTATTCTC CGGGTTTTCT TTTTTTGAGL a ACTATTGCAT TTCATAATTA TTACCCGGAC C TATAAAACAA ACGGATAACA ACAGTAGCTT C

TAGCATAAAT M12 5: 3545-3551 (1985) AAGACTTATA MB 7: 3785-3791 (1987) TATCAAAAGC JC 263: 9718-9724 (1988) TTACGAGTTC MM2 7: 2728-2734 (1987) GAGACAGCAA KX; 202: 294-301 (1986)

-10

0

+10

TCAAQTTGTA M

ATTTTTTTTT

TTTTTTTTAG

CCATCTACCC

ATAAGACTAC

CTTTCTTTAG

CTAGGGGAGA

A

TATTCGCAAT

TTGGTCTGTA

TATGTTTTCT

CAAAGTTAGC

A

GAAACAACAA

CPA2 DCD1 GDH1 HEMi

HIS3 HIS4

HOM3 LEUI LEU2 LEU4

LYS2 SAM1 ADEE

TACTTTCTAA

CTTTTCTTAC

CTTTTACATT

CAGTTCTTTC

AAAGGGTATT

TTTTGAAGAA

.

(1985)

271-279

(1984)

160:

1078-1087

(1984)

ATATAAGTAA

CAAGCAGCCA

ATAGTATAAA

CAGCAATATA

A

AAAAATCTGA

TCATCGTTGT TTTTCGGTTA GCTTCGAAAA

A

AAAATAGAGT

CATTTTAGAA

A

GCCACCGTAC

TTTTTTTTTT

AAATTTCAAC

ATCCATGGAA

AATATGTAAT

A ATACAAACAA

GTAATGCACG

CCATCATTTT

AAGAGAGGAC

A GAGAAGCAAG

GCN2 GCNE

LEU3 HAP2

TGAACCTGAA

TATATATACA

TCACATGTCA

CGCTTCGATC

TGTTGACAAA

GAGCCGGTTG

A

ACAGTACTTA

TCTTCTTATA

TAATAGATAT

A

AAAGTAATGA

GCACAACAAT

AGAAGAGTAT

TAAAGTGTTT GTGTTGAAGG TACTTGCAAA GTCCTTCATG TAATGGTAAC CTAAGGCAAA

a-1 a-2 a-1 a-2

PPR1 STE6 STE7 YP2

DAL7 KEXI

PH05 PRB1

RAD2 RAD6

RAD7

lig.

(1985)

46:

J

A

237-245

(1986

K

8478-8494 (1984) 262: 10127-10133 (1987) 259:

4:

2837-2842

(1984)

K 4: 260-267 (1984) TGCTGGTCCT MR 12: 9287-9298 (1984) ATCTTCTATA MM38: 2132-2139 (198) CAAAACAAAA 8N65 81: 6442-6446 (1984) ATGATATATA MM37: 2708-2717 (1987) GGGCAACTCA MM 7: 578-585 (1987)

fTTTGTCAT

0ll

54: 855-864 (1988)

TTCTTTTAAC

CTTCACTTTT

TATGAAATGT

A

ICAACCATAT

27: 25-35

(1981)

GGAGAGATTG

AAATCAGCTT

AGAAGTGGGC

A

WaAAAAAAG

27:

(1981)

CAATACATCT CCTTATATCA AAGAAAATCA

A

GAAGGACAAC

E68) J.

3:

1061-1065

(1984)

A GAGAGTGTAA

EB J.

3:

1061-1065

(1994)

TTCTATTTGC

TGTAATTTAC

CTAAGTTACC

GTAACGTAAA GTGGTTTGCT TTATTCGCAC

A

TCCCGTTGTT

C

ACGTTCACAC

AGAGAGCTTT

GTGAAGCGAC ACTAATGATA AGAAAACTGC AAGAAAAAGA AGAGAAGCCA GAACAAATTC

CDC28

8562-8566

JIC 262: 16704-16709 (1907)

ITATTGCACTG JBC

TTTTATTCTG AAATCGAAAA TGAAAAGCAT CCAGAAGTAT

GAL9 GAL30

19S 82:

AAAAAGGATT JBC 261: 5160-5167 (1986)

ORD

tRNA

3139-3148

29:

A

PUT2

HSF MAT MAT MAT MAT

5:

Owe

TTCTTTTTTT TTTTTTTTTT TTTCTTCAGA A =ZATAACACA 511 49: 805-813 (1987) TGAAGCGCTA TTTAAGTTTA AGCATCGAAA a ACTAACATCT .UC 258: 1446614472 (1983) ATTCAACATC TTTTTTTGGC ACATTAAAGG T GTGAATGCTA KM 6: 1711-1721 (1986) ATAAAATAAA CTGACTTTTC TTATAGCATT G AAACTTGCTT .B 260: 8502-8508 (1985) GATCCATGCG TTGTTAGAAA TACAACGTTT RN8S 84: 9113-9f7 (1987) CATCATACCA GTGATTTCTT CGAAGAATAT ACTAAAAAA CG AGCAGGCAAG 35 12: 553-568 (1981) CTCCGATGTG TGTTGTACAT ACATAAAAAT A TCATAGCACA 32: 89-8 (1982) TATTTCTATT TTAACAGTGC TCTACTCATT Ca AAGGATATTI JNC 263: 2146-2251 (1988) TGTTTCCGCT GAATTTTCTC AAAATATCTA A GCAGTCAACA JB 259: 3714-3719 (1984)

CPAI

POLA-1 CYCi CYC7

-20

A A

GGAGZAACCC .38B190: 83: AAGIGCfCGCI

25-35

239-250

(1984)

2536-2540

(1986)

TCGGATCATA 898S 83: 7371-7375 (1986) GCGATTAAGT Nature 306: 704-707 (1983)

AAATTTAATA CTGCTTTTCC AGAATATCAT A TTGGAAAAAA Nature 307: 183-185 (1984) GTGCAGTTGA TATCACTTAG AGTATGTGTC a 2&GGCACGGT KM 9: 3231-3243 (1989) TCTTATCATT GTCTTCAACT CCCATCTATT S GAAAAAGCTT 511 50: 573-584 (1987) GGTTAGTATG GCTTCATCTC TCATGAGAAT A AGAACAACAA MM33: 570-579 (1983) AAAAGGTTTA TTCTGTTTAC TCAAACTCTT ATCAAAAACA BC 263: 2830-2839 (1988) TATTGGAGTT CTTCCCATAC AAACTTAAGA TCCAATTAGC KM 7: 4390-4399 (1987) AGGTTCTACA CGTCATCCAT GAAGAAAAGC A TTTTCGGGAG .J 166: 914-23 (1986) GACATAAATA

GTAGGAGTGG

AAAATCAACG

GTGTCTTTGT

ATAAGCTATT

ATCTAGAAAT

GTTTATCCCA AAGAAAGTAT ACAGAAGAGC

A AGATCGAATT ACGAGAGGGG A

168-172 1497-1507

82: 6:

(1985) (1986)

TCTAATTCAT DlBC 263: 3171-3176 (1988)

Figure 1. Sequences on the coding strand in the region of the transcription start-sites of 95 yeast RNA polymerase II promoters. The promoter sequences have been aligned at the major transcription start sites; if more than one start of equal strength were present, the sequence was aligned at the most upstream one. The first bases of major transcripts are underlined. For practical reasons, only 41 bp of the 151 bp compiled in Fig. 2 are shown here. For convenience promoters have been somewhat arbitrarily divided into seven groups: (i) sugar metabolism enzymes, (ii) translational machinery, (iii) structural proteins, (iv) mitochondrial proteins, (v) general metabolism, (vi) regulatory proteins, and (vii) all others. The references are for the mapping of the start sites.

introduced by oligonucleotide-directed mutagenesis as described by Kunkel (14). Five classes of mutations were constructed: (a) deletions of the A-rich region (A16, A21, A27, A32, A39); (b)deletions of the T-rich region (A 17, A40, A54); (c) deletions of both A- and T-rich regions (AXS, A56, A61, A67, A71, A75, A100); (d) deletions of the TATA element (AT, A67T); (e) insertion at the XbaI site (AXK). Note that most deletions create new locator-like sequences because of the artificial juxtaposition of T-rich sequences normally found upstream of the transcription start-site and A-rich sequences normally found in the transcribed region. Mutation A100 removes all A- and T-rich regions between the TATA and the initiation codon and is therefore the most drastic of all deletions. Deletions AT, A 17, A40, and A54 preserve the original transcription start-site while changing the flanking sequences.

Mapping the 5'-ends by primer extension Total RNA was extracted from yeast strain YF585 that had been transformed with pEM39 derivatives and grown on galactose as a carbon source. Five itg of RNA were analysed by primer extension as described (15). The synthetic primer was complementary to nucleotides 142 to 161 downstream of the translation initiation codon.

Sequence comparison of yeast promoters Yeast promoters for which a transcription start-site was indicated in the sequence database Genbank (Intelligenetics, Mountainview, CA; release 59, March 1989) represent the majority of the sequences compared. For each sequence, the original publication was consulted for quality of the 5'-end mapping and relative

Nucleic Acids Research, Vol. 18, No. 11 3389

A 95 promoters

A 51 Strong Promoters

.

60

0)

60-

c

c)

0 0 *

* 00

U

c.)

40

C)

0

~~~0

0

0

0

I, e 0.

*

*

0

~~~0 O0 0

u 0 40

*

0.0

20-

-80

-100

0

*. 0

0

20

-60

0

-20

40

20

40

20

40

T 51 Strong Promoters -100

-80

-20

-40

-60

0

40

20

C)

= u

T 95 promoters

0

40 0"

0"~~~

20 -

60 0

c

:

0

o 0-

0

-100

0

C.) 20-

-80

-60

-40

0

-20

Position

0

O-

A 34 weak promoters

*

-

* .

.

20 40C

.

*

0

0

0 00

*

*40

0

20

P0

)o 40 r-

-80

=

-40

-20

20

0

40

e

0 0

0

0

0

0

0

0

0

0

~

0

0

*

0

*

0

20 -

~

20

G 95 promoters 0

C.) u

-60

0. 0 so

0

0

0 * 0

0

0

0

-100

-

-100

-80

-60

-40

-100

-80

-60

-40

.

-

0

2

-20

0

20

40

-20

0

20

40

0

*0 0

0

0

O

9

0

0

0

0

1-

-100

-80

-60

-40

-80

-60

-40

-20

0

20

40

-20

0

20

40

40 T 0)

C.)

:3 0

-100

Position

Position

Figure 2. Frequencies of occurrence of the A, G, C, or T bases at each position on the coding strand in the region of the transcription start-site. The major transcription start-site has been designated as + 1. Black dots indicate actual frequencies at each position; the line is a mathematical 'smoothing' of the frequencies obtained by averaging within a window of three points. On the left-hand side, the four bases were compiled for all promoters shown in Fig. 1. On the right-hand side, A and T bases have been compiled separately for 'strong' promoters ((i) sugar metabolism enzymes, (ii) translational machinery, (iii) structural proteins, (iv) mitochondrial proteins), and 'weak' promoters ((v) general metabolism, (vi) regulatory proteins) as grouped in Fig. 1.

intensity of the several starts. The sequences were aligned at the predominant transcription start-site; when several transcription starts of equal strength were found, the most upstream one was

chosen. Once aligned, the frequency of occurrence of each base pair on the coding strand (or mRNA-like strand) at each position was compiled. Mathematical 'smoothing' of the frequencies was

3390 Nucleic Acids Research, Vol. 18, No. 11

.4

...... t _~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.

,_ ..........~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.......

Figure 3. Primer-extension analysis of transcripts from the wild-type and 18 deletion alleles of the TCMJ promoter. The sequencing reactions used as size standards were obtained using a wild-type DNA template primed with the same oligonucleotide that was used for the primer extensions. The sequence obtained is complementary to the coding strand as shown in Fig. 4. Note that the numbering is different here than in Figs. 1, 2, and 4 because the primer extensions are all measured starting from the oligonucleotide primer, not the start site; + 1 is here the first base of the initiation codon. The transcripts present in the vector lane and all others come from the GALJO-TCMJ promoter fusion at the TCMJ chromosomal locus of strain YF585. These transcription start-sites have been ignored in the interpretation of the start patterns, as shown in Fig. 4.

obtained through averaging of the values within a window of three points.

RESULTS Sequences in the region of the IR of 95 yeast RNA polymerase II promoters Figure 1 shows sequences of the coding strand flanking the transcription start-sites of 95 yeast RNA polymerase II promoters. Figure 2 shows the frequency of occurrence of the four bases on the coding strand of these promoters. Statistically, T residues

are over-represented between positions -30 and -10 and underrepresented between position -8 and + 15; A residues follow the opposite tendency. The strand-switch in base composition occurs over a remarkably short region centered approximately -9 nucleotides with respect to the major transcription start-site. Put another way, transcription tends to start approximately 10 nucleotides downstream from the end of a T-rich region and within an A-rich region. This overall sequence pattern has been called the locator. It should be emphasized that the locator is not a fixed sequence but rather a trend in base composition. Figure 2 shows that the locator as described above is more

_ 0MAOT~ ~ ~ ~ ~ ~

Nucleic Acids Research, Vol. 18, No. 11 3391

I

W. T.

-50

.60

-70

.80

-90

I

-30

-40

-10

-20

J

I

+20

+10

0

-.

_..

+30 1l

1

I

_ ._

..

.

+40

_

TVCTOOAAGCFFAATAcXJTFFATFrGITOTCFIVI~CATVCIClugFFrViTrArCArATTCArIVGnFGATCIFAJCMCrCAAAATCACAG CYiAACFrAaOA

AT

TtRlWAtW^~~~~~~~~~~~~~~~~~~~~~~~A

AXK

l a YITA

CFIVCFGOA TAT AcX3ATFFnTAQI7E1FVnTcnCAATCC!C

AXS

v

ar

aTFA1TraTsiTVGTT7CTCAC.CGTCrCAATATATTr AA

F

ACAAG

A~~~~~~~~~~~~~~~~~~~~~~~~~L amaKACT__

A16 A21

A27 A32 A39 A17

ICCOATATT cnr|_

_

_

_

_

_

_

_

~

,

ip~~~~a~~ia lllullacT ~TATAAAA ~ ~ ~

aC

_v

coi

rAAA

A40

A54

AAAATrFFrCAi I IUII.I ~ ~ ~

~

~

~

~

~

~

~

~~niim I20

A56 A61

cToAAAAAArnT& I iuI... I i*,iu.i^^A =T!4_C GAAAAAAA11TITCA~~~~~~~~~~~~~~~~ aui0 rTCATCTrCA iAAD.COAuACACTCC.TC

A67

CATGAAAAAAATITFIVAi i ACaMAT,auTrMCCAa ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

v~~~T v v T

A67T

auiITMM&MCTaff§~-~

ACA CCATGAAAAAAATTFITCAi AAAAATFFICIDI ii.ni*.iiti~OACCTCW>AcO1AfA. ffOGITITCCAiAAAG.CO

A71 A75

_________

VT__VYCAWyy_T_TC_AO V

AIOO Figure 4. Sequence and transcription start-sites for the wild-type and 18 deletion alleles of the TCMJ promoter. The -91 TATA element, the XbaI site, and the initiation codon are boxed when present. The first nucleotide of transcripts is indicated by an inverted black triangle, the height of it being approximately proportional to the abundance of the corresponding transcript. The sequences have been aligned and numbered as in Fig. 1.

evident among strong promoters than weak ones, suggesting role in transcription initiation.

a

Deletions in the region of the TCMJ IR The TCMJ promoter was used in tis study in an attempt to assess experimentally the role of the locator: this promoter has T- and A-rich regions located typically with respect to the single major transcription start-site. It also has an XbaI site in the locator, which was used as a convenient location for the introduction of deletions. Eighteen deletion alleles were constructed. Their sequences are shown in Fig. 4. The strength of the TCMJ promoter is relatively unchanged by deletions; promoter strength as quantitated by RNA-blot hybridization varies between 15% (A21, A15) and 91% (A61) of wild type (data not shown). These differences in efficiency are not always evident upon primer-

extension analysis, see Fig. 3, since differences in plasmid copynumber between transformed strains are significant and must be corrected for (data not shown).

Mapping the TCMI mRNA 5'-ends by primer extension Figure 3 shows a typical autoradiogram of primer-extension analysis on the wild-type and 18 deletion alleles of the TCMJ promoter. The pattern of transcription start-sites present at approximately -80 nucleotides in the vector (control) lane (strain YF585 transformed with the plasmid vector YEp13 alone) and in all the other lanes is derived from the GALJO promoter-driven transcription of the chromosomal TCMJ gene. These transcripts have not been included in our analysis. Figure 4 is a schematic representation of the data presented in Fig. 3. Many of the primer-extension products are accompanied by a second band that is one nucleotide longer. Similar pairs of bands

3392 Nucleic Acids Research, Vol. 18, No. 11 were obtained by Shelness & Williams (16) and were attributed to a second enzymatic activity of reverse transcriptase, namely a template-independent nucleotide transferase activity. Where double bands were present, the 5'-end shown in Fig. 2 corresponds to the position of the faster (lower) band. Some 15 different major 5'-ends were mapped in this study. A preference for an adenine residue as the first nucleotide of the transcript, preceded by a pyrimidine in the DNA sequence can be noted, in an otherwise A-rich environment. The same consensus has been noted among CYCI starts (4). It is homologous to its eukaryotic (CAnyy) and bacterial (YR) analogues (12, 17). We do not understand why this consensus is not evident in our promoter compilation, see Figs. 1 and 2. It could be limited to a minority of promoters. Alternatively, the use of nuclease SI to map many of these 5'-ends might have resulted in the 'nibbling' of a typically A+T-rich duplex. The facility with which sequences other than the normal TCMI IR can serve as start-sites (see A16, A21, A27, A32, A39, A54, A56, A61, A67, A71, A75, A 100; Figs. 3 and 4) confirms the lack of more stringent sequence requirements for initiation.

TCM1 IRs and the locator Because the locator is not a fixed sequence but rather a trend in base composition, removal of a portion of a locator through deletion or mutagenesis and interpretation of the resulting pattern of transcription start-sites is not entirely straight-forward, especially in a typical A+T-rich promoter. In the TCMJ promoter, several deletions in the vicinity of the XbaI site artificially bring together A-rich and T-rich regions to create new locator-like sequences. The results shown in Figs. 3 and 4 confirm that these new sequences can indeed act as locators, since the new IRs are most often found within recognizable locator sequences. Most IRs within the deleted promoters are found at the appropriate site in a recognizable locator but at various distances from a fixed upstream sequence. This suggests that the locator, not the distance from an upstream TATA element alone, defines the IR. We note that some transcription start-sites tend to be at a greater distance downstream from the boundary of the T-rich region of the locator when the distance between the TATA consensus and the locator is decreased (A40, A54, A56, A61, A67, A71, A75). This suggests that a minimum distance is required between these two sequences. Complete deletion of both the T- and A-rich sequences between the TATA element and the initiation codon (A100) results in a broad IR whose major transcription start-sites are within a short locator-like sequence normally found within the coding region (Figs. 3 and 4).

DISCUSSION This study describes a sequence pattern that occurs at the transcriptional start-sites of most yeast RNA polymerase II promoters. Statistically, this pattern consists of the following: (1) a T-rich region on the coding DNA strand that can begin approximately 30 nucleotides upstream of the initiation region (IR) and extends typically to within approximately 10 nucleotides of it; (2) an A-rich region extending from 8 nucleotides upstream of the IR and onward; (3) a sharp switch in base composition centered typically 9 nucleotides upstream of the major tanscription start-site. The role of this pattern in the precise location of transcription initiation is inferred from its position with respect

the IR of many promoters, especially strong ones, and is supported by a study of deletions within the TCMJ promoter. We call this sequence pattern the locator. The mechanism of transcription initiation in yeast can be seen as involving at least three blocks of DNA sequence: the enhancer (or UAS), the TATA element, and the Initiation Region (IR). Several lines of evidence have suggested that events occurring at the first two regions are comparable between yeast and mammalian promoters (18, 19). In yeast involvement of the third region is different from mammalian cells; in the latter, the IR is defined principally by its distance from the TATA element (when present), whereas yeast shows a greater variability in spacing between these two regions. In this study we mutated the TATAAA sequence located 91 nucleotides upstream of the normal transcription start-site in order to determine its role in the positioning and intensity of start-sites in the TCMJ promoter (see AT, A67T). Our findings suggest that this sequence is important but not essential for transcription initiation at this promoter. Yeast TATA elements have in general been shown important for transcription initiation (19). However, several reports have suggested that functional TATA elements might not be as easily recognizable in yeast as was previously thought (20, 21, 22). The relatively mild phenotype of our AT mutation is therefore probably due to the presence of alternative, less canonical, TATA elements that can partially replace the -91 sequence. We will assume in this discussion that the -91 TATAAA is the main TATA element in the TCMJ promoter. The present study, along with many others (1, 2, 3, 4, 5, 6) shows that the role of yeast TATA elements is to define a minimum and a maximum distance (40 to 120 bp) within which a separate element sets the position of the IR more accurately. Several sequences have been proposed as candidates for such an element (1, 2, 3, 4, 5, 6). All of them have been sequences within the IR itself. However, as the number of mapped mRNA 5' ends increases, it appears that sequence homology between IRs must be reduced to an adenine residue, or at best the di-nucleotide YA, in order to accommodate most genes (4; see Figs. 1, 2 and 4). This homology, which resembles its analogue in mammalians and prokaryotes (12, 17), seems too short to explain the choice of a transcription start-site among all other A residues within an acceptable distance from the TATA element.

to

The locator described here is a different kind of sequence; it is a pattern found at a relatively fixed location with respect to the IR of most strong yeast promoters and seems to play an important role in the positioning of the TCMJ IR, in addition to both the TATA element and to a minor extent the IR sequence itself. We have been unable to discern a homologue in other eukaryotes or in E. coli (12, 17). We suggest several mechanisms to account for the characteristics of the locator. First it could act as a recognition site for a DNA-binding factor(s) that is important for successful transcription initiation by RNA polymerase II. Depending on the succession of steps in the formation of the open transcription complex, the locator could be recognized either as a doublestranded or single-stranded structure. If binding by the putative factor occurs prior to local DNA unwinding, it might recognize the base-composition switch from T- to A-preponderance on either or both DNA strands of the locator. If binding by the putative factor occurs following DNA unwinding, the differing base composition of the two DNA strands could provide the basis for its specific interaction with one of the two strands. not be Altematively, the locator its effect by the transcription machinery but rather act

might

recognized specifically through

Nucleic Acids Research, Vol. 18, No. 11 3393 on DNA or chromatin structure. It is important to note that the A +T content of the locator region is not higher than that of the surrounding promoter sequences. Its role is therefore not merely as an easily meltable region, but must depend on additional characteristics of the sequence itself. Finally, pyrimidine-rich (and T-rich) stretches have been noted immediately upstream of the IR in several yeast promoters (23). When recognizable, these pyrimidine-rich blocks represent the T-rich half of a recognizable locator. The CYCI promoter contains such a pyrimidine-rich sequence between -53 and -17 from the most upstream major start (see Fig. 1). McNeil (24) showed that this sequence acts as a transcription terminator, aborting transcripts that contain it. Although the relationship between pyrimidine-rich and T-rich sequences is unclear, this finding raises the intriguing possibility that the T-rich region could be an RNA instead of a DNA element, possibly with a role in termination or processing of transcripts starting upstream of it.

ACKNOWLEDGEMENTS We thank our colleagues, especially Jacques Archambault, for discussions. This work was supported by the National Science and Engineering Research Council. Emmanuel Maicas was supported by an Ontario Graduate Scholarship.

REFERENCES 1. Chen,W. and Struhl,K. (1985) EMBO J. 4, 3273-3280. 2. Hahn,S., Hoar,E.T. and Guarente,L. (1985) Proc. Natl. Acad. Sci. USA 82, 8562-8566. 3. Healy,A.M., Helser,T.L. and Zitomer,R.S. (1987) Mol. Cell. Biol. 7, 3785-3791. 4. McNeil,J.B. and Smith,M. (1986) J. MoI. Biol. 187, 363-378. 5. Nagawa,F. and Fink,G.R. (1985) Proc. Natl. Acad. Sci. USA 82, 8557-8561. 6. Rudolph,H. and Hinnen,A. (1987) Proc. Natl. Acad. Sci. USA 84, 11340-11344. 7. Donovan,D.M. and Pearson,N.J. (1986) Mol. Cell. Biol. 6, 2429-2435. 8. Kief,D.R. and Warner,J.R. (1981) Mol. Cell. Biol. 1, 1007-1015. 9. Dorsman,J.C., Doorenbosch,M.M., Maurer,C.T.C., deWinde,J.H., Mager,W.H., Planta,R.J. and Grivell,L.A. (1989) Nucl. Acids Res. 17, 4917-4923. 10. Harnil,K.G., Nam,H.G. and Fried,H.M. (1988) Mol. Cell. Biol. 8, 4328-4341. 11. Broach,J.R. (1983) Methods Enzymol. 101C, 307-325. 12. Bucher,P. and Trifonov,E.N. (1986) Nucl. Acids Res. 14, 10009-10026. 13. Schultz,L.D. and Friesen,J.D. (1983) J. Bacteriol. 155, 8-14. 14. Kunkel,T.A. (1985) Proc. Natl. Acad. Sci. USA 82, 488-492. 15. Fouser,L.A. and Friesen,J.D. (1986) Cell 45, 81-93. 16. Shelness,G.S. and Williams,D.L. (1985) J. Biol. Chem. 260, 8637-8646. 17. Hawley,D.K. and McClure,W.R. (1983) Nucd. Acids Res. 11, 2237 -2255. 18. Johnson,P.F. and McKnight,S.L. (1989) Annu. Rev. Biochem. 58, 799-839. 19. Struhl,K. (1989) Annu. Rev. Biochem. 58, 1051-1057. 20. Ogden,J.E., Stanway,C., Kim,S., Mellor,J., Kingsman,A.J. and Kingsman,S.M. (1986) Mol. Cell. Biol. 6, 4335 -4343. 21. Chen,W. and Struhl,K. (1989) EMBO J. 8, 261-268. 22. Hahn,S., Buratowski,S., Sharp,P.A. and Guarente,L. (1989) Proc. Natl. Acad. Sci. USA 86, 5718-5722. 23. Dobson,M.J., Tuite,M.F., Roberts,N.A., Kingsman,A.J., Perkins,R.E., Conroy,S.C., Dunbar,B. and Fothergill,L.A. (1982) Nucl. Acids Res. 10, 2625-2637. 24. McNeil,J.B. (1988) Mol. Cell. Biol. 8, 1045-1054.

A sequence pattern that occurs at the transcription initiation region of yeast RNA polymerase II promoters.

Saccharomyces cerevisiae mRNA 5'-ends map at a variable distance from the TATA element. The mechanism for the choice of the transcription Initiation R...
1MB Sizes 0 Downloads 0 Views