Nucleic Acids Research, Vol. 20, No. 16 4129-4136

Identification of regulatory elements within the minimal promoter region of the human endogenous ERV9 proviruses: accurate transcription initiation is controlled by an lnr-like element Girolama La Mantia, Barbara Majello, Antonio Di Cristofano, Maria Strazzullo, Gabriella Minchiotti1 and Luigi Lania Department of Genetics, General and Molecular Biology, University of Naples, Via Mezzocannone 8, 80134 Naples and 'International Institute of Genetics and Biophysics, CNR, Naples, Italy Received July 2, 1992; Accepted July 23, 1992

ABSTRACT ERV9 is a low repeated family of human endogenous retroviral elements whose expression is mainly detectable in undifferentiated embryonal carcinoma NT2/D1 cells. In this report we have analyzed the minimal promoter region located within the ERV9 LTR. Using the transient CAT expression assay we have identified the minimal promoter region, which includes sequences spanning from - 70 to + 6 relative to the major transcription start site. Deletion analysis, primer extension mapping of the transcription start sites and DNA-protein interactions assays have allowed us to define two important regions within the ERV9 minimal promoter. One region located between - 70 to - 39 acts as a transcriptional activating sequence and contains an Spl binding site. The second region from - 7 to + 6, which resembles an initiator element (Inr), was necessary for the correct transcription start site utilization, and binds to a regulatory protein. Crosscompetition experiments using various Inr elements have indicated that the protein that binds to the ERV9 lnr element can be competed by the HIV-1 and TdT Inr sequences.

INTRODUCTION Endogenous retroviruses and retroviral elements have been detected in the DNA of many vertebrate species, including primates. In the human genome several families of endogenous retroviral sequences (ERV) resembling either full-length or truncated proviruses have been identified serendipitously or by their relationship with mammalian endogenous and exogenous retroviruses (1-8). Endogenous elements most likely entered the germ line of their hosts through retroviral infection of germ cells or germ cells progenitors and since then have persisted as stably integrated, vertically transmitted, proviruses, but they have largely diverged from each other as well as from their infectious progenitors. Irrespective of their evolutionary origins, the high copy number and dispersion of closely related endogenous

retroviral sequences to various human chromosomes should have occurred primarily through repeated chromosomal insertions of reverse transcripts. The mobility of retrotransposons requires the integrity of certain retroviral functions. Some ERV families do not contain a full complement of retroviral genes, with many of them lacking env gene and, in the majority of cases, elements are defective in their capacity to encode functional proteins (9). However, it has been suggested that defective elements can be complemented by homologous as well by heterologous trans-acting proteins (10). Transcriptional activation of retrotransposons is a prerequisite for retroposition and recombination, and, although all human endogenous retroviral elements examined so far seem to be replication defective, many of them have been shown to be transcriptionally active in human tissues and cell lines (9,11 - 13). Moreover, the use of cross-reactive antibodies indicates that ERV sequences may be expressed also at the protein level (14-16). LTR sequences are considered to be the primary determinants of the regulation of ERV expression. Moreover the nature of the LTR transcriptional regulatory elements influences to a great extent the ability of retrotransposons to affect expression of nearby cellular genes. In several different cases, in mouse, insertions of LAP elements near or within normally regulated genes resulted in their constitutive expression (17-19). Conversely, the androgen responsiveness of the mouse Slp gene seems to be conferred by a provirus-like LTR inserted 2kb upstream of Slp (20). It is worth noting that in this latter case the phenotype is imposed by a highly defective element that has maintained functional LTR sequences. These considerations underline the importance of studying various ERV LTRs for their performance as transcriptional promoters. We have recently described a low repeated family of human endogenous retroviral elements (ERV9), whose expression is mainly detectable in undifferentiated embryonal carcinoma cells NT2/D1 (12, 21). Whereas about 50 different ERV9 loci containing pol-related sequences are present in the human genome, we and other (22) have estimated that ERV9 LTR sequences are at least 4000 copies. Moreover, we have previously

4130 Nucleic Acids Research, Vol. 20, No. 16 shown hat ERV9 LTRs derived from randomly selected repeated units are capable to drive the expression of a reporter gene in a transient transfection assay (12). It seems therefore that ERV9 LTRs can represent an important reservoir of potentially functional promoter sequences that could affect the pattern of expression of a number of cellular genes. In beginning to analyze the expression of the ERV9 sequences, we were interested in defining the promoter element present in ERV9 LTRs. In this paper we utilized transient expression assays to define the LTR minimal promoter. We have examined the cis and trans-acting elements involved in the accurate transcription initiation driven by ERV9 LTR elements in NT2/D1) cells, and we found that both a Spi-dependent element and sequences around the cap site (Inrlike sequences) are important for the regulation of transcription.

MATERIAL AND METHODS Plasmids construction The pERV9-CATa2 was constructed by subcloning a 624 bp Pvull-HindmI fragment derived from a 5' genomic ERV9 LTR in the promoterless p8CAT-0 (12). The pERV9-CATa3 was isolated after subcloning the 220 bp Hincll-Hindm fragment isolated from the pERV9-CATa2 in the SmaI-HindIl sites of p8CAT-0. The pERV-CATa6, pERV-CATa7, pERV-CATa9 were constructed using standard PCR methodology (23). The relevant regions were amplified from the pERV9-CATa3 DNA using as primers oligonucletides containing both a suitable restriction site and the complementary bases. The pERV-CATa6 and the pERV-CATat9 were constructed after amplification of the appropriate regions using oligonucleotides primers containing the SmaI site at the 5' and the HindI site at the 3', respectively, then subcloning the amplified products in the Smal-Hindi sites of p8CAT-0. The pERV-CATcv7 was constructed after amplification of the region -39 to +40 using oligonucleotides primers containing a BamHI site at the 5' and the HindIl site at the 3', respectively. The PCR product was subcloned in the appropriate sites of p8CAT-0. The pERV-CATaI 1 was constructed by first isolating the Smal-AvaI (-70 to -7) fragment from pERV-CATa6 DNA, then the isolated DNA fragment was cloned via blunt ligation in the SmaI site of pGEM-3. The resulting Smal-HindM fragment was then cloned in the SmaI-Hindu sites of p8CAT-0 All the plasmid DNA constructs were analyzed by dideoxy sequencing. CAT assay DNA transfections were performed using subconfluent embryonal carcinoma NT2/D1 cell cultures (24) by calcium phosphate precipitation, with 15tg of CAT reporter plasmid together with 2.ug of pSV-j3-gal (Promega Corp., Madison, Wis.) (the f3galactosidase expression plasmid was used as internal control to normalize for variation in transfection efficiency) in Dulbecco medium containing 10% foetal calf serum. Cell extracts were isolated 44 hours after transfection and protein concentrations were determined by using a commercial available dye-binding assay (Bio-Rad Laboratories, Richmond, Calif.) and 40sg of total protein extracs were assayed for (l-galactosidase activity. Equal amounts of (-gaactosidase activity (routinely 1pM corresponding to about 100,tg of total protein extract) were used for the determination of CAT activity (25). The CAT activity was quantified by counting the amount of ['4C]chloramphenicol converted to the mono-acetylated forms.

Primer extension Forty-four hours after transfection, cells were harvested, and total RNA was isolated (26) and analyzed by primer extension using a CAT primer as previously described (27). Reaction products were resolved on 6% polyacrylamide-8 M urea gels. Sequencing ladders were generated by dideoxy-chain termination method using double strand DNAs as template and the Sequenase kit from U. S. Biochemical Corp. (Cleveland Ohio).

DNAse footprint Nuclear extracts were prepared from NT2/D1 cells as previously described (28). DNAse footprinting assay was adapted from the published method (28). The probe used was purified from pERV9-CATa3 after subcloning the 220 bp long HincIl-HindIll fragment (-179/+40) in pGEM-3 vector (Promega). This plasmid DNA was then digested with EcoRI and Hindll and labelled with Klenow polymerase. The end labelled DNA fragment (approximately 1 ng) was added to a 20 IL reaction mix containing 10 mM Hepes pH 7.9, 2.5% glycerol, 0.05 mM EDTA, 25 mM NaCl, 10 mM MgCl2, 10 mM Spermidine and 1 jtg double stranded poly (dI-dC) competitor DNA. Protein

_

m

~1 j

:

-

F4r-

5' -TGTGGGTGGGGCCRGRTARGAGRATARRRGCRGGCTGCCCRGCOGCRGALGGCftRCCCGC TCAGTC-3' *1

Figure 1. Identification of the ERV9 transcription start sites. pERV9-CATa2 plasmid DNA was transfected in NT2/D1 cells, and total RNA was isolated 44 hours after transfection. Primer extension analysis was performed using a primer complementary to the CAT coding region (33), the end-labeled primer was annealed to 30 itg of RNA (lane 2) and to 30/Lg of mock-transfected cell RNA (lane 1). The lengths of the extended products were estimated by comparison with the sequencing ladders generated by dideoxy-chain termination mxthod using the same primer on double strand DNAs as template. On the bottom is reported the DNA sequences around the transcription start sites indicated by the arrows.

Nucleic Acids Research, Vol. 20, No. 16 4131 extracts (30 Atg) were added last and the binding reaction incubated for 20 mins on ice. DNAse (2/d), freshly diluted at 100-400 ytg/ml, was added to the reaction and digestion of DNA-protein complexes was allowed to proceed for 90 sec at 200. Digestions were stopped by the addition of 30 Al of a solution containing 50 mM EDTA, 0.1% SDS and 200 itg/ml Proteinase K. Samples were incubated at 420 for 30 mins, extracted with phenol chlorophorm and ethanol precipitated. DNA pellets were resuspended in formamide dye and loaded on a 6% polyacrylamide-7M urea sequencing gel.

deletion mutants and analyzing their capacities to promote expression of the CAT reporter gene in NT21/D1 cell line. The construction of the CAT plasmids is described in the Materials and Methods section. We used for this analysis the pERV-CATa2 plasmid, which contains a 624 bp long Pvufl-HindII fragment derived from a 5' genomic LTR. First, we attempted to identify the transcription start site(s) of the CAT construct. For this purpose the pERV-CATat2 DNA was transfected in NT2/D1 cells and after 44 hours cells extracts were assayed for CAT expression and RNA was isolated and utilized as template in a primer extension experiment using a primer complementary to the CAT mRNA sequences. The extended products were analyzed by comparison to the dideoxy sequencing reactions performed on the pERV-CATa2 DNA by using the same primer. The results (Fig. 1) clearly showed the presence of two extension products of different intensity, indicating that two start sites are used, corresponding respectively to the T nucleotide located 40bp upstream the HindLII site and to the A nucleotide 4 bp downstream the major start site. The longest and most intense extension product corresponding to the T nucleotide has been labeled as + 1, hence the pERV-CATa2 plasmid bears sequences surrounding the LTR transcription initiation sites from -584 to +40 respective to the major start site. We next generated a set of 5' deletion mutants containing various lengths of the LTR promoter sequences and maintaining 40bp 3' from the major initiation site (Fig 2). CAT transfection assays were performed and the results of these series of experiments clearly indicate that sequences up to 70 bp upstream of the major transcription start site could be removed without any significant loss of CAT activity, whereas deletion of further sequences to -40 almost completely abolished CAT activity (see pERV-CATa7). The 5' deletion analysis suggests that LTR sequences from -70 to +40 are sufficient to drive efficient expression of a reporter gene. In order to determine whether there where any regulatory sequences downstream and/or around the start sites, two deletion mutants from the 3' side, pERV-CATct9 and pERV-CATa 11 were constructed. Both the -70/+6 and the -70/-7 mutants displayed efficient promoter activity (Fig.2).

Gel retardation assay Binding reactions for bandshift assay used a 20 IL volume of reaction mix consisting of 10 mM Hepes pH 7.9, 25 mM NaCl, 0.05 mM EDTA, 2.5% glycerol, 1 mM Spermidine, 2 mM MgCl2 and 3 Ag poly (dI-dC). Proteins (1-5 jg) were added to the reaction mix and preincubated 10 min on ice. Labelled oligonucleotide (0.001 pmol) was then added and the reaction incubated for 20 min at 200. DNA-protein complexes were loaded on a 5% polyacrylamide gels containing 0.5 x TBE and electrophoresed at 200 V for 2-3 hours. The gels were then dried and subjected to autoradiography. For the gel retardation competition assays various amounts of cold competitor DNA were added during the preincubation period. The recombinant vaccinia Spl (29), the bacterial hTFIID (30) and the oligonucleotides containing the binding sites for Spl, API and TFIID have been purchased from Promega Corp. (Madison. Wis.). All oligodeoxynucleotides used in this work were synthesized on an Applied Biosystems model 381 DNA synthesizer. After synthesis the oligodeoxynucleotides were deprotected and gel purified.

RESULTS Delimitation of the ERV9 LTR minimal promoter region We have recenfly demonstrated that different ERV9 LTRs are capable to drive the expression of a reporter CAT gene in NT2/D1 cells (12). We now have sought to identify basal control elements within one of these ERV9 LTRs generating a set of U3

1+1>1 PVU 11

R

U5 1N*lj

.1

HWit N

Ft

pERV-CTa2

pERV-CTa3

III* II*I

.179

HInd

I

4J0

pERV-CATrA .0

-70

pERV-CATa7 .70

.70

.6

.7

37

39 38

~~~~.0

-W

pERV-CATu9 pERV-CATa11

CAT conversion

38

39

Figure 2. Expression of the deletion mutants of the ERV9 promoter. On the top is shown the structure of a 5' genomic ERV9 LTR. The arrows indicate the 41 bp long and the 72 bp long repeats present respectively in the U3 and U5 regions. A schematic representation of the 5' and 3' deletion mutants described in the text is indicated. The CAT activities were always normalized for the 3-galactosidase activity used as internal control for transfection efficiencies. The values indicating the percentage of CAT activities are the average of 4 to 6 independent transfections.

4132 Nucleic Acids Research, Vol. 20, No. 16 As we knew that the pERV-CATa2 plasmid utilizes at different levels two start sites, at + 1 and +5 positions, we decided to verify if 3' deletions could have any effect on the start sites utilization. We performed primer extension assays, using RNA from transfected cells and a primer DNA complementary to the CAT mRNA sequences. In Fig. 3 are shown the results obtained with RNA extracted after 44 hours from the transfection of the pERV-CATa9, (-70/+6) and the pERV-CATcxl 1 (-70/-7) plasmid DNAs. The pERV-CATa9 (-70/+ 6) synthesized transcripts initiating at the same positions as seen by using the pERV-CATa2 vector.(fig.3, B). Conversely, multiple extension products were present when the -71/-6 mutant (pERVCATcl1 1) was used (fig.3, A), hence the removal of sequences around the start sites determines a change in the start site utilization. From the results of our CAT assays and primer extension analysis it appears that the ERV9 minimal promoter is located between -70 to +6 nucleotides, and a strictly required cis-acting element is located between -70 and -39, whereas an additional regulatory element is located between -7 to +6 region, the latter appears to be necessary only for the correct transcriptional start sites utilization.

Proteins bind to two sites in the LTR minimal promoter To identify proteins capable of interacting with the ERV9 LTR minimal promoter, the 219bp long Hincll-Hindil DNA fragmnt was used in DNase I protection assay. As a source of nuclear extracts we used NT2/D1 cells. Two regions of this probe were protected from DNase I cleavage. A strong protection extends from nucleotides -54 through -33 (A box), whereas a second protected region spanning the transcription initiation site, from -10 nt to +7 (B box), is detectable at a higher nuclear proteins concentration ((Fig. 4, lane 1). Identical protected regions were A

observed by footprint analysis of the same fragment labelled on the other strand (data not shown). The two protected areas defined by footprints analysis correspond to regions required for correct promoter activity as determined in our CAT and primer extension analysis of the ERV-CAT deletion mutants. In fact, deletion to -39 that removes the A box almost inactivates the promoter activity, whereas a deletion from +7 to -6 abolishes the correctly positioned transcription initiation. Inspection of sequences in the A box revealed the presence of the sequence TGGGTGGGGC showing a good match with the transcription activator factor Spl consensus sequence G/TGGGCGGPuPuPy. Moreover, the footprint protected region extends about 20 bp, according to the observation that binding of Spl to a single site is able to protect 18-20 bp (31). In the attempt to ensure that the A box binding factor was indeed SpI we performed electrophoretic mobility shift assays using a -54/-33 double stranded oligonucleotide (Fig 5). DNA protein complexes were detected that were specifically inhibited by either the -54/-33 oligo or by a 22mer oligo containing the canonical Spl binding site (Fig. 5A). In addition the -54/-33 oligo is capable to inhibit efficiently the complex formed by the Spl oligo. Furthermore, using a vaccinia produced recombinant Spl protein

B

33

.54

4-0 4-0 '1

.t _k

Figure 3.

Transcription

start

sites

of

ERV9

LTR

3'

deletion

mutants.

pERV9-CATa 1I (A) and pERV9-CATa9 (B) plamids DNA were transfected in NT2/D1 cells. Primer extensions were carried out as described in Figure 1. The position of the transcription start sites are indicated by arrows.

Figure 4. DNase I footprint analysis of the ERV9 promoter. Binding reaction and footprint analysis were carTied out as described in Materials and Methods. Numbers refer to the positions relative to the major start site of trnsciption. Alongside the autoradiograph the position of the DNase I protected regions are indicated. The non coding stand was digested with DNase I in the absence (lane -) and in presence of 30 jig (lane 2) or 90 Ag (lane 1) of NT2/DI nuclear extracts.

Nucleic Acids Research, Vol. 20, No. 16 4133 both -54/-33 and SpI oligos bind the recombinant SpI protein with a similar affinity (Fig. 5B). Moreover, methylation interference assays demonstrated that both the NT2/Dl nuclear extracts and the recombinant human SpI protein interact strongly with the same guanine residues (data not shown). The results A -

54133

zip

reported in fig. 5 strongly suggest that SpI is the bonafide transcriptional activator protein that binds to the LTR minimal promoter at -54/-33 position, and it is required for efficient promoter activity. Identification of B-box binding factor We have shown in Fig. 4 that a region spanning the start sites from -10 to + 7 is protected from DNase I cleavage and, most notably, this region is important for the determination of transcription initiation (Fig. 3). Recently it has been shown that in some cases the region encompassing the start of transcription is necessary to locate the initiation site (Inr) (32). Moreover specific binding of nuclear proteins at some Inr elements has been demonstrated (33 -35). We decided, therefore, to further characterize the binding activity observed around the start site in the DNase I protection experiment. To confirm that a nuclear factor binds to sequences located around the ERV9 transcription start sites we performed mobility shift experiments using as a probe a double-stranded oligonucleotide from -21 to +9 relative to the major start site. The results (fig.6, panel A) showed that the formation of the complex is specifically inhibited by the addition of 50 and 100 molar excess of the cold oligonucleotide, suggesting that a yet uncharacterized factor binds specifically to sequences spanning the start site, thus defining an Inr-like element. It has been recently described that the Inr-elements of the terminal deoxy-nucleotidyl transferase gene (TdT), human immunodeficiency virus-I (HIV-1) and adeno-associated virus type 2 P5 (P5 + 1) appear to contain recognition sites for nuclear binding factors (32, 34, 35). To establish a relationship between nuclear factors binding the various Inr elements we carried out gel-retardation competitions experiments using the -21/ +9 LTRInr oligo as a probe and varying concentrations of doubledstranded oligonucleotides containing the HIV-1, TdT and P5+1 Inrs, respectively. The results are shown in fig 6, panel B. The

B

-

+

- 54/ - 3 3

_

_

5D

W"

probe

-54/-33

probe

Sp I

- 5 4/ - 33

Sp I

Figure 5. The bonafide SpI transcription factor binds to the ERV9 promoter. (A) The -54/-33 double stranded oligonucleotide (5'-AGGATGTGGGTGGGGCCAGATA-3' and his complement) was used in gel retardation experiments. Specific protein/DNA complexes were detected, and the complexes were specifically inhibited by the lOOx molar excess of unlabeled competitors (the -54/-33 oligo or by an Spl oligonucleotide 5'-ATTCGATCGGGGCGGGGCGAGC-3' and his complement). Similar results were observed when the Spl ologonucleotide was used as probe. (B) The vaccinia produced recombinant Spl protein binds to both -54/-33 and Spl oligos with similar efficiency.

Competitor Fold excess

HIV-I

P5+1

Is 8 I | 5 I N~~~~v

TdT |

Is

API

8

I

*

A

B

Figure 6. Gel shift cross-competition experiments using different Inr elements. (A) The ERV9 oligonucleotide -21/+9 described in Figure 6 was end-labeled and the specific binding complex was competed by 50 and 100 molar excess of unlabelled oligonucleotide. (B) The -21/+9 labelled oligonucleotide was used as probe in cross-competition experiments using the HIV-1, the P5 + 1 and the TdT Inr double stranded oligonucleotides, (HIV-1, 5'-GATCCTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCA-3'; P5 + 1, 5'-GATCCAGGGTCTCCATITTTGAAGCGGGA-3'; TdT, 5'-GATCCGAGCCCTCATTCTGGAGACACCA-3' and their complements to produce BamHI and HindIII ends). Asterisks in both A and B panels indicate unspecific binding complexes.

4134 Nucleic Acids Research, Vol. 20, No. 16 P5+1

GTCTCC,TTTTGAAG

TdT

GCCCTC&TTCTGGAG

HIV-1

GCCTGTA--CTGQGT

ERV9

GCCAGCA--G2GGCA

Flgure 7. Alignment of sequences aromd the transcriptional start sites of the Inr elements used in cross competition experiments reported in Fig. 6. The transcriptional start sites are underlined, and the conserved bases are outlined. For ERV9 and HWV-I two bases gap was necessary to maintain sequence alignrment.

GATGTGGGTGGGGCCAGATAAOAGAATAAAAGCAGCTGCCCGAGCCAGCAe-s:.;-.w.A

-

-

39 3..

9

_

_-

but specific start sites and in some cases the region encompassing the start of transcription has been demonstrated to locate the initiation site (Inr) (32). No conventional TATA sequence is present in the ERV9 LTR promoter, but an AATAAAA at nt. -28. On the other hand the lack of a canonical TATA-box located 30bp upstream of the transcription start site may not to be sufficient criteria to define unambiguously the LTR promoter as a TATA-less promoter, as the TATA-binding protein (TBP) and factors in the TFIID fraction have the potential to recognize the promoter also through a number of sequences unrelated to the TATA motif (36, 37). We carried out mobility shift assays using as a probe a fragment containing sequences starting immediately downstream the Spl recognition site from -39 to +9 (Fig. 8). The results indicated that the retarded band (lane 1) is specifically competed by 50x molar excess of the cold -39/+9 fragment (lanes 2) and by a -21/+9 oligonucleotide (lane 3). Conversely, formation of the complex is not inhibited by the -39/ - 14 oligo (lane 4). Lack of inhibition is also seen when it is used as competitor a 25mer oligo containing the canonical TATA sequence (lane 5). These results clearly confirm that a protein binds to sequences between -21 to +9 , and suggest that no TATA binding activity is implicated in the formation of the observed retarded complex. To further confirm these observations we sought to verify if a bacterial recombinant hTBP is able to specifically bind to any site in the -39/+6 sequences. Mobility shift assays using the recombinant hTBP showed that, using protein concentration that efficiently bind to a probe containing the canonical TATA consensus sequence, the hTBP did not recognize the LTRpromoter (data not shown). Nevertheless, from these results we cannot exclude that the AT rich region at -28 could be a very weak binding site for TBP.

DISCUSSION Regulatory elements in the ERV9 minimal promoter Fgu 8. Gel el phoresis mobility shift analysis of the ERV9 -39/+9 region. A fragment from -39 to +9 was used as probe. The specific complex was eliminatd by compeation with 50x molar excess of both -39/+9 oligonucleotide and by the -21/+9 oligonucleotide (5'-GCAGGCTGCCCGAGCCAGCAGTGGCAACCC-3' and his complement). No competition was observed in the presence of 200x molar excess of the -39/- 14 oligo (5'-CCAGATAAGAGAATAAAAGCAGGCTG -3' and his complement) or by 200x molar excess of a canonical TATA oligonucleotide (5'-GCAGAGCATATAAGGTGAGGTAGGA-3' and his complement).

specific retarded complex formd by the -21/+9 oligonucleotide is competed by addition of excess HIV-1 and TdT Inrs. Conversely, the P5 + 1 Inr does not efficiendy inhibit the complex formation. These results suggest that the ERV9-initiator-binding factor has different affinity at least for the P5 Inr in the conditions we used. The alignment of sequences encompassing the start sites of the Inrs elements that we used in competition assays (fig. 7) did not reveal a strict conservation of the Inr sequences. However we noticed the presence of a perfect conservation of two elements corresponding to the sequences GCC and TGG in all the Inrs except the P5 + 1 Inr. These two boxes are equally spaced in both the ERV9 and HIV-1 Inr sequences and are separated by 7 nucleotides in the TdT Inr. It is known that a large subclass of PolII promoters that lack TATA box and sometimes contain multiple SpI binding sites still initiate transcription at a unique start site, or display multiple

Many thousand ERV9 LTR sequences are present in the human genome (12, 21, 22). As the transcriptional specificities housed in the LTRs represent the key factors underlying possible effects on cellular gene expression and on the element mobility, in this paper we have focused our attention on the identification of the DNA sequences necessary for basal and accurate transcription of the ERV9 LTR. Using transient CAT assays we have demonstrated that ERV9 minimal promoter is composed by sequences from -70 to +6 relative to the major transcriptional start site. We show also that the LTR-minimal promoter is regulated by the interaction of one Spl-binding site and an Initiator element. DNase I protection experiments and band shift assays have allowed us to define that a sequence from -54 to -33 binds to a bonafide Spl. Analysis of deletion mutants in transient expression experiments indicated that the efficiency of CAT expression drops drmatically when sequences containing the Spl binding site are removed. On the other hand deletion of 3' sequences up to -7 determines a change in the start site utilization (see primer extension in fig. 3), thus defining that an Inr-like element is present around the major start site. Moreover footprint experiments and band shift assays indicate that sequences at the ERV9-LTR Ir-like bind a protein. The 3' deletion mutant pERV9-CATal 1 retains a CAT activity similar to that shown by the pERV9-CATa9, although the transcription start sites are different. These results indicated that the sequences from -7 to

Nucleic Acids Research, Vol. 20, No. 16 4135 +6 are mainly required for a correct positioning of the transcription start sites. However, it is pertinent to note that the deletion mutant pERV9-CATa7, that lacks the SpI site, still retains a small but detectable CAT activity, suggesting that the Inr-like element is also capable to support basal promoter activity. Further investigations using in vitro transcription assays are necessary to address this point. Our unpublished results confirmed that no other binding factors, except SpI and the Inr-binding protein, interact with the ERV9 minimal promoter, as the retarded complexes obtained by the binding of nuclear factors to sequences from -70 to +6 are not competed by various oligonucleotides containing the consensus sequences for transcription factors such as API, AP2, AP3, Oct-i, NF-kB, NFl and CREB. Two lines of evidences seem to suggest that the ERV9 promoter falls in the category of the TATA-less promoters. First, inspection of sequences revealed no TATA consensus sequences. In addition, DNA sequences of the minimal promoter are not able to bind a recombinant hTBP, and the retarded complex formed by the -39/+9 probe as shown in fig.8, is not competed by a TATA containing oligonucleotide. Nevertheless, we cannot exclude that the ERV9 promoter could contain a very weak TBP binding site. Moreover, it has recently been reported a direct involvement of TBP also at TATA-less promoters in vivo (38, 39, 40). Pugh and Tjian (39) have proposed that the TFIID complex is recruited by activators such as Spl, via a tethering factor physically associated with TBP. Alternatively, one possible model for transcription from TATA-less promoters is that TBP recognizes weak consensus sites (38). A more definitive conclusion about whether a tethering factor is required for the ERV9 promoter activity, might be drawn only from an in vitro transcription analysis, using partially purified transcription factors. Many reports have been accumulating showing that nuclear proteins specifically recognize Inr elements (41, 33-35). Recently, it has been described the cloning of the human factor, named YY1, that binds the adeno-associated virus PS Inr, (42). In addition, three different laboratories have cloned genes identical to the human YY1. The human NF-E1 (43) binds a regulatory element in the 3' region of the immunoglobulin x gene and the ftE1 site of the immunoglobulin heavy-chain gene and is capable to down regulate transcription mediated by a itEI site. A clone called 6 (44) was shown to bind to downstream elements of ribosomal protein genes, activating the transcription. Finally, Flanagan et al. (45) have cloned a factor termed UCRBP and determined that it binds in the long terminal repeat of Moloney leukaemia virus where it inhibits the transcription. Thus, these findings expand the number of genes known to be regulated by YY1 and underline the diversity of its functional roles. While YY1 seems to recognize similar sequences in the P5 promoter and in the MuLV LTR (42, 45), it is striking the dissimilarity of the binding motifs recognized in the other genes (43, 44). Band shift competitions experiments between different Inrs performed by us (see fig 6) and others (35), suggest that, while some of these proteins could have similar binding affinity, not all of them belong to the same class. On the other hand a strict sequence conservation is missing in many of the known Inr elements, although some sequence similarities between few of them has been observed (34, 35). In fig.7 we reported the alignment between sequences around the start site of the Inrs elements that we used in competition assay. The most interesting observation is the perfect conservation of two boxes that are

separated by 5 nucleotides in both the ERV9 and HIV-1 Inr sequences and by 7 in the TdT Inr. However, the most divergent sequence appears to be that of the P5 Inr, which is not able to compete for the formation of the retarded complex generated by the ERV-Inr element. Site specific mutation analysis of the ERV9 Inr element is under way in order to evaluate the role of the conserved boxes within the various Inr elements. ERV9 expression We have previously shown that ERV9 sequences are preferentially expressed in embryonal carcinoma NT2/D1 cells, as no detectable expression of ERV9 sequences was found in a number of cell lines of different embryological derivation (12, 21). Here we have shown that the minimal promoter is regulated by the presence of the SpI activating element, and by a yet uncharacterized Inr regulatory protein. In addition, the ERV9 Inr element is capable to form similar DNA/protein complexes with both NT2/D1 and HeLa nuclear extracts (our unpublished results). Finally the ERV9-CAT constructs described in this work display promoter activity, albeit at lower levels, when transfected in Hela cells. It appears therefore that the minimal promoter region does not contain the tissue specific regulated control elements. Since we did not detect any ERV9 RNA expression on HeLa cell (12, 21) we have to hypothesize either the presence of a negative transcriptional control element, which has to be located outside the minimal promoter, or that other mechanisms such as methylation and/or specific chromatin structures are responsible for the regulation of expression of the ERV9 sequences. In mouse the transcriptional activation of the endogenous IAP sequences appears to be under the control of a complex series of regulatory proteins (46-48) and is influenced by such variables as DNA methylation (49) and the state of of cellular differentiation (40) or neoplastic transformation (51). We are currently analyzing how variables as DNA methylation and neoplastic transformation could influence the ERV9 expression.

ACKNOWLEDGEMENTS We would like to thank Miss R.Terracciano for skilful technical help and A.Furia for the artworks and photographs. We also thank Dr. M.Persico for the synthesis of oligonucleotides. This work was paid for by grants from the Italian Association for Cancer Research (AIRC) and from 'Progetto di ricerche sull'AIDS' (Istituto Superiore di SanitA) to L.L., and MURST to G.L.M.

REFERENCES 1. Callahan, R., Chiu, I.-M., Wong, J.F.H., Tronick, S.R., Roe, B.A., Aaronson, S.A. and Schlom, J. (1985) Science, 228, 1208-1211. 2. Kroger,G., and Horak, I. (1987) J. Virol., 61, 2071-2075. 3. Leib-Mosch, C., Brack, R., Werner,T., Erfle, V. and Hehlmann, R. (1986)

Virology, 155, 666-677.

4. Mager, D. L., and Hentorn, P. S. (1984) Proc. Natl. Acad. Sci. USA., 81, 7510-7514. 5. Martin, M. A., Bryan, T., Rasheed, S. and Khan, A. S. (1981) Proc. Natl. Acad. Sci. USA., 78, 4892-4896. 6. Ono, M., Yasunaga, T., Miyata, T. and Ushikubo, H. (1986) J. Virol., 60, 589-598. 7. Perl, A., Rosenblatt, J.D., Chen, I.S.I., Di Vincenzo, J.P., Bever, R., Poiesz, J. and Abraham, G. N. (1989) Nucleic Acids Res., 17, 6841-6854. 8. Repaske, R., Steele, P.E., O'Neill, R.R., Rabson, A.B. and Martin, M. A. (1985) J. Virol., 54, 764-772. 9. Leib-Mosch, C., Brack-werner, R., Werner, T., Bachmann, M., Faff, O., Erfle, V. and Hehlmann, R. (1990) Cancer Res. (suppl.), 50, 5636-5642.

4136 Nucleic Acids Research, Vol. 20, No. 16 10. Keshet, E., Schiff, R. and Itin, A. (1991) Adv. Cancer Res., 56, 215-251. 11. Kato,N., Pfeifer-Ohlsson, S., Kato, M., Larsson, E., Rydnert, J., Ohlsson, R. and Cohen, M. J. (1987). J. Virol., 61, 2182-2191. 12. La Mantia, G., Maglione, D., Pengue, G., Di Cristofano, A., Simeone, A., Lanfrancone, L., and Lania, L. (1991) Nucleic Acids Res., 19, 1513-1520. 13. Rabson, A. B., Hamagishi, Y., Steele, P.E., Tykocinski, M. and Martin, M. A. (1985) J. Virol., 56, 176- 182. 14. Derks,J.P.A., Hofmans,L., Bnming, H.W. and Van Rood, J.J. (1982) Cancer Res., 42, 681-686. 15. Jerabek,L.B., Mellors, R.C., Elkon, K.B. and Mellors, J. W. (1984) Proc. Nat. Acad. Sci. USA, 81, 6501-6505. 16. Suni, J., Narvanen, A., Wahlstrom, T., Aho, M., Pakkanen, R., Vaheri, A., Copeland, T., Cohen, M. and Oroszlan, S. (1984) Proc. Acad. Natl. Sci. USA., 81, 6197-6201. 17. Kongsuwan, K., Allen, J. and Adams, J. M. (1989) Nucleic Acids Res., 17, 1881-1892. 18. Leslie, K.B., Lee, F. and Schrader, J. W. (1991) Mol. Cell. Biol., 11, 5562-5570. 19. Ymer, S., Tucker, W.Q.J., Sanderson, C.J., Hapel, A.J., Champbell, H.D. and Young, I. G. (1985) Nature, 317, 225-258. 20. Stavenhagen,J.B., and Robins, D.M. (1988) Cell, 55, 247-254. 21. La Mantia, G., Pengue, G., Maglione, D., Pannuti, A., Pascucci, A. and Lania L. (1989) Nucleic Acids Res., 17, 5913-5922. 22. Zucchi, I., and Schlessinger, D. (1992) Genomics, 12, 264-275. 23. Innis, M.A., Gelfand, D.H., Sninsky, J.J. and White, T.J. (1990) ed. PRC protocols, a guide to methods and applications. Academic Press Inc., San Diego, California. 24. Andrews P.W. (1984) Dev. Biol., 103, 285-293. 25. Gorman, C.M., Moffat, L.F., and Howard, B. H. (1982) Mol. Cell. Biol., 2, 1044-1051. 26. Maniatis T., Fritsch, E.F. and Sambrook, J. (1982) Molecular cloning: A laboratory manual. Cold Spring Harbour University Press, Cold Spring Harbour. 27. Minchiotti, G., and Di Nocera P.P. (1991) Mol. Cell. Biol., 11, 5175-5180. 28. Majello, B., Arcone, R., Toniatti, C. and Ciliberto, G. (1990) EMBO J., 9, 457-465. 29. Kadonaga,J.T., Carmer, K.R., Masiarz, F.R. and Tjian, R. (1987) Cell, 51, 1079-1090. 30. Peterson, G.P., Tanese, N., Pugh, B.F. and Tjian, R. (1990) Science, 248, 1625-1630. 31. Kadonaga,J.T., Jones, K.A. and Tjian, R. (1986) Trends Biochem. Sci.,11, 20-23. 32. Smale, S.T., and Baltimore, D. (1989) Cell, 57, 103-113. 33. Means, A.L., and Famham P.J. (1990) Mol. Cell. Biol,. 10, 653-661. 34. Roy, A.L., Meisteremst, M., Pognonec, P. and Roeder, R. G. (1991) Nature, 354, 245-248. 35. Seto, E., Shi, Y. and Shenk, T. (1991) Nature, 354, 241-254. 36. Hahn, S., Buratowski, S., Sharp, P.A. and Guarente, L. (1989) Proc. Natl. Acad. Sci. USA, 86, 5718-5722. 37. Singer,V.L., Wobbe, C.R. and Struhl, K. (1990) Genes and Dev., 4, 636-645. 38. Cormack, B.P. and Struhl, K. (1992) Cell, 69, 685-696 39. Pugh, B. F., and Tjian, R. (1991) Genes and Dev., 5, 1935-1945. 40. Smale, S.T., Schmidt, M.C., Berk, A.J. and Baltimore, D. (1990) Proc. Natl. Acad. Sci. USA, 87, 4509-4513. 41. Beaupain, D., Eleouet, J.F., and Romeo, P.H. (1990) Nucl. Acids Res., 18, 6509-6516. 42. Shi,Y., Seto, E., Chang, L.S. and Shenk, T. (1991) Cell, 67, 377-388. 43. Park, K., and Atchison, M.L. (1991) Proc. Natl. Acad. Sci. USA, 88,

9804-9808. 44. Hariharan, N., Kelley, D.E. and Perry, R.P. (1991) Proc. Natl. Acad. Sci. USA, 88, 9799-9803. 45. Flanagan, J.R., Becker, K.G., Ennist, D.L., Glerason, S.L., Driggers, P.H., Levi, B.-Z., Appella, E. and Ozato, K. (1992) Mol. Cell. Biol., 12, 38-44. 46. Falzon,M., and Kuff E.L. (1988) J. Virol., 62, 4070-4077. 47. Falzon,M., and Kuff E.L. (1989) J. Biol. Chem., 264, 21915-21922. 48. Falzon,M., and Kuff E.L. (1990) J. Biol. Chem., 265, 13084-13090. 49. Mietz, J.A., and Kuff, E.L. (1990) Proc. Natl. Acad. Sci. USA, 87, 2269-2273. 50. Howe, C.C., and Overton, G.C. (1986) Mol. Cell. Biol., 6, 150-157. 51. Augenlicht, L.H., Kobrin, D., Pavlovec, A. and Royston, M.E. (1984) J. Biol. Chem., 259, 1842-1847.

Identification of regulatory elements within the minimal promoter region of the human endogenous ERV9 proviruses: accurate transcription initiation is controlled by an Inr-like element.

ERV9 is a low repeated family of human endogenous retroviral elements whose expression is mainly detectable in undifferentiated embryonal carcinoma NT...
2MB Sizes 0 Downloads 0 Views