The EMBO Journal vol. 1 0 no. 1 pp. 59 - 69, 1 991

Multiple isoforms of the mouse retinoic acid receptor oc are generated by alternative splicing and differential induction by retinoic acid Pierre Leroy, Andree Krust, Arthur Zelent, Cathy Mendelsohn, Jean-Marie Garnier, Philippe Kastner, Andree Dierich and Pierre Chambon Laboratoire de Genetique Moleculaire des Eucaryotes du CNRS, Unite 184 de Biologie Moleculaire et de G6nie Ghn&tique de l'INSERM, Institut de Chimie Biologique, Faculte de Medecine, 11 Rue Humann, 67085 Strasbourg Cedex, France Communicated by P.Chambon

Together with the previously described mouse retinoic acid receptor alpha-i (mRAR-al, formerly mRAR-aO), we have isolated and characterized here a total of seven mRAR-a cDNA isoforms (mRAR-al to 07). These isoforms are generated from mRAR-a primary transcript(s) of a single gene by alternative splicing of at least eight different exons with the exon which encodes the amino acid sequence of their common B region. All of these isoforms differ in their 5'-untranslated regions (5'-UTRs) and, in the case of mRAR-al and o2, also in the sequences encoding the N-terminal A region which is known to be important for differential trans-activation by other members of the nuclear receptor superfamily. In addition, the sequences encoding the open reading frames (ORFs) of mRAR-a3 and a4 cDNA isoforms remain open to their very 5' ends, which suggests that these two isoforms may also encode RAR-as with unique A region amino acid sequences. The two predominant isoforms, mRAR-al and a2, were found to be differentially expressed in mouse adult and fetal tissues, as well as in P19 and F9 embryonal carcinoma (EC) cell lines. Interestingly, the expression of mRAR-a2, in contrast to that of the mRAR-al isoform, was induced by retinoic acid (RA) in EC cells, thus suggesting the presence of two promoters in the 5' region of the mRAR-a gene, which differ in their response to RA. The conservation between mouse and human RAR-al and a2 cDNA isoform sequences, as seen by cross-hybridization in Southern blots or by DNA sequence analysis, together with their differential patterns of expression, strongly suggests that they perform specific functions during embryogenesis and in the adult. Key words: EC cells/mouse embryo/mouse RAR-a cDNA isoforms/promoter/5 '-UTR

Introduction Retinoic acid (RA), a vitamin A metabolite, thought to be natural morphogen (Maden, 1982; Tickle et al., 1982; Slack, 1987a,b; Thaller and Eichele, 1987), plays a crucial role in the development and homeostasis of vertebrates (for reviews see Maden, 1985; Brockes, 1989, 1990; Eichele, 1989; Summerbell and Maden, 1990; and references

a

Oxford University Press

therein). RA exerts a broad spectrum of effects on various cells in culture, during development of an organism and in the adult. These effects have often been associated with changes in the expression patterns of specific genes (Wang and Gudas, 1983; LaRosa and Gudas, 1988; Murphy et al., 1988; Vasios et al., 1989; Okamoto et al., 1990; Simeone et al., 1990), indicating that RA may directly affect gene expression [for additional reviews and references, see Zelent et al. (1991) (accompanying paper)]. Recently, three highly related nuclear retinoic acid receptors (RAR-oa, -f and -,y) have been identified in both human and mouse (Giguere et al., 1987; Petkovich et al., 1987; Benbrook et al., 1988; Brand et al., 1988; Krust et al., 1989; Zelent et al., 1989), and shown to belong to the superfamily of steroid/thyroid hormone nuclear receptors which act as ligand-inducible transcriptional enhancer factors (for reviews see Evans, 1988; Green and Chambon, 1988; Beato, 1989). Similarly to the other members of the nuclear receptor superfamily, RARs have a modular structure which is comprised of six regions designated A to F (Krust et al., 1986; Green and Chambon, 1988). In the case of steroid receptors, it was shown that regions C and E are responsible for DNA and ligand binding (Evans, 1988; Green and Chambon, 1988), respectively, and that the A/B and E regions contain distinct trans-activation domains which are both cell type- and promoter-specific (Tora et al., 1988a,b; Tasset et al., 1990). The precise roles of regions D and F are unknown. For each RAR subtype (either a, f or 'y), the sequences of regions A - F are highly conserved from mouse to human (Krust et al., 1989; Zelent et al., 1989). In contrast, in a given species only the sequences of regions B, C and E are well conserved between the three RARs, whereas those of regions A, D and F are less or not conserved. These observations suggest that the three RARs may be functionally distinct, and thus may regulate the expression of different sets of RA-responsive genes. This view was further supported by analyses of the distribution of RAR transcripts using Northern blotting (Krust et al., 1989; Zelent et al., 1989) and in situ hybridization (Dolle et al., 1989, 1990; Ruberte et al., 1990, 1991), which showed that each mouse RAR (mRAR) subtype exhibits a specific pattern of expression either in adult tissues or in developing embryo. Recently, multiple human and mouse RAR--y cDNA isoforms have been characterized (Krust et al., 1989; Giguere et al., 1990; Kastner et al., 1990). Using the anchored PCR approach (Loh et al., 1989) and also standard cDNA library screening procedures, we have now isolated a total of six novel mRAR-ca cDNA isoforms (mRAR-a2 to c7). Together with the initially characterized mouse and human RAR-1 isoform (formerly termed RAR-a0, see Giguere et al., 1987; Petkovich et al., 1987; Brand et al., 1988; Zelent et al., 1989), RAR-a isoforms share a common B-F region, but their sequences diverge upstream of the A/B region junction (Brand et al., 1988; Zelent et al., 1989). 59

PLeroy et al.

Interestingly, the two most abundant isoforms, mRAR-alI and mRAR-a2, which differ in both their 5'-untranslated region (5'-UTR) and A region sequences, were found to be differentially expressed in the mouse. Furthermore, the expression of the mRAR-u2 isoform in contrast to that of mRAR-a 1, is induced by RA in embryonal carcinoma (EC) F9 and P19 cells.

Results 5'-UTR and putative promoter sequences of the mouse RAR-a1 isoform

The sequence of the mRAR-al1 cDNA isoform that we previously isolated and designated mRAR-aO (Zelent et al., 1989) was incomplete in its 5' region. In order to obtain mRAR-al cDNA clones extending further in the 5' direction, a XZapII cDNA library was constructed from poly(A)+ RNA of RA-treated F9 EC cells and screened with 32P-labelled probes derived from the above mRAR-a 1 (mRAR-aO) cDNA. Two cDNA clones (F9.1 and F9.2) were isolated. Sequence analysis of clone F9.1 showed that it was colinear with and highly homologous to the previously characterized mouse (Zelent et al., 1989) and human (hRAR-a, see Petkovich et al., 1987; Brand et al., 1990). RARcal cDNAs. An alignment of the DNA sequences of human and mouse RAR-al promoter proximal region, 5'-UTR, A region (Al) and the beginning of the B region is shown in Figure 1. The 5' end of the F9. 1 cDNA clone sequence corresponds to nucleotide (nt) 309 in Figure 1 (indicated by a vertical arrow); the mRAR-a l DNA sequences which are located further upstream in Figure 1 (nt -177 to + 308), and also in Figure 2A and B, were obtained by sequencing the mouse genomic DNA clone XG1 mRAR-a (see below and Figure 4B). It is clear that not only the A region (Al in Figure 1), but also the entire 5'-UTR and the promoter proximal (nt -1 to -177 in Figure 1) sequences of RAR-acl are highly conserved from mouse to human. The amino acid sequences of human and mouse Al regions are nearly identical, exhibiting an overall sequence similarity of 98%, with a single conservative amino acid change (Ser43 to Thr43). In the 5'-UTR, the overall sequence similarity between the mouse and human sequences is 80% (Figures 1 and 2A). In both cases there are two conserved in-frame termination codons (underlined in Figures 1 and 2A) located upstream of the open reading frame (ORF) which encodes the Al and B - F regions of mRAR-a 1. Furthermore, there are a number of upstream short ORFs (sORFs), in both mouse and human 5'-UTR sequences; however, only two of them (human and mouse sORF 1; human sORF4 and mouse sORF3) are conserved (see Figure 2A). Finally, using a secondary structure prediction program (Zuker and Stiegler, 1981; Jacobson et al., 1984), both mouse and human 5'-UTR sequences were found to fold into similar and stable secondary structures (AG = -200 kcal/mol, not shown). The recent characterization of a functional hRAR-o 1 promoter (Brand et al., 1990) identified a major transcription start site for hRAR-alI mRNA (indicated as + 1 in Figures 1 and 2A). Approximately 500 nt of the 5 '-flanking human genomic DNA sequence, which included nt -1 to -176 in Figure 1, exhibited transcriptional promoter activity when tested in cultured cells (Brand et al., 1990). The transcription start site for mRAR-aI mRNA has not yet been

60

mapped. However, in view of the very high DNA sequence similarity (80%) between the mouse and human 5'-UTRs and promoter proximal sequences, it is most probable that the mRAR- l1 promoter is located at the same position as in the human. It is worth noting that both human and putative mouse RAR-oa1 promoter sequences lack obvious TATA or CCAAT box sequences, but possess conserved, multiple putative binding sites for the transcription factor Spl (boxed in Figure 1; Mitchell and Tjian, 1989). Additionally, the DNA sequence surrounding the major human and the putative mouse RAR-a 1 mRNA transcription initiation site (underlined with a striped rectangle in Figure 1), closely resembles the consensus sequence 5'-CTCANTCT-3' thought to be an important transcription initiator core sequence in RNA polymerase B(II)-transcribed promoters devoid of a TATA box (Smale and Baltimore, 1989; Smale et al., 1990). A novel mRAR-a cDNA isoform, mRAR-a2, differs from mRAR-acl in sequences encoding both the A region and the 5'-UTR The 5'-terminal DNA sequence of mRAR-a2 which includes 5'-untranslated sequences, the A region (A2) and the first 14 amino acids of the B region, is shown in Figure 3A. It was derived from sequencing the F9.2 cDNA clone (see previous section) which, similarly to the F9. 1 cDNA clone (mRAR-al1), extended further 3' and contained the entire protein coding sequences of mRAR-a (B-F regions, data not shown). The nucleic acid sequences of a 1 and oz2 cDNAs were identical over their cloned 3'-UTRs (396 nucleotides), as well as in the B-F regions of the receptor, indicating that both cDNAs were derived from the same mRAR-ca gene. The two cDNAs diverged only upstream of the A/B region junction (indicated by a filled arrowhead in Figures 1, 3A and 4A). Thus, the putative RAR-a2 protein which is encoded by the F9.2 clone cDNA differs from the mRAR-a 1 protein only in its N-terminal A2 region amino acid sequences. The methionine initiator codon is preceded by two in-frame termination codons (underlined in Figure 3A). Both the 5'-UTR and the A2 region (encoding 56 amino acids), have no sequence similarity with the 5'-UTR and Al region (encoding 59 amino acids) of mRAR-cal. As is the case for the mRAR-c 1, 5'-UTR sequences of mRAR-ci2 cDNA isoform can also be folded into an extensive and stable hairpin structure (AG = -190 kcal/mol, data not shown).

Five additional mRAR-a cDNA isoforms Using the anchored PCR technique (Materials and methods) incomplete cDNAs of five additional mRAR-a isoforms (mRAR-O3 to mRAR-a7) were cloned, together with those of mRAR-a 1 and a2 which were most frequently isolated. All of these cDNAs contained the same B region-coding DNA sequence (underlined in Figures 1, 3A -F and 4), and diverged from each other immediately upstream from the A/B region junction. An additional clone, mRAR-aNS (Figure 3G), was isolated whose DNA sequence upstream from the A/B region junction corresponded to the intron sequence lying immediately upstream of the B region-coding exon (exon 9) in the genomic DNA (see below). The sequences of mRAR-O3 to a7 (as shown in Figure 3B -F) were initially all derived from cDNA clones isolated using RA-treated F9 EC cell mRNA; however, cDNAs with identical sequences were subsequently isolated from a

Retinoic acid receptor a cDNA isoforms

mRAR-ct3 and c4 remain open to the very 5' ends of their

number of other mRNA sources, including adult mouse skin, lung and brain, untreated F9 EC cells, as well as RA-treated and untreated P19 EC cells. The characterization of mRAR-a3 to Oa7 cDNAs is preliminary and their sequences most likely extend further in the 5' direction. Note that none of the clones described have any of their 5' sequences in common, except for the first 29 nt of mRAR-a6 which are identical to the 29 nt located immediately upstream of the A/B region junction in mRAR-a5, and therefore belong to the same exonic sequence (E6, see Figures 3D and E and Figure 4). The ORFs of

sequences, leaving open the possibility that these isoforms (analogously to mRAR-a 1 and ca2) also encode N-terminally different mRAR-as. Note that the putative A region-coding sequences of mRAR-ca3 and ct4 contain a number of in-frame (with respect to the ORF encoding the B -F regions of RAR-ca) CTG codons (boxed in Figure 3B and C) which could possibly function as initiators of translation (see e.g. Bernards and de la Monte, 1990; Lemaire et al., 1990). In mRAR-ca5, a6 and (7, the sequences upstream of the A/B region junction contain in-frame termination codons and

h/mRAR- a1 human -176 -TTC-A--A-G--TGG----------------------- A-------mouse -177 CACTAGCTGGAAGGCTCTCCTTTAAGACTCGCCCTTCCCTGGGTCTATT -128 -128 -69 -71

-12

-13

IIRAR- ca1

--------GCG--------C------C--TA--T-------T-AC-G--------T-GAG 3TGTGAGCATGAGGAt CCGGTTAGTGATTGGCCCG

AAGGAGAG

Promoter

-G---GG---------------A--G- CA--------- --T---T-T-----G---T GGCATGCA GGCG CCTGCGGCAGGGGGGCCTCCCTCCCACCCCAGCTCCATACC j

+1

---------TTG-------------------------C-------------T CCCCAGTCTAGGGCTCCGGGTAAAGTTTCAGCCTCTGCACGTGACTCGCC

CC

47 47

-------------A-------------------C---------T--------------CC ATGGCCGCTGCCGTCGCCCCGCGCCCCTGAGCTGCGGCCCCCCGGACGGCTCCTCTCGG

106 106

G-------G-- ------T----A-----A--AG---GC-G-G--A--------G---AGGACCCCACACCCCTGACGCCGGGCAGCGCCGCGGCCGCTGCTTGGGGCAGACCCTGT

164 165

--T---T-G---C-C-A---------T----- ----G------T-----------GGCCCGCAGACACGCAG ACGGGTTGCATCCCCTCGCTTCGAGCACCACAGGACATGCCT

221 223

----T----------G---------A--- -----------T-T----G-G-------CCCCCCAGCCACCTACCTGGGGCCCTTCTGGGAGTGGCATCCTCTTTGATACCCTGAAG

279 282

G-----------------C-GG----- ACCAG---A---A--T----G---------TCTCCCAGGACCGCTTAACCAAAGGAC ACCAGCTCTGGACCTTCCTA CAAAAGTG

-

338 337

C-GC-----A------------C-----------

--A-------T-----CT----G

If

395 396

--GCCCT---TGGA-AG---CTC---G-CA-- -C----------G-GA--AGAGACTAAAGTTGACTAACTTGGGACTTGCAGCTCCAGGGACAGAAGGGGGTGGGGGTGG

449 455

------------- A----A----------GTG ---A---------------GCTGACCACCCAACCCCCCCTCTGGGCCCCCCGGCCCCATGCCCCGAGGAAGAGGGACT

502 514

-TC-----C---C--G-GG-CC-------------C---------------C GCGTGAAGGCCATCACA ACTACCTGCCAGACTGTTTGCCTGCTCTTCTGACTGTGGCT

560 572

--- --- --- --C --- --C --- --AC G --- -G_____ GCTTGGC ATG GCC AGC AAT AGC AGT TCC TGC CCA ACA CCT GGG GGC _ Met Ala Ser Asn Ser Ser Ser Cys Pro Thr Pro Gly Gly



GGATTCTTGGGACATCCCCCAATCCACCTGGCCCCTCCGGCTAGGGAGGGGGACCCAGA

14

--- --- --- --- --- --- --G --G --T --- - --- --- --C GGG CAC CTC AAT GGG TAC CCA GTA CCC CCC TAGC ACCG TC CTTT Gly His Leu Asn Gly Tyr Pro Val Pro Pro Tyr Ala Phe Phe Phe

651 663 29

G --A ------G --- -CT CCC CCT ATG CTG GGT GGA CTC TCC CCA CCC GGC GCT CTC ACC AGC Pro Pro Met Leu Gly Gly Leu Ser Pro Pro Gly Ala Leu Thr Ser

696

-- -----T --- --A --T --- --- -- A --- ----------CTC CAG CAC CAG CTT CCA GTC AGT GGT TAC AGC ACA CCG TCC CCA Leu Gln His Gln Leu Pro Val Ser Gly Tyr Ser Thr Pro Ser Pro

606 618

Thr

708 44 741

753 59

Exon 1

---

---

---

---

---

---

---

---

--

)

Exon 2

Al1

___ 7- --T --- --- --- --- --- --- -- T --------G --GCC ACC ATC GAG ACC CAG AGC AGC AGT TCC GAA GAG ATA GTA CCC B Ala Thr Ile Glu Thr Gln Ser Ser Ser Ser Glu Glu Ile Val Pro

Exon 9

Fig. 1. Alignment of the human (h) and mouse (m) RAR-cel proximal promoter sequences and cDNA 5' regions. Gaps have been introduced for optimal alignment. In the hRAR-ca sequence, only the nucleotides and amino acids differing from those of mRAR-acl are given; the conserved bases are indicated by dashes. The 5'-UTR and mRAR-a 1 promoter sequences are denoted by brackets, region Al is boxed and two in-frame stop codons as well as the beginning of the sequence corresponding to the B region (exon 9) are underlined. An open vertical arrowhead and a filled triangle designate El1/E2 and E2/E9 boundaries, respectively (see text). A broken arrow (+ 1) points to the transcription start site determined for the human RAR-ao mRNA (Brand et al., 1990). Filled vertical arrow indicates the 5' end of the mRAR-cel (F9.1) cDNA clone. The consensus-like transcription initiator core sequence surrounding the transcription start site is underlined with a striped rectangle. Four possible binding sites for the transcription factor Spl (based on the consensus core sequence 5'-GGGCGG-3', Mitchell and Tjian, 1989) are boxed. Nucleotide and amino acid numbers are given on the left side of the figure.

61

P.Leroy et al.

(A)

5'-UTR h/mRAR- al + -12 ---------TT-----G----

human

-13

CTCCGCCCCCCCCAGTCTAGGGC

hsORF1

MetAlaAlaAlaIleAlaProAr

mouse

11---------------------C-------------T------------A---------11 TCCGGGTAAAGTTTCAGCCTCTGCACGTGACTCGCCATGGCCGCTGCCGTCGCCCCGCG

msORF1 MetAlaAlaAlaValAlaProAr hsORF2 MetPro gProEnd 70 ---------C---------T--------C-----CCG-------G-- ------T---70 CCCCTGAGCTGCGGCCCCCCGGACGGCTCCTCTCGGAGGACCCCACACCCCTGACGCCG gProEnd Exon 1

SerSerThrArgAlaProGlyEnd

128 A-----A--AG---GC-G-G--A--------G------T---T-G---C-C-A-----129 GGCAGCGCCGCGGCCGCTGCTTGGGGCAGACCCTGTGCCCGCAGACACGCAG ACGGGT 4 exon 1 MetAl aProSerAlaThrEnd 187 ---T----- ----G--- -T-----------G- ----T----------G------187 TGCATCCCCTCGCTTCGAGCACCACAGGACATGCCTCCCCCCAGCCACCTACCTGGGGC

7hsORF3

msORF2 MetProProProSerHisLeuProGlyAl 244 --A--- -----------T-T----G-G--------G-------------C---C-GG246 CCTTCTGGGAGTGGCATCCTCTTTGATACCCTGAAGACCAGCTCTGGACCTTCCTA CA

G

1 -_ LO)

aLeuLeuGlyValAlaSerSerLeuIleProEnd 302 -----iCCAG---A---A--T----G----------C-GC-----A------------C TCTCCCAGGACCGCTTAACCAAAGGACGGATTCTTGGGACATCCCCCAAT 304 AAAGTG

Exon 2

--GCCC 361 ----------- --A-------T-----CT----G-----G-GA--AG360 CCACCTGGCCCCTCCGGCTAGGGAGGGGGACCCAGAAGACTAAAGTTGACTAACTTGGG

415 T---TGGA-AG---CTC---G-CA-- -C------------------- A----A--419 ACTTGCAGCTCCAGGGACAGAAGGGGGTGGGGGTGGGCTGACCACCCAACCCCCCCTCT 471 ---

hsORF4 MetProArgG

---A----------------

lyGlyValValEnd

C----C--G-GG-CC-----GTG-TC-

478 GGGCCCCCCGGCCCCATGCCCCGAGGAAGAGGGACTGCGTGAAGGCCATCACA ACTAC

msORF3 MetProArgGlyArgGlyThrAlaEnd -C--------------C------525 ------------C-----

A 536 CTGCCAGACTGTTTGCCTGCTCTTCTGACTGTGGCTGCTTGGC ATG

1 AGC..

Al

h/mRAR-al Exon 1 - Exon 2 boundary

(1I@ ( donor site

Exon 1 4-I

204

human

CTCGCTGGCACCCCGAACGGGTTGCTTCCCC CGCTGCGAGgtaattcctcccctggggattttggggagcccctggttcc CCCGCAGACACGCAG ACGGGTTGCATCCCCTCGCTTCGAGgtaacctctcccct ggaaccctggggaaccctcgattctc

mouse '-N

(6 acceptor

205

site

.4j

ttcttcagtgcccactcacc gc

r-¢

Exon 2

205

human

tgtctttcttttttctgcagCATCACAGGACATGGC CCCCTCAGCCACCTAGCTGGGGCC

tcagtgcccacttacctgtgtctttcctgtttctgcagCACCACAGGACATGCCTCCCCCCAGCCACCTACCTGGGGCC

mouse

206

Fig. 2. (A) Alignment of the 5'-UTR sequences of human (h) and mouse (m) RAR-al cDNAs, nucleotides -12 and -13 to 575 and 587, respectively, as shown in Figure 1. Four human and three mouse sORFs (hsORFl to 4 and msORFI to 3, respectively) are shown together with their predicted amino acid sequences. All other symbols are as in Figure 1. (B) Alignment of human and mouse exon I (donor site as shown in a) and exon 2 (acceptor site as shown in b) nucleotide sequence and immediately adjacent intronic sequences. Exon and intron DNA sequences are represented by upper and lower case letters respectively. Exon/intron junctions are indicated in a and b by broken arrows. Nucleotides are numbered as in Figure 1.

there are no ATGs. However, mRAR-a6 contains a possible CTG initiation codon located immediately upstream of the A/B region junction (Figure 3E). In order to determine whether the mRAR-ct3 to Oc7 cDNA isoforms shown in Figure 3B-F extend further 3' and contain sequences encoding the common B - F regions of mRAR-a, additional PCR analysis was carried out on 62

cDNAs synthesized [using an oligo(dT) primer] from

poly(A)+ and total RNAs of various sources. For each isoform (mRAR-ol3 to a7), the 5' primer used for cDNA

amplification was specific [i.e. it corresponded to a sequence (available upon request) located upstream from the A/B junction], whereas the 3' primer (5'-GGTACCCAGTTCTGTCTGAGAGGACA-3') was chosen from the mRAR-a

Retinoic acid receptor a cDNA isoforms

mRAR- a2 1

GACTCTGCCTGGGACACCGGAGCCTCGAGTTCTTGGGCCACCGGGCGTTCCCTGCGAG

59

GGAGCGCCGAGGGACCTCCTCCTCCAACCTGCACAACTTCCCGCTTTTCACCTGGGACC

118

GGTGGAGTGACCTGCAGACTTAGGCGCGGGCAATTCTGGGCTGGGCGCCCCGCTGAGGG

177

GCTGCCCGTCGGGGCGAAGGGACTTGTCCCCCCCCCCCCCTGCCCCTAGCCGATTTGCA

? co

236

CGGATGGCGCCTCCAGCAGAGTGACGAGATCCCCGTGCAGGACTTCCCCTCCTGGATCT

295 1

CCGGCCTTCGTGGGGCGAG ATG TAC GAG AGT GTG GAA GTC GGG GGC CTT J Met Tyr Glu Ser Val Glu Val Gly Gly Leu

344 11

ACC CCC GCC CCT AAC CCC TTC CTA GTG GTG GAC TTT TAT AAC CAG Thr Pro Ala Pro Asn Pro Phe Leu Val Val Asp Phe Tyr Asn Gln

389

AAC CGG GCC TGT TTG CTC CAG GAG AAG GGG CTC CCT GCC CCG GGT Asn Arg Ala Cys Leu Leu Gln Glu Lys Gly Leu Pro Ala Pro Gly

26

434 41

479 56

E

Exon 3

A2

CCC TAC TCC ACC CCA CTC CGG ACT CCG CTT TGG AAT GGC TCA AAC Pro Tyr Ser Thr Pro Leu Arg Thr Pro Leu Trp Asn Gly Ser Asn CAC TCC ATC GAG ACC CAG AGC AGC AGT TCC GAA GAG ATA GTA CCC His Ser Ile Glu Thr Gln Ser Ser Ser Ser Glu Glu Ile Val Pro

R

EExon

9

EExon

4

EExon

9

mRAR-ax3 ... GAGTCCTTTGGACCCCTGGGGGTTGTAGGCATGGA

1

36

AAAGTCCTTTTCCATGGGTTAAAGAAGGGAACATGGGTTACAGCAACGTTGGACAGCCC

95

GGCTGATGTCCAGGCCCAAGTAGAAGCCAGGAAAGTC V TCCCTGGTGTGTTTGCAGCCATCGAGACCCAGAGCAGCAGTTCCGAAGAGATAGTACCC

154

ECAAGGCCATTGGCGGAGT

mRAR- ao4 1

557

@ CT 3CTTCO6 GCTGCT V GGCTACAGGCCTGGGAAGCCATCGAGACCCAGAGCAGCAGTTCCGAAGAGATAGTACCC ... GACTAAGAAGACCCACACTTCTGGCTTGGT

mRAR- o 5 114 173

1

GATTTGCCTGTCG

GGGCCTCCCGAGAAGGAAGTGAGGTGAAAACTGGGGCAGGGCAGTTGTGTGTAGGCATG V AGCAGCAGATTGCAGCAGCCATCGAGACCCAGAGCAGCAGTTCCGAAGAGATAGTACCC

mRAR- o6 1

GTGTAGGCATGAGCAGCAGATTGCAGCAGCCGCAGG

37

AGTCTTTCACAGCCTGGCATAACACATGCTGTTGGATGCTTGAGCACAGGAACCGTGTA

96

TATCCAGCACTC

c

CCATCGAGACCCAGAGCAGCAGTTCCGAAGAGATAGTACCC

mRAR- a7 220

1 TGGATTTGCCTGTCGGGCC V TCCCGAGAAGGAAGTGAGCCATCGAGACCCAGAGCAGCAGTTCCGAAGAGATAGTACCC

Intron 1

I

Exon 5 Exon 9 Exon 6 Exon 9

Exon 6 Exon 7 Exon 9 Exon 8 Exon 9

mRAR- ocNS

CTCCTCCTTCTCTCCTAGCCATCGAGACCCAGAGCAGCAGTTCCGAAGAGATAGTACCC

Exon 9

Fig. 3. Divergent sequences of mRAR-ci2 to O7 (A-F) cDNA isoforms as indicated. The first 41 nt of the B region sequence (underlined) are also shown. Sequence of a non-spliced (NS) mRAR-cr cDNA clone is shown in G. t marks the intron/exon junction. In A-G, the nucleotides and amino acids are numbered on the left side of each sequence and in-frame termination codons upstream from the A/B junction are underlined. Exons 3-9 are indicated on the right. The filled triangle in A -F indicates the A/B region junction. (A) The 5' sequence of mRAR-ca2 cDNA. Region A2 is boxed and the 5'-UTR and the beginning of the B region are as indicated on the right side of the figure. (B) and (C) 5' sequences of mRAR-cO3 and c4 cDNAs. The dots preceding the 5' limit of both sequences show that the encoded ORFs remain open. Putative in-frame non-AUG translation initiation codons (CTG) are boxed. (D-F) 5' sequence of rnRAR-cx5 to c7 cDNAs. In (E) an open triangle indicates putative E6/E7 exon boundary and a CTG triplet (see above) is boxed.

3'-UTR sequence which was shown to be common to mRAR-al and ct2 isoforms (see above). In each case, irrespective of the initial poly(A)+ mRNA source used,

gene

specific PCR-amplified cDNAs of the expected sizes were obtained. After Southern blotting these cDNAs hybridized with 32P-labelled isoform specific oligonucleotide probes, 63

Pleroy et al. 5'-UTR+ A mRAR-al 1

B

TAG

TGA ATG

37

561579

s

El1

1i5

i

r

TGA

s

E9

.

ATG

A.2

25i i14

TGA

482

Eg

E3

(ATG)

V

we .... W.Wwww

mRAR- a3

|F 3'-UTR TGA

'

E2 TGA TGA

mRAR-a2

E

1(Al

IJ

---: ::---205

IC |D

171

TGA

Southern blot analysis and partial sequencing of the three mouse mRAR-ae genomic DNA clones XG 1, XG2 and XG3 mRAR-a led to the establishment of the restriction map shown in Figure 4B. The mRAR-al to a7 specific sequences were localized upstream from the exon which encodes the B region (E9), and were contained within -30 kb of genomic DNA sequence (data not shown). The 5'-UTR and A region sequences of hRAR-o 1, the human counterpart of mRAR-a 1, are encoded in two exons (El and E2) separated by 12 kb of intronic sequences (Brand et al., 1990; see also Figure 1). The first exon (El) was shown to be contiguous with a functional hRAR-acl promoter (Brand et al., 1990; see also Figure 1). Human sequences lying around within this first exon were used as a 2P-labelled probe (-200 bp EcoRI -BstEII DNA fragment from the very 5' end of the human T2/T20 cDNA, see Brand et al., 1990) to isolate a 2 kb Sall -NotI DNA fragment corresponding to the 5' end region of the mouse genomic DNA clone XGI mRAR-cx (Figure 4B). Subcloning and partial sequencing of this DNA fragment revealed sequences homologous to the proximal human promoter and to the first exon of hRAR-a 1 cDNA (Figures 1 and 2B, nt -177 to +205). Moreover, mouse genomic DNA sequences found immediately downstream from the nucleotide corresponding to the exact 3' end of the human exon 1 (nt 204/205, open arrowhead in Figure 1) corresponded to a splice donor site sequence (Figure 2B, a) (Shapiro and Senapathy, 1987). Two oligonucleotides from either the 5'-UTR or the Al region of the mRAR-alI cDNA sequence (nt 401-425 and nt 589 -629, respectively; see Figure 1) were end-labelled with 32P and used to isolate a 2.8 kb EcoRI-SalI DNA fragment from the 3' end of the genomic DNA clone XGI mRAR-ce (not shown). This fragment was subcloned and partially sequenced. It contained contiguously all of the Al region and the 5'-UTR region (Figure 1, nt 309-755) of the mRAR-oa 1 isoform cDNA clones isolated here (data not shown). Moreover the genomic DNA sequences found in this fragment, upstream from the nucleotide corresponding to the very 5' end of the mRAR-(x 1 cDNA sequence (arrow in Figure 1; nt 206-308), were highly similar to the most 5' sequences of human exon 2 (Brand et al., 1990) (Figure 1, nt 205-306). Thus the mouse genomic DNA clone (XGI-mRAR-ai) appears to contain both exon 2 (E2) and exon 1 (El) of mRAR-a 1. This conclusion was further supported by the finding that the mouse genomic 5' sequences adjacent to the nucleotide numbered 206 (Figure 1) correspond to a consensus splice acceptor sequence (Figure 2B, b). At the 3' end of exon 2 (i.e. downstream of the sequence encoding region Al), the DNA sequence of the genomic clone XG 1 mRAR-a diverges from the mRAR-at 1 cDNA sequence within the codon for Thr6O (Figure 1), and displays a potential splice donor site (data not shown). A full-length mRAR-e 1 cDNA containing the entire exon El sequences remains to be isolated. However, taken together, the present data indicate that the genomic organization of the 5' region of mouse and human RAR-ct genes is similar. Furthermore using PCR (data not shown), we were able to show that both exons El and E2 are indeed spliced in the mRAR-a l mRNA as indicated in Figures 2 and 4A. Thus, the mouse Al region and adjacent 5'-UTR (Figures 1 and 4) are contiguously encoded in two exons -

v(ATG)

mRAR-a4

74

TGA

E5 TGA

mRAR-a5

ATG TGA

1 9330

E6

mRAR-a6 E6 E7 ATG

TGA

mRAR- a7 1 34 37

90% of them corresponded to mRAR-a 1 or mRAR-c2 sequences, regardless of the origin of the RNA. Note that the sequences of the mRAR-c3 to a7 cDNAs reported here were obtained at least twice, by sequencing the DNAs of independently isolated PCR clones. Cloning of genomic clones Three X EMBL3 mouse genomic DNA libraries [Sau3A fragments (partial digest) of mouse genomic DNA from either LMTK-cells, C57BL/6 or Sv/129-mouse strains, cloned into the BamHIsite of the X EMBL3 vector]

68

were screened following standard methods (Ausubel et al., 1987), by using either the entire insert or a part (nt 401-502 in mRAR-al sequence, see Figure 1) of the 5'-UTR of the cloned mRAR-a cDNA (Zelent etal., 1989) as 32P-labelled probes by random priming (Ausubel et al., 1987). Ten positive clones were obtained and further mapped (see Results). Three overlapping genomic clones (XGI, XG2, XG3 mRAR-a) were shown to lie within the genomic region of interest (Figure 4B). DNA fragments (see Results) derived from either XG1 or XG2 mRAR-a were subcloned using standard techniques into the Bluescript SKII(+) vector (Stratagene) for

sequencing.

Sequencing of genomic DNA and cDNA clones Sequences shown here were obtained on both strands using Sequenase (Stratagene) and standard dideoxy sequencing methodology (Ausubel et al., 1987). Sequencing gels containing 8% acrylamide-bisacrylamide/7 M urea/40% formamide were run in order to resolve GC-rich DNA sequences. mRAR-a 1 and mRAR-a2 specific probes Specific probes were prepared by PCR (Schowalter and Sommer, 1989) using mRAR-al or a2 cDNAs subcloned into Bluescript II SK+ vector (Stratagene) as templates. The 5'-UTR region sense oligonucleotides (5'GTCTCCCAGGACCGCT-3' and 5'-TCCCTGCGAGGGAGCGCCGA-3' for mRAR-ca and a2, respectively) and the A-region antisense oligonucleotides (5'-GGCTGGGGACGGTGTGCTGT-3' and 5'-GTGGTTTGAGCCATTCCAAA-3' for mRAR-al and ca2, respectively) were used as primers in the PCR reactions. All other conditions were as described (Zelent et al., 1991). Unincorporated nucleotides were removed by Sephadex G-50 filtration (Ausubel et al., 1987). The specific activities of these two 32p labelled probes, which both correspond to a DNA fragment of -450 bp, were in each case - 108 c.p.m./yg of DNA. In either Northern or Southern blot (see below) analyses, probes were used at 106 c.p.m./ml. RNA isolation and Northern blot analysis RNA was isolated using the GnSCN-CsCl procedure (Chirgwin et al., 1974). Poly(A)+ RNA (Aviv and Leder, 1972) was electrophoresed on 1 % agarose- 1.1 M formaldehyde gels (Lehrach et al., 1977) and transferred to BAS 85 nitrocellulose filters as recommended by the supplier (S & S). Hybridization with either the mRAR-al or ca2 specific 32P-labelled probe (see above) was carried out at 42°C for 14 h in 50% formamide, 5 x SSPE (Ausubel et al., 1987), 0.2 mg/ml sheared and denatured salmon sperm DNA and 1 x Denhardt's reagent (Ausubel et al., 1987). Blots were washed at 420C in 0.2 x SSPE, and exposed to Kodak XAR-5 film at -800C with two intensifying screens for 36 h. Following the exposure, the filters were rehybridized with an actin 32P-labelled cDNA probe as previously described (Zelent et al., 1989). Hybridization signals were quantified using a GS300 scanning densitometer (Hoefer Scientific Instruments, San

Francisco). Southern blotting Human and mouse genomic DNAs (20 aig), isolated using standard methods (Ausubel et al., 1987) from lymphocytes and from the C57BL/6 mouse strains, respectively, were digested with BamHI, BglII, KpnI and EcoRI restriction endonuclease under conditions suggested by the supplier (New England Biolabs). Digested DNAs were electrophoresed through a 0.8% agarose gel in I x TAE (Ausubel et al., 1987) and transferred to Hybond-N membranes using the manufacturer's conditions (Amersham). Hybridization with the 32P-labelled mRAR-a2 probe (see above) was done at 42°C in 35% formamide, 5 x SSPE, 0.2 mg/ml salmon sperm DNA and 0.5 x Denhardt's reagent. Filters were washed for 3 x 10 min at 60°C in 2 x SSPE and exposed for 6 days to Kodak XAR-5 film at -80°C with two intensifying screens.

Acknowledgements We thank Professor K.Shudo and Dr Y.Hashimoto for the kind gift of Am8O, F.Ruffenach and A.Staub for oligonucleotide synthesis, the cell group for cells, C.Reibel for technical assistance, C.Werle and B.Boulay for illustrations and the secretarial staff for typing the manuscript. This work was supported by the Institut National de la Sante et de la Recherche Medicale (Grant CNAMTS), the Centre National de la Recherche Scientifique, the Fondation pour la Recherche Medicale Francaise and the Association pour la Recherche contre le Cancer. A.Z. was the recipient of a fellowship from the American Cancer Society. C. M. was the recipient of a fellowship from NIH (GM 13597).

Retinoic acid receptor

References Acland,P., Dixon,M., Peters,G. and Dickson,C. (1990) Nature, 343, 662-665. Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D., Seidman,J.G., Smith,J.A. and Struhl,K. (1987) Current Protocols in Molecular Biology. Wiley, New York. Aviv,H. and Leder,P. (1972) Proc. NatI. Acad. Sci. USA, 69, 1408-1412. Beato,M. (1989) Cell, 56, 335-344. Benbrook,D., Lernhardt,E. and Pfahl,M. (1988) Nature, 333, 669-672. Bernards,A. and de la Monte,S.M. (1990) EMBO J., 9, 2279-2287. Brand,N.J., Petkovich,M., Krust,A., Chambon,P., de The,H., Marchio,A., Tiollais,P. and Dejean,A. (1988) Nature, 332, 850-853. Brand,N.J., Petkovich,M. and Chambon,P. (1990) Nucleic Acids Res., 18, 6799-6806. Brockes,J.P. (1989) Neuron, 2, 1285-1294. Brockes,J.P. (1990) Nature, 345, 766-768. Chirgwin,J.M., Przybyla,A.E., MacDonald,R.J. and Rutter,W.J. (1974) Biochemistry, 18, 5294-5299. Clemens,M.J. (1989) Curr. Opinion Cell Biol., 1, 1160-1167. Cleveland,D.W. and Yen,T.J. (1989) Neu Biol., 1, 121-126. Deng,G. and Wu,R. (1984) Methods Enzymol., 100, 96-116. de The,H., del Mar Vivanco-Ruiz,M., Tiollais,P., Stunneberg,H. and Dejean,A. (1990) Nature, 343, 177-180. Dolle,P., Ruberte,E., Kastner,P., Petkovich,M., Stoner,C.M., Gudas,L. and Chambon,P. (1989) Nature, 342, 702-705. Dolle,P., Ruberte,E., Leroy,P., Morriss-Kay,G. and Chambon,P. (1990) Development, 110, in press. Eichele,G. (1989) Trends Genet., 5, 246-251. Erba,H.P., Eddy,R., Shows,T., Kedes,L. and Gunning,P. (1988) Mol. Cell. Biol., 8, 1775-1789. Evans,R.M. (1988) Science, 240, 889-895. Giguere,V., Ong,E.S., Segui,P. and Evans,R.M. (1987) Nature, 330, 624-629. Giguere,V., Shago,M., Zirngibl,R., Tate,R., Rossant,J. and Varmuza,S. (1990) Mol. Cell. Biol., 10, 2335-2340. Green,S. and Chambon,P. (1988) Trends Genet., 4, 309-314. Hassouna,N., Michot,B. and Bachellerie,J.P. (1984) Nucleic Acids Res., 12, 3563-3583. Hinnebusch,A.G. (1990) Trends Biochem. Sci., 15, 148-152. HoriuchiT., Macon,K.J., Kidd,V.J. and Volanakis,J.E. (1990) J. Biol. Chem., 265, 6521-6524. Jackson,R.J. and Standart,N. (1990) Cell, 62, 15-24. Jacobson,A.B., Good,L., Simonetti,J. and Zuker,M. (1984) Nucleic Acids Res., 12, 45-52. Kagechika,H., Kawachi,E., Hashimoto,Y., Hi,i.-T., Shudo,K. (1988) J. Med. Chem., 31, 2192-2199. Kastner,P., Krust,A., Mendelsohn,C., Garnier,J.M., Zelent,A., Leroy,P., Staub,A. and Chambon,P. (1990) Proc. Natl. Acad. Sci. USA, 87, 2700-2704. Kozak,M. (1989) Mol. Cell. Biol., 9, 5134-5142. Krust,A., Green,S., Argos,P., Kumar,V., Walter,Ph., Bornert,J.M. and Chambon,P. (1986) EMBO J., 5, 891-897. Krust,A., Kastner,P., Petkovich,M., Zelent,A. Chambon,P. (1989) Proc. Natl. Acad. Sci. USA, 86, 5310-5314. LaRosa,G.J. and Gudas,L.J. (1988) Proc. Natl. Acad. Sci. USA, 85,

a

cDNA isoforms

Schowalter,D.B. and Sommer,S.S. (1989) Anal. Biochem., 177, 90-94. Segraves,W.A. and Hogness,D.S. (1990) Genes Dev., 4, 204-219. Shapiro,M.B. and Senapathy,P. (1987) Nucleic Acids Res., 15, 7155 -7174. Simeone,A., Acampora,D., Arcioni,L., Andrews,P.W., Boncinelli,E. and Mavilio,F. (1990) Nature, 346, 763-766. Slack,J.M.W. (1987a) Nature, 327, 553-554. Slack,J.M.W. (1987b) Trends Biochem. Sci., 12, 200-204. Smale,S.T. and Baltimore,D. (1989) Cell, 57, 103-113. Smale,S.T., Schmidt,M.C., Berk,A.J. and Baltimore,D. (1990) Proc. Natl. Acad. Sci. USA, 87, 4509-4513. Sucov,H.M., Murakami,K.K. and Evans,R.M. (1990) Proc. Natl. Acad. Sci. USA. 87, 5392-5396. Summerbell,D. and Maden,M. (1990) Trends Neurosci., 13, 142-147. Tasset,D., Tora,L., Fromental,C., Scheer,E. and Chambon,P. (1990) Cell, 62, 1177-1187. Thaller,C. and Eichele,G. (1987) Nature, 327, 625-628. Tickle,C., Alberts,B., Wolpert,L. and Lee,J. (1982) Nature, 2%, 564-566. Tora,L., Gronemeyer,H., Turcotte,B., Gaub,M.-P. and Chambon,P. (1 988a) Nature, 333, 185-188. Tora,L., Gaub,M.-P., Mader,S., Dierich,A., Bellard,M. and Chambon,P. (1988b) EMBO J., 7, 3771-3778. Tora,L., White,J., Brou,C., Tasset,D., Webster,N., Scheer,E. and Chambon,P. (1989) Cell, 59, 477-487. Vasios,G.W., Gold,J.D., Petkovich,M., Chambon,P. and Gudas,L.J. (1989) Proc. Natl. Acad. Sci. USA, 86, 9099-9103. Wang,S.Y. and Gudas,L.J. (1983) Proc. Natl. Acad. Sci. USA, 80, 5880-5884. Zelent,A., Krust,A., Petkovich,M., Kastner,P. and Chambon,P. (1989) Nature, 339, 714-717. Zelent,A., Mendelsohn,C., Kastner,P., Krust,A., Garnier,J.-M., Ruffenach,F., Leroy,P. and Chambon,P. (1991) EMBO J., 10, 71-81. Zuker,M. and Stiegler,P. (1981) Nucleic Acids Res., 9, 133-148.

Received on September 17, 1990

Note added in proof The sequence data in this paper are available from EMBL under the following accession numbers: X56572(RAR-a 1), X56565(RAR-a2), X56571(RAR-

a3), X56570(RAR-ce4), X56568(RAR-ca5), X56567(RAR-ce6), X56566(RAR-O7).

329 -333.

Lehrach,H., Diamond,D., Wozney,J.M. and Boedtker,H. (1977) Biochemistry, 16, 4743-4751. Lemaire,P., Vesque,C., Schmitt,J., Stunnenberg,H., Frank,R. and Charnay,P. (1990) Mol. Cell. Biol., 10, 3456-3467. Loh,E.Y., Elliott,J.F., Cwirla,S., Lanier,L.L. and Davis,M.M. (1989) Science, 243, 217-220. Maden,M. (1982) Nature, 295, 672-675. Maden,M. (1985) Trends Genet., 1, 103-104. Mitchell,P.J. and Tjian,R. (1989) Science, 245, 371-378. Murphy,S.P., Garbern,J., Odenwald,W.F., Lazzarini,R.A. and Linney,E. (1988) Proc. Natl. Acad. Sci. USA, 85, 5587-5591. Okamoto,K., Okazawa,H., Okuda,.A., Sakai,M., Muramatsu,M. and Hamada,H. (1990) Cell, 60, 461-472. Petkovich,M., Brand,N.J., Krust,A. and Chambon,P. (1987) Nature, 330, 44.4-450. Raynal,F., Michot,B. and Bachellerie,J.P. (1984) FEBS Lett., 167, 263 -268.

Ruberte,E., Dolle,P., Krust,A., Zelent,A., Morriss-Kay,G. and Chambon,P. (1990) Development, 108, 213-222. Ruberte,E., Dolle,P., Chambon,P. and Morriss-Kay,G. (1991) Development, 111, in press.

69

Multiple isoforms of the mouse retinoic acid receptor alpha are generated by alternative splicing and differential induction by retinoic acid.

Together with the previously described mouse retinoic acid receptor alpha-1 (mRAR-alpha 1, formerly mRAR-alpha 0), we have isolated and characterized ...
2MB Sizes 0 Downloads 0 Views