Nucleic Acids Research, Vol. 19, No. 2 297

A single nucleotide polymorphism in an exon dictates allele dependent differential splicing of episialin mRNA Marjolijn J.L.Ligtenberg, Annemiek M.C.Gennissen, Hans L.Vos and John Hilkens Division of Tumor Biology, The Netherlands Cancer Institute (Antoni van Leeuwenhoekhuis), Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands Received October 22, 1990; Revised and Accepted November 26, 1990

ABSTRACT The episialin gene (MUC1) encodes an epithelial mucin containing a variable number of repeats with a length of twenty amino acids, resulting in many different alleles that can be subdivided into two size classes. The episialin pre-mRNA uses either one of two neighbouring splice acceptor sites for exon 2, which mainly encodes the repeats. Using the genetic polymorphism of the episialin gene to identify different alleles, we show here that the splice site recognition is allele dependent and is based on a single A/G nucleotide difference in exon 2 eight nucleotides downstream of the second splice acceptor site. Transfection experiments confirm that this polymorphic nucleotide regulates the splice site selection. The identity of this nucleotide is in most cases correlated with one of the size classes of the alleles, indicating that mutations altering the number of repeats seldom arise by unequal cross-over between the repeat regions.

INTRODUCTION The mechanism by which introns in pre-mRNA are accurately recognized and processed is incompletely characterized. Although it is clear that the nucleotide sequences of the intron borders and the branch site are of great importance for the splicing process, it is as yet impossible to accurately predict the location of an intron in higher eukaryotic DNA, indicating that additional sequences determine splice site selection. Especially in the case of alternative splicing, the mechanism by which the actual splice site is selected is in most cases completely unknown. Here we describe that exon sequences are of utmost importance for selection of the alternative splice acceptor sites in the episialin gene . In fact, a single nucleotide difference can determine which of the two sites is selected. Episialin cDNA has recently been cloned by several groups (1,2,3,4). Episialin is encoded by a single copy gene (MUC1) that is highly polymorphic due to a variable number of 60 bp repeats in its coding region, resulting in different alleles (1,2,5,6). These alleles can be divided into two size classes: one with a relatively small (between i 30 and ± 45) and the other with a large number (between ± 60 and 90) of repeats. Two splice variants have been described that differ in the alternative use of the splice acceptor site for exon 2 (1,2). Splice

variant A uses a splice acceptor site located 27 bp upstream of the one used by variant B (figure 1A). The protein products differ by 9 amino acids near their N-terminal ends and have different putative cleavage sites for their signal peptides (1). Episialin is a transmembrane mucin-type glycoprotein with a high content of serine, threonine and proline residues. Many of these serine and threonine residues are modified by 0-linked glycosylation (7,8). Episialin is expressed in glandular epithelial cells. Its expression is elevated in carcinomas (9), where it possibly affects cellular interactions. In this study the relative use of the alternative splice sites was analyzed using the polymerase chain reaction. We show that the splice site selection is allele dependent and is due to a single nucleotide polymorphism just downstream of the second splice acceptor site of the first intron.

MATERIAL AND METHODS Southern and Northern blotting DNA and RNA isolations, Southern and Northern blotting were performed as described before (1). The blots were hybridized with either a randomly primed episialin cDNA probe located outside the repeat region or a variant B specific antisense oligonucleotide probe 5'GAACCTGTAACAACTGTAAGCACTG 3' (complementary to the last 12 nucleotides of exon 1 and 13 nucleotides immediately downstream the splice acceptor site used by variant B), that had been end-labeled using T4 polynucleotide kinase. Hybridization and washing were performed using stringent conditions. Reverse transcription and polymerase chain reaction (PCR) To synthesize single stranded cDNA either 1.5 yg of poly A+ RNA or 5 ,tg of total RNA was reverse transcribed for 60 min at 37°C with 25 units of avian myeloblastosis virus reverse transcriptase (Boehringer, Mannheim) in cDNA buffer (50 mM Tris, pH 8.3; 40 mM KCI; 6 mM MgCl2) in the presence of 1 mM each of dATP, dCTP, dGTP and dTTP, and 200 ng of random primers. Two oligonucleotides were used in the polymerase chain reaction: 5' (GCCGGGA)TCCTTTCTTCCTGCTGCTGC 3', homologous with sequences located 38 to 19 nucleotides upstream of the splice donor site of exon 1, and 5' (GCCGGAATTC)GGGTAGAGCTTGCATGAC 3' complementary to sequences located 33 to 16 nucleotides downstream of the splice acceptor site for exon 2 used by variant

298 Nucleic Acids Research, Vol. 19, No. 2 B. The nucleotides between parentheses were added to the oligonucleotides to obtain restriction sites that facilitate cloning of the PCR products. Amplification reactions were carried out with one-tenth of the cDNA or 100 ng of genomic DNA, 5 /tM of each of the specific primers and 2 units of Taq polymerase (Perkin Elmer-Cetus, Berkeley, CA) in PCR buffer (10 mM Tris, pH 8.4; 25 mM KCl; 1 mM MgCl2) for 30 cycles. Each cycle

included denaturation at 92°C for 2 min, reannealing of the primers at 55°C for 1.5 min, and extension at 72°C for 1.5 min. After the last cycle the samples were incubated at 72°C for 5 min to generate blunt ends. Aliquots of the PCR products were analyzed on agarose gels. Cloning of the PCR fragments For cloning of the PCR fragments the products were digested with BamHI and EcoRI prior to electrophoresis. The region of the gel containing the products was isolated and melted at 65 °C. Ligation reactions were performed at 15°C in the presence of maximally 0.3 % agarose. Transfection assay Plasmid pJ3Q (10), with or without the region containing intron 1 of either variant A or B, was purified by banding in CsCl gradients. For transfection log-phase COS cells were cultured in 6-well plates until sub-confluency. Transfection was mediated by liposomes (DOTMA, Boehringer, Mannheim) according to the method recommended by the manufacturer. RNA isolation, reverse transcription of total RNA and PCR were performed as described above.

I

Figure 1. (A) Schematic representation of the alternative splicing of episialin, the position of the primers used for the PCR (arrows) and the variant A and B specific oligonucleotide probes. (B) Autoradiogram of a Northern blot carrying poly A' RNA of different episialin expressing cell lines hybridized with a nonrepetitive episialin probe showing the genetic polymorphism of the episialin gene. The additional bands present in lane 2 represent incompletely spliced messengers. lane 1, MCF-7; lane 2, ZR-75-1; lane 3, T47D; lane 4, MDA MB 157; lane 5, BT20; lane 6, CAMA-1. (C) An ethidium bromide stained 4% agarose gel containig PCR products of the region surrounding the alternative splice sites amplified from cDNA of the cell lines present in figure lB. (D) Autoradiogram of a Northern blot carrying the same RNAs as in figure lB hybridized with the oligonucleotide specific for variant B.

RESULTS Expression of splice variant A and B in cell lines and tissues We have analyzed episialin mRNA of six mammary carcinoma cell lines on Northern blots. The length of these RNAs varied as a result of the different number of repeats (figure iB). The 27 bp difference in length caused by the alternative splicing cannot be detected on Northern blots, but can be visualized after amplification of episialin cDNA by the polymerase chain reaction (PCR) using primers located on either side of the splice junction between exon 1 and exon 2. The length of the amplified region derived from the RNAs of variant A and B can be distinguished

Figure 2. (A) Autoradiogram of a Southern blot of Hinfl digested DNA of colon (lanes 1 to 6) or breast tissue samples of different tumor patients (lanes 7 to 15) hybridized with a non-repetitive episialin probe localized on the restriction fragments containing the repeats. (B) An ethidium bromide stained 4% agarose gel containing PCR products of the region surrounding the alternative splice sites in cDNA prepared from RNA derived from the tissues used in (A).

_.

and are different from that of the genomic DNA (figure lA). The relative amounts of both splice products varied significantly in the six mammary carcinoma cell lines (figure IC). In the cell lines MDA MB 157 and BT20 only products of variant B could be detected (lanes 4 and 5), whereas in the other cell lines PCR products derived from both splice variants were present in different relative amounts. The presence of splice variant A corresponded remarkably with the expression of mRNAs of the large size class, whereas the presence of variant B corresponded with the expression of small alleles (compare figure lB and IC). Moreover, in cell lines containing unequal ratios of both alleles, due to aneuploidy, the relative amount of splice variants differed accordingly (for example CAMA-1 Oane 6)). This correlation was confirmed by hybridization of a Northern blot carrying RNA of these different cell lines with both a variant A specific probe (located within the additional 27 bp) and a variant B specific probe (spanning the junction between exon 1 and exon 2) (figure 1A). The variant A specific oligonucleotide probe hybridized predominantly to the large mRNA species (results not shown), whereas the variant B specific oligonucleotide probe hybridized almost exclusively to the shorter RNA molecules containing fewer repeats (figure ID). The use of both splice sites in 15 episialin expressing breast or colon tumor samples of different individuals was investigated (compare figure 2A and 2B). Again, the

A

SV 40 early promiotor

Nucleic Acids Research, Vol. 19, No. 2 299

correlation was very strong, although not absolute. Six out of seven tissues containing only large alleles mainly express splice variant A, whereas five out of six tissues containing only small alleles predominantly express variant B. Two tissues that contain both a large and a small allele express equivalent amounts of both splice variants. These results strongly suggest that the selection of the splice acceptor site is allele dependent and is thus encoded by the episialin gene itself.

Cloning of genomic PCR products and transfection assay Since the correlation between the selected splice site and the length of the allele is not absolute, it is unlikely that the number of repeats dictates the selection of the splice acceptor site. We tried to delimit the region containing sufficient information to determine the alternative splicing. Therefore, DNA was isolated from tissues expressing either only variant A or variant B and part of the episialin gene was amplified using the same primers as before. The PCR products were cloned in the expression vector pJ3Q (10) (figure 3A). COS cells were transfected with the resulting plasmids, RNA was isolated 48 hours post-transfection, and analyzed for the use of either the first (corresponding to variant A) or the second (corresponding to variant B) splice acceptor site using the same PCR method as before. Figure 3B shows that the COS cells transfected with the construct containing

SV40 small t ivs

CAP

poly A

-.

/

I~~~~~ /\'

I

.1

"I

t 532

I

I

I

-I

r

I

a]

B

\I

I

0

Q

0

568

+RT 532

t 568

NO

..

568

::

_:

variant Alo Blo

-RT 532

NO

:

..........

.. ......

100 bp

..

....

... .. .....

4plasmid

Figure 3. (A) Schematic representation of the eukaryotic transcriptional domain of the transient expression vector pJ3Q and of the genomic PCR products cloned in the BamHI and EcoRI sites. (B) An ethidium bromide stained 4% agarose gel containing PCR products of the region surrounding the alternative splice sites using the primers as described above. Lanes 1 and 2 are control lanes, in which amplification products of cDNA clones of variant B and A are shown. Lanes 3 and 4 show PCR products of the reverse transcribed RNA preparations of the COS cells transfected with pJ3Q containing intron 1 of tissue 568 or tissue 532. The lower bands correspond with the PCR products of variant B and A (lanes 1 and 2) and represent spliced mRNAs, the upper band correspond with the insert region still containing the intron and is also present in the PCR products of the non-reverse transcribed RNAs (lanes 6 and 7) indicating that it is derived from copurified plasmid DNA. RNA isolated from pJ3i transfected COS cells does not give any signal (lanes 5 and 8), demonstrating the absence of a background in the assay.

300 Nucleic Acids Research, Vol. 19, No. 2 A

genomic:

S.A. variant A S.A. variant B CCAACCCTATGACMAACTTCTTACAG CTACCACAGCCCCTAAACCCGCAACAG TTGTrACRGGTTCTGG

variant A:

exon 1 ICTACCACAGCCCCTAAACCCGCAACAG I TTGTrACGGGTTCTGG

variant B:

exon

II TTGTTACAGGTrCTGG

variant A

B U

A

G

C

A

cUGG

G

G

U

C rG 3' U

C

C

A

A

LU A C ^>UC C U G U

3'

u

u

;G

A

A

C

C

s5' A

variant B G U

G

G

A

A

C A

G

C C

A

A5s A

Figure 4. (A) Sequence of the genomic DNA and the cDNAs surrounding the splice acceptor sites for exon 2. The vertical lines indicate the intron-exon boundaries present in variant A or B. (B) Schematic representation of the putative secondary structure of the region surrounding the downstream splice acceptor site in case of a G at position +8 (as in variant A) or an A in position +8 (as in variant B) (This G and A are encircled). Only the stem-loop structure of variant A is thought to be stable under physiological conditions. The arrows indicate the splice acceptor site. The cross in the arrow of variant A indicates that this splice site is not used in this variant.

the amplified DNA originating from tissue 532, expressing variant A, used the first and the transfectants containing DNA originating from tissue 568, expressing variant B, used predominantly the second splice acceptor site. These experiments indicate that the information for the splice site selection is localized within the amplified region that comprises the 3' region of exon 1, intron 1 and the 5' region of exon 2.

Determination of the difference between the genomic PCR products specifying variant A and B To determine the molecular basis of this effect, the amplified genomic regions were sequenced. No differences were found between the intron sequences of these PCR products, but a single nucleotide difference was observed in exon 2, eight nucleotides downstream of the second splice acceptor site (see figure 4A). In the DNA of tissue 532, expressing the A variant, a G was found at this position, whereas in DNA of tissue 568, expressing the B variant, an A was present. This substitution does not affect the amino acid sequence. This region was also analyzed in all the other tissue samples present in figure 2 using either the genomic DNA or the cDNA. In the PCR products in which the first splice acceptor site was used (variant A) a G was present in all cases, whereas in all products representing B variants an A was found (table 1). This was also observed in both tissue samples (571 and 584) for which the splice site choice did not correlate with the number of repeats. Thus, a 100% correlation is found between the presence of a G at position + 8 and splicing to the A variant; and an A at position + 8 and expression of the B variant. Since this is the only nucleotide difference between both genomic regions showing differential splicing when expressed in COS cells, the nucleotide at this position is most probably also responsible for the splice site choice in the original tissue.

Table 1. The length of the alleles and the splice variants found in tissue samples from different individuals together with the nucleotide present 8 nucleotides downstream of the splice acceptor site specific for variant B. tissue number 1.4 3.5 532 555 575 584 625 2.2

3.1 544 562 568 571 1.3 2.5

type

splice

of allele

variants

L L L L L L L S S S S S S L/S L/S

A A A A A B A B B B B B A/B A/B A/B

sequence at + 8 cDNA genomic g g g a

g g g g a

g a

a a a

a a

A-g/B-a A-g/B-a A-g/B-a

(L=large; S=small)

DISCUSSION The choice of the splice acceptor site of the second exon of the episialin gene is dictated by a single polymorphic nucleotide. Most alleles containing a large number of repeats have a G at the polymorphic site and splice to the upstream splice acceptor site resulting in variant A, whereas alleles containing a small number of repeats have an A at this position and splice to an acceptor site located 27 bp further downstream, resulting in splice variant B. This system is unique in that a naturally occurring allelic difference of only a single nucleotide can dictate the choice

Nucleic Acids Research, Vol. 19, No. 2 301

between splice acceptor sites in a human gene. This is in contrast with other naturally occurring alternative splice site selections, that are either developmentally regulated or stochastic (for review see 11). Therefore, the term differential splicing appears to be more appropriate for the splicing of the episialin gene. The mechanism employed in this system remains to be elucidated. The most likely interpretation is that this one-nucleotide change has an effect on the splice acceptor site of variant B. Two possible explanations are: an effect on a previously unidentified splicedetermining sequence or the sequestering of the splice acceptor site for variant B in a stem-loop structure (as described for both splice donor (12) and acceptor (13) sites). With respect to the first possibility, it is very striking that the A at position +8 is part of a CAG trinucleotide that is present within twelve nucleotides downstream of six out of seven splice acceptor sites in the episialin gene (1,2). However, no precedent is known in which this trinucleotide has been shown to be important, although in two systems AG dinucleotides, that are located downstream of the actual processing site, have been shown to play an important role in the splicing process. These downstream AG dinucleotides are essential for the selection of the splice sites, but the more upstream AG dinucleotides are used as the actual cleavage sites (14,15). The putative effect of the secondary structure is demonstrated in figure 4B. If a G is present at + 8, a stem-loop structure of the region surrounding the splice acceptor site can be formed that is much more stable than the one formed if an A is present at this position (the free energy of these stemloop structures is -7.1 and -3.1, respectively using the fold program of Zuker and Stiegler (16)). This putative secondary structure of the pre-mRNA could make the variant B splice acceptor site inaccessible to the splicing machinery, thereby favoring the variant A splice site. In addition to the very important role in splice site selection, the single nucleotide polymorphism enables us to assess the role of unequal cross-over in the generation of the variable numbers of repeats. Most alleles containing a relatively high number of repeats have a G at position +8, whereas most alleles containing a relatively small number of repeats have an A at this position. This indicates that little recombination has occurred between the alleles, although the repetitive sequences were assumed to be prone to unequal cross-over (17). A low recombination frequency has also been found for another polymorphic, repetitive locus (18). Thus, unequal cross-over does not appear to be the major mechanism in generating variable numbers of tandem repeats. This lack of meiotic and mitotic recombination could rationalize the continuous existence of two separate subpopulations of episialin alleles: one with 30 to t45 repeats and the other one with ± 60 to i 90 repeats. Probably these two size classes have been generated by a duplication of the repeat region and the minor variations in size are most likely due to mechanisms involving slippage during DNA replication. No striking differences in allele frequencies are observed. Therefore, neither the splice variants nor the number of repeats seem to undergo a strong selective pressure, suggesting that no significant functional difference exists between the glycoproteins derived from them.

ACKNOWLEDGEMENTS We wish to thank Drs. R.Nusse, R.Plasterk, P.Laird and prof. P.Borst for critical reading of the manuscript. M.J.L.Ligtenberg and A.M.C.Gennissen are supported by the Dutch Cancer

Society. H.L.Vos is supported by a grant from Centocor Inc., Malvern.

REFERENCES 1. Ligtenberg, M. J. L., Vos, H.L., Gennissen, A. M. C., Hilkens, J. (1990) J. Biol. Chem. 265, 5573-5578. 2. Wreschner, D. H., Hareuveni, M., Tsarfaty, I., Smorodinsky, N., Horev, J., Zaretsky, J., Kotkes, P., Weiss, M., Lathe, R., Dion, A. and Keydar, I. (1990) Eur. J. Biochem. 189, 463-473. 3. Gendler, S. J., Lancaster, C. A., Taylor-Papadimitriou, J., Duhig, T., Peat, N., Burchell, J., Pemberton, L., Lalani, E. and Wilson, D. (1990) J. Biol. Chem. 265, 15286-15293. 4. Lan, M. S., Batra, S. K., Qi, W., Metzgar, R. S. and Hollingsworth, M. A. (1990) J. Biol. Chem. 265, 15294-15299 5. Gendler, S., Taylor-Papadimitriou, J., Duhig, T., Rothbard, J. and Burchell, J. (1988) J. Biol. Chem. 263, 12820-12823. 6. Siddiqui, J. Abe, M., Hayes, D., Shani, E., Yunis, E. and Kufe, D. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 2320-2323. 7. Hilkens, J. and Buijs, F. (1988) J. Biol. Chem. 263, 4215-4222. 8. Linsley, P. S., Kallestad, J.C. and Horn, D. (1988) J. Biol. Chem. 263,

8390-8397. 9. Hilkens, J., Buijs, F., Hilgers, J., Hageman, Ph., Calafat, J. Sonnenberg, A. and van der Valk, M. (1984) Int. J. Cancer 34, 197-206. 10. Morgenstern, J. P. and Land, H. (1990) Nucleic Acids Res. 18, 1068. 11. Smith, C. W. J., Patton, J. G. and Nadal-Ginard, B. (1989) Annu. Rev. Genet. 23, 527-577. 12. Eperon, L. P., Graham, I. R., Griffiths, A. D. and Eperon, I. C. (1988) Cell 54, 393-401. 13. Watakabe, A., Inoue, K., Sakamoto, H. and Shimura, Y. (1989) Nucleic Acids Res. 17, 8159-8169. 14. Ge, H., Noble, J., Colgan, J. and Manley, J. L. (1990) Proc. Natl. Acad. Sci. USA 87, 3338-3342. 15. Zhuang, Y. and Weiner, A. M. (1990) Gene 90, 263-269. 16. Zuker, M. and Stiegler, P. (1981) Nucleic Acids Res. 9, 133-148. 17. Swallow, D. M., Gendler, S., Griffiths, B., Corney, G., TaylorPapadimitriou, J. and Bramwell, M. E. (1987) Nature 328, 82-84. 18. Jeffreys, A. J., Neumann, R. and Wilson, V. (1990) Cell 60, 473 -485.

A single nucleotide polymorphism in an exon dictates allele dependent differential splicing of episialin mRNA.

The episialin gene (MUC1) encodes an epithelial mucin containing a variable number of repeats with a length of twenty amino acids, resulting in many d...
1MB Sizes 0 Downloads 0 Views