Eur. J. Biochem. 190, 195-200 (1990)

0 FEBS 1990

Nucleotide sequence and exon-intron organization of the human proacrosin gene Sabine KEIM6, Ibrahim M. ADHAM and Wolfgang ENGEL Institut fur Humangenetik, Universitat Gottingen, Federal Republic of Germany (Received September 15iDecember 27, 1989) - EJB 89 1124

Acrosin is a serine proteinase and located in a zymogen form, proacrosin, in the acrosome of the sperm. As deduced from the cDNA sequences for human and boar proacrosin, the enzyme is synthesized as a preproenzyme, preproacrosin, which contains a hydrophobic leader sequence. Using cDNA clones as probes, we have isolated the gene coding for human proacrosin from a human leucocyte genomic library and a human cosmid library, respectively. The gene contains four introns between 0.2 kb -4.5 kb in length. Similar to other serine proteinases, the coding sequence of the preproacrosin gene is spread over all the five exons of the gene and the three activesite residues His, Asp and Ser are encoded by three different exons. According to the exon-intron structure, preproacrosin is suggested to be closely related to the serine proteinase subfamily containing trypsin and kallikrein. However, the light chain of proacrosin seems to be similar to that of chymotrypsin. The coding of the serine active-site residue together with the proacrosin-specific proline-rich domain in one exon, namely exon E5, let us assume that the nucleotide sequence for the proline-rich domain was generated during evolution by intron-exon transfer from a foreign gene with subsequent intron excision. By primer extension analysis, the transcription initiation site of the preproacrosin mRNA could be assigned to the residue C at - 74 nucleotides upstream from the translation initiation codon ATG. In contrast to most other eucaryotic genes, including the known testisspecific genes, typical TATA and CAAT box sequences in convential distances from the 5' end of the transcription start site could not be evaluated in the proacrosin gene. Acrosin is a serine proteinase and stored in a zymogen form, proacrosin, in the acrosome of the sperm head. The enzyme has been implicated in the recognition, binding [ l 31 and penetration [4] of the zona pellucida of the ovum and is therefore of great importance in the fertilization process. Recently, the cDNA sequences of boar [5, 61 and human [7, 81 preproacrosin have been published. According to the deduced amino acid sequence, the human preproacrosin is a single-chain molecule containing a 19-amino-acid leader peptide and a proline-rich domain at the C-terminal part of the protein with 129 amino acids. Such a proline-rich domain was not previously been found in any other serine proteinase and was suggested to play a role in the activation and maturation process of proacrosin [6, 91 or in the species-specific recognition or binding of the sperm to the zona pellucida [5, 8,101. The human proacrosin is composed of 402 amino acids, 23 in the light chain and 379 in the heavy chain and has an approximate molecular mass of 43.9 kDa [8]. On the basis of immunofluorescent studies in spermatogenic cells of various mammals, biosynthesis of proacrosin was found to start in early haploid spermatids [ l l , 121. Using Correspondence to W . Engel, Institut fur Humangenetik, Universitat Gottingen, GosslerstraRe 12D, D-3400 Gottingen, Federal Republic of Germany. Abbreviations. 1 x Denhardt's solution, 0.02% (massivol.) bovine serum albumin, 0.02% (massivol.) Ficoll400,0.02% (massivol.) polyvinyl pyrrolidone; M-MLV, murine leukaemia virus. Enzymes. Acrosin (EC 3.4.21.10); M-MLV reverse transcriptase (EC 2.7.7.49); restriction endonuclease SstI, BumHI, EcoRI, HindIII, StyI, KpnI, PstI, SmaI (EC 3.1.24.4). Note. The novel nucleotide sequence data published here have been deposited with the EMBL sequence data bank and are available under the accession number X17349.

cDNAs for boar preproacrosin as probes, haploid expression of the gene during boar and bull spermatogenesis was clearly demonstrated [5] and a transcript of 1.6 kb was evaluated in different mammalian species [8]. Human preproacrosincDNA sequences were used to map the gene to chromosome 22q13qter and to identify the SstI restriction fragment length polymorphisms [13]. We now report the isolation and sequencing of genomic clones for human proacrosin, the intron-exon structure of the gene and of its 5'-flanking region. On the basis of the proacrosin gene organization, the relationship of the enzyme to other serine proteinases is discussed.

MATERIAL AND METHODS Isolation of genomic clonesfor human preproacrosin A human leucocyte genomic library constructed in BamHI sites of EMBL-3 phages (Clontech, Palo Alto, USA) and a human cosmid library kindly provided by A. M. Frischauf (London) were used for the isolation of genomic clones for human preproacrosin. The human leucocyte library was screened by plaque hybridization [14] with a 32P-labeled human preproacrosin cDNA [8]. Hybridization was carried out at 65 "C overnight in the following solution: 5 x buffer A (1 x buffer A consists of 10 mM NaH2P04 pH 7.0,180 mM NaC1, 1 mM EDTA), 5 x Denhardt's solution, 0.1YOSDS containing 200 pg denaturated salmon sperm DNA/ml. Filters were washed twice at 65°C to final stringency of 0.2 x buffer A. The human cosmid library h2a constructed in pcos 2 EMBL [15] was amplified in seven fractions. To identify the fractions containing the full-length human proacrosin gene,

196 E

I

5‘

P

K K

KE

X Y B

E

PX

YS

XPH

l

I

3’ -

El

E2

II

E3 12

E4

13

E5 14

~

500 bp

Fig. I . Diagram of’hunzun yroucrosin gene sliowing the .jragments used,fur nucleotide sequence analysis. Restriction sites used for subcloning into pUC8 and subsequent sequencing are marked as (E) EcuRI, (P) Ps‘stI,(K) KpnI, (X) SmaI, (Y) Styl, (€3) BamHI, (H) NindIII, (S) SstI. Overlaps were established by reference to thc scquence of cDNA [S]. The coding portion of the mRNA is shown as solid boxes (exons E l E5) at the bottom line. The gene contains four introns I1 - 14

5 pg cosmid DNA was prepared from each fraction by alkaline lysis protocol [16], digested with EcoRI, electrophoresed in 1% agarose gels, transferred to nitrocellulose filters and hybridized with 32P-labeled human preproacrosin cDNA [8]. 1.2 x lo5 cosmid clones from a positive fraction were screened by replicating clones onto nitrocellulose filters [17] with 32Plabeled cDNA for human preproacrosin [8].

Moloney murine leukaemia virus (M-MLV) reverse transcriptase (BRL) at 42°C for 30 min. After phenol extractions, the transcripts were precipitated with ethanol and analysed on 6% acrylamide/6 M urea gels. RESULTS AND DISCUSSION Isolation of the human proacrosin gene

Characterization qf genomic clones for human proacrosin DNA from phage and cosmid clones was digested with various restriction endonucleases and electrophoresed on agarose gels. DNA fragments were then transferred to nitrocellulose filters and hybridized with human preproacrosin cDNA probes which correspond to different regions of the proacrosin gene. Three EcoRI genomic fragments containing exonic sequences and 5’- and 3’-flanking regions (Fig. 1) were subcloned into pUC8. Additional restriction mapping of the subclones and Southern blot experiments were performed to identify the location of the exonic regions. The nucleotide sequences of the exons, the exon-intron junctions and the 5’- and 3‘-flanking regions of the gene were determined by the dideoxy chain-termination method [ 181 using the Sequenase sequencing kit (US Biochemical, Cleveland, USA). The 650-bp BamHI - SstI fragment in exon E5 (Fig. 1) was sequenced by the method of Maxam and Gilbert [I91 to resolve the compressions in this G + C-rich region. The nucleotide sequence of the gene was assembled and compared to the cDNA sequence known [7, 81 using the DNA star computer program. Primer extension Total cellular RNA was extracted from human testes and brain by the guanidinium isothiocyanate method [20] and poly(A)-rich RNA was prepared by oligo(dT)-cellulose column chromatography [21]. A sample (20 ng) of a 5’4abeled synthetic oligonucleotide (5’-CTC AAC CAT ACT CCT GGC ACT-3’) which is complementary to the nucleotides - 12 to + 9 of the proacrosin gene, was hybridized with 30 pg poly(A)-rich RNA in 60 p1 0.3 M NaCI, 10 mM Tris pH 7.5, and 1 mM EDTA. Mixtures were treated at 85°C for 15 min and then allowed to cool slowly to 42°C and kept at that temperature for 3 h. The hybridization mixture was then ethanolprecipitated and resuspended in 120 p1 of reverse transcription buffer which consisted of 50 mM Tris pH 8.3, 8 mM MgC12,50 mM KCI, 10 inM dithiothreitol, 1 mM each of the four dNTPs and 10 U of the RNase inhibitor RNasin. The primer was extended by incubation with 300 U cloned

We have recently shown that the proacrosin gene, which exhibits testis-specific expression limited to haploid spermiogenesis [ 5 ] , is present as a single copy in the human genome [13]. In order to isolate genomic clones for the human proacrosin gene, 1.2 x lo6 phages of a human genomic library were screened with the human preproacrosin cDNA clone HA, [8]. Six clones were isolated and characterized by restriction enzyme analysis and were found to contain indentical restriction fragments. The nucleotide sequences of the restriction fragments of one clone (BamHI - SsrI and SstI - HindIII) were found to contain exon E5 and a 3’-flanking region of the proacrosin gene (Fig. 1) but lacked the other exons in the 5’ region of the gene. To isolate the entire proacrosin gene, a human cosmid library was screened with 32P-labeled cDNA for human preproacrosin [8]. The cosmid clone cos 2 was isolated, restricted with EcoRI and the pattern of hybridization with the cDNA clone HA, was compared with that obtained with genomic DNA restricted with the same enzyme. Identical hybridization patterns were obtained (data not shown). Therefore, it was concluded that the cosmid clone cos 2 contained the entire proacrosin gene. The total preproacrosin cDNA was found to be included within a 10.9-kb DNA fragment which consists of three contiguous EcoR1 fragments of 6.2, 2.5 and 2.2 kb (Fig. 1). The three EcoRI fragments were subcloned into pUC8. Nucleotide sequence and genomic organization of the human proacrosin gene The human proacrosin gene was further characterized by subcloning and sequencing the cos 2 restriction fragments containing exons and exon-intron bounderies according to the strategy given in Fig. 1. If one compares the nucleotide sequences of the genomic clones with that of the cDNA [8], the proacrosin gene was found to contain five exons ( E l , 151 bp; E2,204 bp; E3,285 bp; E4,145 bp; E5, 555 bp). The exons are arranged in two clusters which are separated by a 4.5-kb intron (13). The four intervening introns are between 0.2 kb (12) and 4.5 kb (13) in size (Fig. 2).

197 -570 -6F ~ccttgagaCttaa~ctataggag~aaccagacgttcctcacctaggctaccagcaaagccgatggtcagactacttagaatctactaggaggagacgcgtcatattaggclacg~acdggcadcggtggg

-Y

-v

-450

-490

ctttgacgca~ttagccaaattcaaaatgactcttggaaaagtgtgttgatagcattcacttggaagaggaaaagaaaa~ctttccccgaacaacaattaaggatcaattaatct~c~~~ccct~~~tcct -3F -390 ctgatcca~ytti;caccgtctcatcactgaagggtgagccgctcctcagtctgtgagtctgcagtggacagcacgcatgagaggcagactctgaacctgcacaaaagccagd~~~t~~gyt~aty P

-2v -200 -150 t~gggacctcgcaagagttactgggaatggagatcctggccttgggacagagggagtggggctgcac~ggagtccccatcatcctggtggtgggggagcctatgtaggaag~caaqa~ytctct tcagcai -100

-50

~a~ccaiJttaaggcyagyg~ctcttacctygcctga~tgctgggggtggg~gtgggggtcac~cctgac~gat~ggccaggcagcc~cggagctttgtgaggtcacta~gct~ cag ~ a ~ y c c a ~ g c a g t g ~

+l j3agt A T G G T T GAG AT6 C T A C C A ACT GCC ATT CTG CTG GTC TTG GCA GTG TCC GTG G T T GCT AAA GAT AAC GCC ACG TGT GA Met Val Glu Met Leu P m Thr Ala I l e Leu Leu Val Leu Ala Val Ser Val Val Ala Lys Asp Asn Ala Thr Cys Asp

- 19

..........

~i~~yg~dagyatittct~dygagca~gtaccacacat~tg~a~gca~cttgctga

gtaaytgt~ggggcaccttg

7

+I

900 bp

..........ccacatagcgaggcgccccgctttcccdyaacccygaa

~CtCttccccdcttttcccaaccccatgtccctgcctccctcagttgtggagttacaaggacaggctgtgctcatgccaggtttgaaactgtgctctggtctctccccag

1 GGC CCC TGT GGG Gly Pro Cys Gly

11

I T A CGG T T C AGG CAA AAC CCA CAG GGT GGT GTC CGC ATC GTC GGC GGG AAG GCT GCA CAG CAT GGG GCC TGG CCC TGG ATG GTC AGC C T C CAG A l C I T C

Leu Arg Phe Arg Gln Asn Pro Gln Gly Gly Val Arg I l e Val 61y G1y Lys Ala Ala Gln His Gly Ala Trp Pro Trp Met Val Ser Leu Gln I l e Phe

44

AGG TAC AAC AGC CAC f f i G TAC CAC ACA TGT GGA GGC AGC TTG CTG AAT TCA CGA TGG G I G CTC ACT GCT GCT CAC TGC TTC GTC GGC AAA AA g t a i g t g Thr Tyr Asn Ser H I S Arg Tyr His Thr Cys Gly G l y Ser Leu Leu Asn Ser A r g Trp Val Leu Thr Ala Ala H - s Cys Phe Val Gly Lys Asn

75

A

tagggatgcactgayygaggtcttcagaacggctcttctcagagaggggcgttccccggggatgctgtgcagcgtctccctggggctctgggccaagtggctgcaagactccgggggctggtccag~c~tt tyctaygyyaaggcctayayggtcgctgtcaccaggcttttgtccagccggttgtgacctggcttacctttgtgcccacag

T AAT GTG CAT GAC TGG AGA CTG GTT TTC GGA GCA AAG Asn Val His Asp Trp Arg Leu Val Phe Gly Ala Lys

87

GAA A T 1 ACA TAT GGG AAC AAT AAA CCA GTA AAG GCG CCT CTG CAA GAG AGA TAT GTG GAG AAA ATC ATC ATT CAT GAA M A TAC AAC TCT GCG ACA LAG

Glu I l e Thr Tyr Gly Asn Asn Lys Pro Val Lys Ala

Pro

Leu Gln Glu A r g Tyr Val Glu Lys I l e I l e I l e

His Glu Lys Tyr Asn Ser Ala Thr Glu 120

LLA A A T GAC A T 1 GCC CTC GTG GAG ATC ACC CCT CCC ATT TCG TGT GGG CGC TTC ATT GGG CCG GGC TGC CTG CCC CAC TTT AAG GCA GGC CTC CCC AGA Gly Asn As Ile Ala Leu Val Glu I l e Thr Pro Pro I l e Ser Cys Gly A r g Phe I l e Gly Pro Gly Cys Leu Pro H i s Phe Lys Ala Gly Leu Pro Arg

-9

GLC T C C CAG A6C TGC TGG G l G GCC GGC TGG GGA TAT ATA GAA GAG AAA GC gtagtatggag Gly Ser Gln Ser Cys Trp Val Ala Gly T r p Gly Tyr I l e G1u Glu Lys Ala

..........

4WO bp

..........accaccttcag C

CCC AGG pro Arg

153 172

LCA TCA TCT ATA C T G ATG GAG GCA CGT GTG GAT C T C ATC GAC CTG GAC TTG TGT AAC TCG ACC CAG TGG TAC AAT GGG CGC GTT CAG C C A ACC AAT GTG Pro Ser Ser I l e Leu Met Glu Ala Arg Val Asp Leu I l e Asp Leu Asp Leu Cys Asn Ser Thr Gln Trp Tyr Asn Gly Arg Val Gln Pro Thr Asn Val 205 I G C GCG GGG T A T C C T GTA GGC AAG ATC GAC ACC TGC CAG gtaaCcttccttCtggCttCtgggcccctgggtccctccaggactctcccggcccctgagaacal~ct~cttttggalc Cys Ala Gly Tyr Pro Val Gly Lys I l e Asp Thr Cys Gln 218

ii~aagtctccdctatctccac~gctctgcccagggtccttctctagtgactgcttcccggttccttttctagcacctactctcacagtggggatgagcggtgacagccaccggctcccctyccatgtgcc

cctgtggdc~cytggytttgctcatctcactgcaaggaaagccctgacaattcacacctcctcacatcccaaatgaagccctgaca

..........

500 bp

.......... t g a c c t c t g t

y t c c t t c t q a g GGA GAC AGC GGC GGG CCT CTC ATG TGC AAA GAC AGC AAG GAA AGC GCC TAT G l G GTC GTG GGA ATC ACA AGC TGG GGG GTA GGC TGT GCC Gly Asp Ser Gly G l y P m Leu #t Cys Lys Asp Ser Lys G1u Ser Ala Tyr Val Val Val Gly I l e Thr Ser Trp Gly Val Gly Cys Ala

*

248

CTG GCC AAG CGC CCC GGA A T C TAC ACA GCC ACC TGG CCC TAT CTG AAC TGG ATC GCC TCC AAG ATT GGT TCT AAC GCT TTG CGT ATG A T 1 CAA TCG GCC Arg Ala Lys Arg Pro Gly I l e Tyr Thr Ala Thr Trp Pro Tyr Leu Asn Trp I l e Ala Ser l y s I l e Gly Ser Asn Ala Leu Arg Met I l e Gln Ser Ala 281 ACC C C T CCA C C T CCC ACC ACT CGA CCG CCC CCG ATT CGA CCC CCC TTC TCC CAC CCT ATC TCT GCT CAC CTT C C T TGG TAT TTC CAA CLG CCC CCT CGA

Thr Pro Pro Pro Pro Thr Thr A r g Pro P m Pro Ile Arg P m Pro Phe Ser His P m I l e Ser Ala His Leu Pro Trp Tyr Phe Gln Pro Pro Pro Arg

314

CCA C T T CCA CCC CGA CCA CCG GCA GCC CAG CCC CCA CCC CCA CCT TCA CCC CCG CCC CCA CCC CCA CCT CCA GCC TCA CCT TTA CCC CCA CCC CCA CCC

Pm Leu P m Pro Arg P r o Pro A1a Ala Gln Pro Pro Pro Pro Pro Ser P m Pro p r o Pro Pro P m Pro Pro Ala Ser Pro Leu Pro Pro Pro Pro Pro

347

C C A CCC C C A CCT ACA CCC TCA T C T ACC ACA AAA CTT CCC CAA GGA CTT T C T T T T GCC AAG CGC CTA CAG CAG C T C ATA GAG GTC T T G AAG GGG AAG ACC Pm Pro P m Pro Thr Pro Ser Ser Thr Thr Lys Leu P m Gln Gly Leu Ser Phe Ala Lys Arg l e u Gln Gln Leu I l e Glu V a l Leu Lys Gly Lys Thr

380

T A T T C C GAC GGA A A G AAC C A T T A T GAC ATG GAG ACC ACA GAG CTC CCA GAA CTG ACC T C G ACC TCC i G A t c t g a c c t g g t t c t c a a c d g d ~ c i a q t . g a g ~ c c t t c d c t

Tyr Ser Asp Gly Lys Asn His Tyr Asp Met Glu Thr Thr Glu Leu Pro Glu Leu Thr Ser Thr Ser

000

402

-

cctgagaaaaa~gaaa~atydaataadtaaataadtaaacatdtdtatatagat~taacacacacdltctyyaayyata~

Fig. 2. Nucleatide sequence o j the f i v e exmus of the human proacrosin gene including intron-exon boundaries, 5'- and 3'-flanking sequences. The presumed transcriptional start point is designated by a single star. The encoded amino acid sequence is shown below the nucleotide sequence. The catalytical triad, highly conserved in the whole serine proteinase family, is formed by His, Asp, and Ser which are marked by arrows. The four introns (lower case letters) are identified by comparison with cDNA sequence [XI. Sequences in the 5'-flanking region which show similarity to those of other testis-specific genes are underlined. The termination codon TGA and the polyadenylation signal AATAAA are marked

As can be deduced from the cDNA sequences for preproacrosin of boar [S, 61, human [7, 81, mouse, rat and bull (unpublished), the C-terminal of the proacrosin molecule is endowed with a proline-rich domain which was not previously been found in any other serine proteinase. As can be inferred from Fig. 2, the proline-rich domain is encoded in exon E5 and is not separated by an intron from the coding sequence for the proteinase part of the molecule. We have an identical result for the rat proacrosin gene (unpublished). The nucleotide sequences in the intron-exon boundaries are known to be highly conserved in eucaryotic genes [22]. Accordingly, the dinucleotide G T of the splice donor consensus sequence and the dinucleotide AG of the splice acceptor consensus sequence were found at all the four intron-exon

boundaries of the human proacrosin gene (Fig. 2). The sequences of splicing acceptor sites of the gene are rich in pyrimidines, as is the case in many other genes [22]. Recently, Adham et al. [8] and Baba et al. [7] have published the nucleotide sequence of the human preproacrosin cDNA and the deduced amino acid sequence. The amino acid sequence determined by Adham et al. [8] matches with that deduced from the nucleotide sequence of the genomic clones except at position 207, where valine (GTC) was deduced from the cDNA sequence but alanine (GCG) from the genomic sequence. According to our cDNA and genomic sequences for preproacrosin, the amino acids at positions 101, 147 and 326 are leucine (CTG), phenylalanine (TTT) and proline (CCA) while Baba et al. [7] deduced valine (GTG), leucine (TTG) and

198

r

i' pT

-ACR

E LA

Fig. 3. Diagratn of exon-intron structure ofdijferent serine proteinase genes including theproacrosin gene. Exons coding for proteinase domains are shown as open boxes, exons coding for other structural domains are shown by solid boxes. F IX, factor IX; PT, prothrombin; TPA, tissue plasminogen activator; UPA, urokinase; KAL, kallikrein; TRY, trypsin; CHY, chymotrypsin; ELA elastase; ACR, proacrosin. The positions of the nucleotides coding for the three active-site residues of serine proteinases are given: H, histidine site; D, aspartic acid site; S, serin site. 0, I and I1 represent the splice junction types relative to the reading frame for the different serine proteinases

arginine (CGA), respectively, from their cDNA sequencing results. A

B

1 2 3 4

Comparison of the genomic organization of the human proacrosin gene with that of related serine proteinase genes

The analysis of the amino acid sequences and of the gene structure for members of the serine proteinase superfamily has shown that the functional important amino acid residues are present in separate exons [23] and that the intron-exon junctions are mapped to the surface of the protein [24]. As can be inferred from Fig. 2, for the serine proteinase proacrosin the coding sequence is also spread over all the five exons of the gene. The first exon (El) contains the 5'untranslated sequence and the sequences coding the signal peptide and the first seven amino acids of proacrosin. The three active-site residues (His-69, Asp -23 and Ser-221), which are involved in the catalytic activity of the enzyme, are encoded by three different exons (E2, E3, E5). In Fig. 3, the exon-intron structure for different members of the serine proteinase superfamily is shown schematically. It can be seen that the four intron-exon boundaries of the proacrosin gene are at equivalent positions to those evaluated for members of a serine proteinase subfamiliy containing trypsin [25] and kallikrein [26]. This close relationship between the genes for trypsin, kallikrein and proacrosin is supported by the observation that acrosin is more related to trypsin than to the chymotrypsin subfamily according to protein structure and substrate specificity [27]. However, a relevant difference in the gene structure between proacrosin and trypsin is evident. The amino acid sequence of the light chain of proacrosin, which is encoded in exons E l and E2, is similar to the light chain of chymotrypsin [28] while trypsin and kallikrein lack corresponding amino acid sequences. In Fig. 3, the splicejunction types, relative to the reading frame, are given for the proacrosin gene and are found to be different to those of other serine proteinase genes. In Some serine proteinases' like thrombin '291' tissue 'la'domains minogen activator L30i and plasmin [3 outside the proteinase domain have been found. These extra domains are encoded by separate exODs or groups Of exonS and they are mostly located at the 5' side of the exons coding for the proteinase domain. In contrast, the proline-rich do-

83bp -

Fig. 4. Determination of the transcription start site ofproacrosin gene. A synthetic 21-~~~ oligonucleotide was end-labeled and hybridized to poly(A)-rich RNA from testis (pane A) and brain (lane B) and extended with reverse transcriptase. The reaction products were electrophoresed and sized against sequencing reaction (lanes 1-4). The size of the major primer extension products are 83 and 84 bp

199 Table 1. Comparison of the 5’-flanking region ojthe human proacrosin gene with the corresponding regions of’ other testis-specific genes Sequence

Position in the proacrosin gene

Sequence found in the gene for

AGC~CGAT

- 323

mouse protamine 1 mouse protamine 2 rat H l t mouse protamine 1 mouse protamine 2 human phosphoglycerate kinase 2 mouse protamine 1 mouse pro tamine 2

FCGGTGGG

-94; -88

CCTCGCCTTGGG AGG~AGTCAAGAAG

- 289 - 465

main of proacrosin, which becomes lost during the activation of proacrosin to the proteolytically active enzyme j-acrosin [6,9], is encoded together with the serine active site residue of the enzyme in one exon, namely exon E5. Rogers [23] has suggested that the origin of separate exons coding for the nonenzymatic portion of serine proteinases is due to exon transfer from other genes during evolution. Thus, exon E5 in the proacrosin gene could have been arisen by a two-step process: intron-exon transfer from a foreign gene and then excision of the intron, resulting in an exon coding for both the serine active-site residue and the proline-rich domain. Intron excision was suggested to have occurred during evolution of several other mammalian genes [32, 331. Analysis of the 5‘-Junking sequences of the proacrosin gene

To determine the transcription initiation site in the human proacrosin gene, primer extension analysis was performed. A 21-mer oligonucleotide complementary to the nucleotides - 12 to + 9 of the proacrosin gene (Fig. 2) was hybridized to testicular poly(A)-rich RNA and the hybridization product was then extended with M-MLV reverse transcriptase. As shown in Fig. 4, a major doublet of 83/84 bp and minor bands were detected in testis RNA. Since the band corresponding to the 83-bp transcript was fainter than the 84-bp transcript, we assigned the major transcription initiation site of the preproacrosin mRNA to the C residue at position - 74 nucleotides upstream from the translation initiation codon ATG. However, the nucleotide sequence of the promoter region of the proacrosin gene is lacking typical TATA and CCAAT boxes in convential distances from the 5’ of the transcription start site [22, 341. One typical CCAAT and two modified CAAT box sequences are present quite far from the transcription start site at positions - 502, - 384 and - 303, respectively (Fig. 2). Because the proacrosin gene is only expressed in haploid spermatogenic cells, it might be suggested that the absence of these promoter elements is a general characteristic of haploid expressed genes. However, the presence of TATA and CCAAT boxes has been clearly demonstrated in the 5’flanking regions of the testis-specific genes for mouse protamine 1 and 2 [35],bovine protamine 1 [36], rat H l t histone [37] and rat transition protein 1 [38]. Using cDNA clones for boar preproacrosin [ 5 ] , we have isolated and sequenced respective genomic clones containing 180 nucleotides of the 5’-flanking region of the boar proacrosin gene. A comparison of the nucleotide sequence of this region of the human gene with that of the boar reveals a 83 YO sequence similarity in the noncoding region of exon E l while the 100-bp sequence upstream from the transcription start site

Position

-338 - 216 - 138

-455; -140 -319; -10 -454; -390 -233 - 162

shows a 70% similarity between both species. Most importantly, the 5‘ flanking region of the porcine gene also lacks the typical TATA and CAAT boxes. Thus, it can be suggested that the absence of these promotor elements is a general characteristic of the mammalian proacrosin gene. The absence of TATA and CAAT boxes was found in promotor regions of many housekeeping genes [39], but also in some genes with tissuespecific expression [40]. Many of these genes are transcribed from multiple sites, while others, such as the human insulin receptor gene [41], are transcribed from one transcription initiation site despite the absence of a TATA box. Johnson et al. [35]compared the 5’-flanking regions of the testis-specific genes for mouse protarnines 1 and 2, for rat H l t histone and for human phosphoglycerate kinase 2, in order to identify DNA elements which could be responsible for testisspecific expression. They reported nine different motifs which are present in both protamine genes as we11 as in rat H l t histone gene or in human phosphoglycerate kinase 2 gene. If one includes the 5’-flanking region of the human proacrosin gene into such a comparison, four different motifs can be evaluated which are also present in the other testis-specific genes (Table 1). The functional significance of these similarities remains unclear in the absence of biochemical analyses. We thank D r C. Peters (Institut fur Biochemie 11, Gottingen) for critical discussion, Ute Schrader for technical help and Andrea Lutzc for secretarial help. This work was supported by the Dezitsche Forschungsgemeinschaft (En 84/18-3).

REFERENCES 1. Saling, P. M. (1981) Proc. Natl Acad. Sci. USA 78, 6231 -6235. 2. Jones, R., Brown, C. R. & Lancaster, R. T. (1988) Development 102, 181 -191. 3. Topfer-Petersen, E. & Henschen, A. (1988) B i d . Clzem. HoppeSeyler 369, 69 - 76. 4. Yanagimachi, R. (1981) in Fertilization and embryonic development in vitro (Mastroinanni, L. & Biggers, J., eds) pp. 32 - 155, Plenum Press, New York. 5. Adham, I. M., Klemm, U., Maier, W.-M., Hoyer-Fender, S., Tsaousidou, S. & Engel, W. (1 989) Eur. J . Biochenz. 182, 563 568. 6. Baba, T., Kashiwabara, S. I., Watanabe, K., Itho, H., Michikawa, Y., Kimura, K., Takada, M., Fukamizu, A. & Arai, Y. (1989) J . B i d . Chem. 264,11920 - 11 927. 7. Baba, T., Watanabe, K., Kashiwabara, S. 1. & Arai, Y . (1989) FEBS Lett. 244,296- 300. 8. Adbam, 1. M., Klemm, U., Maier, W.-M. & Engel, W. (1990) Hum. Genet. 84, 125 - 128. 9. Zelczna, B. & Cechova, D. (1982) Hoppe-Seyler’s Z . Physiol. Chem. 363,151 - 166.

200 10. Zelezna, B.. Cechova, D. & Henschen, A. (1989) Bid. Chem. Hoppe-Seyler 370, 323 - 327. 11. Phi-van, L., Miiller-Estcrl, W., Fliirke, S . , Schmid. M. & Engcl, W. (1 983) Bid. Reprod. 29, 479 - 486. 12. Florke, S., Phi-van, L., Miiller-Ester], W., Scheuber, H . P. & Engel, W. (1983) D@vwztiation24, 250- 256. 13. Adham, I . M., Grzeschik, K.-H., Geurts van Kessel, A. H. M. & Engcl, W. (1989) H i m . Genet. 84, 59-62. 14. Benton, W. B. & Davis, R. W. (1977) Science 196, 180-182. 15. Poustka, A., Rackwitz, FI.-R., Frischauf, A. M., Hohn, B. & Lehrach, H. (1984) Proc. Natl Acad. Sci. U S A K I , 4129 -4133. 16. Birnboim, H. C. & Doly, J. (1979) Nucleic Acids Res. 7, 15131523. 17. Hanahan, D. & Mcselson, M. (1980) Gene 10, 63-67. 18. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Nut1 Acad. Sci. U S A 74, 5463 - 5467. 19. Maxam, A. M. &Gilbert, W. (1980) Methods Enzymol. 65,499 559. 20. Chirgwin, J. M., Przbyla, A. E., MacDonald, K. J. & Rutter, W. J. (1979) Biochemistry 18, 5294- 5299. 21. Aviv, H. & Leder, P. (1972) Proc. Nut1 Acad. Sci U S A 69, 14081422. 22. Brcathnach, R. & Chambon, P. (1981) Annu. Rev. Biochem. 50, 349 - 383. 23. Rogers, J . (1985) Nature 315, 458-459. 24. Craik, C. S., Sprang, S., Fletterick, R. & Rutter, W. J. (1982) Nutzrre 299, 180- 182. 25. Craik, C. S.. Chao, 0. L., Swift, G. H., Quinto, C., MacDonald, R . J. & Rutter, W. J. (1984) J . Biol. Chem. 259, 14255-14264. 26. Wines. D. R., Rrady, J. M., Pritchett, D. B., Roberts, J. L. & MacDonald, R.J. (1989) J . Biol. Chem. 264, 7653-7662.

27. Miillcr-Esterl, W., Fritz, H., Fock-Nutzel, R., Lottspcich, F. & Hentschen, A. (1984) in Chemistry of’ peptides and proteins (Voclter, W., Bayer, E., Ovchinnikov, Y . A. & Wunsch, E., eds) pp. 377 - 386, de Gruyter, Berlin. 28. Smilie, L. B., Furka, A,, Nagabhushan, N., Stevenson, K. J. & Parkes, C. 0. (1968) Nuture 218, 343-346. 29. Degen, S. J. F. & Davie, E. W. (1987) Biochemistry 26, 61656177. 30. Ny, T., Elgh, F. & Lund, B. (1984) Proc. Nut1 ilcad. Sci. U S A 81, 5355-5359. 31. Malinowski, D. P., Sadler, J. E. & Davie, E. W. (1987) Biocheinistry 23, 4243 4250. 32. Blake, C.(1983) Nature 306, 535 - 537. 33. Rogers, J. H. (1985) Int. Rev. Cytol. Y3, 187-279. 34. McKnight, S. L. & Kingsbury. R. (1982) Science 217, 316-324. 35. Johnson, P. A., Pechon, J. J., Yelick, P. C., Palmiter, R. D. & Hecht, N . €3. (1988) Biochim. Biophys. Acta 950,45-53. 36. Krawetz, S. A,, Connor, W. & Dixon, G . H. (1988) .I. B i d . Chem. 263, 321 - 326. 37. Cole, K. D., Kandale, J. C. & Kistler, W. S. (1986) J . Bid. Chem. 261, 7178-7183. 38. Hcidaran, M. A., Kozak, C. A. & Kistler, W. S. (1989) Gene 75, 39 - 46. 39. Martini, G., Toniolo, D., Vulliamy, T., Luzzato, L., Dono, R., Viglietto, G., Pronessa, G., D’Urso, M. & Persico, M. G . (1986) E M B O J . 5,1849-1855. 40. Sakimura, K., Kushiya, E., Takahashi, Y. & Suzuki, Y . (1987) Gene60, 103-113. 41. Araki, E., Shimada, F., Uzawa, H., Mori, M. & Ebina, Y. (1987) .J. B i d . Chem. 262, 16186-16191. -

Nucleotide sequence and exon-intron organization of the human proacrosin gene.

Acrosin is a serine proteinase and located in a zymogen form, proacrosin, in the acrosome of the sperm. As deduced from the cDNA sequences for human a...
634KB Sizes 0 Downloads 0 Views