Matrix Vol. 11/1991, pp. 375 -379 by Gustav Fischer Verlag, Stuttgart

Original Papers

© 1991

Completion of the Last Half of the Structure of the Human Geneforthe Proa1 (I) Chain of Type I Procollagen (COL 1A1) ANDREA WESTERHAUSEN 1 , CONSTANTINOS D. CONSTANTINOU 2 , MICHAEL PACK 3 , MINZHONG PENG, CHARLES HANNING, ANNE S. OLSEN 4 and DARWIN J. PROCKOP Department of Biochemistry and Molecular Biology, Jefferson Institute of Molecular Medicine, Jefferson Medical College, Thomas Jefferson University, Philadelphia, PA 19 107, USA.

Abstract The nucleotide sequences of the 3' -half of the human gene for the proa(I) chain of type I procollagen (COL1A1) is presented. The results provide the nucleotide sequences for 26 introns not previously analyzed. The sequences that are presented, together with those previously published, make it possible to design primers for the polymerase chain reaction for amplifying and sequencing the gene. The availability of such primers will greatly facilitate the current search for mutations that can cause common and rare diseases of connective tissue. Key words: procollagen gene, sequences of introns.

Introduction The genes for type I pro collagen (COllA 1 and COL1A2) are of interest for several reasons. One is that the coding sequences for the large triple-helical domain of the protein are contained in exons that display an unusual 54bp motif (for reviews, see Vuorio and de Crombrugghe, 1990; Chu and Prockop, 1991). The structure of the exons has been highly conserved during evolution and essentially the same exon structure is found in the genes for the other two major fibrillar procollagens, type II procollagen (COL2A1) and type III procollagen (C0L3A1). Another reason for interest in the genes for type I procollagen is that

I Present address: Department of Ophthalmology, University of Cincinnati College of Medicine, Cincinnati, OH 2 Present address: Department of Medicine, Division of Neurology, Duke University Medical Center, Durham, NC 27710 .l Present address: GI Unit, Massachusetts General Hospital, Boston, MA 02 114 4 Present address: Biomedical Sciences Division, Lawrence Livermore Laboratory, Livermore, CA 94551

they harbor a large number of mutations that cause heritable disorders in man. Over 70-different mutations in the genes cause osteogenesis imperfecta, a heritable disorder characterized by fragility of bone and other tissues rich in type I collagen (for reviews, see Byers, 1990; Kuivaniemi et al., 1991). Also, several recent reports suggest that some mutations in the type I procollagen genes may cause more common diseases such as some forms of osteoporosis (see Spotila et al., 1991). Previous reports have established the nucleotide sequences of about half of the gene for the proa(l) chain of human type I procollagen (COLlA1) (Bernard et al., 1983; Chu et al., 1984; 1985 a; 1985 b; Barsh et al., 1984; 1985; Ramirez et al., 1985; D'Alessio et al., 1988; Tromp et al., 1988). Here we have sequenced the remaining half of the structure and have provided nucleotide sequences for 26 introns not previously analyzed.

376

A. Westerhausen et a!.

Materials and Methods Cloning ofa 5 -kb fragment of the proa(I) chain

For analysis of sequences extending from intron 25 to exon 40, a genomic fragment of the eOLlA1 gene was cloned from the cultured skin fibroblasts of a proband with osteogenesis imperfecta (Tsyneyoshi et a!., 1991). The

genomic DNA was digested with BamHI in order to generate a 5-kb fragment from the proa1(I) gene. The BamHI digest was separated by electrophoresis on agarose gels and fragments of 2 to 6 kb were electroeluted. The fragments were cloned into Lambda-ZAP (Stratagene) and the resulting library was screened (Maniatis et a!., 1983) with a cDNA (Hf-404) probe for the proa1(I) chain (Bernard et

Table 1. Oligonucleotide primers. Code number

Target sequence" Antisense

Sense

COC27

Exon41

COC28

Exon44

COC22

Exon44

CDC26

Intron 45

COC30

Intron 44

COC23

Intron 46

146F

Intron 46

E48R

Exon 48

E48F

Exon 48

E50R

Exon50

E50F

Exon50

E52R a

h

Exon 52

Primer sequence GGA-ATI-CAA-GCG-TGG-TGT-GGT-CGG-CCT-G EcoRI GGG-GAT-CCC-TCA-CCA-CGA-TCA-CCA-CTC-T BamHI GGA-ATT-CCT-TGG-CCC-TGC-TGG-CAA-GAG-T EcoRI AAG-GAT-CCC-AGG-CGG-AAG-TTC-CAT-TGG-C BamHI CCG-GAA-TTC-CTG-GCC-AAG-AGC-TCA-TGC-T EcoRI AAG-GAT-CCC-CTC-CTA-TCC-CAC-AGC-ACA-G BamHI ATG-GAT-CCA-TGC-TGT-GCT-GTG-GGA-TAG-G BamHI ATG-ATT -TCC-GTT -GAG-TCC- ATC-TTT -GCC-A EcoRI ATG-GAT -CCT-CGC-GGT -CGC-ACT-GGT -GA T-G BamHI ATG-AAT-TCC-AGC-CTT-GGT-TGG-GGT-CAA-T EcoRI ATG-GAT-CCA-TGT-CTG-GTT-CGG-CGA-GAG-C BamHI ATG- AA T -TCT -CAA-TCA -CTG-TCT -TGC-CCC-A EcoRI

Primers were paired in the order presented, e.g. COC27 with CDC28, COC22 with CEC26, etc. Underlined nucleotides indicate sequences added to provide restriction site used to clone the PCR products.

Bam HI 26

27

28

29

30

31

32

33/34

35

Bam HI

47

48

49

Fig. 1. Schematic diagram of the 3'-half of the human proa1(1) gene.

50

51

52

Structure of the Human COLlA1 Gene

A

Fig. 2. Nucleotide sequences of the human COllA 1 gene extending from intron 25 to exon 52. Capital letters indicate terminal codons of exon sequences or five-ambiguous bases in introns. The alternative bases in some cloned PCR products were crr at position + 185 in intron 46; AIC at position + 291 in intron 46; TIC at position + 112 in intron 49; AIC at position + 126 in imron 49; and err at position 179 in intron 50.

cccacgaaga gctagggaca gctgtgccgc tgacctgccg atctgtaaaa tgaaagactt tttgggaggc caaggcgggc tagtgaaacc acgtctctac tgtaaccccc aagctagtca aggttgcagt gagctgagat catctcaaaa aaaaaaaaaa ggcctccatg accactgctc cgtggggata ggacaggcct ggtccaaaag gagatagcag ctaacaggga aaaggcagag ggcccagcaa gaagcacctg aagtccctta aaaagtccca cttcagcccc ctcaacccta gtaagtatct cctttccatc tgcaggaggg gtgctagagg tcccctttgc tctctccctg ctgatgggga gatctgggga cctccccgtg actgtagtgt gtgaggcctc atggctgtca cttccagatt ctaaaccttc exon 29 ... CAG gaatccagag ggacaggccc tgaggttaga caccggaaag ggctgcatga ttgctagtct ttccctcaac ccttgtattg gacacatccc taggaggaag ggggttcctc tctaatcacg ggccccaggt aggttttgct exon 30 ... AGA tacttggccc tatctccaga exon 31 ... AAG gaccaggtct cacccctcct agggtgtggg gtggggggaa tgtgagtctt ttcatctttt taccaagcgg ggcttagggc ggttgtcaca tagGGT ... cccctgagcc ctacatggct tctctggatc tctcctcttc gcctttttgc tgatgaatcc tggtgatgag aggtgggggc actcaggggc ctgttaacct aaagaatctt tgagtccttg ggaaacccag acacaagcag aagcttctcc atgtctttag cgcctcccca ccttctggcc tggggttgcg ttgggagagc ggaccagaag gaggggaggc tgaccctgag cctcttgtcc aagacctgtt aagaccccat gtctgtgcct tcaccccttg cgcatctctc caatctgact gtgagtacca agacccccat ggaatgagga caaaggcgtc

aI., 1983). One positive clone was isolated. To sequence the clone, a deletion library was prepared with exonuclease III and a commercial kit (Erase-a-base, Stratagene). Doublestranded DNA from the deletion library was used for sequencing with the dideoxynucleotide procedure (Sanger et aI., 1977) and with T7 DNA polymerase (Sequenase; US Biochemicals). As reported elsewhere (Tsuneyoshi et aI., 1991), all the coding sequences agreed with the consensus cDNA sequences with the exception of a single-base mutation in the codon for glycine al-637 that was found in a proband with 01 but was not found in normal alleles.

377

aacacacccg agctcgaagg agtcttgggc tctgggctca tgtggccact cactctcact ttctggacct cagcctccct ctcggcgggg cacggtggct catgcctgta atcccagcac agaccatgac ctcaggagtt tgagaccagt cgggccaaca taaaaataca aaagattagc tgggtgtggt ggtgtgcacc ggaggctgag gcaggagaat tgcatgaacc cgggaggtgg cacgccattg cactccagcc tgggcaacag tgcgagattc aaagaagaaa gaaagaaaga aaaaatgaaa cacttctcca tgtccttgga atagtgtgtt ggtggccctc caccccgaca ttgatatgat aggcaccccc agtcttggtg gattctttga agaagagaaa gccctttgca gtgcaggcca cagcgggcat gagcctggaa gggtctcttg ggaggagtgg gctcagaaag caggggcatt ccccgggggc caaacagtct tttgaaaaga ctcagagtaa atgagagccc caggaggccc tggcttctca actccctttc tccacagGGA exon 26 ... GTC cctacctcct tcccattgct gccccggcac tttctctccc ccacggtcct cagctgctcg gggccctcct aaccctgagt cagGGT... exon 27 ... GCT gtgagtgtcc gcagaaaagg ggagacaccc tcagcccctc gtctcctcgg tctctctgtg cagGGT... exon 28 ... CAG ggatgatggg aggtaggggt aggaaacacc tctttggtct cctcccttct tcccccattt cccacctaca gGGT ... gtaagaggga gcagccggcc agaggggtgg gagatgcagg ccgcctccta gctaatcaga cagccatcaa ctagagggat aacttcctcc catgaaggga gcagcacaga gggaagtggg gggtgacttc ttttaagagc tgctggaata tgctgtgact ataaatcttg gtccatagtt tggggagggg ggaagccttt agaggggctg tttgggataa tctcaattca gtgctgagaa gccagacccc aggaggaagg accgtgcttt ccagcagagt cactgtctgt tcctctctcc ctccccctca gGGT ... gtaagtaggc ctctcgcgct gcatccgtca aggtgcgttg gcagccttca catgccctgt ccttcccttc tagGGC •.. gtgagggcag cgtggaaggg gctctggcaa gtggcccagg gcagcagggg ctggcgggcc atgaccaaag ccatggagat aagaccaggg caggggccac acacagcctg gagtctgggc ctcaaggctt gtcgttggcc ttggaaacaa gcctgggaga tgtgaccact cttggggccc caggtcactc cagtcttctt exon 32 ... AGA gtaagttcaa ccttccccct cccatctctg cctcctttga atctctcagc atctctcctc tcggctaatc ctcccctctt cccccttttc tcccctcctg tctccctgtg gtccaggccc atctatccca tgggttacca atctccttgg tggaggctcc cttattcatc ccgctacaca cagttccacc tcagtctcca ggcaggcacc ctttttcctg gcccaggtgg aggcagggca gagctgcaga gggcctctca aacactatag gtcacctcct tgccccacac tggaaatctc GGT... exon 33/34 ... AAG gtgacctggc ctaacacata gcctcctcag caggcctggg cacggttccg aggtcctgcc aaactgagct gtcaacctgg gaacctggag tctcctgggg tcatctacta ggagtattca ggggaggccc cttgctctca gGGT ••• exon 35 ... CCC gtaagtacag acttggccct tccctccctt cacacagcac ccctggccct cctctcccct caccgcatcc ccgccttccc tcctgtcaga ccttttcttc tagGGA... exon 36 ... CCT catttttcat caccgactgg gacctgggac ctcgagggac cgtcctcagg ggagaagggt ggagacggga ttgtttccca

The other sequences reported here were obtained by using the primers indicated in Table I and genomic DNA as a template for the PCR (Saiki et aI., 1985). The PCR products were cloned into the filamentous bacteriophage M13, and the M13 clones were used for single-stranded sequencing with the dideoxynucleotide procedure. To confirm the sequences, about ten M13 clones of each PCR product were sequenced.

378

A. Westerhausen et a1.

cccaagcatc ttcctgcctc cattactgct cctcccccag gtagtggaaa ctcctcctcc ttccctccat tcacgccctg cttcctcccc cagGGT... exon 37 ... ATT gtgccccatt gtgagtggct tggcctctgt gcccacgagg ctggtgggct gggacccagg acgggtccag gcttgatgcg tctgtgctct cctacagGGT... exon 38 ... CCT gtgagtatca cccgcctctc tgttgagcct ctcccctctc cccaggcagc ggtggcaggt gagggcagct gggtcggatg agttggctgt tctccctctg actgttccta tgttctctcc ttccagGGT... exon 39 ... TCT gtaagtctct gcagcagagt ccactgctct aggttggggg tgctgggtgg gggctgccag aaggatggtg gggctgactg aggacccaat gatgcaccag agccccctgg agtctgacag cccctcctat cctcatccag GGA •.. exon 40 ... GCT gtaagtgcca gctcagatct ctgcagctcc ggaggtgtgc agagctgggg aggggtccct gtgctgctgt ctggcacctc acccctgttt gcctcccaaa gGGT ... exon 41 ... TCT gtaagtgccc ccctcacctt gggggccctg agaaaaacca tcacaggact tggagtgggc ggagccaagg agaacagatt tggtagagat gactccagcg gactcaagtc ctcccagacc ctatctctgg cctgactctt tcttctccct tagGGT ... ex on 42 ... GAG gtgagcagtc cccagccccc atgccagtac cctcagcatg gccattgtgg ccttgcctaa gccctcttcc ccggctgact ctcacttctc tctctctctc tctgcagGGG... exon 43 ... AAG gtaagatggc aacactccat gaccacagcc ttgtctgctg cttccctgcc ccatcctggc ccttcacccg gggctgaccc atattcccct gctctccccg ccagGGT... exon 44 ..• ACT gtaatagctg ggctccagtt ccctgtacct ggtcaggcca gggactcttc aggcctcctt agaggctggg gatgggtgtc ggacttcacc caggcagggg gaggaaagga gatcctgcaa gatgtcaggg ccttaatcca aaaaactgag ttaaagctca gccctaagtc ccctctccca gacaggaccg cctctcccat gagttggccc cagctcccgt naagattgca gtggggaggt ttccctggga gttgggagag atggccacag tgggaagcag ctgaggagag agagatccag cagaggggag gcctcatcct gcagccccag cctcagcctt ccctggccaa gagctcatgc tttccttgct ctccccagGG T ... exon 45 ... GCC gtaagtaccc tgctgtgtcc cccatgcctt cagaactcta cagatgcaga cagtgcccca ctcgatgcca atggaacttc cgcctgacag tttgtccctt tctctcttct agGGA ... exen 46 ... CCT gtaagtatgc tcagcccctc cccagtcccc atgctgtgct gtgggatagg agggggagct tcgcctcagt ttccccctct ggatagtcat tctttcccct ccctagtggg gactggggtc tgaagatttg tgggcatgtc caagtagctt ctgagagggt gaggggtaca cagagaggga ttatGggaga ggtctctgcc tatggacacc ctcgggctag atttccagaa taatgaaggg gcatgggttg ccacactgcc cttgtctctc cagccaggcc ctcaggctac Atttgacgct cactgggcct gaactgcctt tttatctgtc cttcagGGC... exen 47 ..• CGA gtaagtcatg ccttctctct cctcttctga gccccaagcc caggctcacc tcgggacctt gccaggaccc aggcaccctt tgcctctctg gagaagggtt cagggacagg gagtgggcaa agaaaggaag aatcctgaac aaacaatctg atctagcttt ggcctctctg ctccccaatc cgtcctcccc tggctcagcg gctgggagga gctatggcat gtcctatgga aagaggctga ggctggctct atgagccgtg gggccagagc cagcagggag ggtggtgggc ctctcctcca gagctggggt tgttcgggct tctggcagcc tttctcaaac catttccccc actccagGGT... ex en 48 ... GTT gtatgtagcc cctcatcccc tctgctcatg gccctccagc ccccatagca cttggatgcc ggaatcccca ctctcttccc tctctgtgca gGGT ... exon 49 ... GTG gtgtgggcct gccctagcct ctccctccct cctactcctg ccatgccagg gtccccatgc ccatatgtgc ccctaccata tggtgctggc tgctcccttt ccctgactcc aTcttgccct gccctAccac agGAG ... exon 50 ... CAG gtgcgtgagc tggacctcag agccagtgtt aggagatggg ctagcccagt gctcagaagg gacatgaagt cctggagtag gtctctgcta agggtgatgg acagagctgg gctgggaggc aggggtctca ggtccctgct agtggttcag acacaggctg ccgatgggca ggtggtgccc ctctgatata acggtgcatt gggcagctct ctgaggaccc tggacaggag gccagcagga ctagaggttc ccgcatagct cactcttccc tctctctcct ccctgcagTT C ... exon 51 ..•.. ACG gtgagtgccc agaatcccca ggcagggccc cacctctccg gccttgggct ttttggccag gccatagtgc cctctctcca tcactcccac gtggtaatgc cccctcccgt tgtctccgcc ccaccccagA GT ... exen 52

Results and Discussion

To complete the structure of the COUAl gene, nucleotides from intron 25 to exon 40 were sequenced from a bacteriophage clone. The sequences from exon 40 to exon 52 were defined by amplifying genomic DNA by the PCR, cloning the products into M13, and sequencing the inserts. The data provided here (Figs. 1 and 2 and Table II) provided all the intron sequences extending from intron 25 to exon 52, the 3' -end of the gene. All the sequences that flanked the exons had consensus sequences for correct RNA splicing. The intron sizes varied from 91 bp to 902 bp (Table II). As indicated by the capital letters for intron sequences in Fig-

B

ure 2, five of the nucleotide sequences were ambiguous in that most clones of PCR products contained the base indicated but an alternative base was seen in one or more clones. Therefore, the alternative base was probably an artifact of the PCR, but this conclusion was not confirmed by any further analyses. The nucleotide sequences presented here together with those published previously (Bernard et aI., 1983; Chu et aI., 1984, 1985; Barsh et aI., 1984, 1985; Ramirez et aI., 1985; D' Alessio et aI., 1988; Tromp et aI., 1988) will make it possible to design PCR products to sequence each of the exons and the intron/exon boundaries of the entire proal (I) gene. The availability of such primers for the PCR will greatly

Structure of the Human COLlAl Gene Table II . Intron and exon sizes for the 3'-half of the human COllA 1 gene. Number 25 26 27 28 29 30 31 32 33/34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Exon (bp)

Intron (bp)

54 54 54 54 45 99 108 108 54 54 108 54 54 162 108 108 54 108 54 108 54 108 283 191 243

902 143 103 111 451 93 293 460 221 163 223 97 126 140 101 153 107 104 378 112 336 357 91 132 288 129

facilitate a greater search for mutations that can cause rare and common diseases of connective tissue. Acknowledgements The work presented here was supported in part by NIH research grants AR38188 and AR39740, a grant from the March of Dimes-Birth Defects Foundation, and a grant from the Lucille P. Markey Charitable Trust.

References Barsh, G.S., Roush, c.L. and Gelinas, R.E.: DNA and chromatin stru cture of the human a1 (I) collagen gene. J. Bioi. Chem. 259: 14906-14913,1984. Barsh, G.S., Roush, C. L., Bonadio, ]., Byers, P. H. and Gelinas, R. E.: Intron-mediated recombination may cause a deletion in an a1 type I collagen chain in a el thal form of osteogenesis imperfecta. Proc. Natl. Acad. Sci. USA 82: 2870-2874, 1985 . Bernard, M. P., Chu, M.-L., Myers, J. c., Ramirez, F., Eikenberry, E. F. and Prockop, D.J.: Nucleotide sequences of complementary deoxyribonucleic acids for the proal chain of human type I

379

procollagen. Statistical evaluation of structures that are conserved during evolution. Biochemist ry 22: 5213-5223, 1983. Byers, P. H.: Brittle bones-fragile molecules: Disorders of collagen gene structure and expression. Trends in Genet . 6: 293-300, 1990. Chu, M.-L., de Wet, W., Bernard, M., Ding, ].-F., Morabito, M., Myers, ]., Williams, C. and Ramirez, F.: Human pron 1(I) collagen ge ne structure reveals evolutionary conservation of a pattern of introns and exons. Nature 3 10: 337- 340, 1984. Chu, M.-L. , de Wet, W., Bernard, M. and Ramirez, F.: Fine structural ana lysis of the human proa 1 (I) collagen gene. Promoter structure, Alul repeats, and polymorphic transcripts. j. Bioi. Chem. 260: 2315-2320, 1985 a. Chu, M.-L., Garguilo, V., Williams, C. and Ramirez, F.: Multiexon deletion in an 01 variant with increased type III collagt:n mRNA. J. Bioi. Chem. 260: 691-694, 1985 b. Chu, M.-L. and Prockop, D.].: Collagen: Gene Structures. In : Extracellular Matrix and Inheritable Disorders of Connective Tissue, ed. by Royce, P. M. and Steinmann, B., Alan R. Liss, Inc., in press. D' Alessio, M., Bernard, M., Pretorius, P.]., de Wet, W. and Ramirez, F.: Complete nucleotide sequence of the region encompassing the first twenty-five exons of the human proa 1(l ) collagen gene (COllA 1). Gene 67: 105 - 115, 1988 . Kuivaniemi, H., Tromp, G. and Prockop, D.].: M utations in collagen genes. Causes of rare and some common diseases in man. FASEBJ 5: 2052-2060, 1991. Maniatis, T., Fritsch, E. F. and Sam brook, ].: Molecular Cloning: A Laboratory Manual. Cold Spring Harbor University Press, Cold Spring Harbor, 1982. Ramirez, F., Bernard, M., Chu, M.-L., Dickson, L., Sangiorgi, F., Weil, D., de Wet, W., Junien, C. and Sobel, M.: Isolation anJ characterization of the human fibrillar collagen genes. Arm . N . Y. Acad. Sci. 460: 11 7-129,1985. Saiki, R. K., Scharf,S., Faloona, F., Mullis, K. B., Horn, G. T., Erlich, H.A. and Arn heim , N. : Enzymatic amplification of 1:1globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230: 1350-1354, 1985. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467,1977. Tromp, G., Kuivaniemi, H., Stacey, A., Shikata, H., Baldwin, C. T., Jaenisch, R. and Prockop, D.].: Structure of a full-length cDNA done for the preproa1(I) chain of human type I procollagen. Biochem . J. 253: 919 -922, 1988. Tsuneyoshi, T., Westerhausc ll, A., Constantinou, C. D. and Prockop, D.J.: Substitutions for glyc ine al-637 and glycine u2-6':14 of type 1 procollagen in letha l osteogenesis imperfecra . The conformat ional strain 011 the triple helix introduced by a glyc ine substitution can be transmitted along the helix. J. Bioi. ChOIl., in press, 1991. Vuorio, E. and de Crombrugghe, B.: The family of collagen genes. Ann. Rev. Biochem. 59: 837-872, 1990. Dr. Darwin]. Prockop, Department of Biochemistry and Molecular Biology,]efferson Institute of Molecu lar Medicine,]efferson Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA.

Completion of the last half of the structure of the human gene for the Pro alpha 1 (I) chain of type I procollagen (COL1A1).

The nucleotide sequences of the 3'-half of the human gene for the pro alpha(I) chain of type I procollagen (COL1A1) is presented. The results provide ...
467KB Sizes 0 Downloads 0 Views

Recommend Documents