© 1992 Oxford University Press

Human Molecular Genetics, Vol. 1, No. 7

521-527

The genomic structure of the human skeletal muscle sodium channel gene Andrea I.McClatchey, Carol S.Lin, Jianzhou Wang1, Eric P.Hoffman1, Cecilia Rojas1 and James F.Gusella* Molecular Neurogenettcs Laboratory, Massachusetts General Hospital and Harvard Medical School, Chartestown, MA 02129 and departments of Molecular Genetics and Biochemistry, Human Genetics, and Pediatrics, University of Pittsburgh, PA 15261, USA Received June 1, 1992; Revised and Accepted August 18, 1992

GenBank accession nos L01962-L01983 (incl.)

Electrical excitability of neurons and muscle cells reflects the actions of a family of structurally related sodium channels. Mutations in the adult skeletal muscle sodium channel have been associated with the inherited neuromuscular disorders paramyotonia congenita (PMC) and hyperkalemic periodic paralysis (HPP). We have deciphered the entire genomic structure of the human skeletal muscle sodium channel gene and developed a restriction map of the locus. SCN4A consists of 24 exons spanning 35 kb of distance on chromosome 17q. We describe the sequence of all intron/exon boundaries, the presence of several polymorphisms in the coding sequence, and the locations within introns of two dinucleotide repeat polymorphisms. This is the first sodium channel for which the entire genomic structure has been resolved. The organization of the SCN4A exons relative to the proposed protein structure is presented and represents a foundation for functional and evolutionary comparisons of sodium channels. Knowledge of the exon structure and flanking intron sequences for SCN4A will permit a systematic search for mutations in PMC and HPP. INTRODUCTION The voltage-gated sodium channel is responsible for the initiation and propagation of the action potential in excitable cells (Barchi, 1988; Catterall, 1988). The mammalian sodium channel is composed of a large, membrane-spanning alpha-subunit and at least one, smaller beta-subunit. A voltage-gated sodium current can be generated in Xenopus oocytes by expression of the alphasubunit alone (Noda et al., 1986; Goldin et al., 1986; Suzuki et al., 1988; Auld et al., 1988). Consequently, there has been considerable interest in delineating the functional elements of the alpha-subunit, which is composed of four large, internally homologous domains, each of which is predicted to contain six membrane-spanning alpha-helices. It has been proposed that sodium channels may have evolved by genetic duplication of the phylogenetically older potassium channels, which resemble a single sodium channel domain (Hille, 1984). Six different sodium channels have been described in the rat, including four from brain, along with separate skeletal and cardiac muscle isoforms (Noda etal., 1986b; Kayanoetal., 1988; Auld

et al., 1988; Rogart et al., 1989; Trimmer et al., 1989; Kallen et al., 1990). Recendy the cDNA sequences of the human skeletal and cardiac muscle homologues also have been described (Gellens etal., 1992; George etal., 1992; Wang etal., 1992). However, in none of these cases has the genomic organization of die alphasubunit gene been determined. We have paid particular attention to the human skeletal muscle sodium channel gene (SCN4A), where mutations have been discovered that underlie two distinct neuromuscular disorders: paramyotonia congenita (PMC) and hyperkalemic periodic paralysis (HPP) (McClatchey et al., 1992a; Ptaceketal., 1991; Rojas etal., 1991). To facilitate the analysis of these disorders, and to provide a basis for functional and evolutionary comparisons of sodium channels, we have deciphered the complete genomic structure and physical map of the (SCN4A) gene. RESULTS Cloning and sequencing of the SCN4A gene A genomic cosmid library was screened with the 672 bp cDNA probe HNA2, which contains coding sequence corresponding to much of the intracellular loop connecting domains II and HI (bp 2449-3120) (Fontaine et al., 1990). Positively hybridizing colonies from a secondary screen were chosen for further study. Several cosmids were found to overlap based on restriction enzyme digestion and hybridization to HNA2, which detected two Hindin fragments of 2 kb and 8 kb, respectively. Since HNA2 contains no Hindin sites, it was evident that this segment is encoded by at least two exons. Sequence analysis of the 2 kb Hindin fragment from cosmid 322 revealed a single exon corresponding to 477 bp of the HNA2 probe, confirming the presence of SCN4A sequences in this cosmid. Using primers based on the rat cDNA sequence, on partial human cDNA sequence, on the subcloned 2 kb Hindin fragment and on the vector sequence adjacent to the cloning site, we generated DNA sequence using cosmid 322 as a template. For each section obtained, we synthesized new primers to extend the sequence and to sequence the reverse strand. We were able to locate the boundaries of 13 exons, and to sequence across the entire adjacent intron in some instances. Eleven of these exons encode segments corresponding to domains II and III, with the remaining two

• To whom correspondence should be addressed at: Molecular Neurogenetks Laboratory, Massachusetts General Hospital East, Building 149, 13th Street, Charlestown, MA 02129, USA

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

ABSTRACT

522 Human Molecular Genetics, Vol. 1, No. 7 catcctggcctatccagaggtcgcctgagcacagccagctgtggttctct ctctgtcccgattaggctaagccacacagttcccggagcttggtcccaggagcagtcgaaggcactgggtgttgaccccaagtcccaaggccacaaagaa tctggttggtttgccgtcggtcctgagggcagccgctgccacttctgggggaagtgtggtgggggggccaggggtgggcctcctgtgccagcactgtgca gcagcaggaacttgcaggcacatctcccagtcctgatagagtctgacgcccacctccccggctcggccactgggcaagcaggaaggtgaggagtggacgg cccgcccagggagggcggccgcccagcaccccggggcgtgcgcactgcagctccccaggccacccaccacccttctggtctctgagcccaggatgcgagg atggccagaccatctctgtgcaccctgg3j§c$tctgggccctgagtgcttgcgccccttcacccgggagtcactggcagccatagaacagcgggcggtgg H A R P S l C T L S t f L G P E C L R P F T R E S L A A I E Q R A V E aggaggaggcccggctgcagcggaataagcagatggagattgaggagcccgaacggaagccacgaagtgacttggaggctggcaagaacctacccatgat E E A R L Q R H K Q M E I E E P E R K P R S D L E A G K M L P M I ctacggagaccccccgccggaggtcatcggcatccccctggaggacctggatccctactacagcaataagaag|accttcatcgtactcaacaagggcaeg Y G O P P P E V I G I P L E D L D P Y Y S H K K T F I V L N K G K gccatcttccgcttctccgccacacctBCtctctecctgctgagccccttcagcgtagtcaggcgfggggccatcaaggtgctcatccatgcfgctgttca A I F R F S A T P A L Y L L S P F S V V R R G A I K V L I H A L F S gcatgttcatcatgatcaccatcttgaccaactgcgtattcatgaccatgagtgacccgcctccctggtccaagaatgtgga|gtacaccttcacagggat M F I H I T I L T M C V F M T M S D P P P U S K N V E Y T F T G I ctacacctttgagtccctcatcaagatactggcccgaggcttctgtgtcgacgacttcacattcctccgggacccctggaactggctggacttcagtgtc Y T F E S L I K I L A R G F C V D D F T F L R D P U N U L O F S V atcatgatggc|gtacctgacagagtttgtggacttgggca«catctcagccctgaggaccttccgggtgctgcgggccctcaaaaccatcacggtcatcc I M M A Y L T E F V D L G N I S A L R T F R V L R A L K T I T V I P cag|ggctgaagacgatcgtgggggccctgatccagtcggtgaaiaagctgtcggatgtgatgatcctcactgtcttctgcctgagcgtctttgcgctggt G L K T I V G A L I Q S V K K L S D V M I L T V F C L S V F A L V aggactgcagctcttcatgggaaacctgaggcagaagtgtgtgcgctggcccccgccgttcaacgacaccaacaccacgtggtacagcaatgacacgtgg G L Q L F H G N L R Q K C V R U P P P F N D T N T T U Y S N D T U

-401 -301 -201 -101 - 1 100 34 200 67 300 100 400 134 500 167 600 200 700 234 800 267 900 300

301 1001 335 1101 368 1201 401 1301 435 1401 468

Y G N D T U Y G N E M U Y G N D S U Y A N D T U N S H A S U A T N D atacctttgattgggacgcctacatcagtgatgaag|ggaacttctacttcctggagggctccaacgatgccctgctctgtgggaacagcagtgatgctgg T F D U D A Y I S D E G N F Y F L E G S N D A L L C G N S S D A G |gcactgccct$agggttatgagtgcatcaagaccgggcggaaccccaactatggctacaccagctatgacaccttcagctgggccttcttggctctcttc H C P E G Y E C I K T G R N P M Y G Y T S Y D T F S U A F L A L F cgcctcatgacacaggactattgggagaacctcttccagctg|acccttcgagc8gctggcaagacctacatgatcttcttcgtggtcatcatcttcctgg R L H T Q D Y U E N L F Q L T L R A A G K T Y H I F F V V I I F L G gctctttctacctcatcaatctgatcctggccgtggtggccatggcatatgccgagcagaatgaggccaccctggccgaggataaggagaaagaggagga S F Y L I N L I L A V V A M A Y A E Q N E A T L A E D K E K E E E gtttcagcagatgcttgagaagttcaaaaagcaccaggaggagctggagaag|gccaaggccgcccaagctctggaaggtggggaggcagatggggaccca F Q Q H L E K F K K H O E E L E K A K A A Q A L E G G E A D G D P

334 1100 367 1200 400 1300 434 1400 467 1500 500

501 1601 535 1701 568 1801 601 1901 635 2001 668 2101 701 2201 735 2301 768 2401 801 2501 835 2601 868 2701 901 2801 935 2901 968 3001 1001 3101 1035 3201 1068

A H G K D C N G S L D T S Q G E K G A P R Q S G S G D S G I S D A M tggaag|aactggaagaggcccaccaaaagtgcccaccatggtggtacaagtgcgcccacaaagtgctcatatgg|actgctgcgccccgtggctgaagtt E E L E E A H Q K C P P U U Y K C A H K V L I U | C C A P U L K F caagaacatcatccacctgatcgtcatggacccgttcgtggacctgggcatcaccatctgcatcgtgctcaacaccctcttcatggccatggaacattac K N I I H L I V H D P F V D L G I T I C I V L N T L F H A H E H Y cccatgacggagcactttgacaacgtgctcactgtgggcaacctg|gtcttcacaggcatcttcacagcagagatggttctgaagctgattgccatggacc P H T E H F D N V L T V G N L V F T G I F T A E M V L K L I A M D P cctacgagtatttccagcagggttggaatatcttcgacagcatcatcgtcaccctcagcctggtagagctaggcctggccaacgtacagggactgtctgt Y E Y F Q Q G U N I F D S I I V T L S L V E L G L A N V O G L S V gctacgctccttccgtctg|ctgcgggtcttcaagctggccaagtcgtggccaacgctgaacatgctcatcaagatcattggcaattcagtgggggcgctg L R S F R L L R V F K L A K S W P T L N H L I K I I G N S V G A L ggtaacctgacgctggtgctggctatcatcgtgttcatcttcgccgtggtgggcatgcagctgtttggcaagagctacaaggagtgcgtgtgcaagattg G M L T L V L A I I V F I F A V V G M O L F G K S Y K E C V C K I A ccttggactgcaacctgccgcgctggcacatgcatgatttcttccactccttcctcatcgtcttccgcatcctgtgcggggagtggatcgagaccatgtg L D C N L P R U H H H D F F H S F L I V F R I L C G E U I E T H U ggactgcatggaggtggccggccaagcfatgtgcctcaccgtcttcctcatggtcatggtcatcggcaatcttgtgfgtcctgaacctgttcctggctctg D C M E V A G O A M C L T V F L M V M V I G N L V V L N L F L A L ctgctgagctccttcagcgccgacagtctggcagcctcggatgaggatggcgagatgaacaacctgcagattgccatcgggcgcatcaagttgggcatcg L L S S F S A D S L A A S D E D G E M N N L O I A I G R I K L G I G gctttgccaaggccttcctcctggggctgctgcatggcaagatcctgagccccaaggacatcatgctcagcctcggggaggctgacggggccggggaggc F A K A F L L G L L H G K I L S P K D I H L S L G E A D G A G E A tggagaggcgggggagactgcccccgaggatgagaagaaggagccgcccgaggaggacctgaagaaggacaatcacatcctgaaccacatgggcctggct G E A G E T A P E D E K K E P P E E D L K K D N H I L N H H G L A gacggccccccatccagcctcgagctggaccaccttaacttcatcaacaacccctacctgaccatacaggtgcccatcgcctccgaggagtccgacctgg D G P P S S L E L D H L M F I N M P Y L T I O V P I A S E E S D L E agatgcccaccgaggaggaaaccgacactttctcagagcctgaggatagcaagjaagccgccgcagcctctctatgatgggaactcgtccgtctgcagcac M P T E E E T D T F S E P E D S K K P P Q P L Y D G M S S V C S T agctgactacaagccccccgaggaggaccctgaggagcaggcagaggagaaccccgagggggagcagcctgaggagtgcttcactgagg|cctgcgtgcag A D Y K P P E E D P E E Q A E E M P E G E Q P E E C F T E A C V Q cgctggccctgcctctstgtggacatctcccagggccgtgggaagaagtggtggactctgcgcagggcctgcttcaagattgtcgagcacaactggttcg R U P C L Y V O 1 S Q G R G K K W W T L R R A C F K I V E H N W F E agaccttcattgtcttcatgatcctgctcagcagtggggctctg|gccttcgaggacatctacattgagcagcggcgagtcattcgcaccatcctagaata T F I V F M I L L S S G A L A F E D I Y I E Q R R V I R T 1 L E Y tgccgacaaggtcttcacctacatcttcatcatggagatgctgctcaaatgggtggcctacggctttaaggtgtacttcaccaacgcctggtgctggctc A D K V F T Y I F I M E N L L K U V A Y G F K V Y F T H A U C U L

534 1700 567 1800 600 1900 634 2000 667 2100 700 2200 734 2300 767 2400 800 2500 834 2600 867 2700 900 2800 934 2900 967 3000 1000 3100 1034 3200 1067 3300 1100

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

-450 -400 -300 -200 -100 1 1 101 35 201 68 301 101 401 135 501 168 601 201 701 235 801 268

Human Molecular Genetics, Vol. 1, No. 7 523 3301

gacttcctcatcgtggatQgtctccatcatcagcttggtggccaactggctgggctactcggagctgggacccatcaaatccctgcggacactgcg|gccc

1101

D F L I V D

3401 1135 3501 1168 3601 1201 3701 1235 3801 1268 3901 1301 4001 1335 4101

tgcgtcccctgagggcactgtcccgattcgagggcatgagsjjgtggtggtgaacgccctcctaggcgccatcccctccatcatgaatgtgctgcttgtctg R P L R A L S R F E G H R V V V M A L L G A I P S I H H V L L V C cctcatcttctggctgatcttcagcatcatgggtgtcaacctgtttgccggcaagttctactactgcatcaacaccaccacctctgagaggttcgacatc L I F U L I F S I H G V H L F A G K F Y Y C I N T T T S E R F D I tccgaggtcaacaacaagtctgagtgcgagagcctcatgcacacaggccaggtccgctggctcutgtcaaggtcaactacgacaacgtgggtctgggct S E V M N K S E C E S L M H T G Q V R W L H V I C V N Y O W V G L G Y BcctctccctcctgcaggtgQgccaccttcaagggttggatggacatcatgtatgcagccgtggactcccgggagQaaggaggagcagccgcagtacgaggt L S L L Q V A T F K G W H D I H Y A A V D S R E K E E Q P Q Y E V gaacctctacatgtacctctactttgtcatcttcatcatctttggctccttcttcaccctcaacctcttcattggcgtcatcattgacaacttcaaccag M L Y H Y L Y F V I F I I F G S F F T L H L F I G V I I D N F M Q cagaagaagaagQttaggggggaaagacatctttatgacggaggaacagaagaaatactataacgccatgaagaagcttggctccaagaagcctcagaagc O K K K L G G K D I F M T E E Q K K Y Y H A M I C I C L G S I C I C P Q K P caattccccggccccasQa«caagatccagggcatggtgtatgacctcgtgacga«gcaggccttcgacatcaccatcatgatcctcatctgcctcaacat I P R P Q H K I Q G H V Y D L V T I C Q A F D I T I H I L I C L N H ggtj$accatgatggtggagacagad|accagagccagctcaaggtggacatcctgtacaacatcaacatgatcttcatcatcatcttcacaggggagtgc

1368

V

"

T

M

M

V

V

E

S

T

D

I

I

|

S

Q

L

S

V

Q

A

L

N

K

V

U

D

L

I

G

Y

L

S

Y

E

H

L

I

G

P

M H

I

I

K

F

S

I

L

R

I

T

I

L

R

'

A

L

F T G E C

3400 1134

3500 1167 3600 1200 3700 1234 3800 1267 3900 1300 4000 1334 4100 1367 4200 1400

gtgctcaagatgctcgccctgcgccagtactacttcaccgttggctggaacatctttgacttcgtggtcgtcatcctgtccattgtggQgccttgccctct V L K H L A L R Q Y Y F T V G W M I F D F V V V I L S I V G L A L S

4300 1434

1435 4401 1468 4501 1501

D L I Q K Y F V S P T L F R V I R L A R I G R V L R L I R G A K G catccggacgctgctgttcgccctcatgatgtcgctgcctgccctcttcaacatcggcctcctcctcttcctggtcatgttcatctactccatcttcggc I R T L L F A L M M S L P A L F H I G L L L F L V H F I Y S I F G atgtccaactttgcctacgtcaagaaggagctgggcatfgatgatatgttcaacttcgagaccttcggcaacagcatcatctgcctgttcgagatcacca M S N F A Y V K K E S G I D D H F N F E T F G N S I I C L F E I T T

1467 4500 1500 4600 1534

1535 4701 1568 4801 1601

S A G U D G L L N P I L N S G P P D C D P N L E N P G T S V K G D ctgcggcaacccctccatcggcatctgcttcttctgcagctatatcatcatctccttcctcatcgtggtcaacatgtacatcgccatcatcctggagaac C G N P S I G I C F F C S Y I I I S F L 1 V V M M Y I A I I L E N ttcaatgtggccacagaggagagcagcgagccccttggtgaagatgactttgagatgttctacgagacftgggagaagttcgaccccgacgccacccagt F N V A T E E S S E P L G E D D F E H F Y E T U E K F D P D A T O F

1567 4800 1600 4900 1634

1635 5001 1668 5101 1701

I A Y S R L S D F V D T L O E P L R I A K P N K I K L I T L D L P catggtgccaggggacaagatccactgcctggacatcctctttgccctgaccaaagaggtcctgggtgactctggggaaatggacgccctcMgcagacc H V P G D K I H C L D I L F A L T K E V L G D S G E H D A L K O T atggaggagaagttcatggcagccaacccctccaaggtgtcctacgagcccatcaccaccaccctcaagaggaagcacgaggaggtgtgcgccatcaaga M E E K F H A A N P S K V S Y E P I T T T L K R K H E E V C A I I C I

1667 5100 1700 5200 1734

1735 5301 1768

Q R A Y R R H L L Q R S H K Q A S Y H Y R H S H D G S G D D A P E gaaggaggggctgcttgccaacaccatgagcaagatgtatggccacgagaatgggaacagcagctcgcceagcccggaggagaagggcgaggcaggggac K E G L L A N T H S K H Y G H E N G N S S S P S P E E K G E A G D

1767 5400 1800

1801 5501 1835 5601 5701 5801 5901 6001

A G P T H G L M P I S P S D T A U P P A P P P G Q T V R P G V K E S ctcttgtctagcaggcagcatcggggtggcccactgagtctcggcatagtccccagagctcccccgtggtgcctgcacacagagtgagggaggagggctt L V * tgaatctgggactgtgcctggctccctgatgggggacaggatttggccacactggggctgacacccaggcccgagcgcctgcgttcccagaccatgggaa atgggaattgcgctcaggggctccatgctgggtctgaggcccctgcctccaagatttaacctgcaagttgctctgacctcctctgggccctgtcgcccct ccttttggcctgggggaggtcaggacattcgaatctctgcccctcacttgaggaggagctggcctgcggtggagggatcagttgccccccatcaccagag tcttaagggtcactggcctctccccaggaagtggctcagacccctcagccccagcccagacaaagatgtcttaacctcagggagtgcagacacctaaccc cagggcactgccagcccaccccctttgoctctggggtgcagcttcacccaccaggccagctcaggaattccctggaaaagggaaatgtgactggttcaga

1834 5600 1836 5700 5800 5900 6000 6100

6301 6401 6501 6601 6701 6801 6901

ctgggtcctgccagctggatcgctgcacatggcctgagcatccagacctgagcgggagtcagggacctgctgctcagtaagaagattctcgccccttccc tctctccctgcctcactcctccgtgagcaccaccagggcttccaggagcctcatccagcctcagagatctcccttctcatctccccacgcccgtctcttt ctcacctttcccacctctctccccaaagtgatcctaagaatgtacagttgagctcaggttagatatttcgaccctggggcgtgcagcagggaggcccaac tggttcaggctcaaccttacaacttcctgtggcctgaagaagcacttctgctgcatcgctgttctgggcatggcagggccaggcctctgctggctcagga ggaggggtgagagacctgctcaggcgtcgctggatttattcacttgtgtgtgtacctgtggctgtgtgtctgcttgtatgcttttataggcctgtgtgta tagctgtgtgtgtgttcaagtgcgtgactgtatgtgtgtgtgtgaaccactgtgtactggagcctgcattatgcacgtgtctgggtatctttgtatatat gtgtatatatgtgtgccctggactgtttcaaggtccatggagtacggctggtgtgtcatactgtgcaggcctgtccctgggagtgttcccgtgcctggga

6400 6500 6600 6700 6800 6900 7000

7101 7201 7301

ttgtttctcatgaaaagtttgattaaaattcaggaagcagcaaaaccttcaaaacaagacatgtatgtgtgcttgagtgtgtgaacacgtgtgtgtgtgt gcacatctacatgccatgcctatgggccagagttgtctttattgtccaccatgctctctcacctgctcccagtcctgcctgaacagccctctctctcact cccctctcctccccttcctgtttctcgttgtcacacccatggcctcagccctgctccctgcctcctgcctatgtctcctctatggaaggaggctccactc

7200 7300 7400

7501 7601 7701

gcctggagcctcaga^atctgtttgattggatatgctcctcggactgtgtgcaggttgcagaggaagagtagatgagccgggtccggcctctccctgcct gtggcccctcccctgcagacggatgcccattcctgcctggtccagtggggaacaggtcccacgccaggccagcaggcgggctcctttgtacagttcttec aataaa

7600 7700 7706

Figure 1. Complete nucleotide sequence of the SCN4A coding sequence including 5' and 3' untranslated sequences. The locations of introns are indicated by a black bar in the sequence. Shaded bases and amino acids indicate polymorphic residues or positions where our coding sequence differs from that of George et al. (1992). Differences from George et al., (1992) in the untranslated regions are not shown. Confirmed polymorphisms vary as follows: bp 366 C/T; 2328 C/A; 4123 A/G (Asn or Asp); 4536 C/A; 4916 A/G. We found the following coding sequence differences when compared to the recently published cDNA sequence fin parenthesis): bp 29 T(C) and 30 G(T) which encode a Val (Ala); 32 C(G) which encodes a Pro(Arg); 744 A(G); 1111 G(A) which encodes a Glu(Lys); 1675 A(G) which encodes a Asn(Asp); 3018 T(Q; 3396 T(G); 4101 T(Q.

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

4201 1401

524

Human Molecular Genetics, Vol. 1, No. 7

comprising domain IV. Most of domain IV is encoded in the final exon which also contains the 3' untranslated sequences and the polyadenylation signal. To define the exons encoding domain I and the 5' untranslated sequences, we generated a probe by PCR amplification, using the recently defined sequence from the 5' end of the human cDNA (Wang et al., 1992). The same cosmid library was rescreened with this probe and a single cosmid (51.22) was identified. Direct sequencing of the cosmid was again used to determine exon boundaries for 11 additional exons. The complete sequence of the 24 exons as well as 5' and 3' untranslated regions as determined from both strands is shown in Figure 1. Exon boundaries The precise exon boundaries were determined by comparing the genomic sequence to that of the cDNA. Figure 2 shows the

EXON 1 EXON 2 EXON 3 EXON 4 EXON 5 EXON 6 EXON 7 EXON 8 EXON 9 EXON 10 EXON 11 EXON 12 EXON 13 EXON 14 EXON 15 EXON 16 EXON 17 EXON 18 EXON 19 EXON 20 EXON 21 EXON 22 EXON 23 EXON 24

exonIintron MAG|GTRAGT.

Physical map of SCN4A We generated a map of the SCN4A locus using lambda terminase to linearize the cosmids, followed by partial digestion with Hindm and EcoRI, Southern blotting and hybridization to an oligonucleotide corresponding to the cos site in the vector. Subsequently, the location of each exon was determined by hybridizing single and double digests of the overlapping cosmids 322 and 51.22 with oligonucleotides. These data, combined with 25,268 bp of raw sequence information from the cosmids, enabled us to construct the map in Figure 3. The sequence information

branchpoint .YNYTRAY

intronIexon

l

EXON ,.•caoccaccacccttotgqtototqaqcooaqqatqoqaqq ATG GCC bp MA AAG AAGIgtotgg..98.. . . . otqrtctqtccotqqqfcofcoaoaqoototctocoaccttaq | ACC TTC EXON Ep T F CAT GCIatatoo..122. ...catcctaqctqcttcctcactttoottqaocctqoocoao|G CTG TTC EXON HA — bp L F EXON GTG GA|qtaagt..286. . . . ccaqcrtqaoatqcotqcccoqqcoqqcccrtctoococoaq | G TAC ACC V E bp Y T ATG GCIgtgaqo..470. ...agqototqaoaqtccccaocoocctccacccocrtaoocaq|G TAC CTG EXON M A bp Y L ATC CCA GJgtaotg. .2900 . . . catqccccccrtqooaafcatcottgccctototqcqqtoaq | GG CTG EXON Ep C L GAT GAA GIgtaaqa..1800 . . . gtqqqtgacttqoctcotttqrtggootabqaotoooaaaq | GG AAC EXON D E bp G N GCT GGIgtgagt..240. ...ctcctcccaoootaaacooagooccotqtootattcctag|G CAC TGC EXON a A B bp H C EXON CAG CTGIgtaoag..800. ,..ogtcatgggaocoofccc*ofcocotctccotooaotoooag|ACC CTT bp T L GAG AAG|gtotgg..600. . . . agaqctqgaottcotgcoaqrto*otcttgctggtotaoaq | GCC AAG EXON bp A K E ~~YT EXON ATG GAA GIgtqagt..2100 ,..gataatootcaocaqatacctgctaccacccctotcccaq|AA CTG bp EL M E~ EXON AAC CTGIgtaggg..1700 ...tccccqaqqototqbqaoaqqqootoatqcoacooootaq|GTC TTC bp ~^V F EXON CGT CTG|qtaoao..3900 ...tccctqqqcotoa'fcfccaoccttqooctooctgottggcaq|CTG CGG R ~JT — bp L R EXON CTT GTGIgtqaqt..1600 ...cqqqqactqatqtqaqqotccoootqaocccatqccocaq|GTC CTG L V bp ~V L AGC AAGIgtqaga..1900 . . . tcagggaoooaga'tgtoatgtgacootgtggoctooqoaq | AAG CCG EXON S ~~K~~ bp ~~K P EXON ACT GAG GIgtgagg..2200 .ooatqatgtggtqcctgtooctgtooocootcaccooatcaq|CC TGC T E ~ bp A C EXON GCT CTGIgtaqqo..300. ...ctqotggaocccagcoaccctggtgtgotggtggoocoaqIGCC TTC A ~~L bp A F EXON GTG GATIgtqagt..1200 ...gqqccttqcacqoaotqatcccctcacccctcccctqcaq|GTC TCC V U bp ~V S EXON ATG AGG|gtgggt..1100 ...ggctggqaqaqqcactgqcaatqqaocccctqqcocccag|GTG GTG bp ~V V EXON CAG GTGIgtqaqt..300. ...agcaqccccocoqgotoactoooc—ocaoaotctotcaqIGCC ACC bp AT EXON CGG GAG|qtqagt..300. ,..gcqrtqoaamoccacgcogtoccacootgtggotcooaoaq|AAG GAG R E~ bp ~~K E EXON AAG AAG|atqagt..800. ...actgcacctcaaoatgaoaagccctatcccactatactta|GGG GGG K —K— — — — bp G G EXON CCC CAGIgtaoag..600. . . . caooocMOtcctccaoatccaotocoacccatqctqoaq | AAC AAG P "Or — bp ~N K EXON ATT GTG GIgtgaqo..800. ,..cctccccagtgqcagcgtoaboaotaqcttctccccgcaq|GC CTT I V ~ G L GTC TAG caggoa..2100 ...acgccaggccagaaggcgggctcctttgtacagttottac AATAAA

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Figure 2. Alignment of intron/exon boundaries and comparison to consensus sequences. Intron sequences are represented by lower case letters; exon sequences are represented by capital letters. Deduced amino acid sequences arc aligned below each exon codon. Exons are numbered starting at the 5' end of the SCN4A gene. The sequences of the first three introns have been determined from both strands, while the sizes of the remaining 20 introns arc approximate. Consensus sequences for acceptor, donor and branchpoint sites arc given on the top line using the following abbreviations: Y=C/T, M=A/C, R=A/G and N = A/G/C or T. Intron and exon sequences corresponding to the consensus acceptor and donor sequences arc underlined. Possible branchpoint sequences (at least 5/7 conforming to the consensus) arc in bold (Krainer and Maniatis, 1988). The termination codon TAG is represented by an asterisk (*). Residues in agreement with the consensus eukaryotic initiation site (CCACCATG; Kozak, 1984) and an upstream in-frame ATG found in all sodium channels sequenced to date are underlined.

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

CONSENSUSi

putative branchpoint, acceptor and donor sequences for each of the 24 exons. While most of these correspond well to the consensus sequences, there are several exceptions, most notably the donor sequence of exon 2. The first and last exons, containing the 5' and 3' untranslated sequences, respectively, are the largest.

Human Molecular Genetics, Vol. 1, No. 7 525 also provided the location of polymorphic (dGdA)n and (dGdT)n dinucleotide repeats used in our previous linkage study (McClatchey et al., 1992b). Interestingly, thefirstfiveexons were found to be connected by several very small introns. The entire gene is surprisingly small, with 8.2 kb of coding sequence spanning only 35 kb of physical distance. In fact, cosmid 51.22 was found to contain nearly the entire gene, including 5' UTR and partial 3' UTR sequences, stopping short of the polyadenylation signal.

Polymorphisms By comparing SCN4A sequences from different individuals by genomic PCR and direct sequencing, we detected several polymorphisms in the SCN4A gene. Several are intronic variations removed from any obvious splicing components (not shown). Others are found in the coding sequence and represent 'silent' polymorphisms, predicted to have no effect on the translated amino acid sequence (Figure 1). In addition, however, we found one polymorphism that would alter an amino acid. At bp 4126, a AAC codes for an Asn in some individuals, while a GAC codes for a negatively charged Asp in others. This amino acid is predicted to be in the tiny extracellular loop connecting segments SI and S2 of domain IV, and it is unclear how this would affect the function of the sodium channel. In each case, we were able to associate the polymorphism with a particular haplotype with respect to our (dGdA)n and (dGdT)n repeats, further defining different SCN4A alleles. In addition, there are several positions where our sequence differs from that of George et al (1992), many of which may also represent polymorphisms (also shown in Figure 1). DISCUSSION The human skeletal muscle sodium channel is the first sodium channel from any species for which the genomic structure has been completely delineated. Knowledge of the intron sequences

domain IV

I I I

BB

II HB

•• M

I B

R B R

H

H

I B

1 1 1 ! I B

RB

322 51.22

1kb

Figure 3. Structure of the SCN4A gene. Schematic of the gene with exons shown as blackened boxes and introns shown as connecting solid lines. The locations of exons encoding transmembrane segments of domains I, II, ID and IV are shown above. The alignment of cosmids 322 and 51.22 and locations of polymorphic (dGdA)n and (dGdT)n repeats and restriction sites are shown below (H, HindUJ; R.EcoRI; B.BamHI). Cosmids 322 and 51.22 extend for several Irilobases to the right and left, respectively.

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

3' and 5' UTR Exon 24 contains 2197 bp of untranslated sequence from the stop codon to the polyadenylation signal. There are patches of the human 3' UTR which share significant homology to that of the rat. The 5' UTR is also similar to that of the rat for the 450 bp of 5' untranslated sequence described for the rat cDNA. There is no consensus TATA box within 2 kb of the translation^ start site, and we are currently characterizing the transcriptional initiation site and promoter elements.

flanking each exon has practical applications as well as evolutionary and structural implications. The human SCN4A coding sequence shows both strong similarity and a number of notable differences when compared to the sequence of the rat cDNA. In the coding regions, the sequence identity is 88%. However, the human 5' UTR region displays strong similarity to that of the rat (68%) across the 450 bp of the rat 5' UTR which have been described. The 3' UTRs share patches of >72% similarity, suggesting that participation in regulation of expression might constrain their divergence. For example, the 5' UTR of the rat brain II sodium channel is known to contain a powerful transcriptional silencer (Maue et al., 1990). The human and rat proteins share a 91.8% amino acid identity, with the greatest similarity extending throughout the proposed membrane spanning segments and the intraceUular loop connecting domains IE and IV. The most striking deviation from the rat cDNA sequence occurs near the carboxyl terminus of the protein, where there is a region of poor homology, and the human sequence is shortened by nine amino acids. An examination of the human nucleic acid sequence in this region of difference reveals an inverted repeat of ten base pairs (bp 5415-5436), which could form a large stem loop analogous to structures known to act as translational regulators. When the secondary structure of this region is simulated using the Zuker algorithm (Zuker and Steinberg, 1981) this stem loop is predicted to form, even when large flanking regions are included in the analysis. This region is followed by the final thirteen amino acids of the protein, which are identical in human and rat. The human and rat proteins also differ in the extracellular loop between segments S5 and S6 of domain I, which is poorly conserved among other sodium channel isoforms. The rat amino acid sequence predicts five potential relinked glycosylation sites in a region containing four repeats of the motif WY(G,I,S)NDT proposed to have arisen by sequential duplication of the underlying DNA sequence to form direct repeats (Trimmer et al., 1989). The human sequence predicts the addition of the closely related motif (WYGNEM), thereby producing a sixth potential glycosylation site. The location of each of the 24 introns can be viewed in the context of the proposed topology of the sodium channel in the membrane as shown in Figure 4. The four domains are similar to each other, with domains II and IV sharing the greatest amino acid similarity (30% identity, 60% similarity). Interestingly, domains I and HI share a strikingly similar intron placement. It has been proposed that the sodium channel may have arisen by genetic duplication of an ancestral potassium channel (Hille, 1984). Potassium channels are tetramers

526 Human Molecular Genetics, Vol. 1, No. 7

II

III

IV

Extracellular

composed of subunits resembling a single sodium channel domain; each contains six hydrophobic alpha helices that have limited similarity to those of the sodium channel. There are several different types of potassium channels which differ in their physiological properties. Given the genomic organization presented here, and the resemblance between domains within the gene, it is tempting to speculate that the sodium channel arose by the joining of two different potassium channel genes, followed by genetic duplication to produce four domains. The genomic structures of most potassium channels have not been resolved, but some of the 5/wJter-related genes are known to be intronless (Chandy et al., 1990), and a member of the S/ww-related family of genes has been shown to contain a single intron located immediately preceding the SI segment of its single domain (Ghanshani et al., 1991). It will be of great interest to track the evolutionary progress of the genomic organization of the skeletal muscle sodium channel, and to compare it to those of other sodium channels as they are deciphered. To date, only the locations of a few introns within the Drosophila type I sodium channel have been described (Salkoff et al., 1987). A comparison reveals that the placement of the 8th, 11th and 12th SCN4A introns is identical to the placement of introns in the Drosophila channel, while the 21st intron is located one codon 3' of the site of an intron in the Drosophila gene. We have also determined that an intron is present in the mouse gene at a position corresponding precisely to that of the 21st intron in SCN4A (McClatchey, unpublished results). The SCN4A locus has been tightly linked to two neuromuscular diseases, paramyotonia congenita (PMC) and hyperkalemic periodic paralysis (HPP), for which abnormal tetrodotoxinsensitive sodium currents are detected in affected muscle

(Fontaine etal., 1990; Koch etal., 1991a, 1991b; Ptaceketal., 1991a, 1991b; Ebers et al., 1991; McClatchey et al., 1992b). Clinically, PMC is characterized by muscle stiffness (myotonia) which is particularly exacerbated by cold temperatures, while HPP involves intermittent attacks of paralysis associated with elevated serum potassium (Engel, 1986). Using partial knowledge of the SCN4A genomic structure, we have recently discovered two different single base alterations in the same exon in affected members of three families with PMC (McClatchey et al., 1992a). Both mutations affect amino acids in the intracellular loop connecting domains m and IV of the channel. Two additional PMC mutations, both affecting the same codon at the external edge of the S4 segment of domain IV, have been described (Ptacek et al., 1992). Two different single base alterations have been described in affected members of families with HPP (Ptacek et al., 1991c; Rojas et al., 1991). These HPP mutations were found in transmembrane segments of domains II and IV, distinct from the affected regions in PMC. We have located several polymorphisms within the SCN4A gene, including one predicted to alter an amino acid located in the extracellular linker between segments SI and S2 in domain IV. Rojas et al. (1991), have also described a polymorphism in the intracellular loop connecting domains II and in which effects an amino acid change Glu to Asp. Unlike the PMC and HPP changes, these naturally occurring variations do not produce an obvious phenotype. However, it is conceivable that these polymorphisms, when present in a sodium channel carrying a PMC or HPP mutation, may act to modify the disease phenotype. The knowledge of the genomic structure of SCN4A should greatly facilitate the scanning of exons for sequence alterations. Ultimately, delineation of the full array of disease mutations and protein polymorphisms in SCN4A and characterization of the

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

Figure 4. Superimposition of inlrons on the proposed secondary structure of the sodium channel protein. Locations of introns designated by blackened boxes (modified from brain channel model of Caterall, 1988, and Kandcl et al., 1991).

Human Molecular Genetics, VoL 1, No. 7 527 corresponding phenotypes should serve to identify segments of the protein crucial to particular aspects of sodium channel function. EXPERIMENTAL PROCEDURES Standard techniques DNA and RNA purification, oligonucleotide labeling and hybridization were performed as described (Sambrook et al., 1989), with the following alteration: Oligonucleotide hybridization was at 42°C, followed by three five minute washes at 25°C in 5xSSC. Labeling by the random primer method was performed according to Feinberg and Vogelstein (1983) and Southern blotting according to Gusella (1983).

Sequencing Cosmid DNA was purified using a cesium chloride gradient and sequenced according to the United States Biochemical (USB) protocol with the following modifications: Cosmid DNA was denatured in .2M NaOH, followed by neutralization with 2M ammonium acetate. At this point 8 ng of primer was added, at room temperature, directly followed by the addition of ethanol and precipitation. The DNA-primer complex was later pelleted and resuspended in 1 x Sequenase buffer. The USB sequencing protocol was completed as recommended. Direct PCR sequencing was performed on PCR products that had been purified using a Centricon-30 column (Amicon) with 2 Tns-EDTA washes. To 7 n\ of the resulting product, 2 ^1 of 5 x sequenase buffer and 4 ng of primer was added and the sample was denatured at 95°C for 3 minutes. Subsequent sequencing was completed as recommended in the USB protocol. The sequence of each exon, as well as 50 bp of intron sequence flanking each exon, was determined from both strands, and submitted to GenBank [Accession nos L01962-L01983 (incl.)].

ACKNOWLEDGMENTS This work was supported by NIH grants NS22224 and NS24279. C.L. was the recipient of a Huntington's Disease Society of America post-doctoral fellowship. We thank John Lester for assistance with the artwork.

REFERENCES 1. Auld, V.J., Goldin, A.L., Krafte, D.S., Marshall, J., Dunn, J.M., Catterall W.A., Lester, H.A., Davidson, N., and Dunn, R.J. (1988) Neuron 1, 449461. 2. Barchi, R.L. (1988) Annu. Rev. Ncurosci. 11, 455-495. 3. Catterall, W.A. (1988) Science 242, 5 0 - 6 1 . 4. Ebers, G.C., George A.L., Barchi, R.L., Ting-Passador, S.S., Kallen, R.G., Lathrop, G.M., Beckmann, J.S., Hahn, A.F., Brown, W.F., Campbell, R.D., and Hudson, A J . (1991) Ann. Neurol. 30, 810-816. 5. Engel, A.G. (1986) In Myology, A.G. Engel and B.Q. Banker, eds. (New York: McGraw-Hill), pp. 843-1870. 6. Feinberg, A.P., and Vogelstein, B. (1983) Ann. Biochem. 132, 6 - 1 3 . 7. Fontaine, B., Khurana, T.S., Hoffman, E.P., Brans, G., Haines, J.L., Trofatter, J., Hanson, M., Rich, J., McFarlane, H., Yasek, D.M., Romano, D., Gusella, J.F., and Brown, R.H., Jr. (1990) Science 250, 1000-1002. 8. Gellens, M.E., George, A.L. Jr., Chen, L., Chahine, M., Horn, R., Barchi, R.L., and Kallen, R.G. (1992) Proc. Nail. Acad. Sci. USA 89, 554-558. 9. George, A.L., Jr., Komisarof, J., Kallen, R.G., and Barchi, R.L. (1992) Ann. Neurol. 31, 131-137. 10. Ghanshani, S., Pak, M., McPherson, J.D., Strong, M., Dethlefs, B., Wasmuth, J.J., Salkoff, L., Gutman, G., and Chandy, G. (1992) Genomics 12, 1 9 0 - 1 % .

Downloaded from http://hmg.oxfordjournals.org/ at East Carolina University on April 20, 2015

Sodium channel cosrolds and probes The cosmids 322 and 51.22 were isolated from a library of human lymphoblast DNA in pWE15. The library was plated on nylon filters and incubated at 37"C overnight after which replica filters were prepared and processed for hybridization (Gusella et al., 1980). The HNA2 probe was previously isolated and described (Fontaine et al., 1990). The 125 bp 5' end probe was generated by PCR using the oUgonucleotides 5'CCTTCACCCGGGAGTCACTGG3' and 5'AGCCTCCAAGTCACTTCGTGG3' (bp 56 to 180 in Fig. 1). PCR was performed on approximately 200 ng of total human genomic DNA using the following amplification scheme: r@94°C,r@62°C,3'@72' > C. The product was resolved on a 1 % low melt agarose gel, excised and labeled by the random primer method (Feinberg and Vogelstein, 1983).

11. Goldin, A., Snutch, T., Lubbert, H., Dowsett, A., Marshall, J., Auld, V., Downey, W., Fritz, L., Lester, H., Dunn, R., Catterall, W.A., and Davidson, N. (1986) Proc. Natl. Acad. Sci. USA 83, 7503-7507. 12. Gusella, J.F., Keys, C , Varsanyi-Breiner, A., Kao, F.T., Jones, C , Puck, T.T., and Housman, D. (1980) Proc. Natl. Acad. Sci. USA 77, 2829-2833. 13. Gusella, J.F., Wexler, N.S., ConneaUy, P.M., Naylor, S., Anderson, M.A., Tanzi, R.E., Watkins, P.C., Otnna, K., Wallace, M., Sakaguchi, A., Young, A., Shoulson, I., Bonilla, E., and Martin, J.B. (1983) Nature 306, 234-238. 14. Hille, B. (1984) Ionic Channels of Excitable Membranes, Sinauer Associates, Inc., Sunderland, MA. 15. Kallen, R.G., Sheng, Z.-H., Yang, J., Chen, L., Rogart, R.B., and Barchi, R.L. (1990) Neuron 4, 233-242. 16. Kandel, E.R., Schwartz, J.H., and Jessel, T.M. (1991) Principles of Neural Science, 3rd ed. New York: Elsevier, p. 116. 17. Kayano, T., Noda, M., Flockerzi, V., Takahashi, H., and Numa, S. (1988) FEBS Lett. 228, 187-194. 18. Kozak, M. (1984) Nature 308, 241-246. 19. Krainer, A.R., and Maniatis, T. (1988) Transcription and Splicing, IRL Press, Oxford, pp. 131-143. 20. Maue, R.A., Krainer, S.D., Goodman, R.H., and Mandel, G. (l990)Neuron 4, 223-231. 21. McClatchey, A.I., Vanden Bergh, P., Pericak-Vance, M.A., Raskind, W., Verellen, C , McKenna-Yasek, D., Rao, K., Haines, J.L., Bird, T., Brown, R.H., Jr., and Gusella, J.F. (1992a) Cell 68, 769-774. 22. McClatchey, A.I., Trofatter, J., McKenna-Yasek, D., Raskind, W., Bird, T., Pericak-Vance, M., Gilchrist, J., Arahata, K., Radosavljevic, D., Van den Bergh, P., Haines, J.L., Gusella, J.F., and Brown, R.H., Jr. (1992b) Am. J. Hum. Genet. SO, 896-901. 23. Ptacek, L.J., George, A.L., Jr., Griggs, R.C., Tawil, R., Kallen, R.G., Barchi, R.L., Robertson, M., and Leppert, M.F. (1991a) Cell 67, 1021-1027. 24. Ptacek, L.J., Tyler, F., Trimmer, J.S., and Leppert, M. (1991b) Am. J. Hum. Genet. 49, 378-382. 25. Ptacek, L.J., Trimmer, J.S., Agnew, W.S., Roberts, J.W., Petajan, J.H., and Leppert, M. (1991c) Am. J. Hum. Genet. 49, 851-854. 26. Ptacek, L.J., George, A.L., Barchi, R.L., Gnggs, R.C., Riggs, J.E., Robertson, M., and Leppert, M. (1992) Neuron 8, 891-897. 27. Noda, M., Ikeda, T., Suzuki, H., Takeshima, T., Kuno, M., and Numa, 5. (1986) Nature 322, 826-828. 28. Noda, M., Ikeda, T., Kayano, T., Suzuki, H., Tashima, H., Kurasaki, M., Takahashi, H., and Numa, S. (1986a) Nature 320, 188-192. 29. Rogart, R.B., Cnbbs, L.L., Muglia, L.K., Kephart, D., and Kaiser, M.W. (1989) Proc. Natl. Acad. Sci. USA 86, 8170-8174. 30. Rojas, C.V., Wang, J., Hoffman, E.P., Powell, B.R., and Brown, R.H., Jr. (1991) Nature 354, 387-389. 31. Sambrook, J., Fritsch, E.F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. 32. Salkoff, L., Butler, A., Wei, A., Scavarda, N., Giffen, K., Ifune, C , Goodman, R., and Mandel, G. (1987) Science 237, 744-749. 33. Suzuki, H., Beckh, S., Kubo, H., Yahagi, N., Ishida, H., Kayano, T., Noda, M., and Numa, S. (1988) FEBS Lett. 228, 195-200. 34. Tempel, B.L., Papazian, D.M., Schwaiz, T.L., Jan, Y.N. and Jan, L.Y. (1987) Science 237, 770-774. 35. Wang, J.Z., Rojas, C.V., Schwartz, L.S., Nicholas, H., and Hoffman, E.P. (1992) Biochem. Biophys. Res. Commun. 182, 794-801. 36. Zuker, M., and Steigler, P. (1981) Nucleic Acids Res. 9, 133-148.

The genomic structure of the human skeletal muscle sodium channel gene.

Electrical excitability of neurons and muscle cells reflects the actions of a family of structurally related sodium channels. Mutations in the adult s...
712KB Sizes 0 Downloads 0 Views