Gene, 112 (1992) 197-204 © 1992 Elsevier Science Publishers B.V. All fights reserved. 0378-1119/92/$05.00
Cloning of cDNAs encoding human caldesmons (Recombinant DNA; actomyosin regulation; exon skipping; actin; calmodulin; myosin; tropomyosin; smooth muscle; human aorta)
Mary Beth Humphrey, Haydee Herrera-Sosa, GabrielaGonzalez, RobynLee and JosephBryan Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030 (U.S.A.) Received by T.D. McKnight: 22 September 1991 Revised/Accepted: 15 November/22 November 1991 Received at publishers: 7 January 1992
Caldesmon (CDM) is a potential actomyosin regulatory protein found in smooth muscle and nonmuscle cells. Domain mapping and physical studies suggest that CDM is an elongated molecule with an N-terminal myosin/calmodulin-binding domain and a C-terminal tropomyosin/actin/calmodulin-binding domain separated by a 40-nm-long central helix. An 1100-nucleotide (nt) cDNA probe encoding the C terminus of avian caldesmon (aCDM) was used to screen a human aorta library and clone smooth-muscle and non-muscle CDM-encoding cDNAs (CDM). The human (h) smooth-muscle hCDM is 3050-3630 nt long, having variation in length in the 3'-untranslated region. The predicted hCDM protein has a high degree of identity, > 90%, to aCDM in the N- and C-terminal-binding domains. The central helical domain is more variable, but retains characteristic repeated peptides and an 'i, i + 4' acidic/basic amino acid (aa) motif found in aCDM which can form intra-helical salt bridges to stabilize the central helix. The predicted smooth-muscle protein is 793 aa long (93 262 Da) with a calculated pI of 5.75. As is the case for the chicken, nonmuscle hCDM is missing the central helical domain, 256 aa overall. Our nonmuscle clone is not full length, but the C-terminal end is identical to the smooth-muscle form. If the N-terminal domain is identical, as it is in the chicken, the predicted protein is 537 aa (62 558 Da). Examination of the 'junctions' at either end of the deleted central domain gives a clear indication of the splice sites and suggests that the nonmuscle form is generated by exon skipping. The results suggest the CDMs are a small family of highly conserved proteins probably derived from a single gene.
Caldesmon (CDM) is an actin- and myosin-binding protein found on thin filaments in smooth-muscle and nonmuscle cells (Marston and Redwood, 1991; Sobue and
Correspondence to: Dr. J. Bryan, Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030 (U.S.A.) Tel. (713)798-4007; Fax (713)790-0545; email: [email protected]
Abbreviations: aa, amino acid(s); aCDM, avian CDM; bp, base pair(s); eDNA, DNA complementary to RNA; CDM, caldesmon; CDM, gene (DNA) encodingCDM; hCDM, human CDM; kb, kilobase(s)or 1000 bp; nt, nucleotide(s); ORF, open reading frame(s); SDS, sodium dodecyl sulfate; SSC, 0.15 M NaCI/0.015 M Na3 .citrate pH 7.6.
Sellers, 1991). It has been implicated in the regulation of actomyosin interactions because it inhibits superprecipitation of actomyosin (Sobue et al., 1981), will inhibit the actin-activated ATPase activity of myosin (Da,browska et al., 1985; Ngai and Walsh, 1985; Smith and Marston, 1985; Sobue et ai., 1985) and myosin fragments (Lash et al., 1986; Chalovich et al., 1987), and can crosslink actin and myosin filaments (Lash et al., 1986; Chalovich et al., 1987). Understanding this regulation in detail requires knowing how CDM interacts with several proteins. Toward this end we cloned and sequenced aCDMs (Bryan et al., 1989; Bryan and Lee, 1991; Hayashi et al., 1991). The primary structure of CDM has provided a framework for interpreting the extensive domain-mapping experiments done on gizzard CDM. Physical studies and mapping data indicate CDM
198 has binding domains for actin (Sobue et al., 1981; Marston and Smith, 1985), myosin (Hemric and Chalovich, 1988; Ikebe and Reardon, 1988), calmodulin (Sobue et al., 1981; Marston and Smith, 1985), S-100 (Fujii et al., 1990) and tropomyosin (Graceffa, 1987; Fujii et al., 1988; Horiuchi and Chacko, 1988). These domains are located primarily on the N- and C-terminal thirds of the molecule (Marston and Redwood, 1991; Sobue and Sellers, 1991; and references therein). In addition, the available data suggested the presence of a variable length central helical domain which is deleted in nonmuscle CDM (Bryan et al., 1989; Bryan and Lee, 1991; Hayashi et al., 1991). To confirm this general picture we have cloned and sequenced smooth-muscle and nonmuscle hCDMs. The sequence in the N- and Cterminal-binding regions is nearly identical to that found in aCDM. The middle region is more available, but has the hallmarks of the chicken sequence, a repeating peptide motif and an extensive 'i, i+ 4' pattern of acidic and basic aa with the potential to form stabilizing intra-helical salt bridges (Marston and Redwood, 1991; Wang et al., 1991a). The general conclusion is that vertebrate CDMs are a small family of proteins with highly conserved N-terminal myosin/calmodulin-binding domains (Riseman et al., 1989; Wang, 1988; Katayama et al., 1989; Velaz et al., 1990) and C-terminal tropomyosin/actin/calmodulin-binding regions (Szpacenko and Da,browska, 1986; Fujii etal., 1987; Yazawa et al., 1987; Bartegi et al., 1990; Takagi et al., 1989; Wang et al., 1991b; Watson et al., 1990a,b) separated by an extended helix of variable length. In the nonmuscle form, the helix is short or missing, while in the smooth-muscle forms it is approx. 40 nm in length.
RESULTS AND DISCUSSION
(a) General results of screening A human aorta library in 2ZAPII was screened with an l l00-nt fragment coding for the C-terminal actin/ calmodulin-binding domains of aCDM (Bryan et al., 1989). Approximately 35 overlapping clones were recovered in the primary screen, or in subsequent screens with 5'-end fragments, and further characterized. Sequencing indicates that the longest complete eDNA, obtained from several overlapping clones, is about 3630 nt. This contains a 2379-nt ORF encoding a 793-aa protein with a high degree of identity to avian gizzard CDM. One partial cDNA clone was recovered which overlaps the smooth muscle form at its 3'-end and partially overlaps the 5'-end, but is missing 768 nt from the central region. This deletion specifies 256 aa, including the helical domain, and is 24 aa longer than the deletion characterized in nonmuscle aCDM. This clone is incomplete at the 5'-end, missing 147 aa, but by similarity with aCDMs is presumed to be a nonmuscle
form of hCDM. We note that the overlapping sequences between this putative nonmuscle CDM and the longer muscle form are identical as was the case for the aCDM variants. We have not observed the C-terminal variation reported by Hayashi et al. (1991) for chicken brain CDM. We are currently examining hCDM genomic clones and have identified splice sites that suggest the two forms are produced by exon skipping.
(b) Polyadenylation signals There is variability in the length of the 3'-untranslated region ofhCDM. We isolated 15 clones with poly(A) tracts; these could be put into five groups differing at most by about 550 nt. These groups have been designated hCDM1 through -5; each group has at least two members with different 5'-ends indicating they are separate cloning events. These groups are shown in Fig. 1, beginning with nt 3001. There is a polyadenylation signal, AATAAA, beginning at 3028. The poly(A) tracts ofhCDM-1 and hCDM2 begin 15 and 23 nt downstream, respectively, hCDM-3 was probably generated by oligo(dT) hybridization to the internal oligo(A) tract beginning at 3125. Chicken CDM mRNA has a similar A~2 tract in this region that effectively hybridizes with oligo(dT), hCDM-4 may have been generated by oligo(dT) priming from the A-rich region beginning at 3410. Polyadenylation of hCDM-5 is presumably controlled by the degenerate signal, AATTAA, beginning at 3608. hCDM-4 is the nonmuscle clone, while hCDM-5 is a smooth-muscle clone, suggesting that longer 3'-untranslated regions are associated with both the muscle and lionmuscle forms. (c) Deduced aa sequence and properties The ORF, given in Fig. 2, begins at nt 231 with a Kozak consensus sequence. There are two stop codons in the same reading frame immediately upstream at nt 213 and 219; the first downstream stop codon is at nt 2609. The ORF specifies a protein, 793 aa (93 262 Da). The aa composition shares the features of high contents of Glu, Lys and Arg and a low content of aromatic aa with aCDM. The calculated pl is approx. 5.75: the calculated extinction coefficient, ~M 280, is 34610/M/cm. The former was estimated assuming a pK,, for -COOH groups of 4.5. The latter was calculated using ~ o - - 5850/M/cm for Trp and ~ c , - 1340/M/cm for Tyr (Magne et al., 1977). A comparison of chicken gizzard, human aorta and nonmuscle hCDM sequences is given in Fig. 3. Chicken gizzard CDM extends 19 aa beyo~Jd hCDM, but overall is 37 aa shorter, 756 vs. 793. [Note that the numbers will differ by 15 if the total number of aa, 771, for the longer aCDM isoform sequenced by Hayashi et al. (1991) is employed.] In addition to the C-terminal deletion there are seven internal deletions in hCDM, a total length difference,
199 Comparison of 3'-end variants congennus:
3010 3020 3030 3040 3050 C?,~Gg~KT G~T'J~g CAC~?TKTk~ Tk%~GCAg~ ~ G ~ T T ~ . . . .
hell-1 hCDK-2 beef-3 hCI]I-4 bCI~I-5
C&GKAGTCKT GTTGTTTCTG CAGKAGTCAT GTTGTTTCTG CAGP,KGTC&T GTTGTTTCTG CAGAAGTC&T GTTGTTTCTG C&GKKGTC&T (,-z-z~-~-A~TG
GTAKCCCKTTTCKGATTTGA GTAKCCCKTT TCAGKTTTGK GTAACCCKTT TCAGATTTGA G T K A C C C K T TTCAGATTTGA
C&CTTT&TAK TAKKGCKTGG AKGAKKTTAK C&CTTTKTAK TAKKGCATGG AAGAKKTTAT CACTTTATAK TAKAGCKTGG K K ~ K T T A T CACTTTATAK TAAAGCATGG AKGKA~TT~.T CAC-z-a-xATAK TAAAOCKT(~ AAGKKKTTKT
3110 AAT&CTGCAA KKTKCTGCAK KATKCTOCAK AATACTGCKK
3120 TKKTGGTTGT TAKTGGTTGT TAATGG'I'TGT TKKTGGTTGT
3060 3070 3080 A - - K - - - ~ . . . . ~ - K . . . . . . . 11~
AKKAKKKAAK CTTAGTAAKK CTTAGTAGGC CTTAGTAGGC CTTAGTAGGC
CTTTKKAKK~ AKAAK-AK-CTTTKKAKKK AKAKAKAAAK C~-~-xAKAAAK AKAAKGAKTG C~-~-~KKAP,AK KKAAKGAKTG
KAKAKKKAAK AKKAKAKAKA AKTTGTAKCK AKTTGTAACA AKTTGTAACK
AK KA~JUULv,K CTTI"I'TGAAA CTTTTTGAAK C'~-,"~-PI~KKK
-ACTGTTKAG GT&TTACTTT JUt, TACTGTTAAG GT&TTAC~-z-zTACTGTTAAG GTATTAC-A-a-x
3170 3180 3190 3200 3310 3220 3230 3340 ~TOCT GKTGATTCAT A~TKKKffK C K f f A f f A ~ ff&~TOKCA ~ & C T ~ I ~ "a-a-L-a-a-xK~T "~-~-.L-X-a-x ~wlwFZ~.K~T ~ T ~ q 2 A T ATCTKKKTT& CK~?ATJ?A~ ~AGCTGKCK ~ A C T ~ I L :-z-a-a-a-a'A~T ~L~TT~-z-xz~TCK~T ~KTOK~Z~.KT KTCTKKKTT& C K ~ T K ~ K ~ ~ & ~ O . K GTGGTACTGK ~ K ~ T T~O'Z'~z'z-a-x
OTOOA~L~L'CTTTAOT&OTOA T>&OCOTO AACCACATTT T&OATAACTC AATT&TOT&T OT&TO'X'OC&T &cae&T&~,kc
GTGGATTTCT TTAGTAGTOK TAGT&GCCTG AKCCKCKTTT T&GKTAACT(: AKTTKTGT&T GTKTGTGeKT Ac&eKTKTK¢ GTGGATTTCT TT&OTAGTGA TAGTAGCCTG AKCr.KeATTT TAGKTAACTC AKTT&TGT&T GTKTGT~KT ACACAT&TA¢
A/~cKeKcT& AT~GT&GKAT ~-,-,-,-,-,-~AT OTGCT&GACT ATT&T&TTT& OTAGTATGTC ATTGTAACT& G~eAKT&TCk iklLK~AeACTA ATW~TAWUkT ~-z-a-z-~-~-~A~ GT0eTAOKeT ATTAT&TTTA ~ J ? A ~ T A ~ & ~ T K K e T & OCCKKTA~.I~ I~kKeA~AkCT& &T~TK~KKT ~ ' T T & T ~I~eT&~LeT A~&T&TTT& GTA~&T~J~ & ~ T K K C T K ~CCAKTKTCA
conaeniu;: bCl~M-4 bC/~-$
CAGCTTTTGAAAAA--AAJUk AJL. . . . . . . . ACT&TAAT&T TTC&T&TTTG CCAACAGAAA C&TOGCAGAT AGGTJLTCAAT ¢ & ~ T ' P P ~ KIO~KKKKKK KkKKi~Kg£A CK~CTTTT~K KAAKTTKKKK ~lk~A~.KTC &¢TKTKKTAT ~.KTATTT~ CCAACA~qAK O . K ~ A ~ K T &~TAT~KT
&TGTTTTCAK TGCC~q3KT~K CCTATKKGAK GKAKGTATTG KAKKGKKGKO AGK~A~KC TGTTAG~KGG AOTTGAKKTT &TGTTTTCAK TGCC~KTGA CCT&TKKGAK GAAAGTATTG AAKAGP,AGAO AGATT&GKKC TGTTAGAKGG >TGAAKTT
TTCTAAAAGK CKTAGT&TTT AG-xTxKTAKT TAKATGCKTT CTTGAKGTCC AKAKKAKK~ KAKKAAKKAK A TTCTAP,KKGK CKTAGTATTT AGTTTATAKT TKAKTGCKTT CTTGKKGTCC ~ ~ A
Fig. 1. Comparison of Y-end variants of hCDM. The sequences of five groups of clones isolated from a human aorta cDNA library are shown. Each group had a minimum of two clones with different lengths. The numbering begins at nt 3000 on the complete sequence for the shortest clone, hCDM-I, as given in Fig. 2. The last digits of the numerals are aligned with the corresponding aa. Methods. The human aorta library used to clone these cDNAs was prepared using mRNA, isolated from surgical samples, a gift from Dr. Samuel George, Baylor College of Medicine. The cDNA library was constructed in kZAPII using reagents supplied by Stratagene, Inc. (La Jolla, CA) and was screened with an i l00-nt Hi, dlll fragment encoding the C-terminal actin/calmodulin-binding domain of aCDM. Hybridization was done at 55°C for 18 h in 5 x SSC/I% sodium lauryl sarcosine. Blots were washed sequentially as follows: four times for 15 min in 2 x SSC/0.1% SDS at 20°C; twice for 30 min in 2 x SSC/0.1% SDS at 55°C and once for 30 min in 0.1 x SSC/0.1% SDS at 55°C. The blots were then dried and autoradiographed. Subsequent screening with hCDM fragments was done using more stringent conditions: hybridization at 65 °C, four 15-min washes in 2 x SSC/0. ! % S DS at 20 °C, two 30-rain washes in 2 x S SC/0.1% S DS at 65 °C and three 30-rain washes in 0.1 × SSC/0.1% SDS at 65°C. Plasmid excisions were done following the directions provided by Stratagene, Inc. Sequencing was done using either the M 13 dideoxy or piasmid methods using synthetic oligodeoxyribonucleotides. Cloning procedures were performed as described (Bryan et al., 1989). Sequenase was used following the manufacturer's directions.
additions + deletions, of 63 aa. In the N-~.nd C.terminal regions with greatest identity the majority of the substitutions are chemically conservative. (d) Sequence similarities and structural predictions A comparison matrix of hCDM shows the same centrosymmetric structure exhibited by aCDM (Fig. 4, top panel). The central repeat region is conserved along with the troponin T-like sequence, identifiable as blocks of repeats at the center and on the lower right end of the diagonal, respectively. Secondary structure prediction algorithms (Gamier et al., 1978) indicate that both of these regions are g-helical, in agreement with data for troponin T (Parry, 1981) and with recent spectroscopic (Wang et al., 1991a) and electron microscopy data (Mabuchi and Wang,
1990) for the central region of aCDM. The bottom panel of Fig. 4 is a plot ofthe Gamier-Osguthorpe-Robson helix parameter (Gamier et al., 1978), while the middle panel shows the overall helix, sheet, turn, and coil predictions. Wang et al. (1991a) and Marston and Redwood (1991) have pointed out that the long helix may be stabilized by intra-helical salt bridges between acidic and basic aa flow that follow an 'i, i+ 4' rule. Comparison of aCDM with hCDM (Fig. 3) indicates that, although the number of identical aa is lowest in the center of the molecules, the 'i, i + 4' motif is retained. We estimate there are 32 potential intrahelical salt bridges in the 151 aa between Lys256 and Glu 4°7, greater than 40% of the residues, which could function to stabilize a long central helix in hCDM. There is an additional feature to the arrangement of aa in this region. If, for
o. r3 w r,j 0 ~
oo,-.--o; i~7 ~ :~,.~ ~ o , o . o
.o.,.~o :.., oo-o.o U
,-, r~ I ~
~ r~ ~
~ 0 > E4 m ,~ m r3 o~r3
~ u ~ r3 ~r,= ~ r 3 m E~ w E, m 0
I=E~ w r3
"' ~; :~ ~; ~
. . . .
•1o 0o rq
~ o~~~~~~~~~' ')~!~!~.,., ! t! !o! o,.! .!o .!o,.! .!o,,.! .! .!o...) .! .!o., .! .! .! .! ~
r,.) r~ o r,,9 r~
=, o :, ,.., ,,, ,.., ,: o =, o =,