Proc. Nail. Acad. Sci. USA Vol. 89, pp. 9804-9808, October 1992 Biochemistry
Cloning and expression of chicken erythrocyte transglutaminase (protein-glutamine -glutamyltransferase/mRNA expresdon/erythroid development)
N. WERAARCHAKUL-BOONMARK, J.-M. JEONG, S. N. P. MURTHY, J. D. ENGEL, AND L. LORAND Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University, Evanston, IL 60208-3500
Contributed by L. Lorand, July 29, 1992
We report the sequences of cDNAs encoding ABSTRACT chicken erythrocyte transglutaminase (EC 2.3.2.13). The complete mRNA consists of 3345/3349 nucleotides and predicts a single open reading frame. Nine peptide sequences derived from partial digests of the isolated protein agreed with the corresponding translation of the open reading frame. Approximately 60% identities between the avian protein and three related mammalian enzymes were found. Chicken erythrocyte transglutaminase mRNA is most abundant in red blood cells and kidney, and it accumulates during erythroid cell differentiation.
MATERIALS AND METHODS
Cytosolic transglutaminases (protein-glutamine y-glutamyltransferases, EC 2.3.2.13) were first described by Waelsch and collaborators in the 1950s (1) and were assumed to be primarily involved in the metabolism of amines (such as histamine) by conjugation to proteins (2). These Ca2+dependent thiol enzymes are widely distributed in vertebrates and are also present in a number of invertebrates [e.g., sea urchin egg, Homarus hemocyte, Limulus amoebocyte (3), Homarus muscle (4), and marine sponge cells (5)], some of which served as a rich source for their purification. The intracellular function of transglutaminases is now thought to be the posttranslational crosslinking of proteins, triggered perhaps by a substantial increase in free Ca2+ (approaching 0.1 mM), the appearance of an activating metabolite, or the removal of an inhibiting substance (6). Significant amounts of N6-(-y-glutamyl)lysine, the product of the transglutaminasemediated crosslinking of proteins, were identified in a variety of cellular structures [e.g., in membrane skeletal polymers in human red blood cells (7, 8), in the cornified envelope of human keratinocytes (9), in polymers from human lens cataracts (10), and in apoptotic bodies of degenerated liver cells (11)]. Membrane-bound variants of intracellular transglutaminases have been identified in keratinocytes (12). Secreted forms of transglutaminases participate in the clotting of seminal vesicle secretory proteins of the prostatic fluid of rodents (13-16) and in the clotting of Homarus and Limulus blood (3, 17, 18). One of the subunits (designated A) of the factor XIII zymogen (fibrin-stabilizing factor) circulating in human plasma also belongs to this class of gene products (19, 20). Transglutaminase was shown to become activated in sea urchin eggs soon after fertilization (21, 22) and in A431 epidermal carcinoma cells exposed to epidermal growth factor (23), and it is expressed in induced murine erythroleukemia cells well before the appearance of hemoglobin (24). This paper deals with the identification, isolation, and sequence analysis of cDNAs encoding chicken red blood cell transglutaminase* and shows that the mRNA for this protein is present only in very low amounts in embryonic erythroid cells or in retrovirally transformed erythroid progenitor cells but increases significantly during erythroid cell maturation.
Purification and Peptide Sequencing of Chicken Red Blood Cell Transglutaminase. Erythrocyte transglutaminase was purified from chicken blood (collected in heparin; Pel-Freez Biologicals) to apparent SDS/PAGE homogeneity by David Schilling, using slight modifications of the procedure described for human red cells (25). To obtain partial peptide sequences, purified protein was digested with 100:1 weight ratios ofeither Staphylococcus aureus V8 protease, endoproteinase Lys-C (Boehringer Mannheim), L-1-tosylamido-2phenylethyl chloromethyl ketone (TPCK)-treated trypsin (Worthington), or 7-amino-1-chloro-3-tosylamido-2-heptanone ("Na-p-tosyl-L-lysine chloromethyl ketone," TLCK)treated a-chymotrypsin (Sigma). Some of the peptides were purified by reverse-phase (C3 column) HPLC. Other fragments were separated by SDS/PAGE (26) and were electroblotted onto poly(vinylidene difluoride) transfer membranes (27) prior to sequencing (Applied Biosystems model 177A; Northwestern University Biotechnology Facility). Screening of cDNA Libraries. Two types of cDNA libraries were used to isolate the chicken erythrocyte transglutaminase cDNA clones. The BV4 cDNA library in Agtll was derived from poly(A)+ RNA isolated from a pool of 11-dayold chicken embryos (28), whereas the B21 cDNA library in AZAPII vector was derived from poly(A)+ RNA isolated from erythroid cells of 13- to 14-day-old B21 chicken embryos (29). Recombinant cDNAs were identified by immunoscreening (30) of the BV4 library with a rabbit antiserum raised against purified chicken erythrocyte transglutaminase. After treatment with dithiothreitol, the protein was subjected to SDS/PAGE, and rabbits were injected with the gel slice corresponding to a band of Mr 78,000 (using Freund's complete adjuvant for initial injection and Freund's incomplete adjuvant for subsequent injections). The antiserum recognized chicken erythrocyte transglutaminase preferentially in both the native and the denatured form; it crossreacted with human red cell transglutaminase and with guinea pig liver transglutaminase but did not recognize human factor XIII subunit a or human erythrocyte membrane protein 4.2. Positive clones were plaque-purified, and phage DNA was isolated (31). The BV4 library was then screened successively by plaque hybridization using as probes either the EcoRI-Sac I [297 nucleotides (nt)] cDNA fragment of clone 27c (Fig. 1), the 54 nt of DNAsynI (5'-GAATTTGGGGTTGATGTCCAGCATCTCGAGGCAGATGGCCAAGATCTCATCTTC-3') complementary to nt 1034-1087 in the compiled cDNA sequence (Fig. 2), or the EcoRI-Kpn I fragment (374 nt) from clone NW1. The B21 library was screened using the synthetic 21-base oligonucleotide TG1 (5'-GTCTCCAGCACCAGCTCTTCG-3', complementary to nt 484-504 in the cDNA sequence; Fig. 2) and the EcoRI-Kpn I fragment (374 nt) as probes. The positive clones derived from hybridization screening were amplified by PCR using an internal cDNA
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.
Abbreviations: aa, amino acid(s); nt, nucleotide(s). *The sequence reported in this paper has been deposited in the GenBank data base (accession no. L02270).
9804
EcoRI
Kpnl 1
5.1
1
Xhol 1
BamHl
I
Ball
Sad
==
3
Ckrbctg
Ball
EcoRI-Sacl (297 nt)
I kb 4-
4-
4-
-
18 TACTGTGAGACGCAGGCTAAGGAGAATGCCTACGATCTGGAGGCCAACCTGGCTGTGCTGAAGCTGTACCAGTTC 93 AACCCCNCCTTCTTCCAGACCACAGTGACGGCGCCAGATCCTGCTGAAGGCCCTCACCAACCTCGCCCCACACTGA 168 CTTCAGCCTTTGCAAGTGCATGATCGACCAGGCCACCAGCAGGACGGCCATCCGCCAGATCCTGTACCTGGGGGA 243 GCTGCTGGAGACGTGCCACTTCCAGTCCTTCTGGCAAGCTCTGGATGAGAACATGGAGCTGTTGGATGGGATTGC 319 TGGTTTTGAGGACTCTGTCCGAAAATTCATCTGCCACGTGGTGGGTATCACATACCAACACATCGACCGATGGCT 393 GCTGGCTGAGATGCTGGGGGACCTCTCAGAGGCACAGCTGAAGGTGTGGATGAGCAAAT ATG GCT GGA CGG M
27 C 464
4-
521
-
DNAsynl
154
nt)
D2040
.
4--..
(21
----b
P
GAG
CGC
NW1 0
ACC T
ATG
G SAT
GGC
CGC
N
G
GAC
GGG
578
CGG
R
R4
GGG
G7G
G
V
692
GGC
ACC T
AGG
G
749
GTG
TTG
GAG
GAG
V
L
0
0
806
GGC
CGC
863
AGC
nt)
1
CCG
G
635
EcoRI-Kpnl (374 nt) TG1
GGA
E 4,-
4-
CGGCAACGGTGGAGCGC
1
EcoRI
I I
A
CGC
R TTG
GGG
GAG
G
92G
NW3
977
GTG
TAG
CCC P
AMA CTC
GAG
KI
L
TCG CAC
R
CTG
CCC
GAG
CCC
V
G
R
GAG
GTC
GGG
L
G
F
.*---- TG6 121 nt)
L
V
L
GAG
GCC GMA GAG A E E CAC CGG ACG
GAG
GAG
A7G
E
R
T
E
E
M
ATC
ACC
M
TTC
ACC T
I
T
L
TTC
GAC
GTG
GAG
A
F
D
V
E
TTC
ACC
F
T
TAC
E 1091
CCC
GAG
1148
R GTG
AIG
N
AAT
GAG
N
H
TAT
SAC
L
TiC
GCC
CTG GTG L
GGA
CGG
GGC
TAG
V GAG
G
R
G
Y4
E
E
TGT
CCC
GTT
GAG
ACG
TCA
C
P
V
E
T
GAG
GAG
GGG
ACC
TCG
CTC
T14C
TCC
L
C
TGG
AGT
GCA
AGC
ATT
G
A
T
L
C
V
CCC
CTG
ACG
CTG
GAG
GCC
TAC
CAG
L
T
L
E
A
S
T
G
Q
G
TTG
GTT
CTG
CTC
TTC
AAT
GCC
TGG
GAG
CCA
GAG
SAT
F
V
L
L
F
N
A
W
H
P
E
CGT
GAG
TAG
GTG
CTG
GAG
GAG
V
L
E AGC
TCC
AGA
TGC
GTG
GAG
ATG
TGT
C
L
E
ATA
I ATG
TGC TCC CGT CGC C S R R AAT
V
N
C
N
CAG
SAT
CGC
ATG
ACC
MAT
GAG
N
ATC
GCA
M
A
GAG
GTC
ATC
CCC
TGG
AAC
W6
N
F
ATC
AAC
CCC
N
P
K
TAT
ATT
CGC
CCT
GTC
TIC
V
TCC 136 GGA A
F CTC
F
L
ATC
CCC
G
L
193
202
ASS
G75
R
V
231
W6 ATE
250
CCC
R
L
AGC GTG S
174
TTT
Q TTC
G
MAA
155
GGG CTC
G
V
W6
GGC
GAG
P
98
117
L
GAG
P
P TGG
TGC
GGG
A
P
C TCGGGTCCTGCT CCC
CCA
TTG GGC
CCC
L
GASGSACGAG
G
TCC ACG
60 79
CCC
7
41
GAG
A CCC
P
CGG
V
F
CCC
22
GTG
GGG
N
GTA
7CC
ASS
GAG
G
3
GCC C
C
AAC
TGG AGG
GAG
P
L
TGC
GTC
E
AAG AGS
1262
SAC
269
V
AAG4A~ ::C4?>9l 288
1319
1376
OCT
ICC
ITS
F
A
A
V
A
C
AAC
TCT
CCC
GAG
A
H
ACG
AAC
1433
ASG
GAC
1490
TGC
E GTC
TAG
N
ACG
CCC
CAG
1604
1661
1718
1775
1832
1889
1946
GCG CTG A
L
CC?
OTT
P
V
C GAG
AGO
P
V
K
GAG
ACC T
SAGCGIG rGC AGG
R
R
GIG
GCT
P R CCCCA GAGCASAM
CCC
GAG
K
A
7CC
CTG
C
L
AACGGSC
AAC
N
7CC
AC
SAG
CTG
GSAC
CGA
L
A
SIC
GTG
ASC
CAA
P
CTG
GTG
ATC
L
V
ATS
ATC
CGT
SCA
P
G
STT V
CGT
CGA
GTG
L
CTC
GAG
L
H
ATG M
GTC
TTC
GCT
7TC
V
F
CAC
SAC
CGC
TCC GCC C
GAG
CTG CGT
2117
CTC
TAT
L
2174
CTC
2231
CGA
F
I
6CC
0CC
CTG
ACT
GAS
TAT
CCA
GAS
ATC
N
P
GTC
GCA
SAC
V
A
GTC
CTC V
V
2402
CTC
SAC
CGTCGAG L
H
L
GAG
V
AAC
SAG
GAO
TAG
CMA GGC
CTC
L
N
SAG
AAC
SOC
GAS
GTC
N
P
H
CTG
6CC
P
N
L
D
V
V
A
ATT
GGG
SAC
CCA
GAG
ACT
GAG
ATG GMA CGA
CCC
AAC
6CC
SAC
Q
D
GTO
L
V
N
P
GCC
GGC
CTC
ACT
GAS
GAS
ASS
GAG
0CC
AAG
ATG
SAC
A
K
TTC CGG F R GAS
GAG
CGT L
N
GCA SAGCGCA A
TTC
GAG
TGC
F
H
C
CTC
MGC
SAC
CTG
F
V
6CC
F
K
CTG
6CC
GTC
F
V
SAG
CTC
TCT
SAG
COT
SAG
CCC
7CC
CGA
CAG
SAG
CCC
440
459
GAA TTC
L
R MAG
C
F
CCC
TTG CG1C ATC L R CTG GTG GCT
P
G67
V
GTC V
SAC
CCC
K
TAG
Y
I
AAC
7CC
TTC
N
C
F
K
SAT
GAG
ICA
R GGA GTG
Q MCG
ICC
V
L
K
592
611
630
CCT P
CCC
P
554
ATC
R
GAS
516
535
AAG CTG
E
SAGCGTT
497
573
V
649
666 TAG
Y
AAC GTC ATC ATC GCA CCC CTG CCC AAG TGA GGCCCCCCCGAGCCCCCACCCTGCTCCAGCC P K N V A P L R 2580 CTGGGCM7OGCTGCGAAACAAAGCCATAAGCCTTAGCCCMACCTGCACCMSCCGCATCCSACCCCGCACCTCCG 2655 ACTAOCCCACTG4CCACCACAGCCCCTCTCCATCACTGCCACTCCCAGCCGGCCGGGGGACAGTGACMACTGTGGTAC 2730 TGCAGAC9TGCCACCGGCTGAGCCATGCCTTC7CCTCCTCTCCCTCCCCATGGACCTCGACGCAIGCTCCAGCTCGO 2805 GGCCCTC CTGCAGCASGCCAGCAGAMCCGTCOGTCATTTCTGCAGCTCTAAACACGCCGOCCCCGACGCAGAGCAC 2680 9TAACGCAMACATAGAMAGACAATCTTCTCCTGCACCCGATGGCGCAGCAGCAGCACTCATGCCTGCCAG 2955 TTAAAATGCTCTGAATGCAATTTCCTAGAGAAMCATCGTATACTGTGAGCACGMAGCTGTTTATATGCTATATA 3030 CACATATAIGATAMATCTATTTATAGCTCTATMAATACATACTGCCGAGAACCCTGCTGCATAGGTMSGCAGGTT 3105 67771GCTTTMTAACTGTTCCTGTCACCGCAGAGATCCTGCCCTGAGGAATGCGACATTGCCTTGATGCGAATTA 3180 ACTATGCAAGCACGT ACCTCTATTTTTCACTTTTTAMAGCAAAAAMAAAAAAAAAAAA~k
2516
421
MCG
TTC
L
L
383
402
CGT
7CC
TCC
K
CCC
364
478
ATC
MAC
345
MSG
F
C
GTC
326
GCC
V
N
0
TTC
ACC
K
R
ATC
TIC
L
ATC
L
A
K
R
SAC
AAC
SAG
A7GCAGA CTC
P
K
T
M
P
L
N
ACC
3147
CTG
Y AAC
V
P
L
V
6CC
MAC ATG
GTG GCC
GTC
L CCA
GT
TAG
V
W
CCGC AGA
0
GAG
CCC
GTC
TACGSAT GGG TIC CAA Y TIC TCC CCG GCCC
P
CAC
SAC
V
TGG
R
V
CTG
ATG
SAGC 14CC
CCC
ATC
A
K
AAC
A
R
CMA GCA
MSG CGT
CCC
G
ACG
R
Y C C TACGSAC ATC CCC Y I P CGA ACT GACGGST
OCT
AGG R AAC
ATC
CCC
CAC
K
MAC
TAG
Y
K
I
7CC
N 6CC
Y
P
V
K
L
Y
L
2345
6CC
L TAG
AAC
L
AGC
A
AAC CTC
L
SAGCAGA
E
L
ATC
A
CTG
ATC
R
CCC
R
A
GIG
CCC... A'GC' "CIT "GT'G"
V
M
K
P
MSG GCC ATC AAG CAG
2060
2459
CCC
SAC
ATG
ACT
GCG GAC CTC MAT CCT GACGCTG GTG TAG TOO ATC GTT V W V F A N A V V D AAGAC ATC AGC MCG AAG ASC ACC GAG 7CC TEA GTG GTG SGCG H V V K N K K AGG GAG AGC CGC GAG GAG ATC ACC GAG ACC TAG AAG TAT CCC P H R GAO CGA SAG GTG TTC AGC MSG GCG GAG GACGSAC AAG AGC 7CC V F K A H K1
L
2298
ATC
TIC
V
1.547
GCG TGC
TTC
Y
2003
sequence and a primer within the A vector. Clones that contained the longest 5' inserts were then plaque-purified and further characterized. DNA Sequencing. All independent cDNA clones were subcloned into pGEM-3Z(+) or pGEM-7Zf(+) (Promega) or pBluescript SK(+) (Stratagene). Plasmid DNA was sequenced (32) using 2'-deoxyadenosine 5'-[a[35S]thio]triphosphate and Sequenase (United States Biochemical). Occasionally the Taq polymerase sequencing system (Promega) was employed. For three clones (D2040, NW1, and NW3), nucleotide sequence determinations were derived from both DNA strands by using a combination of random deletion subcloning and synthetic oligonucleotide priming. Conveniently located restriction sites (Fig. 1) were used to prepare most subclones. Additional subclones were prepared using exonuclease III (33). Three other clones (27c, 30, and 31a) were sequenced for their full length, but only on one DNA strand. The nucleotide sequences were assembled and analyzed with the IBI/Pustell sequence-analysis software. Northern Blot Analysis. Samples of poly(A)+ RNA (2 ,ug) from a variety of chicken tissues and cell lines were denatured, electrophoresed in a 1.3% agarose gel containing 0.2 M formaldehyde (34), and transferred to a nitrocellulose membrane (GeneScreenPlus, DuPont). The blot was hybridized with a 32P random-labeled EcoRI-Sac I cDNA fragment (297 nt) from clone 27c and washed as described (35). The blot was stripped and rehybridized to a mixture of cDNA probes encoding the chicken ,B-actin and erythrocyte band 3 proteins in order to determine the integrity, size, and relative amounts of the mRNA samples. The band 3 hybridization signal was used to check the degree of erythroid cell contamination of the various nonerythroid chicken tissues. Primer Extension Analysis. The 21-mer TG6 (5'-GAACTGGTACAGCTTCAGCAC-3'), complementary to nt 72-92 of
SAG
AAC CTG
GCC ATG A
V
1205
CGA
D
AGC
G
CTG CAG
TCT
R SAG
GIG
GAG
GAG
A
ATG
GAT
C
CTC
L
TAG
TGC
T
T
14CC
ATC
ACG
E
ACG
TTG
K
ACC
AAC
ACG
GAG
L
Y GAG
T75
GTG C7G
GCC
SAT
GGC
L
CTG
F
ATG
AAG
Y
GTG
GCC
GMAGA GAG GAS A E TCC ASS SAG TAG
CTG
V
:1034 GMA
FIG. 1. Restriction map and sequencing strategy for the chicken red blood cell transglutaminase (Ckrbctg) cDNA clones. The shaded box represents the coding region, and the unshaded boxes show the 5' and 3' untranslated regions. Originally, clone 27c was isolated by immunological screening (30) of a Agtll cDNA library prepared from 11-day-old chicken embryos (BV4; ref. 29) using an antiserum raised in rabbits against purified chicken red blood cell transglutaminase. D2040 was isolated by using an a-32P random-labeled EcoRI-Sac I (297 bases) cDNA fragment of 27c as probe; NW1 was isolated using a y-32P end-labeled oligonucleotide, DNAsynI (54 nt), as probe. NW3 was isolated from a AZAPII cDNA library (B21; ref. 29) prepared from erythroid cells of 13- to 14-day-old chicken embryos by using an a-32P random-labeled EcoRI-Kpn I (374 nt) cDNA fragment of NW1 and a y-32P end-labeled oligonucleotide, TG1 (21 nt), as probes. Solid arrows indicate the extent and direction of DNA sequencing for each strand of the individual cDNA clones; dotted arrows indicate the relative position and orientation of synthetic oligonucleotides. Restriction sites employed for subcloning and DNA sequencing are indicated. kb, Kilobase.
9805
Proc. Natl. Acad. Sci. USA 89 (1992)
Biochemistry: Weraarchakul-Boonmark et al.
687
COG
FIG. 2.
697
Nucleotide and deduced amino acid sequence of chicken
red blood cell
transglutaminase
cDNA clones
(Fig. 1)
were
cDNA.
Sequences of the overlapping by the dideoxynucleotide
determined
technique (32). Nucleotide residues are shown in the 5' to 3' beginning at the 5' end of recombinant NW3 (Fig. 1). The sequence reveals a single open reading frame of 2094 nt [698 amino acids(a) flanked by 451 nt at the 5' end and by 700 nt at the 3' end. The stop codon is indicated by an open circle. The consensus polyadenylation signal AATAAA is located 26 nt 5' to the poly(A) tail. The pentapeptide sequence containing the active-site Cys is shaded. Matching amino acid sequences of peptides isolated from partial proteolytic digests of purified chicken red blood cell transglutaminase (Materials and Methods) are underlined. A peptide isolated from the endoproteinase Lys-C digest of the protein (PNLHGPEILDVP), and three other peptides obtained in very low yields, did not match the cDNA-derived sequence. orientation
the chicken transglutaminase cDNA, was 32P-end-labeled by polynucleotide kinase. RNA from MSB-1 lymphoid cells (36) or adult chicken (definitive) reticulocytes was hybridized to
9806
Proc. Natl. Acad. Sci. USA 89
Biochemistry: Weraarchakul-Boonmark et al.
this oligonucleotide, and cDNA was synthesized by avian myeloblastosis virus reverse transcriptase (37). To accurately assess the position of the mRNA cap site (the 5' end of the primary transcript), the primer extension products were coelectrophoresed in a DNA sequencing gel directly adjacent to a "ladder" produced by dideoxy sequencing of clone NW3 with the same primer.
RESULTS AND DISCUSSION Isolation of Chicken Transglutaminase cDNA Clones. Immunoscreening with a rabbit antiserum raised against chicken erythrocyte transglutaminase allowed the initial identification of 20 clones in the BV4 cDNA library, and these were plaque-purified. Three clones (27c, 30, and 31a) harboring the largest fusion proteins (on Western blots) were further analyzed and found to have overlapping nucleotide sequences (Fig. 1). Clone 27c contained an insert of 1704 nt, whereas clones 30 and 31a contained overlapping inserts of 270 and 250 nt (data not shown). The 1704 nt of 27c represented a single open reading frame followed by a stop codon and an untranslated sequence of 678 nt at the 3' end. This clone was confirmed to be an authentic cDNA segment encoding chicken erythroid transglutaminase by matching to peptide sequences derived from a partial digest of the purified protein. A DNA fragment (297 nt, EcoRP-Sac I; Fig. 1) corresponding to the 5' end of clone 27c was then used to rescreen the same library, resulting in the identification of four more positive clones. One of these (D2040) contained 3446 nt with an open reading frame from nt 1219 to nt 2746 (Fig. 1). This clone encoded 509 aa, including the conserved pentapeptide around the active-site Cys (38) for transglutaminase (aa 283-287 in Fig. 2) as well as all the 3' sequence encoded within 27c. Eight peptide fragments from the purified enzyme [aa 189-199 (chymotrypsin digest), 250-270 (trypsin digest), 391-409 (S. aureus V8 protease digest), 414-424 (chymotrypsin digest), 438-456 (endoproteinase Lys-C digest), 465480 (chymotrypsin digest), and 472-491 and 658-669 (endoproteinase Lys-C digest) as indicated in Fig. 2] matched with segments of the conceptually translated nucleotide sequence. A sequence of 1218 nt at the 5' end of this clone (Fig. 1; dashed line on the map) was subsequently found to represent an unspliced intron of 647 nt, as well as additional 5' cDNA coding sequence of 571 nt. After screening of the same library with a 54-base oligonucleotide (DNAsynl, Fig. 1), two additional clones were obtained of which the longer (NW1) was partially sequenced (a total of 1250 nt; of these, 1244 nt matched the 5' end of D2040, corresponding to aa 13-426 of Fig. 2). Unlike D2040, the recombinant NW1 contained no intron, but in conceptual translation NW1 was found to be only 6 nt longer than D2040. An EcoRI-Kpn I fragment (374 nt) of clone NW1 and oligonucleotide TG1 (see Materials and Methods) were next used as probes to screen a different (B21) cDNA library (29). Sixty independent recombinants were isolated, and 10 of these were further characterized by PCR. Three clones were found to contain longer inserts than NW1; the longest one, NW3 (484 nt longer than NW1) was subcloned, and the 5' end was sequenced (a total of 856 nt; of these, 372 nt matched the 5' end of NW1, corresponding to aa 11-134 of Fig. 2). NW3 contained the Met initiation codon preceded by an untranslated region of 451 nt. Nucleotide Sequence of cDNA and the Deduced Amino Acid Sequence. The nucleotide sequence of the cDNAs coding for chicken erythrocyte transglutaminase was constructed from overlapping sequences of cDNA clones D2040, NW1, and NW3 (Fig. 2). The compiled sequence contained 3245 nt with a single open reading frame beginning with an ATG initiation codon (nt 452-454) and ending with a TGA stop codon (nt
(1992)
2546-2548), predicting a coding sequence of 698 aa. Analysis of NW1 and NW3 provided further results matching one peptide sequence (aa 17-30) isolated from the V8 protease digests of purified chicken erythrocyte transglutaminase. Two ATG triplets were found, at nt 452-454 and 479-481. According to Kozak's rule (39), either of these could serve as an initiation codon; the latter exactly fits the consensus (ACCATGG) as the translation initiating site, while the former is divergent (AATATGG). However, the location of a purine residue at position -2 has been shown to have a