BIOCHEMICAL
Vol. 178, No. 3, 1991 August
AND BIOPHYSICAL RESEARCH COMMUNICATIONS Pages 1072-l 077
15, 1991
PARTIAL
STRUCTURE OF THE HUMAN H-PROTEIN GENE
HIROHISA KOYATA AND KOICHI HIRAGA THE DEPARTMENT OF BIOCHEMISTRY, TOYAMA MEDICAL AND PHARMACEUTICAL UNIVERSITY TOYAMA
Received
June
18,
SCHOOL OF MEDICINE,
JAPAN
930-01,
1991
showing an SUMMRY In a span of approximately 13 kb, two genomic fragments obvious overlap encoded the 1,192 base pair (bp) cDNA sequence for human Hprotein. Because of the close similarity in the genomic organization comprised of five exons to that for the chicken H-protein gene, this region Primer extension analysis was assigned to the true H-protein gene in human. suggested and Sl protection analysis confirmed that at least one additional exon for the 14 bases-long 5' untranslated region in H-protein mRNA exists as the presumable first exon. None of the 5.0- and 5.5-kb Sac1 and the 5.2-kb EcoRI fragments which were undetectable in the genomes ofpatients with nonketotic hyperglycinemia was included in the true H-protein gene. 0 1991 Academic
Press,
Inc.
Human hydrogen of the
glycine
of the
intermediary
the
stimulatory
this
enzyme
protein
carrier
cleavage
system
product protein
system
as well
and is
as those
nonketotic
hyperglycinemia hyperglycinemia,
(5),
two -EcoRI has demonstrated
the from
the
more
human H-protein
gene.
This
this paper
fact
to the aberrant
that
the
H-protein
by the
issue,
loci
for
the
we attempted
reports gene
1072
found
includes in the
none patient
and as of
of H-
results
in
of patients
with
24L and 24s by us
undetectable
5.5-
copies prevented
aberrations
to characterize
on the organization
fragments
activity
multiple genome
above
carrier
component
cDNA cloned
However, human haploid
the
hyperglycinemia. 0006-291X/91 $1.50 Copyright 0 I991 by Academic Press, Inc. All rights of reproduction in any form reserved.
genomes both
(2)
another
(5).
in
this
is
and T-protein
identified
fragments
components
as the
defective
using
longest
four
and synthesis that
In the
of genomic
into
functions
(3,4),
analysis
existing
verification insight
of its cleavage
structures
EcoRI
one of the
decarboxylase
of the
aberrant
is
decarboxylase
Southern fragments
gene and on the
corresponding
glycine by itself
of glycine
cDNA sequence
accurate
To gain protein
inactive
Sac1 or 5.2-kb
H-protein
the
glycine
Because
in man (1,5,6).
the
and 5.0-kb
(H-protein)
(1).
in
for
nonketotic probes,
protein
(5). the
of the
true
of fragments with
of us
nonketotic
H-
BIOCHEMICAL
Vol. 178, No. 3, 1991
AND BIOPHYSICAL
EXPERIMENTAL
RESEARCH COMMUNICATIONS
PROCEDDRES
Isolation of Genomic Clones The two H-protein cDNA fragments, the 5' 800 bp 24L and 3' 300 bp 245 which are produced at the internal (51, were -EcoRI site nick-translated and were either simultaneously or separately used as probes. Aliquots of a human genomic library constructed by using )\DASH vector (Stratagene Cloning Systems) (7,8) were subjected to a couple of selections of clones including the human H-protein cDNA sequence by the method as described previously (5). This library contained a strongly amplified genomic clone that has a processed H-protein gene. To avoid the selection of this clone, and to obtain a clone encoding the 5' region of the cDNA, a fragment produced at the 5' -EcoRI and *I sites of the 24L cDNA (see Fig. 1 of Ref. 5) was also employed as a probe for the screening. Isolated genomic fragments were subcloned by using Bluescript plasmid vectors (Stratagene Cloning Systems). Dm Sequencinq Nucleotide sequence was determined on the subcloned plasmid by the method of Sanger et al . (9) using 7-deaza-dGTP instead of dGTP (10). Promoter sequences for T7 and T3 RNA polymerases on the vector DNA were used for priming sites. Oligodeoxynucleotides (17-mer in size) identical or complementary to segments of the human H-protein cDNA sequence were synthesized by using a DNA synthesizer, model 381A (Applied Biosystems Inc. Japan, Tokyo), and were also employed as primers. Primer Extension Analysis An oligonucleotide complementary to nucleotides 420 of the cDNA sequence was synthesized, 5' end-labeled, and annealed with human liver total RNA. The H-protein mRNA sequence was transcribed by using reverse transcriptase from Moloney murine leukemia virus (Bethesda Research Laboratory) essentially according to the method of Calzone et al. (11). Sl Protection Analysis A genomic subclone from XHHG102, pHHG102EcoRI/PvuII (see Fig. 1 and 21, which codes for nucleotides 1-188 of the cDNATth a preceding 500-bp genomic sequence, was linearized at the EcoRI site (5' end of this insert). Part of this genomic fragment was replicatxby both E . coli DNA polymerase I Klenow fragment and a 5'-end-labeled synthetic DNA primer complementary to nucleotides 90-106 of the cDNA sequence. DNAs were denatured with 0.2 N NaOH followed by 3.5%-polyacrylamide gel electrophoresis in the absence of urea. The single-stranded and radioactive product (about 600 bases) was recovered by diffusion into 1 ml of 10 mM Tris-HCl buffer (pH 7.5) containing 0.1 mM EDTA. The phenol extracted DNA probe was annealed with human liver total RNA and treated with 5 unit/ml of Sl nuclease. Sizes of DNA fragments were determined by electrophoresis, using a polyacrylamide gel containing 7 M urea, followed by autoradiography.
RESULTS Orqanization
of
cDNA, we have partially
obtained
to the
&HG9
appeared kb in
hybridized
to
with
size)
hybridizes
to both alone
involve
(5). by that
clones
5 '-end
the
segments
This
to 24L.
differ
the
tiHG102
6.4-kb
and 1.3-kb
fragments
6.4-
and 1.3-kb
fragments,
Although
the
7.8-kb
analysis
1073
fragment in
the
sites, genomic
two
fragment is
are
and
4 kb (Fig.
fragments
(7.8
was also
a polymorphic produced.
and 24s to the seems to have previous
insert
These
of approximately
1.3-kb there
by their
and ApaI
cDNA.
AHHG9 by SacI,
of H-protein
which
__ EcoRI
of the
overlap
from
human genome,
Southern
segments
Among them,
an obvious
In the the
between
formed
different
of genomic maps.
downstream
hybridized
245.
consecutive
underestimated
types
fragments
at which fragment
several
to more
Using
Gene
restriction
Among several
and 1.3
H-Protein
cDNA fragment
insert
fragments 1).
True
characterized
hybridized the
The
study,
-Sac1 site The 24L probe 1.3-kb been the SacI
site
Vol.
BIOCHEMICAL
178, No. 3, 1991
AND BlOPHYSlCAL
RESEARCH COMMUNICATIONS
A
*kb
______
-~--~----~-~-“-“--
B
0 1
EmA
ExonD
ExonE
Organization of the human H-protein gene. ?& Recognition sites for Fiq. 1. restriction endonucleases in the genomic region assigned to the true H-protein The recognition sites for -EcoRI, E; GdIII, H; *I, S; SacI, Sa; gene. X; are shown for the XHHGlO2 (upper row) and XHHG9 (lower @I, Sm; and =I, row) inserts. Filled circle denotes the polymorphic Sac1 site suggested in g. The distribution of exons inthe genomic region i-previous paper (5). Exons illustrated between the EcoRI and &I sites shown with asterisks in &. with closed boxes are tentatively designated A to E, because the 14-bp unknown Pv and Hc untranslated region is predicted for H-protein mRNA (cf. Fig. 2). The indicate the recognition sites for -PvuII and GcII, respectively. regions upstream and downstream from the EcoRI site in exon E are separately hybridized with 24L and 24S, respectively. Fig. 2. The sizes of both the primer extended and the Sl protected products. Byg two 17-mer primers, each complementary to nucleotides 4 to 20 of the cDNA (lane 1) and -23 to -7 of the genomic sequence (cf. Fig. 3)(lane 2), and liver total RNA (5), H-protein mRNA was examined for the unknown 5' region. The single-stranded DNA illustrated under the autoradiogram was prepared as described in EXPERIMENTAL PROCEDURES. Products from reactions for 7 (lane 5) and 15 min (lane 6) with Sl nuclease (5 units/ml) were determined for their sizes togetherwith that of the probe after annealing with the RNA (zero time control, lane 4). Lane 3 is for the end-labeled WI-treated pBR322 DNA.
between
the
assigned
7.8-
to the
hHHG102
a 12-kb
insert,
even when
However, hybridization
As would
inserts
exhibited
whereas
the
reported
base
not
a genomic
remaining
sites cDNA are of the
separate
yielded from
tested
boundaries, 3). in the replaced 21st
each
insert
facts,
both
comprised carried
cDNA sequence
(Fig.
in the
segments
in
these
organization
clones
substitutions
154T of the substitution
were
exon/intron
and acceptor
by using
be expected
observed
present
study
was
(Fig. 1). In the -SacI-treated -Sac1 site was detected only by a 5' region of the
fragment
human H-protein
At all
fragments
probed
signals
shown).
four
and 1.3-kb
above polymorphic
the
(not
Compared
with
with
Leu in the
1074
&IHGlOZ
of five
exons
sequences
clones
(not
and AHHG9 (Fig. similar
1 B), to
the
shown). the
is
conserved
for
cDNA sequence,
elucidated.
T and C, respectively, deduced
cDNA, multiple
of remaining the
processed
consensus
exon sequence
of the
cDNA.
mitochondrial
splice there
Nucleotides resulting presequence
donor are
only
69C and in the to Ser
Vol.
178,
No.
in the acid
3, 1991
genomic
sequence.
change.
Unknown
a 96-bases
was reverse-transcribed
product
of
to nucleotides
mRNA.
the Fig.
with
size
unknown
total
than
the
In this
shorter the
context,
than
consensus
for
(Fig.
3, -2 and -1,
bases
in
size)
splice
and -46
are
further
-23
in the
sequence,
(Fig.
ply(A) this
2, lane
(96 bases)
Considering
from
normal tail
upstream
2).
This 28-bases bases
H-protein
long
5'
mRNA (1.4
to -1,
the
100-200
in the
400-bases
H-protein
genomic
segment
exon was assigned
which
value
is
in (5))
to the
far
downstream
long
poly(A)
addition in
A
z
A
231 E
:i::
311 x
Exon B TTTrGcAcAG&attggattatatt----2:; :: 8
375 Hu Ch
Exon c AAcAAAcAAGgtgagtgttcttagg----95N K Q D 86N K D D
-Y+
::
taagcgcggcgggc-----
Exon D 507 TATGAAGI4 taagctgttgctag----Hu 139Y E D l-s Ch 130Y Q D G 1121
-GCAAAATMT-TA
Exon/intron Fig. 3. and a-case letters both 5' and 3' regions. corresponding to the is referred with minus end of exon E is the consensus for splice
5.3
2.3
2.5
1.5
kb
kb
kb
kb
probe
fragment l-106
suitable the
for
genomic
the
longer
the
a predictable
of the
the
-----tcttgttttattta
-----tttttttccaCttxA2AG;?G
-----tcttttgttcggc~~~T
exceeds
5' end of the
5' end of the
CTCCGGCCGCGAAC
-----ggtcttctgtttt~~Tf.A~
-33
39-bp
mRNA having
cDNA sequence
A
Bxon
B
R
K
20 250 F
BXcm c GAA;CGf2 E A I, Rmn
D
E
F
Exon
E
w
I,
cDNA
330 G 334 G 526 I
Bxon E ~aatcatgtttgttttgatgttaatatttcatttagta
Upperboundary sequences of the human H-protein gene. and intron and flanking sequences at indicate exon, Numbering of nucleotides begins at the position toward the downstream, and 5' end of the cDNA for exons The 3' to the 5' upstream from the 5' end of exon A. The site at which !&y(A) region is joined in the mRNA. sites and an AG dinucleotide (-46, -45) are underlined.
1075
DNA
mRNA sequence.
in mRNA, and the
of the
and 152
of about
size
of the possible
far,
the
two AG
a size
than
of
sequence
a synthetic
to 1,192-bp
2 bp upstream
contained
a primer,
ttaagctctgtcccgcccccgcggcaccgcctccgcgcctccatccaaccGGcTcc Bxon
the
a cDNA with
Thus
HWhen
(108
for
(5),
RNA, in
two fragments
EXO" -48
in
in
region
size.
poly(A)+
probe
are between
produced
cDNA sequence
kilobases
present
When used
dinucleotides
A
exon A as nucleotides
Therefore,
examined.
1).
RNA,
analysis.
two AG dinucleotides are
total primer
By the incubation fragments (lanes 5 and
included
in
amino
exists
Sl protection
sequence
and -45).
to nucleotides
400 bases
from
several
site
complementary
product
human liver of 76 bases
produced
Moreover,
presumable
2, lane
(Fig.
from
region
consecutively
2).
without
sequence.
RNA, the single stranded-DNA 200 bases (lane 4).
nucleotide
exists
(Fig.
the
C, but
approximately the
106 bases
cDNA sequence
formed
COMMUNICATIONS
From human liver by using the 17-mer
cDNA
result
7 and 15 min at 37OC, Sl nuclease
6).
genomic
of H-protein
shows
human liver
C in the
5' untranslated
2 also
shorter
with
mRNA
was also
RESEARCH
replaced
H-Protein
4-20
an identical
no fragment for
cDNA is
long
that
annealed
the
A is
Region
with
protein
G in
BIOPHYSICAL
Untranslated
complementary suggesting
AND
The 428th
The 771th
5'
product
BIOCHEMICAL
most
Vol.
178,
No.
sequence
3, 1991
BIOCHEMICAL
reported
previously
AND
BIOPHYSICAL
suggesting
(5),
RESEARCH
the
occurrence
exon for the 74-bases long exon was tentatively The most 5'
region 5' untranslated named A, and subsequent
Structural
Homology
of
and
assignment
of the
currently
first
gene, the above chicken H-protein located
genomic 49th
chicken
gene,
formed
boundaries
sizes
the
is
of introns
protein
genes,
the
genomic
the
true
H-protein
in both
are quite region
genes.
different
analyzed
gene
in
the
in the
H-protein
are
Moreover,
From this
H-
(5,12).
unknown
structures.
the
chicken
residues still
human and chicken
primary
for are
Ser for
acid
A which
valid
H-protein
reported H-protein
40th
of 125 amino
in their
positions
the for
the
5' end of exon
true
A and B, C and D, and D and E produced
exons
nucleotide
although codes
comprised for
that
mature
and at the
are
of exons
to the
with
of the
presumable
For the
Genes
region
was compared
human H-protein
positions
between
at comparable
genomic
of the
in H-protein mRNA. exons B to E.
H-Protein
Amino-termini
3, except
at identical
boundaries
(12).
proteins
in Fig.
Chicken
characterized
Ser for
and both
As shown
Human
organization
gene
at the
protein,
The
COMMUNICATIONS
the
split
codons
conservativeness,
in the
chicken
present
study
and human Hdoubtlessly
in human. DISCUSSION
In a span of about entire
H-protein
of five is
Only
exons.
closely
similar
Remaining
cDNA (Koyata
the
as basal
component transcription
that
is
appeared
to contribute activity
that
because this
is is
liver,
specified
of the
gene
chicken
glycine
respects chicken
(1,5,6,8). In the and human H-proteins
similarity could
further be regulated
cleavage
supports
system
occurs
in
the
glycine
locus
at present
by conserved
idea
that
The chicken
mechanisms 1076
cloned. the
Both to resemble
H-protein
glycine
H-protein However,
organization
of
the human and in many
of the conserved. gene
the gene
has
of the
(12).
reported
the
coordination
has been
present study, the organization has been shown to be highly the
exhibiting
decarboxylase
extents
(13).
of cDNAs now available,
have been
tissues
This
tissue-specific and brain
region
manners
The tissue-specific
with
genomic
length 5'
among multiple
by two different
tissue-specific.
by a single at the
to H-protein
AHHGlO2 and XHHG9 inserts
expressed
transcription
kidney,
insufficient
unknown
is
and coordinates to determining
in
in the
that
by us (12).
similar
Therefore,
transcription.
exclusively
structure
reported
the
comprised
gene.
gene
gene
activity
cleavage gene
H-protein
exon
gene
sequences
included
H-protein
H-protein
cleavage
H-protein
encoded
structure
in the
observation).
region
true
the
substituted
chicken
clones
exon/intron
processed
and tissue-specific
of the
glycine
were
unpublished
to the
human genomic
an organized
of the
genomic
chicken,
in
included
and Hiraga,
can be assigned such
two of several
bases
isolated
copies,
In the
four
to that
clones
genomic
13 kb,
cDNA sequence
genes This
transcription
in human and chicken.
In this
for
Vol. 178, No. 3, 1991
context,
Sakakibara
decarboxylase exon
(7).
their
5'
Further flanking
and 1.3
from
both
the
probes
in the
that
(56 bp)
for
the
of the
the
human glycine
as the presumable
unknown
glycine
structure
first
of the
decarboxylase
gene
true
and
structures
Implication remains
which are 12, -Sac1 fragments, with either or both of 24L and 24.9,
gene.
In patients
Sac1 fragment
was undetectable.
the
not
however,
genomic
is
aberrations
existing of the
from
with
nonketotic
24s or the
5.5-kb
The 5.2-kb
of patients
fragments,
display
three
hybridized
5.0-kb
genomes
that
hyperglycinemia
required
H-protein
gene, suggesting abnormal
sequence
as that
and are
true
None of these
sequence.
is
RESEARCH COMMUNICATIONS
indicated
RESULTS section,
either
24L was absent
(5,6).
the
the
hyperglycinemia, fragment
previously
a short
as well
kb in size in
AND BIOPHYSICAL
sequence. in
involved
have
analysis
gene
As detailed 7.8,
--et al. includes
gene
human H-protein
are
BIOCHEMICAL
probably
aberrations
Sac1 from
-EcoRI fragment having the 5.5-kb -Sac1 fragment included in the true H-protein observed
in the
near
presumable
in
the
pathogenesis
previous
study
processed
of nonketotic
to be examined.
Acknowledgment:
The authors
encouragements
and continuous
are
grateful
supports
to Dr. to our
Eiji
Tsukamoto
for
his
kind
work.
REFERENCES
1. Hiraga, K., Kochi, H., Hayasaka, K., Kikuchi, G., and Nyhan, W.L. (1981) J. Clin. Invest. 68, 525-534 2. Motokawa, Y., and Kikuchi, G. (1974) Arch. Biochem. Biophys. 164, 624-633 3. Hiraga, K., and Kikuchi, G. (1980) J. Biol. Chem. 255, 11664-11670 4. Hiraga, K., and Kikuchi, G. (1980) J. Biol. Chem. 255, 11671~11676 5. Koyata, H., and Hiraga, K. (1991) Am. J. Hum. Genet. 48, 351-361 6. Hiraga, K., Koyata, H., Sakakibara, T., Ishiguro, Y., and Matsui, C. (1991) Mol. Biol. Med. in press T., Koyata, H., Ishiguro, Y., Kure, S., Kume, A., Tada, K., 7. Sakakibara, Biophys. Res. Commun. 173, 801-806 and Hiraga, K. (1990) Biochem. H., Sakakibara, T., Ishiguro, Y., Kure, S., 8. Kume, A., Koyata, and Hiraga, K. (1991) J. Biol. Chem. 266, 3323-3329 F., Nicklen, S., and Coulson, A.R. (1977) Pro. Natl. Acad. Sci. 9. Sanger, USA 74, 5463-5467 S. Nisimura, S., and Seela, F. (1986) Nucleic Acids Res. 14, 10. Mizusawa, 1319-1324 F.J., Britten, R.J., and Davidson, E-H. (1987) Methods Ensymol. 11. Calzone, 152, 611-632 H., Matsui, C., and Hiraga, K. (1991) J. Biol. Chem. 12. Yamamoto, M., Koyata, 266, 3317-3322 H.. Kume, A., Ishiguro, Y., and Hiraga, K. (1991) 13. Kure, S., Koyata, J. Biol. Chem. 266, 3330-3334
1077