DEVELOPMENTAL GENETICS 11:77-87 (1990)

Characterization and Expression of Two Sea Urchin Homeobox Gene Sequences GORDON V.L. WANG, GREGORY J. DOLECKI, RUBEN CARLOS, AND TOM HUMPHREYS Pacific Biomedical Research Center, University of Hawaii, Honolulu ABSTRACT We describe two homeobox sequences, TgHbox5 and TgHbox6, isolated from the Hawaiian sea urchin Tripneustes gratilla using a Drosophila Sex combs reduced probe. Sequence analysis shows that the encoded TgHbox5 homeodomain shares only 30-52% amino acid identity with homeodomains encoded by previously characterized genes, establishing that it is a divergent homeobox that is not in any known class of homeoboxes. TgHbox5 is expressed in the embryo a s two major developmentally regulated transcripts. one at 5.0 kilobase (kb) appearing by blastula stage and the other at 2.7 kb appearing at pluteus stage. Multiple transcripts from TgHbox5 are present at a much lower level in adult tissues and are predominantly expressed in small and large intestines. The TgHbox6 homeobox is an Antennapcdia-class homeobox, which appears not to be expressed during embryogenesis but produces abundant 3.6 and 3.2 kb transcripts in the six adult tissues examined. Key words: Divergent homeobox, embryonic transcription, homeobox evolution

INTRODUCTION The homeobox is a n approximately 180 base pair (bp) conserved protein-coding DNA sequence initially found in the Drosophila homeotic selector genes Antennapedia (Antp) and Ultrabithorax (Ubx), and the segmentation gene fushi tarazu (ftz) [McGinnis et al., 1984; Scott and Weiner, 19841. Mutations in such homeoboxcontaining Drosophila genes have shown they control the pattern of morphogenesis in the developing embryo by specification of the number, polarity, and identity of larval segments [Nusslein-Volhard and Wieschaus, 19801. Homeobox genes have subsequently been identified in the genomes of a wide variety of eukaryotes ranging from yeast to mammals [McGinnis, 1985; Holland and Hogan, 1986; Way and Chalfie, 19881. Several lines of evidence indicate that many homeobox genes are involved in controlling developmental processes a t the level of transcriptional regulation through DNAbinding by the 60 amino acid homeodomain encoded by the homeobox. Soon after the discovery of the ho-

0 1990 WILEY-LISS, INC.

meobox a similarity was recognized between the homeodomain and the DNA-binding helix-turn-helix motif of proteins such as the yeast MAT a2 and X cro [Laughon and Scott, 1984; Shepherd et al., 19841. The idea that the homeodomain possesses DNA-binding activity was further supported by immunolocalization of homeobox proteins to the nucleus [Laughon et al., 1985; Carroll et al., 1986; Kessel et al., 19871, in vitro demonstrations of sequence-specific DNA-binding by fusion peptides containing homeodomains [Desplan et al., 1985; Fainsod et al., 19861, and structural studies of bacterially expressed homeodomains [Otting et al., 1988; Muller et al., 19881. Genetic analyses examining changes in the expression of one homeobox gene in Drosophila embryos mutant for another indicate a complex hierarchial, cross-regulatory network of interaction with examples of both negative and positive regulation [Akam, 1987; Ingham, 1988; Scott ct al., 19891. The recent demonstration that three mammalian transcription factors, Oct-1,Oct-2, and Pit-1, contain homeodomains supports the hypothesis that homeobox proteins can act a s sequence-specific transcription factors [Levine and Hoey, 1988 for review]. The majority of homeobox genes that have been described come from insects and vertebrates, the two groups that each represent the highest development of morphological complexity in the protostomes and deuterostomes, respectively. Many of these homeoboxes fall into a large family of homeobox genes that has been designated the Antp-class based on their high degree of similarity to the Drosophila Antennapedia homeobox. Antp-class genes of both vertebrates and insects are organized into homologous clusters in which corresponding genes are expressed within the same relative boundaries along the antero-posterior body axis during embryogenesis [Akam, 1989; Duboule and Dolle, 1989; Graham et al., 19891. The spatial order of expression of each Antp-class gene along this axis is a direct reflection of its genomic order within these clus-

Received for publication October 3,1989; accepted December 15,1989. Address reprint requests to Gordon Wang, Kewalo Marine Laboratory, 41 Ahui St., Honolulu, HI 96813.

78

WANGETAL.

ters. Within the mouse genome four Antp-class gene clusters, each residing along a different chromosome, have been identified. Sequence comparison of these mouse genes demonstrates homologous inter-cluster cognates leading to the suggestion that these clusters arose by duplication of a n ancestral cluster of homeobox genes [Kappen et al., 1989; Duboule and Dolle, 1989; Graham et al., 19891. The functional and structural homologies between vertebrate and insect homeobox gene clusters suggest that this ancestral Antpclass gene cluster was present in primitive coelomate metazoans before the divergence of the protostomes and deuterostomes. It would be reasonable to predict that a n Antp-class gene cluster exists within the sea urchin genome. The sea urchin embryo has long been a favorite experimental organism among developmental biologists and has contributed significantly to our ideas about embryogenesis [reviewed by Wilt, 19871. As a model system, the sea urchin offers advantages such as large quantities of synchronously developing embryos and rapid, simple, and accessible embryogenesis. While standard diploid genetic analysis is not practical in sea urchins, the power of paragenetic analysis afforded by microinjection of nucleic acid molecules and constructs is now applicable [Flytzanis et al., 19851. Evolutionarily, the sea urchin embryo may be thought of a s a modern-day relic of a n early coelomate metazoan; it represents one of the most primitive enterocoelous forms extant. Though the sea urchin adult has radial symmetry, the embryo is bilaterally symmetric and lies securely in the evolutionary line that gave rise to the bilateral chordate body plan [Hymen, 19511. An understanding of the activity of homeobox genes in sea urchin embryogenesis will contribute to interpreting and contrasting the roles of homeobox proteins in the more complex developmental programs of vertebrates and Drosophila. We have previously reported the isolation of four sea urchin homeobox genes, TgHboxl [HBl, Dolecki et al., 19861, TgHbox3, TgHbox4 [HB3 and HB4, Dolecki et al., 19881, and TgHbox-En [SU-HB-en, Dolecki and Humphreys, 19881. The first three genes contain Antpclass homeoboxes that are transcribed in a temporallyregulated manner during embryogenesis. In situ hybridization experiments have localized the TgHboxl transcripts to a n ever-decreasing subset of the aboral ectoderm cells between blastula and pluteus stages [Angerer et al., 19891. The TgHbox4 homeobox appears to be the sea urchin homologue of the Drosophila iab-7 [Regulski et al., 19851, mouse Hox-1.7 [Rubin et al., 19871, mouse Hox-3.2 [Breier et al., 19881, mouse Hox5.2 [Dolle and Duboule, 19891, Xenopus XlHboxG [Sharpe et al., 1987; De Robertis et al., 19881, and human Hox-2.5 [Boncinelli et al., 19881 homeoboxes. The TgHbox-En gene contains a n engrailed-class homeobox that appears not to be transcribed during embryogenesis, but expression in adults is restricted to the Aris-

totle’s lantern, presumably in the neural ring. We report here the characterization of two additional sea urchin homeobox genes, TgHbox5 and TgHbox6, and their pattern of transcription during embryogenesis and in adult tissues.

MATERIALS AND METHODS Genomic Clones Recombinants (5 x lo5) from a Charon 4Al Tripneustes gratilla genomic library [Dolecki et al., 19861 were screened with a 32P-labeled homeobox probe derived from the Sex combs reduced (Scr) [Harding et al., 19851 gene of the Drosophila Antennapediacomplex. Nitrocellulose lifts of the library were prepared by the method of Benton and Davis [1977] and hybridized with the Scr probe under low-stringency [McGinnis et al., 19841 conditions. Positive recombinants were isolated and plaque-purified. Two of these, AHB5 and AHB6, were restriction mapped, and their homeoboxes were localized by standard Southern techniques. Nucleotide Sequence Analysis Restriction fragments spanning the two homeoboxes were subcloned into M13 vectors [Messing and Vieira, 19821. Several of these restriction subclones were then used as single-stranded templates to generate deletion subclones using the 3‘ to 5’ single-stranded exonuclease activity of T4 DNA polymerase (Dale et al., 19851. DNA was sequenced along both strands by the Sanger dideoxy method [Sanger et al., 19771. A synthetic oligomer 5’-ATGACGACGACGATC-3’was used to sequence across the internal Kpn I restriction site of the TgHbox5 homeobox owing to our inability to find a deletion clone covering this area. Most of the computer programs used to carry out our studies were provided by the BIONET National Computer Resource for Molecular Biology [Smith et al., 19861. Codon bias analysis was performed using the IBI Sequence Analysis System to determine DNA sequences with a high potential for being protein-coding. The IBI program works by establishing the codon bias at each position in a sequence based on a bias table of known coding genes. The bias value at each position is the product of the values over a user set window of up to 50 codons (25 in this case), and then the root of the product is determined. Biases higher than the minimum value measured for any of the sequences used to make the table are judged to be significant. Probes for Hybridization The following restriction fragments were used as probes: 1) p109A-1 (Fig. lA), a 345 bp HaeIII-HaeIII genomic subclone of AHB5 that contains the TgHbox5 homeobox, 2) p92-1 (Fig. 2A), a 1.9 kb HindIII-EcoRI genomic subclone of AHB6 that contains the TgHboxG homeobox, and 3) pHB6-1 (Fig. 2A), a 356 bp Accl-Accl

SEA URCHIN HOMEOBOX GENES genomic subclone of XHB6 that also contains the TgHbox6 homeobox.

RNA AND DNA ANALYSIS Sea urchin embryos were grown, and their RNA was extracted a s previously described [Fregien et al., 19831. Adult tissues were collected, and RNA was extracted from them as described by Dolecki and Humphreys [19881. Poly(A) RNA was electrophoresed on denaturing 1.O% agaroseiformaldehyde gels and then transferred onto nitrocellulose. Genomic DNA was extracted from sperm [Fregien et al., 19831, and Southern analysis was performed according to standard techniques [Maniatis et al., 19821. DNA and RNA transfers were hybridized with labeled DNA probes produced by random hexamer oligonucleotide-primed synthesis with the Klenow fragment of DNA polymerase I using [a32P]dCTP to achieve a specific activity of approximately 1 x lo9 dpmipg [Feinberg and Vogelstein, 19831. The blots were washed under stringent [Maniatis et al., 19821 conditions followed by exposure to Kodak XAR-5 film for 1 to 3 days at -70°C with two intensifying screens. RNA sizes were estimated by comparison with A DNAiHindIII markers.

RESULTS A Tripneustes gratilla genomic library was screened for clones that cross-hybridize to a Drosophila Scr homeobox probe under low-stringency [McGinnis et al., 19841 conditions. Eighteen recombinants that reacted to the Drosophila probe were plaque-purified. Southern analysis of EcoRI digests of the 18 clones revealed four distinct reactive fragments of 8.9, 1.7, 3.4, and 3.8 kb. Sequence analysis of the first two, designated XHB3 and XHB4, had uncovered two Antp-class homeoboxes, TgHbox3 and TgHbox4 [Dolecki et al., 19881. In this report, we characterize the two latter clones, which we have named XHB-5 and XHB-6, respectively. Restriction maps for XHB5 and AHB6 are shown in Figures 1A and 2A, respectively. Sequence analysis showed that XHB-5 and XHB-6 each contain a definitive homeobox. The nucleotide sequences are presented in Figures 1B and 2B along with the conceptual translation of the open reading frames (ORF) from the putative exons encoding the homeodomains. We designate the homeobox genes represented in XHB5 and hHB6 TgHbox5 and TgHbox6, respectively. The putative exons encoding 232 amino acids for TgHbox5 and 87 amino acids for TgHbox6 in Figures 1B and 2B, respectively, are supported by codon bias analysis (IBI Sequence Analysis Software System) of these ORFs. Both are demarcated at the 5' end by a putative splice acceptor site and by a stop codon a t the 3' end. In most homeobox genes, the 180 bp homeobox is found in the 3' terminal exon. If this is the case for TgHbox5 and TgHbox6, then the translations presented in Figures 1B and 2B may represent the C-ter-

79

minal ends of these gene products. Until we have isolated and sequenced the corresponding cDNAs, the assignment of these 232 and 87 amino acid segments as being entirely coding must be tentative. The 232 amino acid segment containing the TgHbox5 homeodomain is unusual in several respects. The putative splice acceptor site is in good agreement with the consensus eukaryotic acceptor [(T/C),N(C/ T)AGiG] splice junction [Mount, 19821 and is located relatively far upstream from the homeobox in contrast to its close distance to the 5' end of homeoboxes of many other genes. The region preceding the TgHbox5 homeodomain is rich in glutamic and aspartic acid residues with one area containing nine of these acidic residues in a row. Many vertebrate homeobox proteins contain regions rich in glutamic acid residues, but these areas in contrast to TgHbox5 usually follow the homeodomain rather than precede it [Kessel et al., 1987; Falzon et al., 1987; Simeone et al., 1987; Wright et al., 19871. The amino acid sequence following the TgHbox5 homeodomain is rich in alanine followed by a C-terminal region rich in serine. Such homopolymeric amino acid stretches of alanine and serine have been seen in the Drosophila caudal, even-skipped, engrailed (en), and human HHO.cl3 homeobox genes [Mlodzik and Gehring, 1987; Frasch et al., 1987; Poole et al., 1985; Mavillo et al., 19861. The level of similarity between the TgHbox5 homeodomain and other published metazoan homeodomains ranges from 30% to 52%. Much of the similarity is concentrated a t the C-terminal end of the homeodomain where the helix-turn-helix DNA-binding subdomain resides. Comparison of the TgHbox5 homeodomain with the original members of the two largest classes of homeodomains, Drosophila Antp and en, revealed that they share 45% and 43% amino acid identity, respectively. TgHbox5 can thus be considered quite divergent from other homeoboxes so far identified. If homeoboxes similar to TgHbox5 can be found in other organisms, they would together constitute a new class of homeoboxes. The TgHbox6 homeobox is flanked just 23 bp upstream by a putative splice acceptor site, which is also in good agreement with the consensus splice acceptor site described above (Fig. 2B). The introdexon border 5' of many vertebrate Antp-class homeoboxes is typically just upstream of the homeobox a s is the case here. The portion of the TgHbox6 ORF downstream of the homeodomain encodes a 40 amino acid sequence rich in aspartic and glutamic acid residues (17140 amino acids). Comparison of the TgHbox6 and Drosophila Antp homeodomains revealed that they share 87% sequence identity a t the amino acid level, which establishes TgHbox6 to be a mainstream Antp-class gene. Southern transfers (data not shown) of EcoRI and Hind111 digested genomic DNA from the individual used to construct the Charon 4A library were reacted with a 32P-labeled345 bp HaeIII-HaeIII homeobox-con-

80

WANG ET AL.

A

-

X HB5

- l2

-\ It

E

1 kb

H

orm

HG

H

L 5

K

Ha

r t

II

a r m

E

TAA

200 bp

I

TgHbox5

HG

H

kb

3

I---

HG

pi09A-1 II

Prcbe

345 bp

B 72

CTGTTTTCTCT CATTCTTATT TTTATTATGT TCCTTCAGCT TATATTTTCC TTCCTATCTA TTCTCTTAAT G v -

CAG GTA TGC CTG CAT T T C V C L H F Hae I I I 1 CGT CTC GCT ATG GCC AAA R L A M A K

1

20

AGC TCT CAT CCA CCG ATA GCC CAA CCG CTA ACC AAC AGT AGC S S H P P I A P P L T N S S

132

GAT GTC TCA CCC CGA CCA ACT CAA TCT CAT AGC AGC TTC GGA D V S P R P T P S H S S F G

192

0

GAG GAA AGC GGC GAG AAA GAC AGC CGT GAT CGC GAA TCC CCC GAT CCT GAA CTC CGT CGA E E S G E K D S R D R E S P D P E L R R

252

4

0

AAT GAT GGA CAC GAA GAG GAG GAG GAT GAC GAC GAC GAT CGT TCG CCG CAA AAG AAG AAG N D G H E E E E D D D D D R S P Q K K K

312

6

AAG AAG AAG ACG AGA ACT GTC TTT TCG CGA ACT CAA GTC TTT CAA CTT GAA TCA ACG T T T K K K T R T V F S R S P V F 2' L E S T F

80

Kpn I

1 GAA GTG AAG CGG TAC CTT TCG AGT TCT GAG CGG GCT GGA CTA GCA GCA AAT CTT CAC CTG E V K R Y L S S S E R A G L A A N L H L

100

1I3 7 2 I 1432 I

i ACT

GAG ACT CAG GTC AAG ATC TGG TTC CAG AAC CGT CGC AAC AAG TGG AAG CGA CAG ATG T P V K I U F P N R R N K Y K R P M 120 P H e e E i i i 1 GCC GCT GAG CTT GAG GCG GCG AAT CTG GCT CAC GCC GCT CAA ATA CGC GCT CAG GCA AAC 1 4 0 A A E L E A A N L A H A A P I R A P A N

i

492

552

CTT GCT CAG GCC AGC GCT GCG CAT GCC CAT GCC CAT GCG CAG CGC ATG GTC CGC GTA CCC 0 L A P A S A A H A H A H A P R M V R V P

612

ATC CTC TAC CAC GAG AAT CAT TCC AAC ACA GGT TCG AGT ATG CAC TCG CCT AAC GTC TTG I L Y H E N H S N T G S S H H S P N V L

672

CCT TTC CCG ATG CCA TAC CAT GGC TCG TAT CAT CAA A T 1 AGC TCG TCA TCA TCG TCT ACA 2 O O P F P H P Y H G S Y H P I S S S S S S T

732

TCA TCA GCA AGA TCG CCA CCG TCA ACA AGT ATC GTT ACA TAA AGCGCATAAA ACGTTGTTAA G S S A R S P P S T S I V T Trm

795

1

180

220

6

Fig. 1.

SEA URCHIN HOMEOBOX GENES taining fragment (Fig. 1A) of hHB5 (TgHboxS) and a 32P-labeled 356 bpAccI-Ace1 homeobox-containing fragment (Fig. 2A) of AHB6 (TgHbox6). After high stringency post-hybridization washes, the TgHbox5 probe reacted with a single 3.4 kb EcoRI fragment and a single 2.1 kb HindIII fragment; the TgHboxG probe reacted with a single 3.8 kb EcoRI fragment and a single 2.4 kb HindIII fragment. These reactive fragments correspond in size to restriction fragments seen in the original genomic clones and to two of the bands on a genomic Southern reactive with a Drosophila Scr probe a t low stringency [Dolecki et al., 19861. These results indicate that the 3.4 kb and 3.8 kb homeoboxcontaining EcoRI fragments from XHB5 and XHB6, respectively, accurately represent the single-copy genes from which they derive. The expression of TgHbox5 and TgHboxG was studied using transfers of poly(A)+ RNA from sea urchin embryos and adult tissues reacted with 32P-labeled TgHbox5 and TgHboxG restriction fragments containing the homeoboxes followed by high stringency washes. The TgHbox5 p109A-1 probe (Fig. 1A) hybridized to two major transcripts in poly(A)+ RNA obtained from sea urchin embryos a t 3-hr, blastula, gastrula, and pluteus stage (Fig. 3A). The larger 5.0 kb RNA first appears at blastula stage, increases slightly in abundance a t gastrula, and then decreases substantially by pluteus. The second smaller 2.7 kb transcript appears only a t pluteus stage and is expressed at a slightly lower level than the larger 5.0 kb message at this same time point. Several very faintly reactive smaller bands are also seen in the gastrula and pluteus lanes. The reactions of the TgHbox5 probe to transfers of poly(A)+ RNA from six adult tissues, small and large intestines, Aristotle’s lantern, ovary, testes, and coelomocytes is shown Figure 3B. In small intestine

Fig. 1. A TgHbox5 organization and sequence. The upper line shows clone XHB5 containing approximately 16 kb of genomic sea urchin DNA inserted in a hCharon4A vector (hatched boxes). The position of a 1.9 kb EcoRI-Hind111 fragment containing the TgHbox5 homeobox (solid box) is shown. The lower line shows an expanded map of the 1.9 kb EcoRI-Hind11 fragment carrying the TgHbox5 homeobox (solid box) oriented for transcription left to right. The positions of Hind11 (H),HueIII (Ha),KpnI (K), and EcoRI (E) restriction sites are indicated. The putative 3‘ splice acceptor site (r)is marked. The thick line represents the long ORF, which contains the TgHbox5 homeodomain; the stop codon (TAA) for this ORF is indicated. The horizontal arrows indicate the sequencing strategy. The arrow terminated by a circle at its 5’ end indicates the sequence determined by using the synthetic oligomer 5’-ATGACGACGACGATC-3’.p109A-1 represents a 345 bp Hue 111-Hue111fragment used as a probe. B: Nucleotide sequence of a portion of the XHB5 genomic clone and the conceptual translation of the ORF defined by the putative 3‘ splice acceptor site (overlined from nucleotides 62 to 77) and stop codon (Trm).The region encoding the 60 amino acid TgHbox5 homeodomain is boxed. HueIII and KpnI restriction sites are indicated.

81

poly(A)+RNA a distinct 5.0 kb transcript is seen along with faint reactions at 3.5, 3.1, and 2.7 kb; the 2.7 kb message is seen more clearly in Figure 3A. The large intestine RNA shows a similar pattern of hybridization with the same four sizes of RNA but a t different levels of expression. The 3.5 and 3.1 kb reactions in the large intestine lane are approximately equal in abundance to the 5.0 kb transcript in the small intestine lane. The 5.0 kb message in the large intestine lane is fainter and is expressed at about the same level a s the 3.5 and 3.1 kb species in the small intestine lane. The smallest band a t 2.7 kb in the large intestine lane is barely detectable. The 5.0,3.5, and 3.1 kb RNA transcripts in the Aristotle’s lantern and ovary lanes are expressed faintly and a t about equal levels in each lane. There is no detectable reaction to poly(A) RNA prepared from testis and coelomocytes. A lane of small intestine poly(A)+ RNA run on the same gel a s the embryonic RNA (Fig. 3A) as well as a lane of pluteus poly(A)+ RNA run on the same gel as the adult tissue RNA (Fig. 3B) show that the overall level of expression of TgHbox5 is much higher in embryos than in any adult tissue examined. We were unable to detect any transcription of TgHbox6 during embryogenesis (data not shown) after probing poly(A) RNA prepared from 3-hr, blastula, gastrula, and pluteus stage embryos with both the 1.9 kb EcoRI-Hind I11 and 356 bp AccI-AccI genomic fragments (Fig. 2A) of AHRG in separate experiments. Adult tissue RNA transfers were probed with the 1.9 kb EcoRI-Hind I11 probe (Fig. 2A), and a 3.2 kb RNA hybridized in all tissues with a n additional 3.6 kb RNA appearing in two of the tissues examined (Fig. 4). Coelomocyte RNA shows the highest level of transcription for the 3.2 kb transcript. The relative abundance of the 3.2 kb transcript in the other adult tissues examined was as follows: Coelomocyte > small intestine = large intestine > ovary = testes > Aristotle’s lantern. The 3.6 kb RNA appeared in ovary and testes at levels more abundant than the 3.2 kb message in the same tissues. The smaller 356 bp AccI-Ace1 probe that was used in hybridizations to the embryonic RNA transfers produced the same pattern of reaction and relative signal strength as seen in Figure 4 to transfers of adult tissue poly(A)+ RNA (data not shown). +

+

DISCUSSION We have characterized two new sea urchin homeoboxes, TgHbox5 and TgHboxG. The TgHbox6 homeobox is a mainstream Antp-class homeobox that appears not to be expressed during embryogenesis. Its lack of embryonic expression is surprising since other Antp-class homeobox genes from many species are expressed abundantly in embryos. This might be explained as the result of transcription a t very low levels, in a narrow time window, andlor in a very restricted cell-type during early development. TgHbox6 tran-

a2

WANG ET AL.

X HB6

H

E

-10 k b

H

A

S

I kb

E

A TAA

3'

TgHboxG

A

A

j H

pHE6-I

I

Probes E

356 b p

p92- I I 9 kb

B Acc 1

1 ATCTTACATT AATTTTACAT AATTCATCAT GTGTGTAATA TTCTATCGTC ATGAGTATAC AAGCTACATA TT v TTACTGTT TTATAGCCA TGGATTTTAT TTCCTTTCTT TCACCTAGGC ACC CAC ATC ATC GGC ACT GAT T H I I G T D Sac I

72

140

/CGA AAG CGC GGT CGC CAG ACT TAC ACG CGA GCT CAG ACC CTG GAA CTG GAG AAA GAG TTC R K R G R P T Y T R A P T L E L E K E F

200

8

ICAC TAC AAC CGC TAC CTA ACG CGA AAG AGA CGC ATC GAG ATC GCC CAG GCC GTG TGT CTC Y N R Y L T R K R R I E I A Q A V C L

260

28

!

TCC GAG AGA CAG ATC AAA ATC TGG TTC CAG AAT AGA CGG ATG AAG TGC AAG AAG GAA CGA S E R Q I K I U F Q N R R M K U K K E R

320

48

ATA CGG GAC GGT GTC GGG GAC GAG GAA GAT GAG GCT AAG GAC GGC GAT GAT GTC AAT AGT 1 R D G V G D E E D E A K D G D D V N S Acc I

380

iH

6

8

440

8

TGT AGT ATA AAG GAG GAT GAA GAC AAT GAG GTA GAC TCT GAT AAA GAC CTG GGA AGT CAT 8 C S I K E D E D N E V D S D K D L G S H TAA CGCATCATCA CCATCTTCAT TGTTGTCATT AAGAATATTG GATCCTCTAT CCGTCATGTG TACCATTTC T rm

512

1

Fig. 2. A TgHboxG organization and sequence. The upper line shows clone XHBG containing approximately 12 kb of genomic sea urchin DNA; the position of the 1.9 kb HindIII-EcoRI fragment containing the TgHboxG homeobox (solid box) is indicated. Symbols are as in Figure 1A. pHB6-1 and p92-1 represent 356 bp AccI-AccI and 1.9 kb HindIII-EcoRI genomic fragments, respectively, used a s probes. H,

HindIII; A, AccI; S, S a d ; E, EcoRI. B: Nucleotide sequence of a portion of the XHBG genomic clone and the conceptual translation of the ORF defined by the putative 3' splice acceptor site (overlined from nucleotides 104 to 119) and the indicated stop codon (Trm); the 60 amino acid TgHboxG homeodomain is boxed. AccI and Sac1 restriction sites are marked.

SEA URCHIN HOMEOBOX GENES

B

A kb

3

B

G

P

S

I

P

kb

8.9 -

8.9 -

6.2 -

6.2

-

4.0

4.0

-

-

2.1 1.8 -

83

SI

LI

L

0

T

C

2.1 1.8 -

Fig. 3. RNA gel blot analysis of TgHbox5 transcripts: 3, 3-hr embryo; B, blastula; G , gastrula; P, pluteus; SI, small intestine; LI, large intestine; L, Aristotle's lantern; 0, ovary; T, testes; C, coelomocytes. A Poly(A)+RNA, 2 pg per lane, from the developmental stages indicated, was electrophoresed on denaturing 1%formaldehyde-agarose gels, transferred to nitrocellulose, and hybridized to 3"-labeled, 345

scripts were easily detected in all the adult tissues examined. Expression of TgHbox5 is more typical of homeobox genes; multiple temporally-regulated messages appear in embryos. There are two major transcripts of 5.0 and 2.7 kb t h a t may be the product of distinct promoters andlor differential splicing. The TgHbox5 messages could differ in their 5' exons a s is the case with the XZHbox 1 gene [Cho et al., 19881 where differential splicing caused by the use of two separate promoters results in the differential utilization of the same ORF leading to the production of a long and short XZHbox 1 protein. Both Xenopus proteins have the same homeodomain and the same DNA-binding specificity, but the long version has a n extra 82 amino acid domain a t the amino terminus. The Drosophila Ubx gene [O'Connor et al., 19881has been shown to produce five distinct splicing products, which form a family of related proteins that share common 5' and 3' exons but differ with respect to their number of internal "micro" exons. Analysis of the molecular processes that produce the 5.0 and 2.7 kb embryonic transcripts of TgHbox5 will require isolation of the entire gene a s well a s the corresponding cDNAs. Adult transcription of TgHbox5 is predominantly found in both the large and small intestines and may

bp HueIII-Hue111homeobox-containingfragment (see Fig. 1A). A lane of adult small intestine poly(A)+ RNA was included for comparison. Denatured, HindIII-digested h DNA was used for a size marker. B: RNA gel blot analysis of TgHbox5 adult tissue transcripts. Blot prepared and hybridized as in A. A lane of embryonic poly(A) RNA from pluteus stage embryos was included for comparison. +

kb 8.9 6.2

-

4.0

-

SI

LI

L

0

T

C

2.1 1.8 Fig. 4. RNA blot analysis of TgHbox6 adult tissue transcripts. Blot prepared as in Figure 3. The TgHbox6 probe used was a "'P-labeled, 1.9 kb EcoRI-Hind111 homeobox-containing fragment (see Fig. 2A). SI, Small intestine; LI, large intestine; L, Aristotle's lantern; 0, ovary; T, testes; C, coelomocytes.

represent tissue-specific expression; it is at a much lower level than during early development. In addition to the 5.0 and 2.7 kb transcripts seen in embryos, two additional transcripts measuring 3.5 and 3.1 kb are also seen and may be the product of differential splicing. The sea urchin homeobox gene TgHbox3 [Dolecki

84

WANG ET AL.

A

TgHbox-En TgHboxb TgH box3 TgHbox6 TgHboxl TgHbox5

D-Antp X - Hox3 D-Prd M-Hoxl.6 H-H0x2.7 D - Rough D-Bcd

1 10 20 30 40 50 60 1 1 1 1 1 1 1 E - R P - - A - - A - - L Q R - K Q E - Q Q S N - - T E Q R - R S - - K E - T - S - S - I - - - - - - K - A - I27 -~S G R - K - C P Y T K F - T L E - - K E - L F N M - - T R D R - L E I - R L - S - - - R - - - - - - - - - - M - M - K27 -N G - R G - Q T Y T - Q - T L E - - K E - H F S - - V T R R R - F E I - Q S - G - S - R - I - - - - - - - - M - - - 27 -EH R - R G - Q T Y T - A - T L E - - K E - H Y N - - - T R K R - I E I - Q A V C - S - R - I - - - - - - - - M - - - K26 ER R-RC-QTYT-Y-TLE--KE-HFN---TRRR-IE-SHL-G---R-I--------M-Y-KES 27

KKKTRTVFSRSQVFQLESTFEVKRYLSSSERAGLAANLHLTETQVKIWFQNRRNKWKRQM

27 R-RG-QTYT-Y-TLE--KE-HFN---TRRR-IEI-HA-C---R-I--------M---KEN M R R Y - - A - T - E - I A R - - K E - Y R E N - V - R P R - C E - - - A - N - P - - T I - V - - - - - - M - D - - Q 29 R Q R R C - - T - - A - - L D E - - R A - - R T Q - P D I Y T - E E - - Q R T N - - - A R I Q V - - S - - - A R W < -22 H P N A V - - N - T T K - L T E - - K E - H F N K - - T R A R - V E I - - S - Q - N - - - - - - - - - - - - M - Q -28 KRE S - R A - - A Y T S A - L V E - - K E - H F N - - F V R P R - V E M - N L - N - S - R - I - - - - - - - - M - Y - K D25 Q QRRQ--T--TE-TLR--VE-HRNE-I-R-R-FE--ET-R-----I--------A-D--IE 31 P R R - - - T - T S - - I A E - - Q H - L Q G - - - T A P R L - D - S - K - A - G T A - - - - - - K - - - R R H - I28 QS

Consensus

c

B 15

0

18

3 21

16

L

W

28

E I?

Fig. 5. A Alignment of the 60 amino acid TgHbox5 homeodomain with five Trzpneustes homeodomains and representative homeodomains from different subclasses [Scott et al., 19891. At each position where there is identity with TgHbox5, a dash is indicated. The number of amino acids that have identity with TgHbox5 for each comparison is shown on the right. Below the alignment is a consensus sequence where dashes indicate variance of two or more residues a t a given position (except for position 8 ) and invariant amino acids between the 13 homeodomains compared are shown. The location of three a-helical and B-turn secondary structures are indicated as identified by NMR data lotting et al., 19881. B: Helical wheel analysis of residues 8 thru 21 of TgHbox5. This portion of the TgHbox5 home-

odomain is displayed end-to-end down the axis of a schematic a-helix with an incremental angle of 100". This helical wheel consists of seven spokes corresponding to the fit of seven amino acids into every two a-helical turns. The direction of the helical turn is clockwise going Nto C-terminal into the plane of the page. Highly conserved residues from the consensus sequence shown in A are boxed. C: Helical net analysis of amino acids 8 thru 21 of TgHbox5. The helical region containing the conserved residues (boxed)is flattened by splitting the helix lengthwise along the opposite face. The display starts at the lower left corner with phenylalanine 8. The first helical turn is completed with arginine 10, and the second turn is initiated by serine 11.

SEA URCHIN HOMEOBOX GENES et al., 19881 shows tissue-specific expression in adults where transcripts of 5.7 kb are restricted to ovaries and testes (from which mature gametes have been shed). Examples of tissue-specific homeobox genes in vertebrates and insects include the mouse Hox-1.4 gene [Wolgemuth et al., 19861, which exhibits testis-specific expression, and the Drosophila H2.0 gene [Barad et al., 19881, which contains a divergent homeobox that shows transcription restricted to the visceral musculature and its anlagen. The temporal patterns of expression during embryogenesis for the sea urchin homeobox genes TgHboxl [Dolecki et al., 1986; Angerer et al., 19891 and TgHbox5 are very similar. Transcripts first appear at blastula stage, increase at gastrula, and then decrease at pluteus. For each gene, there are two major transcripts. The similar temporal pattern of expression of TgHboxl and TgHbox5 may be due to coordinated regulation a s is the case with certain Drosophila homeobox genes. It is difficult to assign possible roles in the specification of cell fate for either of these sea urchin homeobox genes since specification, determination, and initial differentiation of larval tissues occur before these two genes are activated [Angerer et al., 19891. The TgHbox5 homeodomain is very divergent, sharing between 30% and 52% similarity at the amino acid level with the majority of known homeobox genes. The TgHbox5 homeobox shows only 48% nucleotide sequence identity with the Scr probe used to isolate the sea urchin homeoboxes. The ability of the Scr homeobox to hybridize to TgHbox5 gene apparently depends on a homologous stretch of 21 nucleotides with only one mismatch near the 3‘ end of the TgHbox5 homeobox. This region encodes the most highly conserved amino acid sequence K-I-W-F-N-R, corresponding to the second helix (helix 3 in Fig. 5A) of the DNAbinding region of the homeodomain. Eighty-seven homeodomains from a wide range of organisms were organized into different classes on the basis of amino acid sequence similarity [Scott et al., 19891. TgHbox5, like 14 other homeodomains from various species, could not be assigned to any of these classes. We have compared the TgHbox5 homeodomain (Fig. 5A) to a variety of divergent homeodomains in order to determine which amino acids are conserved. In addition to the amino acids conserved in the helixturn-helix motif, the consensus sequence of this comparison shows three invariant amino acids a t positions 12/Gln, 16/Leu, and 20/Phe in the Helix 1region of the homeodomain defined by recent three-dimensional NMR structural studies on a 68 amino acid, bacterially expressed, Drosophila Antp homeodomain lotting et al., 19881. These NMR experiments defined a n a-helix of approximately 12 residues extending from positions 10 to 21 in Figure 5A. If this proposed helix 1 subdomain is extended to position 8, which is usually Tyr (Tyr in 63/95 cases and Phe in 18/95 cases), there are four “invariant” residues within helix 1 forming the

85

tetrad repeat Y/F XXX Q XXX L XXX F. Helical wheel (Fig. 5B) and net analysis (Fig. 5C) of a n idealized ahelix stretching from amino acids 8 to 21 of TgHbox5 indicate that these highly conserved residues line up along a face of the a-helix. Conservation of this possible structural feature among widely divergent homeodomains implies a n important functionality. These conserved residues may represent a domain involved in protein-protein interaction [Barad et al., 1988; Scott et al., 19891 or in protein-DNA interactions [GarciaBlanco et al., 19891. Since the sea urchin evolutionary line branches off from the deuterostome lineage after the divergence of the protostome branch, i t seems likely that sea urchin Antp-class homeoboxes would also be clustered and contain cognates of the insect and vertebrate homologues. Sequence comparisons show that the Tripneustes TgHbox4 gene is clearly the homologue of the Drosophila iab-7 gene [Dolecki et al., 19881 and the corresponding vertebrates cognates (e.g., mouse Hox1.7, -2.5, -5.2, and -3.2). The iab-7 gene and its mouse cognates are the 5’-most genes in their respective clusters and are expressed most posteriorly along the antero-posterior axis in comparison to the expression of the other homeobox genes from these complexes. Sequence comparison of the other three sea urchin Antpclass homeobox genes, TgHboxl, 3, and 6, with Antpclass homeobox genes of other species have not succeeded in defining the cognates of thcsc sca urchin genes owing to the high degree of similarity among homeoboxes from many Antp-class genes. Determination of the true cognates of the urchin genes will require sequence data outside the homeobox, information on genomic linkage relationships, and determination of their spatial patterns of embryonic expression.

ACKNOWLEDGMENTS We appreciate the expert technical help of Dawn Haeckel, Bert Fukunaga, and Lynn Godfrey. This work was supported by grants from the National Institutes of Health to T.H. and G.J.D. (HD22483 and HD06574). The BIONET resource was supported by a grant from the National Institutes of Health Division of Research Resources (RRO1685).

REFERENCES Akam M (1987): The molecular basis for metameric pattern in the Drosophila embryo. Development 101:l-22. Akam M (1989): Hox and HOM: Homologous gene clusters in insects and vertebrates. Cell 5-7:347-349. Angerer LM, Dolecki GJ, Gagnon M, Lum R, Wang G, Yang Q, Humphreys T, Angerer RC (1989):Progressively restricted expression of a homeobox gene within the aboral ectoderm of developing sea urchin embryos. Genes Dev 3:370-383. Barad M, Jack T, Chadwick R, McGinnis W (1988): A novel, tissuespecific, Drosophila homeobox gene. EMBO J 7:2151-2161. Benton WD, Davis RW (1977): Screening Xgt recombinant clones by hybridization to single plaques in situ. Science 196:180-182. Boncinelli E, Somma R, Acampora D, Pannese M, DEsposito M, Sim-

86

WANG ET AL..

eone A (1988):Organization of human homeobox genes. Hum Reprod 3:880-886. Breier G, Dressler GR, Gruss P (1988): Primary structure and developmental expression pattern of Hox 3.1, a member of the murine Hox 3 homeobox gene cluster. EMBO J 7:1329-1336. Carroll SB, Laymon RA, McCutcheon MA, Riley PD, Scott MP (1986): The localization and regulation of Antennapedia protein expression in Drosophila embryos. Cell 47:113-122. Cho KWY, Goetz J , Wright CVE, Fritz A, Hardwicke J, De Robertis EM (1988): Differential utilization of the same reading frame in a Xenopus homeobox gene encodes two related proteins sharing the same DNA-binding specificity. EMBO J 7:2139-2149. Dale RM, McClure BA, Houchins J P (1985): A rapid single-stranded cloning strategy for producing a sequential series of overlapping clones for use in DNA sequencing: Application to sequencing the corn mitochondria1 18s rDNA. Plasmid 13:31-41. De Robertis EM, Burglin TR, Fritz A, Wright DVE, Jegalian B, Schnegelsberg BP, Bittner D, Morita E, Oliver G, Cho KWY (1988): Families of vertebrate homeodomain proteins. In Gralla J (ed): “DNA-Protein Interactions in Transcription.” UCLA Symposium, Vol. 95. New York: Alan R. Liss, Inc., pp 1-9. Desplan C, Theis J , O’Farrell PH (1985):The Drosophila developmental gene, engrailed, encodes a sequence-specific DNA binding activity. Nature 318:630-635. Dolecki GJ, Humphreys T (1988): An engrailed class homeobox gene in sea urchins. Gene 64:21-31. Dolecki GJ, Wannakrairoj S, Lum R, Wang G, Riley HD, Carlos R, Wang A, Humphreys T (1986): Stage-specific expression of a homeobox-containing gene in the non-segmented sea urchin embryo. EMBO J 5:925-930. Dolecki GJ, Wang G, Humphreys T 11988):Stage- and tissue-specific expression of two homeobox genes in sea urchin embryos and adults. Nucleic Acids Res 16:11543-11558. Duboule D, Do116 P (1989):The structural and functional organization of the murine HOX gene family resembles that of Drosophila homeotic genes. EMBO J 8:1497-1505. Dolle P, Duboule D (1989): Two gene members of the murine HOX-5 complex show regional and cell-type specific expression in developing limbs and gonads. EMBO J 8:1507-1515. Fainsod A, Bogarad LD, Ruusala T, Lubin M, Crothers DM, Ruddle FA (1986): The homeo domain of a murine protein binds 5‘ to its own homeobox. Proc Natl Acad Sci USA 83:9532-9536. Falzon M, Sanderson N, Chung S Y (1987): Cloning and expression of rat homeo-box-containing sequences. Gene 54:23-32. Feinberg AP, Vogelstein B (1983): A technique for radiolabelling DNA restriction endonuclease fragments to high specific activity. Anal Biochem 132:6-13. Flytzanis CN, McMahon AP, Hough-Evans BR, Katula KS, Britten RJ, Davidson EH (1985):Persistence and integration of cloned DNA in postembryonic sea urchins. Dev Biol 108:431-442. Frasch M, Hoey T, Rushlow C, Doyle H, Levine M (1987):Characterization and localization of the euen-skipped protein of Drosophila. EMBO J 6:749-759. Fregien N, Dolecki GJ, Mandel M, Humphreys T (1983): Molecular cloning of five individual stage- and tissue-specific mRNA sequences from sea urchin pluteus embryos. Mol Cell Biol 3:1021-1031. Garcia-Blanco MA, Clerc RG, Sharp PA (1989): The DNA-binding homeo domain of the Oct-2 protein. Genes Dev 3:739-745. Graham A, Papalopulu N, Krumlauf R (1989):The murine and Drosophila homeobox gene complexes have common featues of organization and expression. Cell 57:367-378. Harding K, Wedeen C, McGinnis W, Levine M (1985):Spatially regulated expression of homeotic genes in Drosophila. Science 229: 1236-1242. Holland PWH, Hogan BLM (1986): Phylogenetic distribution of A n tennapedia-like homeo boxes. Nature 321:251-253. Hymen LH (1951): “The Invertebrates.” Vol. I1 New York. McGrawHill. Ingham PW (1988):The molecular genetics of embryonic pattern formation in Drosophila. Nature 335:25-34.

Kappen C, Schughart K, Ruddle FH (1989):Two steps in the evolution of Antennapedia-class vertebrate homeobox genes. Proc Natl Acad Sci USA 86:5459-5463. Kessel M, Schulze F, Fib1 M, Gruss P (1987): Primary structure and nuclear localization of a murine homeodomain protein. Proc Natl Acad Sci USA 84:5306-5310. Laughon A, Scott MP (1984): Sequence of a Drosophila segmentation gene: Protein structure homology with DNA-binding proteins. Nature 310:25-31. Laughon A, Carroll SB, Storfer FA, Riley PD, Scott MP (1985): Common properties of proteins encoded by the Antennapedia complex genes of Drosophzla melanogaster. Cold Spring Harbor Symp Quant Biol 50:253-262. Levine M, Hoey T (1988): Homeobox proteins as sequence-specific transcription factors. Cell 55:537-540. Maniatis T, Fritsch EF, Sambrook J (1982): “Molecular Cloning: A Laboratory Manual.” Cold Spring Harbor, NY: Cold Spring Harbor Laboratory. Mavilio F, Simeone A, Giampaolo A, Faiella A, Zappavigna V, Acampora D, Poiana G, Russo G, Peschle C, Boncinelli E (1986): Differential and stage-related expression in embryonic tissues of a new human homeobox gene. Nature 324:664-667. McGinnis W (1985): Homeobox sequences of the Antennapedza class are conserved only in higher animal genomes. Cold Spring Harbor Symp Quant Biol 50:263-270. McGinnis W, Levine MS, Hafen E, Kuroiwa A, Gehring WJ (1984): A conserved DNA sequence in homeotic genes of the Drosophila A n tennapedia and bithorax complexes. Nature 308:428-433. Messing J , Vieira J (1982): A new pair of M13 vectors for selecting either DNA strand of double-digest restriction fragments. Gene 19: 269-276. Mount S (1982): A catalogue of splice junction sequences. Nucleic Acids Res 10:459-472. Mlodzik M, Gehring W J (1987): Expression of the caudal gene in the germ line ofDrosophilat Formation of an RNA and protein gradient during early embryogenesis. Cell 48:465-478. Miiller M, Affolter M, Leupin W, Otting G, Wiithrich K, Gehring WJ (1988): Isolation and sequence-specific DNA binding of the Antennapedia homeodomain. EMBO J 7:4299-4304. Niisslein-Volhard C, Wieschaus E (1980): Mutations affecting segment number and polarity in Drosophila. Nature 287:795-801. OConnor MB, Binari R, Perkins LA, Bender W (1988): Alternative RNA products from the Ultrabzthorax domain of the bzthorax complex. EMBO J 7:435-445. Otting G, Qian Y, Miiller M, Affolter M, Gehring WJ, Wuthrich K (1988): Secondary structure determination for the Antennapedia homeodomain by nuclear magnetic resonance and evidence for a helix-turn-helix motif. EMBO J 7:4305-4309. Poole SJ,Kauvar L, Drees B, Kornberg T (1985): The engrailed locus of Drosophilut Structural analysis of an embryonic transcript. Cell 40:37-43. Regulski M, Harding K, Kostriken R, Karch F, Levine M, McGinnis W (1985):Homeobox genes of the Antennapedia and bithorar complexes of Drosophila. Cell 43:71-80. Rubin MR, King W, Toth LE, Sawczuk IS, Levine MS, D’eustachio P, Nguyen-Huu MD (1987):Murine Hox-1.7 homeobox gene: Cloning, chromosomal location and expression. Mol Cell Biol 7:3836-3841. Sanger F, Nicklen S,Coulson AR (1977):DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci USA 74:5463-5467. Scott MP, Weiner A (1984): Structural relationships among genes that control development: Sequence homology between the Antennapedia, Ultrabithorax and fushi turazu loci of Drosophila. Proc Natl Acad Sci USA 81.4115-4119. Scott MP, Tamkun JW, Hartzell GW, I11 (1989): The structure and function of the homeodomain. BBA Rev Cancer 989:25-48. Sharpe CR, Fritz A, De Robertis EM, Gurdon J B (1987):A homeoboxcontaining marker of posterior neural differentiation shows the importance of predetermination in neural induction. Cell 50:749758. Shepherd JCW, McGinnis W, Carrasco AE, De Robertis EM, Gehring

SEA URCHIN HOMEOBOX GENES WJ (1984): Fly and frog homeo domains show homologies with yeast mating type regulatory proteins. Nature 310:70-71. Simeone A, Mavilio F, Acampora D, Giampaolo A, Faiella A, Zappavigna V, DEsposito M, Pannese M, Russo G, Boncinelli E, Peschle C (1987): Two human homeobox genes, cl and c8: Structure analysis and expression in embryonic development. Proc Natl Acad Sci USA 84:4914-4918. Smith DH, Brutlag D, Friedland P, Kedes L (1986): BIONET: National computer resource for molecular biology. Nucleic Acids Res 14:17-20. Way J C , Chalfie M (1988): Mec-3, a homeobox-containing gene that

87

specifies differentiation of the touch receptor neuron in C. elegans. Cell 38:361-369. Wilt F (1987): Determination and morphogenesis in the sea urchin embryo. Development 100:559-575. Wolgemuth DJ, Engelmyer E, Duggal RN, Gizang-Ginsberg E, Mutter GL, Ponzetto C, Vivian0 C, Zakeri ZF (1986): Isolation of a mouse cDNA coding for a developmentally regulated testis-specific trasncript containing homeobox homology. EMBO J 51229-1235. Wright CVE, Cho KWY, Fritz A, B u r g h TR, De Robertis EM (1987): A Xenopus laeuis gene encodes both homeobox-containing and homeobox-less transcripts. EMBO J 6:4083-4094.

Characterization and expression of two sea urchin homeobox gene sequences.

We describe two homeobox sequences, TgHbox5 and TgHbox6, isolated from the Hawaiian sea urchin Tripneustes gratilla using a Drosophila Sex combs reduc...
1MB Sizes 0 Downloads 0 Views