Copyright 0 1992 by the Genetics Society of America

Comparative Studies of Drosophila Antennapedia Genes Joan E. Hooper,*’” Manuel Pirez-Alonso,t” JohnR. Bermingham,*’3Mary P r o ~ t , * ’ ~ Bryan A. Rocklein,* Michael Wagenbach,* Jan-Erik Edstrom,% Rosa de Frutost and Matthew P. *Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, Colorado 80309, ?Department of Genetics, University of Valencia, E-46100 Burjasot, Valencia, Spain, and $Department of Genetics, University of Lund, 223 62 Lund, Sweden Manuscript received December4, 199 1 Accepted for publicationJune 12, 1992

ABSTRACT The Antennapedia (Antp) homeotic gene of Drosophila melanogaster controls cell fates and pattern formation in the epidermis, nervous system and mesoderm of thoracic segments. Its expression is controlled at the levels of transcription, alternative RNA splicing, polyadenylation and translation. Two nested Antp transcription units extend over 103 kb and produce sixteen different transcripts. We have compared the Antp genes of Drosophila virilis, Drosophila subobscura and D. melanogaster to determine which structural features are conserved and therefore maybe important to the gene’s function. The overallgenestructures are similar. There are manyconservedsequenceblocks throughout the largeintrons,atleast 15 kb upstream of thefirst promoter, and at least 3 kb downstream ofthe last polyadenylation site.Intron and exon sequence conservation around alternative splice sites indicates that alternative protein coding forms mayalsobe conserved. Protein coding potential is perfectlyconserved around the C-terminalhomeodomain, wellconservedin the Nterminal region, and more variable in the middle. The large size of the Antp gene may reflect a large number of control elements necessary forappropriate Antp protein expression. The conservation of transcript complexity suggests functional requirements for the different protein forms.

H

OMEOTIC genes control the formation of struc-

turesthat distinguish onesegmentfromanother in Drosophila. During development, homeotic gene products appear in elaborate spatial and temporal patterns. T h e proteins act as transcription factors to coordinate theexpression of downstream “target” genes. There are five protein-coding homeotic genes clustered in the Antennapedia complex (ANTC) (KAUFMAN,LEWISand WAKIMOTO1980; KAUFMAN, SEEGERand OLSEN 1990) andthree in the bithorax complex (BX-C) (LEWIS 1978; DUNCAN 1987; SANCHEZ-HERRERO et al. 1985). Most of these geneshavelargetranscriptionunits with cis-acting regulatory sequences extending over tens kilobases of (BENDERet al. 1983, 1985; BOULET,LLOYDand SAKANJU 1991; BOULET and SCOTT1988; CELNIKER et al. 1990; GARBER, KURIOWAand GEHRING 1983; IRVINE, HELFANDand HOGNESS1991; KARCH et al. 1985; SCOTTet al. 1983; SIMONet al. 1990). T h e Antennapedia(Antp) homeotic gene controls



Both authors should be considered as equivalent first authors. J.H. is the primary author of the Dv project and M.P.A. is the primary author of the Ds project. Current address: Department of Cell and Structural Biology, University of Colorado Health Sciences Center, Denver, Colorado 80262. Current address: Howard Hughes Medical Institute, School of Medicine “01 3, University of California, San Diego, La Jolla, California 92093. ‘Current address: Department and Molecular and Cell Biology, University of California, Berkeley, Clifornia 94720. Current address: Departments of Developmental Biology and Genetics, Stanford University School of Medicine, Stanford, California 94305-5427.



Genetics 132: 453-469 (October, 1992)

cell fates andpatternformation in the epidermis, nervous system and mesoderm of thoracic segments. Transcriptional regulation provides the primary control of its elaborate spatial and temporal expression pattern (reviewed in AKAM1987; SCOTTand CARROLL 1987; INGHAM 1988). T h e initial differentialtranscription of Antp along the anterior-posterior axis of the embryo at the blastoderm stage is dependent on “gap”geneproducts (IRISH,MARTINEZ-ARIAS and AKAM1989; REINITZand LEVINE1990). Slightly later “pair-rule” genes regulateAntp (INGHAM and MARTINEZ-ARIAS1986; TREMML and BIENZ 1989). The maintenance of segmentally restricted transcription of Antp depends both on othergenes of the ANT-C and BX-C (HAFEN, LEVINE and GEHRING1984; HARDING et al. 1985; RILEY,CARROLL and SCOTT1987; STRUHL and AKAM1985) and on regulatorsof homeotic genes such as Polycomb (LEWIS1978; WEDEEN, HARDING and

LEVINE 1986; ZINK et al. 1991), extra sex combs (STRUHL1983), and many others (KENNISON and TAMKUN 1988). Striking dorsal-ventral differences in Antp expression (e.g., BERMINCHAM et al. 1990) indicate responsiveness to genes that control dorsal-ventral differentiation (reviewed in ANDERSON 1987). As development proceeds, Antp is expressed in complex and precise patterns in most tissues, including the nervous system, the epidermis, and the somatic and visceral mesoderm (LEVINEet al. 1983; MARTINEZ-

454

J. E. Hooper et al.

ARIAS 1986; REGULSKIet al. 1985). Large numbers of tissue- and position-specific transcription factorsare suggested by the highlycomplexpatternsof Ant$

1975). We also compare the protein coding exons of the Antp gene from Drosophila subobscura to those of the other two species. D. subobscura and D. melanogaster probably diverged 20-50 million years ago.T h e expression. large size of the Antp gene is common to D.uirilis and The Antp gene has two promoters that are differD. melanogaster. We find striking colinear homologies ently regulated (LAUGHONet al. 1986; SCHNEUWLY et throughout the 100 kb of introns. There is extensive al. 1986; STROEHER,JORGENSEN a n d GARBER 1986). sequence conservation aroundthe two promoters and T h e Antp P 1 promoter drives a 103-kb transcription the alternative splice sites. The sequence of the Antp unit. The Antp P2 promoter drivesa 36-kb transcripprotein is highly conserved among three species of tion unit that is nested within the larger transcription flies,with perfectconservation of thealternative unit.Theprocessedtranscriptsfromthetwo promoters differ in their leaders but share protein-coding forms. exons(Figure1). T h e two Antp promoters are exMATERIALS AND METHODS pressed in different patterns both in embryos and in imaginaldiscs(JORGENSEN and GARBER1987; BERConstruction of the D. virilis genomic library: Vector DNA was prepared by digesting EMBL-3 (DENTE,CESARENI MINGHAM et al. 1990). T h e twopromoterscan be and CORTESE1983) with BamHI and EcoRI followed by independently controlled; onlyP 1is responsive to the three isopropanol precipitations. High molecularweight Kriippel “gap” gene, while only P2is responsive to the genomic DNA prepared from adult D. uirilis (SCOTTet al., hunchback “gap” gene (IRISH,MARTINEZ-ARIASa n d 1983) was partially cleaved with Sau3A, sized on a sucrose AKAM 1989) and the fushitarazu “pair-rule”gene gradient (16-23 kb), phosphatased, and ligated to the phage DNA. More than 99% of the phage in the library were (INGHAM and MARTINEZ-ARIAS 1986). recombinant, with an average insert size of 19 kb. Large cis-acting sequences are required for correct Identification, isolationand sequencingof the D. virilis P1andP2transcription.Twenty-five kilobases of Antp gene: Southern blots of EcoRI-restricted genomic DNAroughlycentered on the P1 promoter direct DNA from D. uirilis were probed with fragments from the lacZ expression in the epidermis and mesoderm in a D. melanogaster Antp gene at moderate (hybridized at 55” in 6 X SSC, washed at 55“ in 2 x SSC) or high (hybridized pattern that approximates the authentic P1 pattern at 65” in 6 X SSC, washed at 65” in 2 X SSC) stringencies. (M. PETITT a n d M. P. Scott, unpublished). A 10-kb A probe from the first promoter (Pl) of the D.melanogaster fragment of DNA, containing mostly sequence just gene (4.0 kb EcoRI fragment) detected 1.2- and 1 l-kb upstream of P2, is sufficient to direct lacZ expression fragments in D. uirilis, the second promoter(P2; 1.9-kb EcoRI fragment) detected 3.0- and 9.6-kb fragments, exons in a pattern that roughly approximates the normal D-G (3.0-kb EcoRI fragment) detected a 7.0-kb fragment, pattern of P2 transcription (BOULETa n d SCOTT 1988). and exon H (1.7-kb EcoRI fragment) detected a 2.5-kb I n each case the patterns are imperfect, so additional fragment. regulatory elements are needed. Therefore, Antp conThe D. uirilis genomic library was screened at moderate trolelementsaredispersed over largeregions of stringency with the D. melanogaster probes just described for P1, P2, exons D-G and exon H. Authentic Antp clones DNA. were separated from spurious positives, based on the EcoRI The Antp protein includes features shared by many fragment sizes predicted from theSouthern blots. The homeotic gene products; a 61-amino acid DNA bindintron regions between the putative exon clones were then or M repeatsequences, a inghomeodomain,opa obtained by chromosomal walking. The map ofthe D.uirilis Antp gene was reconstructed from the lambda phage clones nuclearlocalizationsequence,andtwoothershort (Figure 1). motifs-MXSYF andYPWM(reviewedin SCOTT, Exons were subcloned into pEMBL18, pEMBL19 TAMKUN and HARTZELL1989).AlternativeRNA (DENTE,CESARENI and CORTESE 1983), pGEM1, or pGEM7 splicing generates four different Antp protein forms (Promega, Madison, Wisconsin). Double-stranded sequence (BERMINGHAM and SCOTT 1988; STROEHER,GAISER was obtained using dideoxy chain termination (SANGER, NICKLENand COULSON1977) with nested deletions genera n d GARBER 1988).They vary by insertion/deletion et al. 1986), ated by Exonuclease 111 digestion (LAUGHON of a few amino acids just upstream of the homeodooligonucleotide primers, 35S,and either Klenow fragment main. The significance of the different Antp protein for single stranded sequencing, or Sequenase (U.S. Bioforms is unknown. chemical Corp., Cleveland, Ohio) for double-stranded sequencing. What are the essentialaspects of the Antp gene Identification, isolation,and sequencing of the D. suborganization? An interspecific comparison should in obscura Antp gene: Southern blots of D.subobscura genomic principlerevealessentialelements as evolutionarily DNA were probed with a D.melanogaster Antp cDNA probe preserved.Divergedregionsmightcorrespondto c903 (SCHNEUWLY et al. 1986) kindly provided by W. J. GEHRING. The blots were hybridized at 42” in 5 X SSC, nonessentialfeatures, or might identify species-spe46% formamide and washed at 50” in 2 X SSC to identify cific elements. Here we compare the Antp gene of the homologous EcoRI fragments. A D.subobscura genomic Drosophila uirilis to the gene fromDrosophila melanolibrary in EMBL4 (provided by R. GONZ~LEZ-DUARTE and gaster. The common ancestor of the two species is G. MARFANY)was screened under the same hybridization estimated to have lived about 60 million years ago and washing conditions described above using the D. melanogaster cDNA probe. The authentic Antp clones were iden(BEVERLEYand WILSON 1984; THROCKMORTON

Antennapedia Sequence Conservation tified on the basisof the EcoRI fragments detected by Southern blotting. A map of the 3’ part of the D.subobscura Antp gene is shownin Figure 1. EcoRI fragments of the lambda phage clones hybridizing to the cDNA were subcloned into pUC18 and sequenced on both strands by the NICKLENand COULS~N (1977) using T 7 method of SANGER, DNA polymerase (Pharmacia-LKB) and 35S.Nested deletions were generated using the deletion kit of PharmaciaLKB and oligonucleotides were used when necessary to fill gaps in the sequence. RNA analysis:mRNA isolation and Northern blot analysis methods are described in LAUGHON et al. (1986). Sequence alignment and analysis: Protein alignment was carried out using the CLUSTAL package of HIGGINSand SHARP(1988). Each sequence in Figures 8 and 9was analyzed for potential RNA hairpin loops using HAIRPINS (PCGENE software package, IntelliGenetics, Inc., Mountain View, California) with minimum stem length of 5 bases and loop size from 3 to 20 bases.The free energy value (AG) is computed et al. using the method and numerical values of TINOCO (1973). For the exon D-G region, the 13predicted hairpins of AG greater than -7 kcal were then compared between the three species. For the exon H region, the two predicted hairpins of AG greater than -7 kcal were compared between the three species. Conserved predicted hairpins are noted in the text andfigures. DNA sequences were compared by dot matrix, using DOTPLOT (University of Wisconsin Genetics Computer Group, Madison, Wisconsin) or Pustell DNA matrix (MacVector software package, International Biotechnologies, Inc., New Haven, Connecticut). Window sizeand stringency were varied to optimize elements on diagonals. P1 and P2 DNA sequences were aligned with GAP (University of Wisconsin Genetics Computer Group, Madison, and Wisconsin), which uses the algorithm of NEEDLEMAN WUNSCH (1970). The P1 sequences are too large for GAP to evaluate in a single step. GAP was first run with the /LIMIT option, to detect large segments of homology but not necessarily the bestpossible alignment. EachP1sequence was then subdivided into four segments. Each segment was optimally aligned with its homolog, without the /LIMIT option. The final P1 alignment was the product of four contiguous segment alignments. The final P2 alignment was the product oftwo contiguous segments. The preliminary alignment (GAP/LIMIT) was performed using a gap weight (GW) of 3.0 and a gap length weight (GL) of 0 while the alignments of the subdivided sequences was performed with GW = 5.0 and GL = 0. DNA sequences were also aligned semimanually, using the interactive ALIGN program developed by R. MARVIL (University of Colorado at Boulder). The program first lists all perfect matches between two sequences, the minimum length specified by the user. The user then selects from this list a colinear set of matches, the program introduces gaps, and reports the number of aligned nucleotides (nt). The program is then instructed to align imperfect matches detected by the user in the vicinity of perfect matches. A core perfect match of eight was the minimum to introduce agap for two-way alignments. For three way alignments gap were introduced either toalign core perfect matches of seven, 8/ 9 imperfect matches, or longer imperfect matches (>go%) that aligned the third sequence with a match between the other two. The validity of the final alignments was verified by comparing them to the dot-matrix plots. The resulting alignments were formatted using ALIGN and edited for final presentation using MacDrawII(CLARIS Corp., Mountain View, California). The two algorithms produced similar alignments that

455

differ in details. For P1, GAP chose six longer imperfect matches over shorter perfect matches. It missed four perfect matches of 1 1, 14, 14 and 15 nt, and found six additional 15/17, 21/28, imperfect matches (16/21, 19/28, 10/13 18/21). The alignments differed most in regions with no apparent similarity. Because alignment of weak sequence similarities depends on subjective judgments, we limit our discussion to alignments detected by both algorithms.

+

RESULTS

Organization of the D. virilis and D. subobscura Antp genes: The A n t p gene of D. melanogaster is a single copy gene (LAUGHON et al. 1986; SCHNEUWLY et al. 1986; STROEHER, JORGENSEN a n d GARBER1986). W e probed Southern blots of genomic DNA from D. uirilis a n d D.subobscura with exon fragments from the D.melanogaster A n t p gene. At moderate stringencywe detected only oneor two major restriction endonuclease fragmentsper probe (data notshown). O u r restricAntp tion maps of cloned genomic DNA from the genes ofD.uirilis a n d D.subobscura (Figure 1) predict exonrestrictionendonucleasefragmentsconsistent with the major bands detectedon Southern blots. We concludethat we haveclonedtheauthentic Antp homologues of D. subobscura a n d D. uirilis, and that they are single copy genes. Molecular maps of the D.uirilis a n d D.melanogaster A n t p genes, and of the 3’ part of the D. subobscura A n t p gene are shown in Figure 1. The overall organization of the A n t p genes from the three species is similar. We crudely mapped homologies between the D.uirilis a n d D.melanogaster genes by probing Southern blots of cloned D. uirilis A n t p at moderate stringencywithintron,exon,andupstreamfragments from D. melanogaster. The boxes plotted in Figure 2 indicate the uncertainties in the position of the homologies.Virtually every D. uirilis fragment larger than 1 kb,andmanysmaller, is detected by a D. melanogaster probe. This is equally true for fragments in the middle of the large introns, near the promoters, and around the exons. This is not due to indiscriminate hybridization since D. melanogaster probes generally only recognized the corresponding region of the D.uirilis gene; most homologies fall on the diagonal of the “box-plot.” Three cross-hybridizing fragments do not fall on the diagonal. They mightrepresentrepetitiveelements or counterexamples to the generalcolinearityofconservedelements.Higher resolution descriptions of the homologies around P1, P2,and the exons are presented below. Transcripts from the D . virilis Antp gene: T h e cross-hybridization between clones of D.uirilis and D. melanogaster Antp genomic DNA (Figure 2) suggests of the Antp that the two promoters and exon structure geneareevolutionarilyconserved.The D. melanogaster gene encodes four major transcripts, utilizing two alternative promoters (exons A B or exon C), sharingexons D-H, and utilizingtwoalternative

+

J. E. Hooper et al.

456

521 34

45

11

24

31

a:

*a

:*

:,

a

A

Dm

. I . I

,

'. ' .

Y

.

.

I

I I .

C

DEFG

H

8

8

W

p2-".

B ,

A8

48

B 7

I

C

'.

Hl Hl9 H13

A17

53

A

Dv' . .. .

432

72

52s

8 '

I

P1

H23

A4

430 40

44

EcoRl SalI

MFG W

H W

.

'

Ds

,

. EcoRl '

Sal1

EcoRl salI

LL Y A1

FIGURE1.-Molecular map of three Drosophila Antennapedia genes. T h e Antennapedia genes of D. virilis ( D v ) , D. melanogaster (Dm),and D. subobscura (Ds)are aligned by their homeoboxes (exon H). The coordinates (in kb) are set with zero at the point of transcription initiation from the first promoter of the D. uirifis gene. The individual D. uirilis phage clones are shown above the Dv map, and the individual D. subobsmra phage clones are shown below the Ds map. The restriction sites for EcoRI and Sal1 are shown. Exons (A-H), as mapped first by Southern blotting and then by DNA sequencing. are indicated. The splicing patterns for the P1 and P2 transcripts of D. melanogaster are indicated relative to the D. melanogaster exons. T h e transcription units of D.mefanogaster P1 and P2 are, respectively, 103 and 36 kb. In D. uirilis the corresponding distances are 12 1 and 4 1 kb.

P1 P2 H 5.0. 4.0 '

kb

i "

I

X

B A

D. virilis An@ (kb)

A

B exons C

D-G H

FIGURE2.-"Box-plot" of homologies between the D.virilis and D. melanogaster Antennapedia genes. The horizontal and vertical axes respectively represent the D.virilis and D. melanogaster Anten-

napedia genes. Exons A-H are indicated for each gene. Southern blots of cloned DNA from the entireD.virifis gene were probed at high stringency with seven large probes (sizes between 8.8 and 19.3 kb) which together covered 74% of the D. melanogaster gene. The cross-species homologies are shown by boxes. The vertical length of each box represents the length and position of each probe. T h e width of each box represents the size and position of the D. virilis fragments recognized by each probe. Darkly shaded boxes correspond to strong signals on the blots and lightly shaded boxes to weaker signals. Six smaller probes (1.7-4.4 kb) derived from the exon regions were also used (black boxes).

FIGURE3.-Northern blot detecting D.uirifis Antennapediatranscripts. Probes from the putative P1, P2, and homeobox (H) exon regions were used on polyadenylated RNA ( 1 pgllane) from 0-20hr D.virilis embryos. Each probe recognized a major transcript of 4.0 kb and a minor one of 5.0 kb. P1 probe, SafI-EcoRI 1.2-kb fragment (3383-4538; Figure 5); P2 probe, Sal1 2.0-kb fragment (6 18-2738, Figure 6); exon H probe (1.2-kb EcoRI fragment, just downstream of the fragment whose sequence is shown in Figure 8).

polyadenylation signalsin exon H (Figure 1). Mature transcripts fromthe two promoters are coincidentally of similar size; the alternative polyadenylation signals generate transcripts of -3.5 and -5.0 kb from each promoter (LAUGHON et al. 1986; SCHNEUWLY et al. 1986; STROEHER, JORGENSEN and GARBER 1986). Northern blots of D. virilis embryonic mRNA confirmed that the two putative promoter regions and the putative homeobox identified by Southern blotting are transcribed (Figure 3). The D. virilis Antp homeobox probe, which should identify allAntp transcripts,identifiestwosizeclassesofpolyadenylated

Antennapedia Sequence Conservation

'

.

i

-

1dOO D

. . .. ._. . . . 30'00' E

... ''

5000 * F G'

d

,

Ds ~

i. 6.,

:

1OO(

.

;. .

200(

1000

457

\

i

J

Dm

200".

. . 1000 . . . . Dm . J

H short

FIGURE4."Dotmatrixcomparisonsbetween genomic Antennapedia sequences of D.melanogaster and of D. virilis or D. subobscura. The top left panel compares P1 sequences. The top right panel compares P2 sequences. T h e middle panels compare thecluster of exons D-G. T h e bottom panels compare exon H sequences. For each panel, the D. melanogaster sequence is the x-axis and the position of the exons is indicated above the D. melanogaster sequence. For all plots, the window is 2 1 and the stringency 66%. D. melanogaster (Dm), D.virilis (Dv), D.subobscura (Ds).

Ds 1000

2000

3000

RNA, -4.0 and -5.0 kb. This transcript pattern is also seen with the P1- andthe P2-homologous D. virilis probes. Thus both D. virilis putative promoters generate both sizes of transcripts. This is consistent with a D. virilis gene structure identical to that of D. melanogaster: P 1and P2 transcripts are coincidentally of the same size, and the different transcript sizes might be accounted for by alternative polyadenylation sites. Sequence comparisons between the promoter regions of D . melanogaster Antp andthe D. virilis homologs: A dot-matrix comparison of 6 180 bases of genomic DNA sequence from the D. melanogaster P1 region and 5807bases ofthe D.virilis P 1 -homologous region is shownin the first panel of Figure 4. The interrupted diagonals show that blocks of conserved sequence are interspersed with diverged sequences.

The shifts from diagonal to diagonal show that the length of diverged sequences is not conserved. The monotonic character of the diagonals shows that conserved sequences are colinear. A dot-matrix plot of the reversecomplement of the D. virilis sequence against the D. melanogaster sequence has no hint of diagonals (not shown), demonstrating the lack of conserved sequences with reversed orientation. The position of exon A, which begins with P1, is indicated for the D.melanogaster sequence in Figure 4. Surprisingly, good diagonals are equally represented 3.9 kb upstream of the promoter, at the promoter, and 0.7 kb into the first intron. The alignment of the sequences in Figure 5 documents the conclusions of the dot-matrix plot. Blocks of conservedsequenceup to 134 nt in length are interspersed with diverged sequences. Large con-

J. E. Hooper et al.

458 me1 vir

C G G G G T A G G G M G T G C C T C T G G C A G A T G G C ~ T C W U GCACTTTTGGGGGACCGCTT 4 1 9 AAAACCGAC-CG-C-C---CG-CTC--M-"----------------------------------------------------A--G-----------C----------A~TT-T~C 139

mal

TGTTGTTGCCGGCACAGGTTGACTTATTTGCTG~TTGCTGGAGGGCGAGGACCAG~CCGGGMCAGGGCGGAGGAGATATGGMGGTATTGGGACCTGGTTTTTGAGGTCTTTC 5 9 9

vir

me1

-"-

TG-TTT-CTG----------------CC--T-~~MCCC-A-CC--GG-T--~-CCTGG-TC-CAG-TCCCAGGTCCC----CC-A--CT--A---CCCG-CTCCAGCG

258

""""""""-

vir

ATCCAGCTCAACGGCACGCGCGCTGTGTCCGCCAGTTGTTGCTGMGGGGAGTGTACCCCTT~GGGTTGTTCAAAGGTAGCTTAGCACTGGGGTATTGTMCTACACAG~TGT 719 TCT-GT-TTT-TGM---C-T--TTTTCTGT---TGTTG--GTT-TTT-----GT~TTGTTG339

me1

TTTTATTTATMTTGATATTGCGAGTATTGAGMGMTTTATTTATATAGAAAGAAATATAAATGTTCCTTATTCCATTATTTTGATMTMTC~CGGMTTTTTT~CAATA8 3 9

me1

CACAGTTGCCTGGGTCATATGM~TATAGGGTMCTGAAAATTMTTTGCTTTCTCAGAGTATTTCTAAATTTCTTTTTTAAATGMCTTATMTGGTTGT~TGAAATM 959

me1

TAAAGATGGTCTTTGTAGGTACATGGTATATCCTATTCGATAGGTTCTTGGCTCAGTTTGCTGGAGCGGAAACGCAGGCTTGGGGGCGAGTTGTCTGTGGCGGCGGCATGTGAAAATATT 1 0 1 9 -C--CTn;TA---C---CAG-------------------T-GTTA-CG-----G-C-CCCGA42 1

vir

1.m vir

me1 vir

me1 vir

me1 vir

me1 vir

me1 vir

me1 vir

me1

-------------------

TCCCAAGGTCMCGCCACACACAAACACGACGCCGACCCACGGGGACGG~GGACCCAAATGCMGGAGGCACTCGATTCCAGCTGAGACTGGGTCTGGTTTTGGGCCTGAT -T----------T------A-A-GTACMG-C-A-CA-ACAAC-AG-CAC-TCC-TGATTCC-GCTG--CCCC-G-CAC-GCA-CAT-CCATTACC--TGCTG--GCCAGTGCCCGTGCC

1194 541

CCTGGACCCGAATTGMTGATTTAGAGTCGCACAGTTGTCAAAGAGCAGCGATGCGTACGGACAGCATCCATGGGACCTTTCTTTCTGTGGGCAGGGCAGGAGC CGTGCTCGTCCCCGTG----AG-A-C--------C------G-A---------------------------------T-CA--GA-G-C----G-AGGCM-GCTTTG-T-T-CCAG

1298 660

TAAAGCGGGACTGTTGCCCATCCCAATTCCAATGCAATGGGTGGTGTCCGGAGTGG~CGCCMGTATGCTTTT~CCGATTCATCCGTTTTCATGATATTTATTGCCGCTACATMTTGAGT -TGT-GTTC-GACG-C-------AC---C-CC-A"--CATT---------------C-------------------T--C-CATT-------------GTGG------------

1418 779

GTAAAATTTTTTGGGGTTCGGCTTMCTTTGATATAT~GGTACGTTCACAAAGTAATTTTATTTATGTMl\TACGTATAAATATTGMTACATMTTT~GTGTATATAGACAGG 1538 CC-AC-AAAAAAAMC-T-ACAGCTGCC-AGTCCC-G-TACCACATTTM-TGGGA-AG-G-T-C--A-TCGTC-TT-TAAATGACACAG-AAATCTT--CA-ACATATA-TT 899

-"""

TTTTATTGTCAAATATTTTMTGTTTTTGGAGGTTTG~AG~TTMT~TTATTTCTTAAATGTTCATGM~CGCATACTAAATCTTMCTTTATAGATTTTTTAGTTACCGA 1658 -M-T-ATAT--CAGCAGGT-TAC---TTC-TA-GC-A-TC--CC--TC-GTT--CCMG~TT-MCAGCATG-ATM--AGT-TG-ATM-TA-A--A-AT------TT-G-MTG 1019

MTCACGATGTTCATACGGACAGATATGGACAGTTGAATAGACAGAGGGACATGGTCAGCTCGGCTCAGAAAGTGATCGTCMTTTGTTATGGTCGGAAACGCTTCCCTATACCTGGTTC TTGTTTCTMGGAT-TATTT-GCT-GMTTA-AG-TTTC-TTTGTTTTTG-~-A--TTGTATTAT--GTCCMT-TT-TTTGA-TMT--A--MTC-TM-CATA-TG--T--AC--

1718 1139

ATACGTTTCMCCAATCTAGTCTACCCTTTTACTCTACGAGTMTGGTMCTATTATAC~TACATTATGTTMTTTCATTATTAAATMTTATGCMTATGAGATTTTCATGMGC1 8 9 8 CG--A--CA--A---GTCCTGT---TTAGC-TGMCTTT~G--ATM-TT-GCA-A-TGTTTTATTC-GC-T----A------T-GTT-AT-ACTAAA--G-ATTT-A-A-ACCATC-A 1259

vir

CTTCTTTMCMGTTTMGTCTGMTGCTTTTCTGATGTGATTATTATTAAATAGCGTTATGAAAACAGCTTMTTTGCTTTGATAAACTCATATMTTGGMCCCCATT~TTCM~2 0 1 8 T-CGGA-GTTTTCA--GC-ATCCCTATTCM-A-A-ACM-MG-A-CAGCG-TTTAGCCGCTGTCTT---CCTCA---ACA-CAG-CAG--A-AGGCCM-TGATTC-CC-CTCM-GA 1379

me1

T

vir

G-A-CCG-CGGGC--GG-CC-CA---GCTGCCTCTGCCTCTGTCTC-GC-TCTGC---CGTTGGCAGCA-CGT--A-GTCG-C-CG-CGA-TCG-TCG--T--GTT-G-AGGTCC

me1

T T A A C A T C C G G T A G M T T T T T G C C G C T C G C G M C T T G ~ G T T ~ C T G ~ T T T T C A G C A C C A C T G T T C C A C C M G A G M T T C T C C G C A C T C A G A A A G G C ~ T T C T M C T G G A G C2 2 5 8

me1 vir

TCACTGGCAGCGCCGTCAGCAGAGCAGCCGACGACTGCGTAAATTTATGTGAGAAATGG~TTG~GAGCAGTGCGGCGCGCTCACGACACACATAAAGC~TMTGGCACAT 2378 -TGCG----TG-----------------------A -C--------G--CMC-C----1588

me1

TCGTCACACTTMCGCGGCCGCATCTCGCTCGTCTCGCTCGCACCCGCGTCT~CCCT CTCGCTCCCACTCTCCCTCTCTCCCTGCGCCGCCATCCATCTTGTGCGTTTGT

vir

------------------------GCT------C-TG-T----GT--TCG---GG--CTCGTGCGCG-----------A-AGA-ATCG-CA

mal

GTGCCGCGCTTGTGTCTGTGCGGTCTCT GAGMTGCTTTATGAGTGTGCGTCTGTGGCGGACTCCTCCCTGTTTACGACGCTGGCAGCGCCGCMGCAGMGCAGC -CTAGTATTGCTCTCCC-CAG-CGG--GAGCAGGCGACAGGC---------------C-C---CTGCA-CG-------------G------GTTGCCMTT-GTGA-G-CGA-GTG

2594 1824

GGCTGCAAAAATTCGGCCMCTTGTGTTGTTGTTCTTTTTCCGCT -C-GAGGC-G-GG-A-AGGCAGC-CMC----GCAGGG-CTTAGGAGCGAGGGGAGTACCCTTTTTTTTTATTCCAGCAGCATGTTGCGCGTGGTGCGTGGTATTGTAGCCACACAGCAG

2639 1944

vir

me1 vir

A

G

C

T

C

A

T

G

A

C

T

A

C

A

T

T

T

T

A

A

A

T

T

T

T

T

T

G

T

T

T

C

A

G

T

I

U

A 2138 1494

_____________ . . . . . . . . . . . . . . . . . . . . . .

me1

---G--T-----------GAG-CGA-C

CCACAACAGCTGTGGGAGAGCGAGAGCGTAG CA-CAC--GAGAGCAGCAGCAGCCACAGCAAAGATGCGAGC

2488 1706

2670 2064

V i r

CTACTTCTCTAGTGCTTCTTTGCCTCTGCTGATCTCATGCTCATGCTCACGCTCTGTGCGGC-AC-------------

me1

GAGAAAGAGAGAGAGTGGGAGCGATGCTGAAT TGTGATTGTGTGCGTGTGAGTGATGGCCTGCATGTGTGCGTGCATG 2 1 4 8 AGAGI\GC------------ACCATTTCCTTATTTAGGGTGGTTTGCGTTGCGTAC-A-T-CT----------T---TGT-TG--TGA--AC---C--G-2176

vir

inel vir

-

" " " "

TGATTTGTTGTTGGCGCGCTTCGATATAAACTCG GCCATGATGGTTTTGCCGTCGGCMGTG~TGTTTGGATGCCATACTCCCCGGTTCCMGTACCGTATTCCGGGTTCCA 2 8 6 1 2296 --TG-GAG--CGT-TTGTT-CT-T-T-TGTTM-ATTGTAC--------T---G------------A-M-GA--CCATACATCCGGCAG--ATMTTCTC---A-A-GATT-CTCAAAC

vir

2981 TMTTCACGGCGGCCATCATTCTCCCGATTATC~TGTTAGCCGATM~AGATATCAGAACGTCTAGATATGACTTATCGGGCAAATACATTTGCTATAC~CTGTTMTTATG 2416 -TTG-T-T-TTTMGCM-ATATTTA--AT--T--AT--GCA~TA---ATTTATM-T-CTTG-AGCTA-A-T--CA-ATTAAA-T-M-T-CA-A--CGTTT-T-AGM--TA-C-T

me1

T A T G G T T G T G G T A T A T C T A C ~ T M T T A T T A A A A G A T A C G T T T A A A C G T T T A T A T A C C A T A T C T A T M G C G T T M T T M G M C A T C T ~ G A T ~ T C G A G A C M T C G 3101 AAA

vir

-TACM-AAAT--ACGA-CTGTGCTGTTMTA---TTCTCG-AMAT

mal

3221 TCAAAAACCTTTTTTACTCTTATMGTACTTMGCAGCATMCMTTTA~TTATTTTACATMTAGGCAAACA~CTTTCGACCTTTCCATTGC~CCCTATTTTAGATTATTT

me1

3341 TCACTGGTAAAATAAATAAATAAATGAGTTTGTAAATTMTGATTCAGTCT~GTMTCGACGGATTGATACTTT~CT~GTCMCACMGGCCTTTAGAGMCCGTATTTG

me1

3461 TAAATTGTACMTTATTTTMTTGATTTTATAGMGGTGGTCTCCCAGAAAATGGTAGATTTGACCG~TTT~CMGTAAACTCAAGATGTACACATCTTTCACATCGTTAC 2513 -----------A---M-A-GTA-GCT-A~TTGTTTTG-ATT-TATA--A-GAAAT-T-GG-ATTT~TTTCA-T-TC~-AG-CATTG--ACAACA----TAAAT-

me1

vir

me1 Vir

me1 vir

me1 vir

me1 vir

me1 Vir

2463

GCTATTTTMGAAATT~GCAAATGMTTGATTGACCATATAGTCCAGAAGMTGGTT~GGTATAAAGATATTMGTTATAAA TGAG-A---CA--TAATCCCTT---GTT-A-CT--ATGTGATA-AGT-TT--AC--TT-AC-T-TM--CCC-ATA-AAT-CG---TTMGTAAAGATTCCTTGAGTTT~CMCTA

3550 2693

GATATAGTMTCGCTCTTCGAAAT~GTACAAACTACTCGATMTGCCCAGACMTCGMTTTT3 6 1 6 2813 ATMTGC~TMTTGGTTTTGAAAAGTAAATATTCTTTTT~TATAAATTGAG-----------A--MC-MTT-A-------TTT-T--TAGT--TTAAATAT-GAGTCTTTTC-GG M~\TAGTTTCCATTTTAGAGAAACTG~CAGCACTTMGAGM~\GAAACTT~TTGMCCTTGC~TTGTATTTMTAAACC~TGMTGAAACAAATTTCTTAAAAACC TTT-TM--TTC-M-TTTATCCTGA--TTTCCTT~T-T-TTGTT-TTTA-----AGAT-AT-ACA---C-T-M-------GG--ATTCTGATTG-C-T-A-CCC~T-T-TTA

3136 2933

T M T A A A A M G T G A A A T C A A A T A T A C C M C C A A A A T A A A T C T G G G M C T ~ T M T ~ T M C T G G A T G T G C A T C A T M G A T C T ~ T T T T G M G A C T G A C T A C A G T T T G C A G C T C A3856 G 3053 ATT-T-TT-M----CA---CG------AGT---ATTG~TTG-T-C-GAT-~CGC-T-GC-A-TG--AAA-C-GATG-GG-CC-AGC-T-GAGGCA-GGG-CCGAGCTCGGGCCGGCAACTCMGCCGGTCMCT~GTCAGGCGCCMTTCGCTGCTCC -CGCA-TTCAGACTAGC~-GC-GCAAGTGGC-GCTATGC-MTGTGAGATACAAA-ACMCTGGCCAGCGGGCMCGTGGAGCTTGCCACAGCTGTGGMGCTCTCGCAAACGTAAA~

served blocks are found upstream of the promoter and within the first intron. For instance, the two largest conserved blocks, of 1 19/127 and 1 12/134 bases (D.melanogaster coordinates 377 and 2287) are

3915 3113

respectively 3864 and 1967 bases upstream of P1. Variable spacing is clear with thegroup of three conserved blocks downstream of exon A (coordinates 6015 to 6178 in D.melanogaster). This group is sepa-

Antennapedia Sequence Conservation

459

me1 vir

TGGCAACGAGAGCAGAACGGTTCTTTTGCAAA CACTCGCTCAGCTCTCTCAGTCACTCAARA TTCGCACTCATTCACT--T--------ACGCTCA-C--TGGA-CTCTGGAGCTC----------C--CA--T-CGAGCAAAAG-TAGCATTGGTTGTTGTTGCGCTTTTCGTTTCGTACT

3977 32 93

me1 vir

ATACTTTATGACCGGAGCAGAAGCAGMTCGGACCTCTGTCGCGGTCGCAGTCGCTGCCAGTGCTGACGGCGCTGTTGCGGCCGACG CTGACGCACATAAATAAAA GACTGCAG----------G---AC-TGACGC-GATCG-A-CGTGAA-CG--C-GACGCT-G--ACGTG-C----G---T-GC-GCAAC-T----AGCGGCT--G-AT----------CGT

4083 3413

me1 vir

. CGCTCTCTTTCGCTCACGAGTTTCTTATAAGACTTTATTTATATACTGCGCTGCCACCGCTGACTGCGCGCCGCCGC 4 1 6 0 TGGCAGCGCCTCGACTGCGGTTCTCTCTCACGAGCTCTGCCGC---C-G--G---GCGCT----------------------------TTTG-GGG-A-TGCGACT-TGC-G--GTT 3531

m e1 vir

CGCCGACTGCG

me1 vir

AGAAATCAACATCACACMCAAAGCACTCGAGGACCCGTCCGTGTTCCGTTGACTTAGCCCACGTGCGACTG~GTGTGTGTGCGAGAGGGATCCCAGCAGCATAACGGTATTCGGT 4380 ---------------------C-AA--G-ACCCGTGT-CAGTGTAG--CG--TA--CCA------TGCC-GTCG-CGTC--CGTC--TGTT-TTC--GTT-TCAT-GTTGTCGT-ATT-3771

me1 vir

AACTATTCGCATTCGTATTCGCT 4 4 0 7 GCCCCGTATTCCGTAGTCGCTATAGCGGCCAATATCTTGGGGCTGGGCTCTCGGAGCTGAACTGAACTGAACTAAACTGAACTG TGTTGT-G-GTGT--T-ATTGC-GT----GT-GGGGGCACCATCA-AGCAG-A-CAGCAGCAGCT-AA-C-GCGGCA-CTTT--GGTCTGTTGCCCA---------GTCGTT--CGGT-G 3891

me1 vir

CGCTCTCCGTCGTCGCGACCGCTCTCTTATC CCAATTGTCAGCCCGCTAAGATCATAAAGGCCGTAAAAATAATCATAACAATAATCGTAAAAAATTA 4 5 8 5 -CG-GGT-AG-TCGT--CMCGC-AGCC-G-CCGTCCGCCCGCCACGCTGCGG-------A----T--TA--A--------AAT-A------------------------"-------4011

me1 vir

GAACCGCMTCAGTCGCACACTGTCCATCG~CGCAGATCGTCATAPIACCCATTATTATAATAATMCCGTAATCGTAACCGTMTCGTAATTGTMTCGCAGGCGCCATCGCATM -G--TCGC-CATTGGC----G-CGGACG-AG-CATTTTCA-CGCGACAT--AT-GCA-GC-G--G--A

4705 4079

me1 vir

ACAGCAAACAAAAACACCGTCAGAGAAATTGGAATTGAACTTCGCGCGCGTTTC ATTTCCTCTCTCTTATCTCCCTGTTCTTCTCG

4769

me1 vir

' C G A G T G T C T G G C A ~ G T G T C T G G A T T C A A A T T - A A A T ~ ~ A ~ A T C C A C A T A T T G T G C A A T A C T A C T ~ A T C T G G G C T G A C T G T G C T G C ~ G C T A G A G 4C8 7 3 CCCATAAACACCTACCGTT-C-AC---------------AAC-CGGC-A--A-----------AGTGA-T-A-TGTAAA-T-ACACT----A--CTCT-CAACC-AA 4265

me1 vir

TAGAGTTCAAATTTATCMGCCAAATCCTACTACCTAAGTGTTTATCTATTCTTAAAAACGCATCGTGCACCGAAAACTCGAATCTGTACAACAACGTTTTCTAGTCCCCCGTAAATCGA 4 9 9 3 AAAAGC--G-A- 4 3 6 8 ACA--AAGT--AG-G-TTCA-GTT-GATATTCGG--TGA-T-GCCATAGCC~GCG-GT-A-CGAGCA-AGTTCTTAA-A

me1 vir

TATT--CAC-A--C-AGCAG---GC-C---

me1 vir

AAATAATAAAAAACAAAATACATAAAACGG~GAATMGGAGCGGAAAAAGTTTCGATTCCACTGCGGGCAAACTTTTTATTTTGGCGCACGTGCAAGCTACCGTTAGCTGAGAATTT 5221 ---G--AGC---GG----C-A-A--TTATC---TA--GTTTTT-GT-----------CAACCG-TAG-AT-C-T--ACC 4546

_"""

-GACCCACGTAGATACTTAAGAG 4 1 5 7

" " " " "

GCAATATCTACACACTAGCACAAAC-GCTGAAGCTTTCAACTCAAACCACAAC~TTCAACAGAAATAAAACGTAGACAGCC AAAATAACACG-T --G-----AG-----G-A--G-AAAAGAAAGTTCAG-T--A----TAT--C-----A

5101 4455

_""

me1 vir

TAAGTGCCAGGGCTCGACTTTTCAATTTCCTCACCTTCTTCACTGAAATCAAAATGGTTATATGTTTCTT -G---TG--ACA-GATTTGAGTTGCCGAT-TTTG-GGCA-CAA-

5341 4635

me1

TTATTGGCCAATTGGGATTAAAATGTCTTTATTCACAAAAATGGTTAATGCGTTTTGAAAACATGCTTAAATTGATTATAATTATATTTTACTACAAATTTTATGTTATTTCTGCCTAT 5 4 6 1

me1

CCAAACTATTTCAPIATATTCMTTTAATAATAATCGTAACTGAAAGTGAAG~TCTTAACTTTAAATTGTAGTAAAAAATCAATATTGTTAAATAATTGCAAAGGTAATTGTACAT5 5 8 1

me1

G T A A T G T M T G T C A T C T A T G C A T A C C G T C A A A G M T T A T T C G A T T G T C A C T A A A A A T T A T A T A T A A T T T T T A A A C A A A A T G G 5701

me1

TTGACATAAGTATATTATGTTACTATTAGCTGTCCATTAAATGTTTGAATAAAATCAATCATTTCAATGAAGCTTACATGGATTAGAATATCAGTCTATGTATATGCCGACGGAAACTTC 5 8 2 1

me1

ATACATTTCCAPIATACTTGTCTAAATATGTGGAAGTTTTGACATTAATGACTCTGTACAAAGAATAGCGCATCTGTTTCTCGATGTCCAGTGGGCAAGGTATAAGTTTTTTGCCCGAACT

me1 vir

CAGATTCRACACTTCATAACTCAAAGGGCGTTTTTCGGAAGCTACAGCCAGGTCATCACTCATTTGGCCTGCAAACTGC~VVICTGTTGACAGGCTTATTAGCCATTTGCATAAATTTGCA 6 0 6 1 A-TTACG-CCA-T-CAAGGACAT -ATTC----------------------------A----------C----TTG 4730

me1 vir

AAGTGGGAGGAGGGGTGGTGGGATGCTTTCAGGAATCCCGGGAGTTGAAGCTGGAAAACTGAAAACGGAGCTCCGAACTTAACTGGCCACGAGTAATGGGCTCTTTGAACTTGGAATTC GG-GCATGCAGA-A-A-CC-A-TCCAGAGTC--GCAA-GA-ATAGATTCAGGA-T -----CCAC-----G-AGCTGG-GT----------------C-G---------------CGT

5941

"""-""""""-

6180 4843

FIGURE 5.-Alignment of the sequences around the PI promoter. D.virilis sequence i s aligned underneath the D. melanogaster sequence. Dashes indicate identities between the two genes. PI is indicated, exon A is underlined and the initiation codons within exon A are boxed. Exon A is 1 159 bases in D.virilis and 103 1 bases in D. melanogaster. Details of the alignment algorithm and its alternatives are discussed in the MATERIALS A N D METHODS.

rated from the homology around the 5' splice site by 656 bases in D.melanogaster but by only 36 bases from the 5'-splice site-homologous sequence in D.virilis. The ends of the "P1 exon" identifiedin D.virilis by Northern blot (Figure 3), were inferred by sequence similarities to D melanogaster. T h e transcriptional initiation site for D. melanogaster P1 is embedded in a conserved block of 107/110 (D.melanogaster coordinate 4 172). The sequence identity is assumed to identify the PI transcriptional initiation for D. virilis. The 5' splice site of D. melanogaster exon A (D.melanogaster coordinate 5258) is included in a weakly conserved block of 18/24. T h e D. virilis CGGUGAGU aligned with the D.melanogaster 5' splice site is a good match to the canonical AGGUA/GAGU splice donor (SHAPIRO and SENAPATHY 1987) and is assumed to identify the homologous5' splice site for D. virilis exon A. The "Pl"probe used in Figure 3 overlaps by

908 bases the putative exon A delimited by sequence similarity, consistent with our inferred boundaries for exon A in D.virilis. No attempt was made to identify and sequence aD. virilis region homologous to exon B of D.melanogaster. A dot-matrix comparison of 3984 basesofDNA sequence from the D.melanogaster P2 region to 3675 bases of sequence from the D. virilis P2-homologous region is shown in the second panel of Figure 4. The position of exon C is indicated above the D. melanogaster sequence. The pattern of conserved and diverged regions is similar to that for the PI region. An alignment of the two sequences is presented in Figure 6. Six blocks of excellent conservation arefound between 1056 and 1728bases upstream of P2 (30/32 at D.melanogaster coordinate 804, 54/57 at 905, 78/ 81 at 1078, 23/24 at 1173, 53/56 at 1261, 40/45 at 1475). Four smaller conserved segments are within

J. E. Hooper et al.

460

TCTACCAAATGATTGTCGTATGTTGATTTTTAATCAACAAAAGAACGAGAAAAGATGGACGTTGCTAGTTGTTGCGTTGCAAATGCATTAAATGAAGTTATCAACTCCAT 840 CCCGG-CG----G---------------A----------GT-TG 45

_"_

CGCTCCCTCTCTGTCTGACCTGCTCGGCTTCTTCGTGCCATCTCTGTTTGAATAAAACACAGAATTTAGATTTATGGCTTCACGCAATTAGTGCTTAAATATATGCAAAGGGATATTGAA

GGG---TAT-T-T-GG-TTCATTA--T-CTTG-TACG-ATA--CAA-TA----GCAGA-T----A----------------------C--------G-------------------

AATCTTATATAGATTTGAATCCAGGAGAA~ACAGGACC~CGCGGCGTAI;TGAAGCCATACCG~TGACT~AGGGAAAGGI;GAGGGAAG

-

1048 282

-C-G-A-AT---A--ACA-CAGC--C----AC-A-AGCAAC-AC--CTGG--A-AGT---CCCAG--CCTG--TC---A-C-G-ACCTTAGCATAATGAAGCCATTCACGTGGCCT TGAGGTCCTTGGCCGTCGTCGTTCACTGA

960 165

CCGACGACACTCTCGATGACTTCCCCGTTT

GC-CGCAGT-GCC----G--A-T--G--C--CCGAGCAGCAAA~ACAACAGCAACAGCAACAGCAACAACAACAATAACAACAACATCG------------------------A-----

1107 402

CCACGCTTTTCACTTTCAATTGCGGCTCAACTTTTTACTATGT """""""""--"

1179 522

AGAA~CGGCAAC AACAATTGCGCCGCCAA TA-----------------T-CAT--G---CT-T-TCTCTCCCTCTCTCTCTGTCTCTGCGTCTCTTTCCAGCTCACTGTT---------""-"-

AAGCCATAAATCTTATT TCTCCGCTTTTTATTTATTTTTTTTTTTTGTGTGCGTGCTCTTCGA GGGGGGATTTCCGAGA 1258 "--A""""-"GTAAAAAAAAAAA-AACAA-AAAT--CG-G---G"C-----CTT-TGG---CGTCAGCAGCGCCGTTGACGTTGACGTCGACGTC--C-CTGGCGA--TCG 642 GGGGGATATTTGTATTTCATATGCATGGCGGTAATTTTTGTGATTATGTTGTTGCTGT ATGTAGCCCAAAGAAACGAAAATTCAA TC--C---------------------------------------------C--T---TGTTGTTGTTTTTGTTGTTGTTGTTGTTATTGTTGT---T--TGCTCAC---TG--T-----

1343 162

GTTCAGCCTGTTCCCATAAACAAAAAATAAATAAAAAGGGTTCCCTGG CTACTAAACAAACA CAGTTCCTCACCCTACGTGGCT 1427 --------A---T--CACC-A---T--A---A----CA---GAAA--GTTTT-GG-----------ACAAGCTGGCAGCACAGCGCAGGGCCGCCTT-CC-G-TCA----CC-T---A882

C T G A A T T A T T A T A T A T T C AAG--GCCA-GC-TTGTC--TGCC-CAA-T-TG--TTTT-C-ATTA"TATTT G G C A G C A A T T A T C G G C T A R 1527 A

A

-GGAGAAGAGTTGAATI\TTTGTA--------T----------A--------------ACC--------

1002

GGGCAAAGCTCTCAATTATTTAATTATTTTTTTATATTCCACAACTATTATTTATTTATTTAC ATTTTTCTTTATTGCTTTCAATGTGTATTT AT-ATTTC-AT-TCCA-T-CA-T--CAA----G-G---TGTA-G-GC----CG---GCA--C-CAATAAGCTGGCTATCATTTGTATTTAT-AGC-T--A---TGCC--T--"-C-C--

1620 1122

AATTTATGCCCGTTTATAAATGTCTAGTC~CAAGTTGTTACTTCTGCCATTTGAATGTAATTATTATTTCGAATATTACTTCACTGTTCGCCGGCAACATTTTGGCCAA~AAGTTGGCCA 1740 -"-"-" AGC---G-TTC-T-TT--T-A-T--TCATTTGC--TCA-AT-C-T-A-C-G-A-T----T-----T-TGCGCTTCAGC-ACA-GCA-----------T"AG 1234

AAGCGCGGCAACAAATGTTGCTAAAGAACACAGTGCACGGAAATTTATAATAACAATTTATCCTAAACAAAAATTATTTC ATAAATAAAAGATCGGAACGTAACTAATTCAAA CGCG----GC-AC-GCT--CA---TGTTGGCA-C--GCC-G-G-CT--A-GT-G-G--G-T-C-T-GC-AG-CACGGCA-GT----C---TCTAA------TTGC---T---

1853 1347

ATTTTTTTTCTGAAATTTTCGTAGTGAAT~TTAGCTGCAAAATTATTTTATTTAATGAACAAGCTTCTATTCTGTTCAT~TTTACTTGTTTTCTATAAAAGTTCTCAGATCGTAAACGTA --GC--G-AAAT"-AA-T--GTG----T---A------CA-TT--CGAAA----A--G----T-A--A----G-TACGT-TT-ACT--A--

1973 1442

TATAATATATAAGCAATTAAGGTAAACAGTAAATGGTATAAAGTGTTCAAATACTCAATTGAAAAAA~AGATAGGTGGGACCATATTGATATGCACATAAAAACTAATCCATAATCTGT ATGT----T---CACT-AT-ACA-G---TA---ATTAT--T-TAA--TT-T--A-TTT--TT-GGC-T-TCT-TAT-T-ACA---CAC-----CTT-"c-

2093 1544

TACATAACGTCCATTTAATATGAGCAAACAACAAATTTACTTTTAAATGCATTGATCGTTCTAACATTAACTTGTATAATTACAAAAGTTTTAGAATGGCCTAAACATT;;TACTGAGAAA

2213 1631

-CTT-CGTA--AT-----ATGATAA---TT-T-T-CC-CT---A--CATTCACC---A---C--TT-AT-T---CC-TCA-T-T-T-

ACGTCTAACTTGGATAATAAATTATTTGCAGTTAGCTAGCTTAAATCACGTTCAGTGTAGCAAAACCACACGATTCTCCGAAAAGCGCCTAAAAAAATACTTAACGAGCCACTAGTTACC 2333 T---TTA--TTTA----T---TC-----G-T-AA---CCCCT----AA-C----TC-C-CA-T----C-CTT---GCAC--ACT---T-TAC--AA 1728

1968 2658 GTG~~~CCAAATAGACGT;ICCGCCGCCGCTCGATTCGC~CTTTGCTTTCGGTTTTGCCGTCGTTTCAC~CGTTTAGTT~CGTTCGGTT~ATTCCCAGTTCTTAAATAC~ ___ - __ A------------------TT--------CGCTCGG-T--C----TCG-GA-C---A-TCCA-A2045

GGACGTAAAAATACACTCTAGCGGTCCCGCGAAGAAAAAGATAAAGAAATCTCGTAGAAATATAAAAATAAATTCCTAAAGTC T """ CTC-C--ATTGA"-GAA-ACCAG----ATT--T--CG-G-T-GT-T-T

CTCGTTCACTTTCGCTGCCTGCTCAGGAC

TAA-GAAGT-----AATT-G-T-TC-T--

__

-

2778 2143

GAGG~ACCAAGAGG TTTCTA---CC---GTG-C-TCTT---C A A G A G A A A C A A AT A A A G A G G G A ~ C A T A G G A A C A G G A A C C A G A T A A T A G G--A-A-A-A---GTGCC-C-T---------GA~ATAAGCGACCCTTTCGCAAA~AATTTGGCG CA-AAG--A-C-C ~ A ~ C G G C G C C A A G2898 T

2226

TTTGGCAGCATCAATCCCAAAGCACAAAATTAATTTCTATCATAATTTCTG 3005 GGAC-GACAA----G-CGC-AA---C-CAT-T--A-T-GAA-TAA--TGA--TCT 2346

.

GGTGCAACACGGACCCATAATTGAATCGAATATAGGGCTTATCTGATAGCCCGGCAGC AAC~TTGAACTTT~CGGCTGCAA~WC G CACAAT--T--C-ATTG-T-A-TGCG-CC-AC-TT-C------A-CC--AG-A----cTGAACTG----C-CGTT-C-G---GG--TGAT- -ATCGGCACTTGGGCAGACACTGG-

3094 2466

TTGGGCTCG ACACCGAGATC GCCAATTTTCG ---T---C---AAATTCATCAGCAGCATCGAATCCAAATATCT---------T-GCGCTGGCTCAGCAAAGCACGCCGAACCCAAGACTCAGAGCCAGACCAATACCAAGCCC-AA-CT-

3125 2586 3233 2706

3351 2826

ATTAATTTTGCTTCTGTATCGTTTGCAAAATCTCAAAAGATTCAAAAAGTTCGTCATCAGCAGCCGCAACACAAAAACCAACGAGTGTAAAGCCGAGCATACAAATATCAATAAAAACAT

3471 2946

ATC-T-ACA-AA--T--G-TA A C A T T T A ~ C C A A T C-GTTTT---T-GTGAT---A T C A ~ T C T C A A A A C ~ T T C GT C A-CT C G T- T T-GGTG-ATCTGTGTGCGTGTATATC~AR~TTA~TT

3570 3063

- _"" _ - _"" - --AAGCTCTAAATTAA~TGTAAATCTTACAAAACGTCTACGTTTTTA

TTAAATACAAAAGATT

CGCCCAAATTGTGATTGTATATATATATTTAACGGCA GTT-AT "

CA--CA----G-T---GTGATTCTCAATTTTGTAAATACAACATTATATAAAGAAAACAAATTCAGCGCAACGGA-TCG-T-------------"-T-A"""----G-""""

3703 TCGCGAAATTCGAAGTATCAGAA~AAAACGGTGTATA~AA 3302

_""_"_""

AACAAGAAATTGTGATATTATATATTAA "

3631 3183

e

ATATTTTAGCTGAGCA~AGTACACAC~~ATATAAATATA~AAATATA

T

3780 3422 3074 3540

ATGGGTTTA~GTTAA

'ATTTCCCAA~;CTAGTTTAGTTTCAATGCCAGTTTCTGAC~ATAGATTT~TTAAAACTATGTATAAGTCACTTGATTTCTCTGGAGGCT~CGATT3984 3660

T-T-C-AA--AAA--TAAGT-AGG-AA-T-TA-AA---TG-CA---AAG-A-----T--TT-A-ATAA-T--CTT--GT-T--A-GTCT-AAAAC----ATG-AT--T--ATTA-TAAAC

FIGURE6.-Alignment of the sequences around the P2 promoter. See legend for Figure 5. Exon C is underlined. Exon C is 1606 bases i n I ) . uirilis anti 13 19 bases in D. melanogaster.

me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub me1

me1 sub me1

vir sub me1

vir sub me1

vir sub me1

vir sub vir sub

GTGGGTTGC TGTAG CTAGTCATTGAATGTAGTTTAACTAGTCGTAGTAGTTGCATACAACAAAAACAACAACATTTTAAAC --------AA-----A-------CCCCCGTTGTATTAGATTTTAGTAGCTCCAATTAGTAGTAGTCGTGCGATCMCAACAAGAGCCAAGGG

1984 2016

FIGURE7.-Alignment of the sequences of the D-G exons. See legend for Figure 5. The initiation codon (boxed) at 575 in the D. melanogaster sequence is the authentic Antp initiation codon. Two potential hairpin stem-loop structures flanking the alternative 5' splice sites of exon G are indicated by opposing arrows beneath the sequence. D. subobscura (sub). The exon sizes in D. melanogaster, D.virilis and D. subobscura are 252, 274 and 273 bases for exon D; 777, 849 and 938bases for exon E; 39 bases for all three F exons; and 226, 235, and 232 bases for exon G.

152 bases upstream of the promoter (31/33, 10/13, 12/12,and 14/15).Conserved blocks within the exon include 1 1/11 at D. melanogaster coordinate 2660,29/ 32 at 2818, 15/15 at 2871, 145/163 at 3242, 58/63 a t 3594, 33/38 at 3734,and 20/21 at 3791.

T h e limits of the "P2 exon" identified in D virilis by Northern blot (Figure 3) were inferred from sequence similarities around the P2 transcription initiation site and the 5' splice site of the D. melanogaster exon C. 49/52 bases are conserved at the promoter

462 me 1

J. E. Hooper et al. ATTCGGTAAGGTTGTCTTTTAATCAGCCTCAGTTTCA

sub

GAGACGCAGCCGCCCCGCTTCATGTGGTTTTGAGTTACCCATTCCCTGATCCCTCGTCC~TTCTCCGGGAGAAAAAGA 118 CTATAGACAA-CCGAGATAT-GCCGC-GCT 30 ------------------------------C-CC-GGGCCC---CGCTGAG----TT--------------CGG -----G-TCC------TCTGGGGCACAG------ 111

me 1 vir sub

ACAAAW\AAAAAAAAAAAAGAAAATAAAAGACAACAAGAAATAAATAAGTATATATACATTTACTCCAGCTCTGGTTGTTGTA TGCATTTTAGTGGACCTGTAGATA -""-C-"-T-T"-T-T-G---G-T-C-TTC-G-T-C---GC-AG-TCA-TGTTG-T-TG-GGCACGT-TC-G--CAC-T-GAAGATGT---T-------CT G-T-C-GC--C-A---"GTGTATATTA -G---G-GTA-G--T---T-----T---------CGCGTACTCGTA------G----T-GAT---G---CT

me 1 vir sub

TAGTAGAATTTGTTGTCGTTGATTTGGTGGCCGGTCATGCATTGCATTGATTTTAATGTGAAAA TCGTTTTATTTACTTGGAACCAACA T--T-TTG-------T-GC-----TATTATT--GCA -TA---G-----------TGTG--CTTTTCTCGTTTCTTTTTCTATT--T---G--CAT------------GTAG-TATAGA---T---A-----------A--G-A--GA---------------TGTG--T ATTTTTC-CC---TC--TG------------

me 1

CGCGGAAGGCAGACATACACCCGGTACCAGACTCTAGAGCTAGAGAAGGAGTTTCACTTCAATCGCTACTTGACCCGTCGGCGAAGGATCGAGATCGCCCACGCCCTGTGCCTCACGGAG --T--CC-C-----C--T--A--C--------G--G"----G-----A-----C--T-----------------A--C------C-C-----A-----G-----G--C-----G--C---

V iK

vir sub me1

vir sub

- - -- -

exonH

CGCAAA 323

G"----

251

__-----324

443 371 444

CGCCAGATAAAGATTTGGTTCCAGAATCGGCGCATGAAGTGGAAGAAGGAGAACAAGACGAAGGGCGAGCCGGGATCCGGAGGCGAAGGCGACGAGATAACACCACCCAACAGTCCGCAG 563 --T--A--C-----C--------------A-----------------------------------------C--C-----C-----G-----------------------------C--491 --------C-----C-----------C--C--------------------------------------A--------G--C-----G-----------C--G--------T--------564

AC 582 -GGACGCAGAGGCCAATGCGGGAGGGTTGTGATTATGGTTATGGGGGGAGTAGTCCAAGTCCAAGTCGAAGTCCAACTCGGAGTCCAAGTCCAAGGCCAAGTAGCGAAGCCAGCG- 611

vir sub

vir sub

127 217

--T--C--------G-----A--C--------CT-G"----G-----------C--T--------G--------G--------------T--------G--------C-----G------

me 1

me 1

225

:c

"

AGACACAC"

598

CCACTTAAATGAAATTTCTAT CTAAATACAATTTACGTTAGTT CGGAGAGCGCAAA TGAATTTACTTCGATCCCAGAGGACTATCTAATAACT AA-AC-------------A-G ---------------AA-TTA-A---ATT-TGT---TGAA-T----A---------A--GTCT--A-TA""""-------C--------------T---TT--TTT-----------CAACA-----------------------------TCT---ACCCCATGTATCCAATCTATA

CAGAGA-CGAAAGATCGGAG

615 696 l17

ATCCAATCCGTTGAACTTCGCGTGAACAAACCTAAACTAAACTAAACAAAGAGCAGAGCTGAGAACTCTACCTACAACTTAGTTAATTGTTATTATTTTCTACTTATTATTTAATTGT 193 ---A--------------------------TA-------------T794 GCGTTGCAAAAGT-CATAAAAAGA--GT-CA---C--A-C---------C--A---GA--GGGG--A-----AT---AGA--A----A------------------------------------------------T829

me 1

vir sub

AA------------A--AC-A

me l vir sub

ACACGAAAGGCAAGTGGG GAAAGCGAAATAAGATTAACGTAAAGATAGCGATTACG ATAAAGACACAAGTAAAGCGTAAAACTCAAACAAAACCAAC 890 ---T----A----A-------T------T-A-869 """-TT"""--A ---C---AGG----------AA---AGAAGAT-GAGTAAAGATAAAAGATAC----T--------------------------------940

me 1 vir sub

TCATGTGACCTCAGATCTAAATAAGCTATATTTAACTATAATGCATATATATATACACATAAATATATGGATAA~TATAAATGATACCAAGTAAAGCTAAAGGCAAGGAGTTATATATA 1009 ---------------AT--------------A--TA--G ----------T------------TATAT-AATAT-T-A-ACTA-T-C-T-A-A-CTATA-T-T-A-A---ACA-T 983 """""-c"""""C""""""""-""-c"--1045 -----------------------------------A-------------CC---

me 1 vir sub

IRRTRAI(TITATRTGAAGCATATATAAAT~ TAACATTAGATCTACGCGTCATAAGTA 4 CTATACGAT 101 AACATAATAATAACT--A-- ----------A--------CGA-TAAATACTACATACACAA---------T------T---C---C-GTAAAGCATACTTTATAAATAATA-------1103 C-------------------CGACTACTATATACACGA-----C---T----T-----------A--G ------ 1 1 3 2

me 1 vir sub

TAACTTATATATACACCCCAG CATAAACCCTAAAACTAAACC ---T-------GTTTAATACTGTATACACCA--------TAT-C-TACT-

me l vir

___

I"-P ------------T-

TAAACATTAAACTAAATC AATGTTTGTAGCAATCCTAGCGCAAAA ---CA-C----------TG-----------G---TT-----------G---A-CCCAAG----CCTTTAA--C-C-A-------------------------T-------

"""""""""-

1161

TCACAGCAGAA 1215 1228

A T A T e T c + Z q ACCACAACAACAACAACAAGCCCAATTGCTAAACTGCAATGCAATATACCGAATAGTGTGTTTTATGTCGATAGAGCCAAAAGATACGACCACT-----G----C--A-AG-G"

1 1E 7 1330

sub

me1 vir sub

~CARATGGCGTCAAAATCCATTGCATGTTGGTTCATAAAACCTATACATTTTTCATAACTCTGRACATGATAACAGAAAACTTT~ACC TAAGTG L T G T C GCACTTT 1302 AACAAAGAGAACATATAAGCAGTTAAGAGCAGACTTGAAACCTAAACCTAAACAAACAA------CTA------"GGA1416 IAATAAA~-TcT-------c------AGG-~----T---AA-- 1359 ~C-A--TG-CGTCA---TC-AT--A-CCCC~TGG-TAACAGA-AAAC-AT

me 1 vir sub

TAGACAAAGAAATACCAAAACTACGAAAGAAGCGTTGCTTAAAGTGAAATT MCGTTTTACACATACAATAAGAGTAARACCTATAAACTGGCAGATGCTTAACTATAAATAGAAAGAA1421 - -- G---------TAC--TGGAGCG-AC----A--TC-A-CA -AATAATACAATA---GAT---CC-A--GC-1481 ---------------------C-ATGG-A-GA-------------A--~------------T----~ACT-TTATA-A--C~CA-AA-AA--AACATT-A-A---A-1479

me 1 vir sub

CTCGC AAT-A

me 1 vir sub

1612 ATGCTTAACCAATTTGAGAGTTCCCCTTTTCTTTGGCTTCAATACCCATTATATCTGTATTTTTATTTGTTTGTAATTTCTTGTGCAATTTTT AGTTCTTGCAAAA T-T-T--CAA---CG-G-A-------G------------T--G--TAT-T-TACATTACTTTTC 1103 TGTG-TCTCGATG 1618 TA-AGACCTTTTAATTTCG T-T-TATG-A-GT---CA--T--------------------T-------------

me 1 vir sub

AAACAAAATTCGAATTAGGTCGAAAAGGATATAAAGTATACCGAATTACAAAAAAATATGA AATTGAAATTTTGTAATTTTAAAAACGAAAATTTAAATTACGAAAATTGCA-GC-C-------------T---------------------CAA--C-

- -- - -

me 1

CAAGTGGATATTT 1506

AAGAGATTGGCCAACTTAAAAAATAGATGTATATTTCTACGACAATTCAACTTTCAAAAACTCG

-CA-C------A--G-----ACTAACTGTCGAGAAATGTTTCGGCTAGCCGAATTTACATGGTCTTTCCTGGCTPPIATAR~V\TATTTTGATA-AC-----A--AACAAAATTTGTATAAAA-TT--C-G------TTG AACTTTTCAAAGTCTCTC-C-A----A----

ATTGGCAAG

- --

---- C

"

TAAGGAGGA A T T T ~ t - A A c A ~ c - % Z P

AAT--""--

TTGTTAACAGGGAATCATATGTATAAATAATGAATTCTAACAAAA T--------AGAAATG-ACACGT-G-TTTC--AAT-C-TA--TGT--ACAAAACAATTGAAATAAT-AA-T-A-AATG----TT-T-----------AA-CGA-G-GAACAC--C-CAA--CAAAG-AT-TGTATAAATTAATGAAAACC---

1611 1544

1613 1801 1690

~C----G--------T----------------------A-G--------C-AA-CAAAACGCGAAG

me1

vir sub

"

CCC 1749 1918 1787

- - - - - - -- - -

GTGTAAAGTAGTAAATTGAAATGCATTATTATACGAGAATGTAGGGAACCCATTTTGAAAGAGCAACCGATCGATTTATATATGAATATTTCCCTAACAACTATATTAATGTATGTGTAC 1869 A---A-AT--AAA-AAGAA-GTAA~TCACTATGCA-A-ATTCA-AATG-ATT--TGAA-GC-GGC-GCGC-GCGG2012 A----------ATC-A-ATT-GAGGA-GA--GCGAGAGGGAG--TCCCATTTGAA--C-GA-CG-TTTCTATATAGATATACGCCTA-CAAA--AC--TTTATG-A-GTA ----- 1919

vir sub

A

me 1

GTAACTTAAAATCATTTTCCACGCCATCTGTGGAAATCCATATACCAAAGTCAAAGGAACGAAGGAGAAAGAAAAAGGAGGACGGAATGGCAAACTATAAAGTATCATATGTTTATATGT 1989 ------C---C---A-------G-----------------------CA---T---A-G-AA-TG--TGG-T--C --G--A-----------C----------------2012

sub

me 1 sub

AGATATATATATTTAAACAAGCCTAAT

ACAAAACATGTAACTTTATAGAGC GTTTCGTTTGTAAATTCCCCAGAAATCCCATTTACCTCCCCAGCCCGAATCCCAATC -----------A-------CARAAC-CACCTAAGC-T---G-----------G-----AAAGA---------TGT--A----A-A-G-AAAA-AAACAAAAA-A-TAA-A--AAAA--AA

2097 2132

me 1

CCAGGTGAAAAGACTTGTGATTTGCAATAGAACCGAATAGTCAAGAAAAAAAAACTTTACGAAGTATTGGCTAAGCAACATTGAGAGCAAAATTCAACTCAATCCAGATACGTAACTTTC

sub

GCCCAGAGCCCAGTCAAAAAAGTGGAGGTACCATATGGAGCCATGAGAGGCGGG

2116 2216

me 1 sub

GGCCTCATTCTGTAAGAAACTAATTATTTAAGTTTCAATTATGAAAATTATT-AAAAAACGAACAAAAAAAAAACGTAAATATAAAAAAGTAGAAATTCACRAGAAACAAAATTCGA -----------------------A-TAAGTTTCAA-TC ----A---A-G---GG-T-AT-GG-A-AT-G--T----------------------------

-----C--A---------------------

2231 2316

Conservation Antennapedia Sequence

463

MGCGTTGATT~TATATATATGT~C~C~CATT~C~CCCC-CATTTGTGCGTAM~GGM CGAAATGTGGMGCTAAATMTTTGTTGT 2 4 4 6 -T---------T---A--------A CTC-CATTTATGCG-TTGAAAGGG-TGAMT---------------------------2406

J? E q

-;

-

TTTAAACTTTAGTGTTAMGATGAGMCGGAWTGCAGTG A A A G G G C T G C A M G C G C A M G C A T A M G MAATTATMCTAAATTAC 2 5 4 1 A-----T-----------C-M~G-A-C-A--G-AGM-GTGMGGATGATGMGTGCTGCGA--GC---M-GC--AAA-----------A----C-----T-------2526

:"-= ""-" 1-"""" :-

AAATGAGATTTTGTTTTTATTTGTMTATTTA

-"""-

-.

.

TATAGTATTT~TTTATMCGMGCGGTTATTATTAGGMGGGTTTTTAT~CAGTGMTCCTATTTGTCTGGAGC 2 6 6 1 A---TTCT-TTC----TGCTTCT-MAACMGGGGGTTTCTTGGGAG----CAG-T-TCG

" " "

TTCTMGTGTGMGGTMGTATTTTAMTTTMTTATTTTGTATTTGAAGTMTCATTTTCTGTTTATCCAGGTATTTTTACTGMCACTMTCGCATGMTGGAGATTTTTCTGATTAC G-GA-TA-T--TT-MGTMG-GAATCC--CC---GM----TA--CC-ACG-M-CA--T-A--A--TATTA-T-G---CT--TTTTTGC-GACTT-TC-AT--TC-CCMGCTGC-G-

2646 2781 2766

CCATTT 2 8 1 7 CATTCT-TAGGCCA-TGATTAT-TC-GAGGGTMG-CGCM--TGACAGT--CTGT-~C-M--TTT-CGTTTCTTTTGCAGGCCTTTCGMGTAGAGAGCT-----2886

GTCATACGGTMGCACTGMTCGATATTACMTTTCACATGTACATGTCATTAAACATTTGTTTTTTTTTTTTCAGACATTCGCCGCGM

TTGTAMCGATCTTATTMTCTGACATCI\TMTMGCCATTMTCMGTGGCATTMCTAMGCACTGACCMTGACCMCCCAGACACCTGAGCCCCGGA pI""""""""""""""""""""""""""""""-

..

CCCAAAGACCACGATCAGCAGCTGC

C-C--------G---CM-GC-GCAGAC-C-TGA--TTGAGGCTGAGACTGAGACTGA

2 9 16 3006

TCCACCCGCATAAACTCACTTCAGGTTGACTCAACCCGATCTTT GAGCCAGCCAGGAGCTCATACAAATTGA 3073 CTGTMCCCGCATTGGACATACTATACACATCTCTCTCGAGC----G--T----------A--------3126

GGCTMGGCC----GCC-G-------G--------TCGM----------

MTTTCATTGTTGCGTMTCGCCAGCGATTTGTMTCGCGlTTCAGTGMTGTGCTCACG~TATGTCACAGMTCACATGMTAATMTCTAGA -T-C

3110 3130

FIGURE8,"Alignment of the sequences of the H exons. See legend for Figure 5. The termination codon (boxed) at 564 in the D. melanogaster sequence is the end of the Antp protein. The short form of exon H ends at the AATAAA (boxed) polyadenylation signals beginning at 1168 in the D. melanogaster sequence. The longer form of exon H (not underlined) extends to the last boxed AATAAA at 2575 in the D. melanogaster sequence and for undetermineddistances in the other two species. A likely hairpin structure just proximal to the distal polyadenylation site is indicated underneath by opposing arrows.

( D . melanogaster coordinate 2530) and 37/40 bases are conserved at the 5' splice site ( D . melanogaster coordinate 3821). T h e "P2" probe used in Figure 3 overlaps by 777 nt the 1557 nt of the putative D . virilis exon C, consistent with the inferred boundaries of exon C in D.virilis. The D.melanogasterAntp cDNAs have a striking array of AUGcodonsupstream of thelongopen reading frame (ORF) which encodes the Antennapedia protein: 8 for P1 and 15 for P2! The P1 leader AUGsinitiateORFsrangingfrom six to 84 bases, while the P2 AUGs initiate ORFs ranging from three to 264 bases (LAUGHON et al. 1986). Two of the P1 AUGs and six of the P2 AUGs fit the consensus for 1987), although there translation initiation (CAVENER is no data addressing whether they are actually utilized. The putative P1 leader of D.virilis contains two AUGs in exon A within a perfectly conserved match of 13 nt (4765 in D. melanogaster,not alignedin Figure 5 ) . They initiate ORFs of 24 and 21 nt which are conserved neither in size nor sequence. The putative P2 leader of D. virilis contains 14 AUGs initiating ORFs of 18 to 336 bases. Eight of these AUGs fall in conserved segments butneitherthe sizes, nor sequences of the ORFs are conserved. Comparison of Antp exons D-G and exon H from three species: Genomic sequences which include exons D-G (Figure 4, middle panels) and exonH (Figure 4, bottom panels) from D. melanogaster are compared with the homologous sequences from D.virilis and D . subobscura. The protein coding region stands out as strong diagonals. The introns separating exons D, E and F are entirely diverged. T h e ambiguous region in the middle of exonEcorresponds tothe opa repeats. The 5' and 3' noncoding regions display the familiar pattern of discontinuous diagonals. T h e alignments are presentedin Figures 7 and 8. We identified

the probable exons in D. virilis and D . subobscura by sequence similarity to the D . melanogaster exons. Each putative splicejunction retains all essential features of consensus splice sites (SHAPIRO and SENOPATHY 1987). For the purposes of discussion, we assume that the sequenceidentity atthe splice junctionsdelineate exons D-Hin D.virilis and D . subobscura. Intron sequences, with theexception discussed below, are not conserved. Seven blocks of sequence within the leader encoded by exons D and E are conserved in all three species, though the spacings are variable. The protein-coding region displays a different patternof sequence conservation; silent third position changes are common while divergent regions areabsent. The protein sequence is discussed in more detail below. The transition from divergence to thirdposition substitutions immediately at the first AUG of the long ORF (575 in D . melanogaster, Figure 7) implies that this AUG is the bonaJide initiation codon,despite its poor match (3/8 in D. melanogaster) to the Drosophila translation initiation 1987). The 3"untransconsensus sequence (CAVENER latedregion is almost as conserved as the protein coding region (Figure 8). Of the 627 bases between the termination codon and the first cluster of polyadenylation signals of D.melanogaster, only 61 do not align with at least one of the two other species, and 3 1 1 are included in blocks conserved among all three species. The 112 bases surrounding the two alternative 5' splice sites at the end of exon G may contain information which affects splice site selection. Even silent substitutions are virtually absent in the last 6 0 bases of exon G. The first 5 0 bases distal to the alternative 5' splice sites of exon G (90 between D. virilis and D. subobscura) are also highly conserved. Two possible hairpin loops (opposed arrows in Figure 7) flank the

J. E. Hooper et al.

464 / \ T

A C

c A

C

c

T G

A C C T C.G G.T T.A A.T 0.C 0.C 0.C G.T C.G T.G / \ CTCGCAGTCC ATGCGAAGTC

B G

T

A A A

T T A

G

A

C.G A.T C.G T.A T.A A.T / AGTGATCGAC

/ \ T T T O T T A.T G.T A.T G.T T.A A.T A.T A.T C.0 A.T T.A T.A A.T / \ TTATAACTAA ATTTATAATA

C

/ \

\ TTTCTTCT

FIGURE9.-Conserved

potential hairpin loops in Antp mRNA. D. melanoguster sequence (Figure 7) and (B) is the one centered at position 1999, upstream and downstream, respectively, of the alternative donor sites of exon G . (C) is the loop centered at position 2552 of the D. melanogaster sequence (Figure 8), just upstream to the distal polyadenylation site. (A) is the loop centered at position 1930 of the

5’ splice site in all three species. The predicted structures forthese two hairpin loops (AG = -8.0 and -9.8 kcal respectively) are shown in Figure 9 A and B. We speculate that these hairpin loops might form in the nascent transcript and affect its processing. The proximal polyadenylation signals are the least conserved of the transcript processing signals. A proximal cluster of four polyadenylation signals (D.melanogaster coordinates 1 168-1 193, Figure 8) can be approximately aligned with a cluster of three polyadenylation signals in D. subobscura and one signal in D. virilis. However the sequence in that region is poorly conserved among the threespecies. Around the distal polyadenylation site (D.melanogaster coordinate 2575, Figure 8) 91/94 bases are conserved in D.subobscura. (The sequence is not available for D. virilis.) Within this conserved region, a possible hairpin loop (AG = -7.4 kcal) just upstream of the distal AATAAA polyadenylation signal (opposed arrows in Figures 8 and 9C) might be involved in regulation of transcript processing or termination. Additional blocks of sequence conservation are found downstream of the distal polyadenylation signal (79/83at2872 of D. melanogaster and 26/29 at 3046of D.melanogaster). The D. melanogaster P1 and P2 transcripts, including exon F and the long form of exon G, and using the proximal polyadenylation signals, have lengths of 3287 and 3492 bases, respectively. The D. virilis P2 transcriptshouldbe 3964 bases, summing all the exons. This isin goodagreement with the 4.0-kb major transcript seen in Figure 3. The D. virilis P1 transcript (without exon B) sums to 35 12bases. Exon B, which is 88 nt in D. melanogaster and which we have not isolated from D. virilis, would need to contribute another 200-300 bases to make the P1 transcript indistinguishable in size from the P2 transcript on RNA blots, as it appears to be. T h e minor 5.0-kb transcript in the D.virilis blots, seen with both PI and P2 probes, might indicate use of more distal polyadenylation signals, which are well conserved at least in D. subobscura. Comparison of the proteins encodedby the three

species: Predicted Antp protein sequences encoded by the three Drosophila species are aligned in Figure 10. T h e predicted sizes of the D. melanogaster, D. virilis and D. subobscura proteins are respectively 378, 380 and 394 amino acids. T h e D. virilis Antp protein is identical to the D.melanogaster protein at 360 of 378 amino acids (95%) and the D. subobscura homolog is identical at 367 amino acids (97%). T h e N-terminal 104 aminoacids which include the MXSYF motif are highly conserved. The C-terminal 126amino acids which include the homeodomain and the YPWM motif are also highly conserved. This presents a strong contrast to the glutamine-rich region between amino acids 105 and 156 where most of the variation is seen. The variation is mainly dueto insertion/deletion events. Nine positions in the central glutamine-rich region have insertions of extra amino acids. Of the additional19-amino acid variations, three are conservative substitutions. Only one results in a charge change. DISCUSSION

Antp has a major role in morphogenesis. The morphological similarities among Drosophila species suggest that many functional aspects of the Antp gene structure should be preserved. During the 60million years of Drosophila evolution random base substitutions under no selective pressure should lead to 60% divergence (e.g.)HAYASHIDA and MIYATA1983; MART~NEZ-CRUZADO1990; O’NEIL and BELOTE1992). Therefore, we interpret preserved features of Antp as required for aspects of its function common to all Drosophila. Diverged features might contain speciesspecific information or unselected sequences. At 121 kb, the transcription unit in D. virilis Antp is 17% longer than theD.melanogaster Antp. D.virilis and D.melanogaster Antp both use two promoters and at least two polyadenylation sites. Strong homologies around theoptional exon Fand alternative splice sites in all three species indicate that the RNA processing options have been preserved. On the other hand many intron, leader, andtrailer sequences are unique to the Antp gene of each species. Most of the base substitutions in the protein coding region are silent, and do not alter the encoded protein. Islands of sequence similarity with preserved order and orientation are abundant throughout the introns and upstream of the promoters. Other Drosophila genes show similar patterns (KASSIS,WONGand O’FARRELL 1985; KASSISet al. 1986; BLACKMAN and MESSELSON 1986; BRAYand HIRSH 1986; HENIKOFF and EGHTEDARZADEH 1987; MARTINEZet al. 1988; JOHNSON et al. 1989; TREIER, 1989; KASSISet al. 1989; KONSOPFIEFLEand TAUTZ LAKI et al. 1990; MAIER, PREISSand POWELL1990; MICHAEL, BOWTELLand RUBIN 1990; SEEGERand KAUFMAN1990; TAGHERT and SCHEIDER 1990; HE-

465

Antennapedia Sequence Conservation

378

::i

schematic of the alignment of the three proteins. The rectangle represents the D. melanogaster protein. Insertions are arrows pointing toward the rectangle with the number of amino acids indicated. Deletions are the contrary. Substitutions are indicated by vertical lines.

1 1

1

2

1 1 1.0

0 VVAh A

A A V 1

6

1

' JVAA

11

1

2

v v

II

/\to1

Dv

A

A

1

1

1

Dm

Ds

4

IOPA REPEATS

I

CzzZIJ

OPA

HOMEODOMAIN :

I

1 . a

1 0 0

200

and RUBIN1990;JONES,DALTONand TOWN1991; CRIBBSet al. 1992). Conservation of transcriptional control elements: Sequence conservation can identify cis-acting regulatory elements. A short upstream sequence that activates heat shock genes during times of stress (AYME, SOUTHGATE and TISSIERES 1985; XIAOand LIS 1988) was first noticed because it is conserved among different Drosophila heat shock genes (HOLMGREN et al. 1981). A region upstream of the promoter of chorion genes that is conserved between flies and moths can directs sex-specific and ovary-specific gene expression in the heterologous species (MITSIALIS and KAFATOS 1985). Thirty-one random mutations upstream of a D. melanogaster rhodopsin gene promoter were analyzed in vivo for their effects on expression (FORTINI and RUBIN1990). Eight mutations fell within blocks BERLEIN LEY

I

370

so0

of sequence conserved in the D.virilis rhodopsin gene, and seven of thoseaffected the gene's expression. None of the 23 alterations in nonconserved regions had any discernable effect. Thus sequence conservationoften identifies transcriptionalregulatory sequences. A block of sequence conservation in the Antp P1 leader has been implicated in transcriptional regulation by the homeoticgene Ultrabithorax (Ubx). Ubx negatively regulates Antp in vivo (HAFEN,LEVINEand GEHRING1984). A 6.9-kb fragment including the P1 promoter mediates transcriptional regulation by the Ubx protein in cultured cells (KRASNOW et al. 1989). BEACHYet al. (1988) detected two binding sites for Ubx protein in the Antp P1 leader by DNase I protection. The Ubx A-A binding site at positions 45304592 of the D.melanogaster exon A is well conserved

466

J. E. Hooper et at.

in D. virilis whilethe A-B binding site (positions 46334691) is apparently not conserved.The divergence of the A-B site might indicate that this site is not functional in vivo. Alternatively, this binding site might have species-specific function. Other cis-regulatory elements of the Antp P1 promoter havenotbeen identified, but the large numbers of conserved elements that we have identified scattered throughout the introns, as well as upstream of thepromoter, pinpoint putative control elements forfurther study. PERKINS,DAILEYand TIJAN(1988)mapped sequences required for in vitro transcription from the D.melanogaster P2 promoter. Deletion analysis identified a negative element, a positive element, and a complex positive element which contains multiple binding sites for DTF-1. Sporadic sequence conservation around P2 coincides with many of these elements. The upstream half of the complex positive element coincides with a block of 32/34 (2380 of D . melanogaster, Figure 6). The negative element coincides with a 13/14 conserved block (2496 of D. melanogaster, Figure 6). The positive element downstream of the transcription start is nested in a block of 49/52 (2530 of D. melanogaster, Figure 6).FourDTF-1 binding sites characterize the downstream half of the complex positive element in D.melanogaster. In D . virilis, there are also four DTF-1 consensus binding sites near the putativepromoter. Only the palindromic overlapping DTF-1 site appears in the alignment (1 2/12 at 2459of D.melanogaster); the orderof the two additional sites relative to other aligned sequences is not conserved. T h e poor alignment immediately upstream of P2 apparently reflects rearrangement of sites rather than lack of the sites themselves. This pattern of rearranged sites is seen immediately upstream of the promoter of Dopa decarboxylase (BRAY andHIRSH1986). Conservation of nontranslated mRNA: T h e conserved sequences around P1 and P2 might function in transcription,RNAfolding, splicing or stability, translation, or even chromatin structure. There are no data addressing folding of the Antp RNA forms. However, two possible hairpin structures flanking the alternative 5’ splice donor sites of exon G and near the distal polyadenylation site (Figure9) are conserved. The long leadersand thelarge number of upstream ORFs in D . melanogaster (eight for the P1 transcript and 15 for the P2 transcript) raise the question of et al. 1986). translational control for Antp (LAUGHON There is no obvious discrepancy between the spatial and temporal distributions of the Antp mRNA and protein in D. melanogaster embryos (BOULETand SCOTT1988;BERMINGHAM et al. 1990; M. PETITT and M. P. Scott, unpublished results). Different translation efficiencies for P1 and P2 transcripts, and for P2 transcripts in different neuromeres, is suggested

by similar abundance of transcriptsbutdifferent abundance of protein (BOULET and SCOTT1988; BERMINGHAM et al. 1990).Leader sequences of exon D E, common to P1 and P2 transcripts, mediate internal ribosome binding in cultured cells (P. SARNOW and S.-K. OH, personalcommunication). T h e first 56nt of exonD are absolutely requiredfor internal ribosome binding, and are almost perfectly conserved (coordinates 15-7 1 in D. melanogaster, Figure 7). This suggests intriguing possibilities for translational control of Antp. The lack ofsequence conservation around theproximal polyadenylation site suggests that the exact location and context of the poly(A) tail may be relatively unimportant. The dramatic sequence conservation of much of the 3’ nontranslated part of the mRNAs could be involved in folding the RNA, translational control, control of message stability, or mRNA localization. Conservation of the Ant# protein: The primary sequence of Antp is 95% conserved between D. virilis and D. melanogaster and 97% conserved between D. subobscura and D. melanogaster. This is more conserved than the products of other homeotic and segmentation genes (KASSISet al. 1986; WILDEand AKAM 1987: TREIER, PFIFFLEand TAUTZ 1989; SEEGER and KAUFMAN1990; CRIBBSet al. 1992). The frequent changes in thethird positions of the Antp codons (Figures 7 and 8) suggests that the selection for conservation in theproteincodingregion acts on the protein rather than DNA or RNA. Most of the variation in Antp proteins are insertion/deletion events in the glutamine-rich central region which derives from et al. 1985). Insertion/ CAG or opa repeats (WHARTON deletion in protein-coding repetitive DNA is common (KASSISet al. 1986; WILDEand AKAM198’7; TREIER, PFIFFLEand TAUTZ 1989; SEEGERand KAUFMAN 1990; HEBERLEIN and RUBIN1990) and might arise by codon slippage (DOVER1989). What selective pressure might retain thesesequences, given the apparent genomic instability which they cause? T h e selection appears to be for high glutamine content rather than any specific amino acid sequence, yet Antp protein deleted for the glutamine-rich region is an effective transcriptional activator(see below).Perhaps theCAG repeats have some entirely different function. AlternativeRNA splicing results in four possible Antp protein forms in D. melanogaster (BERMINCHAM and SCOTT 1988; STROEHER, GAISERandGARBER 1988). The forms differ by the presence or absence of the 13 amino acids encoded by exon F and the presence or absence of four amino acids encoded at the 3‘ end of exon G. Precisely the same amino acid sequences areencoded inall three species by the presumed alternative splice forms. Sequence conservation around the exon F splice sites and around the 3’ end of exon G suggests that these alternativesplice

+

Antennapedia Sequence Conservation

events may be conserved. Even the unprecedented 5‘ splice site at the 3’ end of the longer formof exon G, which begins theintron with GA ratherthan G T (BERMINCHAM and SCOTT1988), is conserved in all three species. The conservation of the alternative splicing options suggests that thefinely altered protein forms are important. Paradoxically, a highly deleted form of the Antp protein or thehighly diverged mouse Hox-2.2 protein can mimic authentic Antp in an ectopic expression assay (GIBSON,KURIOWA and GEHRINC1990; MALICKI, SCHUCHART and MCGINNIS1990). Ubiquitous overexpression of a protein lacking amino acids 66245 transforms embryonic body segments and causes legs todevelop in lieu of antennae in adults. This “minimal” Ant# protein can recognize and activate appropriate target genes in the fly, at least in the epidermis. Yet it is lacking substantial regions whose conservation implies strong selective pressure.Perhaps those regions are involved in modulating efficiency or tissue specificity of Antp action, necessary functions that would be missed in a gross overexpression assay. Or perhaps substantial regions of the Antp protein are truly dispensable. This question will only be resolved by transgenic flies whose endogenous Antp gene is replaced by one encoding the deleted protein form withall controlelementsintact.It raises an important cautionary note-sequence conservation may suggest importantfeatures,butproof liesin further experiments. We are grateful to JOHN TAMKUN for help with the D. virilis genomic library, to ROSERGONZALEZ-DUARTE and GEMMA MARFANY for the D. subobscura genomic library, to MARK KISSINGER a n d PILAR BURILLO for technical assistance, to REBECCAMARVIL for the sequence alignment formatting program, and to NURIA forconstructive PARICIO,ALLENLAUGHONandJOHN TAMKUN criticism. We also thank SANDRA MARTIN and GABRIELDOVERfor critiquesofthemanuscript. T h e D. subobscura sequenceswere determined at Servicio d e Secuenciacibn de la Universitat d e Valencia. A special thanks to LUIS PEZZI (Centro Nacionalde Biotecnologia) and the EMBnet for assistance with the sequence analysis software. Part of the sequence analysis was performed at Servicio de Bioinformatica. T h e research was supported by a postdoctoral fellowshipfromtheNationalInstitutesofHealthtoJ.E.H., by National Institutes of Health grant 18163 to M.P.S., by Direccion General de Investigacion Cientifica y Tecnica grant PB87-0952 to K.deF. M.P.-A. was recipient of an European Molecular Biology Organization short term fellowship during the initiation of the D. subobscura project.M.P.S. was aninvestigatoroftheHoward Hughes Medical Institute.

LITERATURE CITED AKAM,M., 1987 The molecular basis formetamericpattern in the Drosophila embryo. Development 101: 1-22. ANDERSON, K., 1987 Dorsal-ventralembryonicpatterngenesof Drosophila. Trends Genet.3: 9 1-97. 1985Nucleotide AYME,A., R. SOUTHGATEand A. TISSIERES, sequences responsible for the thermal inducibility of the Drosophila small heat-shock protein genes in monkey COScells. J. Mol. Biol. 182: 469-475. BEACHY,P. A., M. A. KRASNOW, E. R. GAVIS and D. S. HOGNESS,

467

1988 An Ultrabithorax protein binds sequences near its own and the Antennapedia PI promoters. Cell 55: 1069-1 0 8 1. BENDER, W.,M. AKAM,F. KARCH, P. A. BEACHY, M. PEIFER, P. SPIERER, E. B. LEWISAND D. S. HOGNESS,1983Molecular genetics of the Bithorax complex in Drosophila melanogaster. Science 221: 23-29. and M. PEIFER, BENDER, W., B. WEIFFENBACH,F. KARCH 1985Domainsofcis-interactioninthebithoraxcomplex. Cold Spring Harbor Symp. Quant.Biol. 50: 173-180. J. R. J., and M. P. SCOTT, 1988 Developmentally BERMINGHAM, regulated alternative splicing of transcripts from the Drosophila homeotic gene Antennapedia can produce four different proteins. EMBO J. 7: 3211-3222. J. R., A. MARTINEZ-ARIAS,M. G. PETIT and M. P. BERMINGHAM, SCOTT,1990 Different patterns of transcription from the two Antennapedia promoter during Drosophila embryogenesis. Development 109 553-566. S. M., and A. C. WILSON, 1984 Molecular evolution in BEVERLY, Drosophila andthehigherDiptera. 11. A time scale for fly evolution. J. Mol. Evol. 21: 1-13. BLACKMAN, R. K., andM. MESSELSON, 1986 Interspecific nucleotidesequencecomparisonsusedtoidentifyregulatoryand structural features of the Drosophila hsp82 gene.J. Mol. Biol. 188: 499-515. BOULET, A.M., A. LLOYDand S. SAKANJU, 1991 Molecular definition of the morphogenetic and regulatory functions and the cis-regulatory elementsof theDrosophila Abd-B homeotic gene. Development 111: 393-405. BOULET,A. M., and M. P. SCOTT, 1988 Control elements of the P2 promoter of the Antennapedia gene. Genes Dev. 2: 16001614. BRAY,S. J., and J. HIRSH, 1 9 8 6 T h e Drosophila virilis dopa decarboxylase gene is developmentallyregulatedwhenintegrated into Drosophila melanogaster. EMBO J 5: 2305-23 1 1. CAVENER,D.R.,1987Comparison of the consensussequence flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res. 15: 1353-1361. CELNIKER,S. E., S. SHARMA,D. J.KEELANand E. B. LEWIS, 1 9 9 0 T h e moleculargeneticsofthebithoraxcomplexof Drosophila. cis-Regulation in the Abdominal-B domain. EMBO J 9 4277-4286. T. C. CRIBBS,D. L., M. A. PULTZ, D. JOHNSON, M. MAZZULLA and KAUFMAN, 1992 Structural complexity and evolutionary conservation of the Drosophila homeotic geneproboscipedia. EMBO J. 11: 1437-1449. DENTE,L., G . CFSARENIand R. CORTESE,1983 pEMBL:a new family ofsingle-strandedplasmids.NucleicAcids Res. 11: 1645-1655. DOVER,G . A., 1989 Slips, strings, and species. Trends Genet. 5: 100-102. DUNCAN, I. M., 1987 The bithorax complex. Annu. Rev. Genet. 21: 285-319. FORTINI,M. E.,and G . M. RUBIN,1990 Analysis of cis-acting requirements of the Rh3 and Rh4 genes reveals a bipartite organizationtorhodopsinpromoters in Drosophila melanogaster. Genes Dev. 4: 444-463. GARBER, R. L., A. KUROIWA and W. J. GEHRING, 1983 Genomic and cDNA clones of the homeotic locus Antennapedia in Drosophila. EMBO J 2: 2027-2036. GIBSON,G. A., A. KURIOWAandW. J. GEHRING,1990The specificities of Sex combs reduced and Antennapedia are defined by a distinct portion of each protein that includes the homeodomain. Cell 62: 1087-1 103. HAFEN,E., M. LEVINE andW. J. GEHRING, 1984 Regulation of Antennapedia transcript distribution by the bithorax complex in Drosophila. Nature 307: 287-289. HARDING, K., C. WEDEEN, W. MCGINNIS and M. LEVINE, 1985 Spatiallyregulatedexpressionofhomeoticgenes in Drosophila. Science 229: 1236-1 242.

468

J. E. Hooper et al.

HAYASHIDA, H., and T. MIYATA,1983 Unusual evolutionary conservation and frequent DNA segment exchange in class I genes of the major histocompatibility complex. Proc. Natl. Acad. Sci. USA 80: 2671-2675. HEBERLEIN, U., and G . M. RUBIN,1990 Structural and functional comparisons of the Drosophila virilis and Drosophilamelanogaster rough genes. Proc. Natl. Acad. Sci. USA 87: 5916-5920, HENIKOFF,S., and M. K. EGHTEDARZADEH, 1987 Conservedarrangement of nested genes at the Drosophila Gart locus. Genetics 117: 71 1-725. HIGGINS,D. G . , and P. M. SHARP,1988 CLUSTAL: apackage for performing multiple sequence alignment on a microcomputer. Gene 73: 237-244. HOLMGREN, R., V. CORCES,R. MORIMOTO,R. BLACKMAN and M. MESSELSON,I981 Sequence homologies in the 5’ regions of four Drosophila heat-shock genes. Proc. Natl. Acad. Sci. USA 78: 3775-3778. INGHAM, P. W., 1988 T h e molecular genetics of embryonic pattern formation in Drosophila. Nature 335: 25-34. INGHAM, P. W., and A. MARTINEZ-ARIAS, 1986 The correct activation of Antennapedia and bithorax complex genes requires thefushi tarazu gene. Nature 324: 592-597. IRISH,V. F., A. MARTINEZ-ARIAS andM.AKAM, 1989 Spatial regulation of the Antennapedia and Ultrabithorax genes during Drosophila embryogenesis. EMBOJ. 8: 1527-1537. IRVINE, K. D., S. L. HELFAND and D. S. HOGNESS,1991 T h e large upstreamcontrolregion of the Drosophilahomeoticgene Ultrabithorax. Development 111: 407-424. .JOHNSON, W. A.,C. A. MCCORMICK,S. J. BRAYand J. HIRSH, 1989 Aneuron-specificenhanceroftheDrosophila dopa decarboxylase gene. Genes Dev. 3: 676-686. JONES, C. W., M. W. DALTON and L. H. TOWNLEY, 1991 Interspecific comparisons of the structure and regulation of the Drosophila ecdysone-inducible gene E 7 4 . Genetics 127: 535-543. E. M., and R. L. GARBER,1987 Function and misJORGENESEN, function of the two promoters of the Drosophila Antennapedia gene. Genes Dev. 1: 544-555. KARCH,F.. B. WEIFFENBACH, M. PEIFER, W. BENDER,1. DUNCAN, S. CELNIKER, M. CROSBY and E. B. LEWIS, 1985 T h e abdominal region of the bithorax complex.Cell 43: 81-96. KASSIS,J. A., M. L. WONGand P. H. O’FARRELL, 1985 Electron of the microscopic heteroduplex mapping identifies regions engrailed locus that are conserved between Drosophila melanogaster and Drosophila virilis. Mol. Cell. Biol. 5: 3600-3609. KASSIS,J. A., S. J. POOLE,D.K. WRIGHT and P. H. O’FARRELL, 1986 Sequence conservation in the protein coding and intron regions of the engrailed transcription unit. EMBO J 5: 35833589. KASSIS,J. A,, C. DESPLAN,D.K. WRIGHT and P. H. O’FARRELL, 1989 Evolutionaryconservation of homeo-domainbinding sites and other sequences upstream and within the major tranDrosophila segmentation gene en. Mol. scription unit of the Cell. Biol. 9 4304-431 1. KAUFMAN, T . C., R. LEWIS andB.WAKIMOTO, 1980 Cytogenetic analysis of chromosome 3 in Drosophita melanogaster: the homeotic gene complex in polytene chromosomal interval 84A, B. Genetics 94: 115-133. KAUFMAN, T . C., M. .4.SEEGER andG . OLSEN,1990 Molecular and genetic organization of the Antennapedia gene complex of Drosophila melanogaster. Adv. Genet. 27: 309-362. KENNISON, J. A,,and J. W.TAMKUN, 1988 Dosage-dependent modifiers of Polycomb and Antennapedia mutations in Drosophila. Proc. Natl. Acad. Sci. USA 85: 8136-8140. KONSOLAKI, M., K. KOMITOPOULOU, P. P. TOLIAS, D. L. KING, C. SWIMMER and F. C. KAFATOS,1990 The chorion genesof the medfly, Ceratitis capitata. 1. Structural and regulatory conservationof the s36 generelative to twoDrosophila species. Nucleic Acids Res. 18: 1731-1737.

KRASNOW, M. A., E. E. SAFFMAN, K. K O R N F E L D D.~S.~ HOGNESS, ~ 1989 Transcriptionalactivationandrepression by UltrabiCell 57: 1031thorax proteins in cultured Drosophila cells. 1043. LAUGHON, A., A. M. BOULET, J. R. J. BERMINGHAM, A R.. LAYMON and M. P. SCOTT, 1986 Structureoftranscriptsfromthe homeotic Antennapedia gene of Drosophilamelanogaster: two promoters control the major protein-coding region.Mol. Cell. Biol. 6: 4676-4689. LEVINE, M . , E. HAFEN, R. L.GARBERandW. J . GEHRING, 1983 Spatial distribution of Antennapedia transcripts during Drosophila development. EMBOJ. 2: 2037-2046. LEWIS,E.B., 1978 A gene complex controlling segmentation in Drosophila. Nature 276: 565-570. MAIER, D., A. PREISSand J. R. POWELL,1990 Regulation of the segmentation genefushi tararu has been functionally conserved in Drosophila. EMBO J. 9: 3957-3966. MALICKI, J., K. SCHUGHART and W. MCGINNIS,1990 Mouse Hox2 2 specifies thoracic segmental identityin Drosophila embryos and larvae. Cell 63: 961-967. MARTINEZ,C. J. C.,C.SWIMMER, M. G . FENERJIAN and F. C. KAFATOS,1988 Evolution of the autosomal chorion locus in Drosophila. I. General organization of the locus and sequence comparisons of genes315 and s f 9 in evolutionary distant species. Genetics 1 1 9 663-677. MARTINEZ-ARIAS, A,, 1986 T h e Antennapedia gene is required andexpressed in parasegments 4 and 5 oftheDrosophila embryo. EMBO J. 5: 135-141. MARTINEZ-CRUZADO, 1990 Evolution oftheautosomalchorion cluster in Drosophila. 1V. T h e Hawaiian Drosophila: rapid protein evolution and constancy in the rate of DNA divergence. J. Mol. Evol. 31: 402-423. MICHAEL, W. M., D. D. BOWTELL and G. M. RUBIN, 1990 Comparison of the sevenless genes of Drosophilavirilis and Drosophilamelanogaster. Proc. Natl. Acad. Sci. USA 87: 5351-5353. MITSIALIS,S. A,, and F. C. KAFATOS,1985 Regulatory elements controlling chorion gene expression are conserved between flies and moths. Nature 317: 453-546. NEEDLEMAN, S. B., and C. D. WUNCSH,1970 A general method applicabletothesearchforsimilaritiesintheaminoacid sequence of two proteins.J. Mol. Biol. 48: 443-453. ONEIL,M. T . , a n d J. M . BELOTE,1992 Interspecific comparison of the transformer gene ofDrosophila revels an unusually high degree of evolutionary divergence. Genetics131: 113-128. PERKINS, K. K., G . M. DAILEY and R. TJIAN,1988 In vitro analysis of the Antennapedia P2 promoter: identification of a new Drosophila transcription factor. Genes Dev. 2: 1615-1626. REGULSKI, M.,K. HARDING, R. KOSTRIKEN, F. KARCH,M. LEVINE and W. MCGINNIS,1985 Homeobox genes of the Antennapedia and bithorax complexes of Drosophila.Cell 43: 71-80. REINITZ,J., and M. LEVINE,1990 Controloftheinitiationof homeotic gene expressionby the gap genesp a n t and tailless in Drosophila. Dev. Biol. 1 4 0 57-72. RILEY,P. D., S. B. CARROLL and M. P. SCOTT, 1987 T h e expression and regulation of Sex combs reduced protein in Drosophila embryos. Genes Dev. 1: 716-730. SANCHEZ-HERRERO, E., I . VERNOS,R.MARCO andG. MORATA, 1985 GeneticorganizationofDrosophilabithoraxcomplex. Nature 313: 108-1 13. SANGER, F., S. NICKLEN and A. R. COULSON,1977 DNA sequencing with chain-terminatinginhibitors.Proc.Natl.Acad. Sci. USA 74: 5463-5467. S., A. KURIOWA,P. BAUMGARTNER and W. J. GEHRSCHNEUWLY, ING, 1986 Structural organization and sequence of the homeotic gene Antennapedia of Drosophila melanogaster. EMBO J . 5: 733-739. SCOTT,M. P., and S. B. CARROLL,1987 T h e segmentationand

Conservation Antennapedia Sequence homeotic gene network in early Drosophila development. Cell 51: 689-698. and G. W. HARTZELL 111, 1989 The Scorn, M. P., J. W. TAMKUN structure and function of the homeodomain. BBA Rev. Cancer 9 8 9 25-48. S c o r r , M. P., A. J. WEINER,T. I. HAZELRIGG, B.A. POLISKY, V. PIRROTTA,F. SCALENGHE and T. C. KAUFMAN, 1983 The molecular organization of the Antennapedia locus of Drosophila. Cell 35: 763-776. SEECER, M. A., and T. C. KAUFMAN, 1990 Molecular analysis of the bicoid gene from Drosophila pseudodbscura: identification of conserved domains within coding and noncoding regions of the bicoid mRNA. EMBO J. 9 2977-2987. SHAPIRO, M. B., and P. SENAPATHY 1987 RNA splicejunctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 15: 7155-7174. SIMON, J., M. PEIFER, W. BENDER and M. O’CONNER, 1990 Regulatory elements of the bithorax complex that control expression along the anterior-posterior axis. EMBO J 9: 3945-3956. STROEHER,V. L., J. GAISER C. and R. L. GARBER, 1988 Alternative RNA splicing that is spatially regulated: generation of transcripts f r o m the Antennapedia gene of Drosophila melanogaster with different protein-coding regions. Mol. Cell. Biol. 8: 4 143-4 154. STROEHER,V. L., E. M. JORGENSEN and R. L. GARBER, 1986 Multiple transcripts from the Antennapedia gene of Drosophila melanogaster. Mol. Cell. Biol. 6: 4667-4675. STRUHL, G.,1983 Role of the esc’ gene product in ensuring the selective expression of segment-specifichomeotic genes in Drosophila. J. Embryol. Exp. Morphol. 76: 297-331. STRUHL, G., and M. AKAM,1985 Altered distributions of Ultrabithorax transcripts in extra sex combs mutant embryos of Drosophila. EMBO J. 4: 3259-3264.

469

TAGHERT, P. H., and L. E. SCHEIDER, 1990 Interspecific comparison of a Drosophila gene encoding FMRFamide-related neuropeptides. J. Neurosci. 10: 1929-1942. THROCKMORTON, L. H., 1975 The phylogeny, ecology and geography of Drosophila, pp. 421-469 in Handbook of Genetics, edited by R. C. KING.Plenum Press, New York. TINOCO,I., P. N. BORER,B. DENGLER,M.D. LEVINE,0. C. UHLENBECK,D. M. CROTHERS J.and GRALLA, 1973 Improved estimation o f secondary structure of ribonucleic acids. Nature New Biol. 246 40-41 TREIER,M., C. PFEIFLEand D. TAUTZ, 1989 Comparison of the gap segmentation gene hunchback between Drosophila melanogaster and Drosophila virilisreveals novel modes of evolutionary change. EMBO J. 8: 1517-1525. TREMML, G., and M. BIENZ,1989 An essential role ofeuen-skipped for homeotic gene regulation in the Drosophila visceral mesoderm. EMBO J 8: 2687-2693. WEDEEN, C., K. HARDINC and M. LEVINE,1986 Spatial regulation of Antennapedia and bithorax gene expression by the Polycomb locus in Drosophila. Cell 4 4 739-748. WHARTON, K. A., B. YEDVOBNICK,V. G. FINNERTY and S. ARTAVANIS-TSAKONAS, 1985 opa: a novelfamily o f transcribed repeats shared by the Notch locus and other developmentally regulated loci in D . melanogaster. Cell 4 0 55-62. WILDE,C. D., and M. AKAM,1987 Conserved sequence elements in the 5’ region of the Ultrabithorax transcription unit. EMBO J. 6: 1393-1401. XIAO,H., and J. T . LIS, 1988 Germline transformation used to define key features of heat-shock response elements. Science 239: 1139-1 142. ZINK,B., Y . ENCSTROM, W. J. GEHRINC and R. PARO,1991 Direct interaction of the Polycomb protein with Antennapedia regulatory sequences in polytene chromosomes of Drsosphila melanogaster. EMBO J 1 0 153-162. Communicating editor: R. E. DENELL

Comparative studies of Drosophila Antennapedia genes.

The Antennapedia (Antp) homeotic gene of Drosophila melanogaster controls cell fates and pattern formation in the epidermis, nervous system and mesode...
3MB Sizes 0 Downloads 0 Views