Mol Gen Genet (1992) 232:126-134

MGG

© Springer-Verlag 1992

The pogo transposable element family of Drosophila melanogaster Mark Tudor, Malgorzata Lobocka*, Margaret Goodell, Jonathan Pettitt, and Kevin O'Hare Department of Biochemistry,Imperial Collegeof Science, Technologyand Medicine,London SW7 2AZ, UK

Summary. A 190 bp insertion is associated with the white-eosin mutation in Drosophila melanogaster. This insertion is a member of a family of transposable elements, pogo elements, which is of the same class as the P and hobo elements of D. melanogaster. Strains typically have many copies of a 190 bp element, 10-15 elements 1.1-1.5 kb in size and several copies of a 2.1 kb element. The smaller elements all appear to be derived from the largest by single internal deletions so that all elements share terminal sequences. They either always insert at the dinucleotide TA and have perfect 21 bp terminal inverse repeats, or have 22 bp inverse repeats and produce no duplication upon insertion. Analysis by DNA blotting of their distribution and occupancy of insertion sites in different strains suggests that they may be less mobile than P or hobo. The DNA sequence of the largest element has two long open reading frames on one strand which are joined by splicing as indicated by eDNA analysis. RNAs of this strand are made, whose sizes are similar to the major size classes of elements. A protein predicted by the DNA sequence has significant homology with a human centrosomal-associated protein, CENP-B. Homologous sequences were not detected in other Drosophila species, suggesting that this transposable element family may be restricted to D. melanogaster. Key words: Transposable element - Hybrid dysgenesis

Introduction The white-eosin (we) mutation of Drosophila melanogaster arose spontaneously in a stock of white-one (w I) in August 1911 (Morgan and Bridges /916). We have shown that in wI there is a Doc transposable element

* Present address: National Institutes of Health, NCI, Laboratory of Biochemistry,Building37, Bethesda, MD 20892, USA Offprint requests to: K. O'Hare I

inserted in the promoter region of the white (w) locus (Driver et al. 1989) and that in we there is an additional secondary insert of 190 bp within the Doc element (O'Hare et al. 199/). This secondary insertion appeared to be a member of a transposable element family which had not been described and we called it pogo. In this paper we describe in detail the family of pogo elements. It appears to be of the same class of transposable element as the D. melanogaster P and hobo elements, but may be less mobile than them.

Materials and methods Drosophila stocks. Canton S, and Oregon R were from the American Stock Center at Bowling Green State University, Ohio, USA. Harwich was from M.G. Kidwell (University of Arizona, Tuscon). Stocks of D. simulans, D. mauritiana, D. yakuba, D. erecta, and D. pseudoobscura were from M. Ashburner (University of Cambridge, UK). All stocks were raised on standard cornmeal/sugar/yeast media at 25 ° C. Recombinant DNA techniques. Two genomic 2 libraries were used. The first was constructed during the course of unrelated studies on the suppressor of forked locus (Mitchelson 1988; A. Mitchelson, M. Simonelig and K. O'Hare, in preparation). The genomic DNA was from flies heterozygous for the X chromosome balancer FM6 and a chromosome carrying a lethal su(f) mutation. The autosomes were derived from the wild-type strain Canton S. The insert DNA was made by partial digestion with Sau3AI and a size fraction of 14-20 kb was purified by agarose gel electrophoresis. The vector was EMBL4 digested with BamHI. The second library was that of Mariani et al. (1985) and was similar to the library we prepared, except that the insert DNA was from the wildtype strain Oregon R. The cDNA libraries of Poole et al. (1985) were screened for pogo cDNAs. Hybridising phage were purified and their inserts analysed by restriction enzyme mapping, DNA blotting and DNA sequenc-

127 ing. R N A preparation and analysis was as described by O'Hare et al. (1983). Standard techniques were used throughout (Sambrook et al. 1989). D N A sequences were determined using the dideoxy technique on M13 recombinants. For pogo4, pogo5 and pogo6, suitable restriction enzyme fragments containing the entire element were first subcloned into the vector pBluescript M 1 3 - . For pogoR1 I, a ClaI-XhoI fragment was cloned between the ClaI and Sall sites of the vector pBR322 to give the recombinant plasmid pogoR11XC. Fragments generated by restriction enzymes with 4 bp recognition sequences were subcloned into M13 vectors and the sequence of the inserts determined. In order to complete the determination of the sequence of the ClaI-J(hoI interval of p o g o R l 1 for both strands independently, an oligonucleotide primer (complementary to positions 1223-1237 of Fig. 3) was made and used on the M13 recombinants. The termini of the pogoR11 element are not present in the plasmid pogoR11XC, so the terminal sequences were determined on one strand only by cloning fragments from 2pogoR11; BamHI-HindIII for the ClaI end and EcoRI for the XhoI end (see Fig. 2). The D N A sequence of the 2.1 kb element p o g o R l l has been given the accession number X59837 in the E M B L Nucleic Acid Database. The sequences of the c D N A inserts were determined by subcloning fragments from the lambda phage D N A directly into M13.

Results

Size, number and distribution of pogo elements There are ClaI and J(hoI sites just inside either end of the insertion in we, and we have used the size of fragment generated by digestion with these enzymes to characterise other elements. The sizes o f pogo elements in three different wild-type strains (Canton S, Oregon R and Harwich) were examined by D N A blotting (Fig. 1A). Digests with ClaI and XhoI were probed with the ClaIXhoI fragment from the element in wt The three strains examined all have a prominent band at about 200 bp, several in the size range 1.1-1.5 kb and one at 2.1 kb. D N A from Canton S males and females showed the same pattern of bands (data not shown), indicating that pogo elements are not restricted to the Y chromosome. Digests of Canton S and Oregon R D N A with J(hoI or BamHI were probed to examine the number and distribution of pogo elements (Fig. 1 B). These enzymes cut once (XhoI) or not at all (BamHI) within pogo elements. The number of hybridising bands gives an estimate of the number of elements and the size o f the bands is determined by the location of restriction enzyme sites in the sequences flanking the pogo elements. Both strains have many hybridising fragments and there are strainspecific bands. This suggests that the two strains do have elements in different genomic locations and is consistent with transposition of pogo to new sites. Some bands do appear to be shared, but, as so many bands are detected, this could be due to co-migration of D N A fragments from different genomic sites. However, a single

H CSOR

CSOR M CSOR

M H C80R

2,1 - -

1.5-1.1--

0.2--

:i i~;!~¸

A

B

C

Fig. 1 A-C.

DNA blot analysis of pogo elements. OR, Oregon R; CS, Canton S, H, Harwich. A ClaI-XhoI double digests were probed with the ClaI-XhoI fragment from the element in we. B BamHI digests (left of the marker lane M of 2 digested with HindlII) and XhoI digests (right of the marker lane) were probed with the ClaI-XhoI fragment from the element in we C The filter from A was re-probed with the HindIII-BamHI fragment flanking the element cloned in 2pogoRll (see Fig. 2). M is a marker lane of 2 digested with BstEII

copy probe from the sequences flanking an element cloned from Oregon R, p o g o R l l (see below), detects the same size fragment in Canton S, Oregon R and Harwich (Fig. 1 C). Digests using other enzymes such as EcoRI and HindIII (not shown) confirm that this element, cloned from Oregon R, is also present at this site in four other wild-type strains - Canton S, Amherst M56i, Hikone J and Harwich. This may indicate that the frequency of transposition of pogo is less than that of other transposable elements. No homologous sequences were found by D N A blotting in D. simulans, D. erecta, D. mauritiana, D. yakuba, or D. pseudoobscura (data not shown).

Isolation of additional pogo elements To isolate other pogo elements, we screened an unamplifled 2 library constructed during the course of unrelated studies on the suppressor of forked locus (see Materials and methods). The ClaI-J(hoI fragment from the insertion in we was used as probe. Based on the number of phage which hybridised (about I in 150) and the average size of the insert, this gives an estimate of the copy number as about 80 per haploid genome. We purified and analysed six phage. The size of the ClaI-XhoI fragment which had homology to the probe was determined by D N A blotting. Four of the six (2pogol, 2, 3 and 4) appeared to contain elements similar if not identical in size to that in we. There was a 1.1 kb element in 2pogo5 and in 2pogo6 the region of homology had a XhoI site at one end but there was no ClaI site at the other end. D N A sequence analysis of pogo6 showed that the position where the homology to other pogo elements terminated coincided with a Sau3AI site (position 293

128 w

e

H

ECX

H

X

I

I ~

I

I

H

4

H I i

[ [

C H EXHE

tl

I r

rl;;

E

H

EXB

EB

I

f

III

II

6

R15

I~

BH

5

Rll

BCX X E

r

lkb

t__j

(E)

C H

G

EX

G E

I

r I

I

Ir

I I

BC H II I

G r

E X(E) T II

J

Fig. 2. Restriction enzyme maps of cloned intervals containing pogo elements. The elementsare shown as shaded bars. B, BamHI; C, ClaI; E, EcoRI; G, BglII; H, HindIII; X, XhoI. The probe used in Fig. 1C is shown under the map of pogoRll. The EcoRI sites in parentheses are from the vector at the end of the cloned interval

in Fig. 3). This indicates that 2pogo6, two Sau3AI fragments have been ligated together to produce a chimaeric insert. Other apparently chimaeric phage have been isolated from this library (our unpublished results). To clone the larger pogo elements, we used the 1.1 kb ClaI-XhoI fragment from pogo5 as a probe and screened an amplified library of the wild-type strain Oregon R. Twelve phage which gave strong hybridisation signals were purified and found to contain ten different genomic inserts. Elements were characterised by the size of the ClaI-XhoI fragment with homology to the probe. M13 subclones of pogo5 were used as probes to investigate the structure of the elements. Nine of the cloned intervals contain elements between 1.1 and 2.1 kb in size. In 2pogoR2 the region of homology was bounded at one end by a ClaI site, but there was no XhoI site corresponding to the other end of the element. At least 170 bp of the sequences at the XhoI end of pogo elements are missing in 2pogoR2. This apparent truncation coincides with a Sau3AI site (position 1948 in Fig. 3) and may correspond to the end of the cloned region. The nine cloned elements represent at least six different structures. Restriction enzyme maps were constructed for the two phage (2pogoRll and 2pogoR15) which had the largest ClaI-J[hoI fragments (Fig. 2). They have apparently identical elements from different genomic locations. These elements are 2.1 kb in size and correspond to the largest ClaI-XhoI fragments seen in the genomic DNA blotting experiments (Fig. I A). The ClaI-XhoI fragment from 2pogoR11 was subcloned into pBR322 between the ClaI and SalI sites as the recombinant plasmid pogoRl IXC and is a useful probe for pogo elements as it has no flanking sequences. The possible association of pogo with other repeated sequences (as occurs in the we mutation where it is inserted into a Doc transposable element) was investigated by probing restriction enzyme digests of the 2pogo recombinants with a total genomic DNA probe. The increased concentration of repeated sequences over single

copy sequences in such a probe allows for detection of fragments which include repeated sequences such as transposable elements. This showed that the inserts from two sites (including that in 2pogoR15) were made up mostly, if not entirely, of repeated sequences and that at five other sites there were repeated sequences in addition to the pogo element (data not shown). Some of these sites may be from heterochromatin where transposable elements are often found close together. The site cloned in 2pogoR11 appeared to be single copy apart from the pogo element (see above).

DNA sequence of pogo elements

The complete DNA sequence of the pogoRll element was determined (Fig. 3). The complete DNA sequences of pogo4 and pogo5 and 750 bp of the DNA sequence of pogo6 were also determined. The pogo4 element has exactly the same DNA sequence as that found in we (O'Hare et al. 1991) and they have a 1.9 kb internal deletion compared to pogoRll. The pogo5 element has a single deletion of 982 nucleotides, while the data we have for the pogo6 element is consistent with there being a single deletion of 645 nucleotides. We have isolated two cDNAs (see below) which appear to have been transcribed from elements with single large deletions of 976 bp in pogoC2 and of 620 bp in pogoC4. The positions of the deletions in different elements are shown diagrammatically in Fig. 4 and the sequences at the sites of the deletions are in the upper panel of Fig. 5. The pogo5 and pogoC2 elements are both about 1.1 kb long and have deletions between similar but non-identical positions. Similarly, the pogo6 and pogoC4 elements are b o t h about 1.5 kb with deletions between similar but non-identical positions. This suggests that the bands at 1.1 and 1.5 kb seen in ClaI-XhoI digests of genomic DNA (Fig. 1 A) do not arise from multiple copies of identical elements but, rather, are from elements with similar but different deletions. Apart from sequence differences at the proposed deletion endpoints (shown in lower case in the upper panel of Fig. 5), a few point polymorphisms were detected between different elements. Where there is an A at position 1871 in p o g o R l l (Fig. 3) there is a G in pogo5. At position 2033 in pogoRl 1 there is a run of 11 A residues, but in pogo5 there are 12 and in pogo6 there are 13. In pogoC2, there is Y 9 rather than T8 at 161, a G inserted at 238, A at 324 rather than G and A at 1608 rather than C. In pogoC4, the only polymorphism is the same as that described above for pogo6 where there is A13 at 2033. The terminal DNA sequences of the cloned pogo elements are in the lower panel of Fig. 5. The insertion sites are generally AT-rich. The sequences immediately flanking the XhoI ends of pogoR11 and pogo5 are identical (GTATAT), but the sequences flanking the other elements are quite different. Even though the sequence of w~ where the pogo element in we has inserted is known (O'Hare et al. 1991), it is not possible to define the exact end of the element. If no duplication is made on inser-

129

cAGTATAATicccTTAGcT~AT~ATAG~TAG~TGcAT~G~`cAA~ATATcT~`cATTA~TTTTccAN~TmGTGTGA~TAGAAAATTiGTAc~AA~ 5O

250

350 mTTGTGTiTTTGTTTcTiATGGcAcAA~GTTATTAAAi TGGGTAAAAcAAAGcGTGT;GTTGGAcTAi ~cAcTAAAGGi~AAAGcTTcAi°~ATAATcG~a M G K I K R V V G L T L K E K L e ! I E 450 500 TAGTGACCAACAAAGTGGACAAAAAGGAAA1. T T GT GCCAAGT T CAAAT GCGACAGAT CCACAGT CAACCGCAT T T TACA/~AAAACAAAT GAAAI T CAT GA L V T N K V D K K E I C A K F K C D R S T V N R I L Q K T N E I H E 550

600

AGC~GT~GCcGCG1.CAGGTTTAAAAAGAAAGCGTCAAAGAAAAGGAGCGCACGACT~AG.iAGAAGAAGCCTTATACAT~ IGGTTCGGACAGCAGGAATCA A

V

A

A

S

G

L

K

R

K

R

Q

R

K

G

A

H

D

L

V

E

E

A

L

Y

I

W

F

G

Q

Q

E

650 AAGAACGTA,~T TCTTGACCGGCACGTCATATTAGCAAAAGCGAAAGAATTTTGCCAAAA/~TTTAACGAcGCCTT1.GAACCTGACGCcAGCTGGCT K

N

V

I

L

D

R

H

V

I

k

A

K

A

K

E

F

C

Q

K

F

N

D

A

F

E

P

D

A

S

S

700 TTGGC

W L

750

800

GC~GG~GCAAGCG~cACAMM~AGTMGGcAAAA1.AcAcGGCGAAAcTGCTAcAAA1.GATTCcGTATCAGCAAATGAGTAcAAAAATGATAT T1.1.GCC R

W R

I(

R

H

N

I

K

Y

G

K

I

H

G

E

T

A

T

N

850 IGGATTGCTTAAAGGTTATAACCCAGAAGACATTI"TTAArGCTGACGAAACTGCACTCT G L L K G Y N P E D I F N A D E T A L

D

S

V

S

A

N

E

Y

K

N

I

L

P

900 TTACTIG1.GGA F.T C G

TT 1.A1.AAAGCAATGCCGAATGCGACATTTT F Y K A M P N A 1. F

950 AAGCAA1. 1`AAA1.GGC~AGAAATCTCAGAGAGTGAGACT1`Ac1.TTGC1.G1`TTATATGCAATGCAACTGGGACA1.ACAAAAAAACTT1. K Q L N G Q K S Q R V R L T L k F I C N A T G T Y K K T

D

I000 TGTAATCGGCAGAT F V I G R

1050 1100 CTAAAT~GCCACGATG~TTCAAGAATGCTAAT~TGCCCATTCCG1.ACTATGCAAATAAGAAGGCCTGGATGACTAAGGATCTC~`GGCGAAAAATAA1.GAC S K S P R C F K N A N V P ] P Y Y A N K K A W M T K D L W R K ] M T 1150 1200 AGGAT I"TGACGAAGAAAT GAAAAAGCAAAATCGAAAGA1. T 1.TACT CT TCATCGACAATGCAACTAGT CACACGACT GT CAAGGACT 1.CGAAAACAT AAAA G F D E E M K K Q N R I( 1 k k F I D N A T S H T T V K D F E N I K 1250 T1.GTGcT1.CATG•CAC•AAACGCAA•GGc1.•1.A•TTCAAC•1.CTGGAC•AAGGTATTAT••A•TCATTCAAA1.TAGAGTATAGG•GTA1.1.TTGGT L C F M P P N k T A l l Q P L D Q G I I H S F K L E Y R R I

L

I

A

V

N

C

G K

S

T

V

E

V

1300 CAAAC K

1350 1400 TTAAAATCAT1.ATCGTTATTGGATGCTCTATAT TT TGTCAACCAAGGATGGAAGAA

AGCAGCTCATTGCTGTTAATTGTGGTAAATCTACTGTGGAATTT Q O

L

F

L

K

S

L

S

L

L

D

A

L

Y

F

V

N

Q

G ~

K

N

1450 1500 1.r,TTAAAA1.G1.TAACTA1.1"CAGAAT IGTTTTAAAAAGGTAAGA1.GGGATTATTATTGATATG1.AT CTCAAATAACGAATTTATTATTTTCAGGCTGGAT1. V K M L I" I Q N C F K K ....................................................... A G F 1550 TAAGTTCAGT TTTGAAAATGAAGACACCAT TGCTGAAAAAGACAAACAATGCGTAGAAGTTGACATTGTATCGAATATTAATT K F S F E M E D T I k E K D K Q C V E V D I V S N I N

1600 GGAATGAATATGCCAAT ~ N E Y A N

1650 1700 GTTGATGCAGATGAGGcTTr`c~ATG~TcAJ~TTAGA~GATGATGAAATcGTGCGcTcTT~AGTTCAAGATG~AAAAA~cAGCGATAACGAAGAAAGCCATA V D A D E A C H G Q L D D D E I V R S l V Q D A K T S D N E E S H 1750 1800 OTGATGAAGAI"6TGGACGATACTGAGCGT('.'CTACTTTTAAGGATGGGTTI'GCAGCAATTAAGGCI"TTAAAGTCCATTTTTATGCGAAACAATAATGATGA S D E D V D D T E R P T F K 0 G F A A I K A L K S I F M R N N N D E 1850 1900 GTTTTTGCAAAACTTGAAT'rCTATGGAAGACAAGCTGTTTAATTTACATATAAACTCAGCTGTATTGCAAAAAAAAATTACTGACTATTTTTAAGTTAGT F l Q N L N S M E D K k F N L H I N S A V L 0 K K I T D Y F " 1950 TT TAAAAAGTGT I"TTAA1.CAAT TCACCATCAC1.1.AAAT1. TATATGTCGATCT TACT1.ATCATTAAGAATGAAATTATCAGI"TCCTTTTAT

2000 GT TTAACAT T

2050 2100 GTTATAAAG/~ATAAATTCTTTATTTTTCCTTAAAAAAAAAAATTAAGTTAGCTGCATTTTTAAGTTACCTGCATCGAGGCATTGTGCAAAGTACTCGAG GCAGCTAAGCGAATTATACTG

tion, then insertion always occurs between T and A and pogo has 22 bp inverse terminal repeats. Alternatively, pogo always inserts at the dinucleotide T A and has 21 bp inverse repeats. This is similar to the Tcl element of Caenorhabditis elegans (reviewed by M o e r m a n and Waterson 1989). An element which makes no duplication upon insertion would be more unusual than one which has a specificity for insertion (TA within an AT-rich sequence), so we assume that pogo does in fact make a 2 bp duplication u p o n insertion. The terminal 4 bp C A G T of pogo would then be identical to Tcl.

Fig. 3. DNA sequence of pogoRll. The DNA sequence of the ClaI-XhoI strand is shown. Only the sequence of the pogo element is shown. The flanking sequences and probable TA duplication are in the upper panel of Fig. 5. The single letter amino acid code is used to show the predicted ORF10RF2 polypeptide

Coding potential of pogo The D N A sequence of p o g o R l l was analysed for protein coding regions. The distribution of stop codons in all frames on both strands is shown in Fig. 6. On the upper strand, whose sequence is given in Fig. 3, there are two long open reading frames (ORFs) of 1155 and 417 bp, while the longest O R F on the lower strand was only 279 bp. Suitable splice sites can be predicted f r o m the D N A sequence which would join the longer 5' O R F , ORF1, to the shorter 3' O R F 2 . Analysis of c D N A s (see

130

ml,Rl

Cla I

Hind]I[

Bgl ]]

I

I

I

5

Eco RI Xho I

I

~

I

+2

I

+ 3

6

I

we, 4 . ~

E

I

~

Ill I

II

i

rl

Bgl ]l

I

I

I

J

~

I~]ll

I

rlilllil

I

J l I11

I lJl IIIIlUl Illll

Hin din

IIII

Jl

IJ I II]

ClaI

I

A

C4

I llllllllll I1[111 I IILI II II11

IIII II II I

I

I

I

C2

II

+1

Jl iJ

]

Ir I

[ ~1 I1~

I I

Eoo RI Xho l

I

I

I IIIJIllrl

III I]

II II

I III Illll II I

i

II

iiiJt i

[llllli]tltlil

I Ittgllll

MIII[[IIIIIII II1 i[

J,

I JrrJ

-1 -2 -3

Fig. 4. C o m p a r i s o n of pogo elements. The relationship between different cloned pogo elements is shown. The extent of the pogo sequences are s h o w n as horizontal lines. The ends of cloned sequences are s h o w n by the long vertical lines a n d the deletion endpoints as short vertical lines. In the c D N A pogoC4, the n o n - p o g o sequences are s h o w n as a wavy line (not to scale) a n d the i n t r o n as a caret

Fig. 6. Stop codons in pogoR11. The distribution of stop codons in all reading frames is shown. Frames + 1, + 2, + 3 are of the upper strand (5' to 3' is f r o m left to right) a n d - i , - 2 , - 3 are o f the lower strand (5' to 3' is f r o m right to left)

below) suggests that these sites are used to splice ORF1 and ORF2 into a single continuous O R F as shown in Fig. 3. The protein encoded by pogoR11 O R F 1 - O R F 2 is shown in Fig. 3 below the D N A sequence, using the single letter code. We cannot exclude the possibility that sequences upstream of the start codon indicated in Fig. 3 could encode protein. The first A T G in the element is at post-

tion 179 and the one we have indicated in Fig. 3 at position 340 is the seventh. Two of other six ATGs fit as well as that at 340 with the consensus proposed by Cavener (1987) for translation start sites in D. melanogaster (C/AAAC/AATG). Moreover, downstream from the stop codon at 298 which marks the beginning of ORF1, there is a polypyrimidine-rich tract followed by an A G at 330. While this is a relatively poor fit with consensus

pogoR11

100 GTGTGAATAGAAAATTTGTACGAAAATTCATACGTTTGC

pogo4

GTGTGAATAGAAAATagaaaatagAAAAAAAAAAATTAA

pogoR11

GAAATAAATTCTTTATTTTTCCTTAAAAAAAAAAATTAA 2020

pogoR11

GGAGCGCACGACTTAGTAGAAGAAGCCTTAT

pogo5

GGAGCGCACGACTTAGACAAACAATGCGTAG

pogoR11

ACCATTGCTGAAAAAGACAAACAATGCGTAG 1550

ililililiiliili

iililiIrililiif

pogoR11

550 GCGTCAAAGAAAAGGAGCGCACGACTTAGTA

pogoC2

GCGTCAAAGAAAAGGAGACACCATTGCTGAA

pogoR11

CAGTTTTGAAAATGAAGACACCATTGCTGAA

550

lililililililili

lilililililiiliil

--

°

AGTATGGCAAAATACACGGCGAAACTGCTAC

pogoO

AGTATGGCAAAATACACCAAGGA TGGAAGAA

w

1

we pogo4 pogo5 pogo6 pogoC4 pogoR 11

ililililililliil

1520

750

pogoR 1 1

pogoR 11

iililillilililIii

liiliill[]ililili

IIIIIlllllllllll

CTCTATATT TTGTCAACCAAGGATGGAAGAA 1400

770 pogoR11

ACAAATGATTCCGTATCAGCAAATGAGTACA

pogoC4

ACAAATGATTCCGTAaGATGGAAGAATGTTA

pogoR11

AGTTTTGTCAACCAAGGATGGAAGAATGTTA 1400

IIIIIIEIIIIIIII

illllllllllllll

aacacaaagggaagattcaataaT_AAataaaaattaaaaaaaaaaaaaa aacacaaagggaagattcaataaT~ACAGTATAATTCGCTTAGCTGCATCGAT

atattacgtatatgcatttggtaT_.AACAGTATAATTCGCTTAGCTGCATCGAT atacactgaaaacaaagttcgagT._AACAGTATAATTCGCTTAGCTGCATCGAT

CTCGAGGCAGCTAAGCGAATTATACTGT._AAataaaaattaaaaaaaaaaaaaa CTCGAGGCAGCTAAGCGAATTATACTGTAtttatttcgaaaacctccaacaa CTCGAGGCAGCTAAGCGAATTATACTGTAtatgtaattaacaaggtttgctc

CTCGAGGCAGCTAAGCGAATTATACTGTAgtatattcaatagatttagtatc CTCGAGGCAGCTAAGCGAATTATACTGTAtttcttgat'tttatcgaaaggca atcgacctgttggccatatagtaT_AACAGTATAATTCGCTTAGCTGCATCGAT Ctal

Fig. 5. D N A sequences o f pogo elements. U p p e r panel. D N A sequences at deletions in pogo elements. P o l y m o r p h i s m s at the position of the deletions are s h o w n in lower case. Lower panel. D N A

CTCGAGGCAGCTAAGCGAATTATACTGTAgtatatacatacatatacatata Xhol

sequences at the termini of pogo elements. The p r o b a b l e T A duplications m a d e u p o n insertion are underlined and flanking sequences are in lower case

131 splice sites, analysis of cDNAs suggests that this can be used as a 3' splice site (see below), so it is possible that sequences upstream of position 331 are spliced with O R F I - O R F 2 . Analysis of the D N A sequence in this region, however, did not reveal any long ORFs (Fig. 6) or convincing splice sites. Compared to standard codon usage in D. melanogaster, codons rich in A or T are preferred in pogo. For leucine (encoded by T T R or CTN), T T R is used here 60% of the time, while codon usage tables for D. melanogaster (Wada et al. 1990) show that T T R would typically encode only 19% of leucines. Furthermore, T T A is the codon used for 42% of the leucines in O R F I ORF2 but for only 4.5% of leucines in codon tables from D. melanogaster genes.

Transcription of pogo A developmental blot of R N A from various stages of the life cycle of Oregon R was probed with a singlestranded probe for the HindIII-EcoRI interval from pogoR11, which would detect RNAs of the strand shown in Fig. 3. This probe would not detect R N A from the 190 bp elements at all, and would have most homology with RNAs from the largest elements. Two RNAs of 1.1 and 1.5 kb were present throughout development (left panel of Fig. 7). A similar R N A profile is seen in samples from adult females of Canton S and Harwich (right panel of Fig. 7). Longer exposures detect low levels of a 2.1 kb RNA. These sizes (2.1, 1.5 and 1.1 kb) correspond to the major size classes of elements in these strains. Although the elements in Harwich, Canton S and Oregon R are almost identical (Fig. 1 A), the R N A E1 E2 E3 L1 L2. L3 P1 P2 ~

~

ORCS H

2.1

-

1.51.1 -

Fig. 7. Transcription of pogo. The probe was a single-stranded probe of the HindIII-EcoRI interval of pogoRll which would hybridise to RNAs of the ClaI-XhoI strand. Left panel. Samples of poly(A) ÷ RNA from different stages of Oregon R development. El, 0-4 h embryos; E2, 4-8 h embryos; E3, 8-24 h embryos; L1, 1st instar larvae; L2, 2nd instar larvae; L3, 3rd instar larvae; P1, early pupae; P2, late pupae; ~, adult males; ~2,adult females. The high molecular weight smears represent hybridisation to DNA present in the RNA preparations. The late pupae and adult female lanes were underloaded compared to the other lanes. Right panel. Samples of poly(A) ÷ RNA from adult females of different strains. OR, Oregon R; CS, Canton S; H, Harwich

profile of Harwich is somewhat different from Canton S and Oregon R in the smaller size range (right panel of Fig. 7). The structure o f p o g o is not consistent with transposition through an R N A intermediate, so these RNAs are unlikely to be genuine full-length RNAs. Almost fulllength RNAs would be generated if transcription initiated close to the ClaI end and terminated close to the XhoI end of pogo. The structures of pogo cDNAs suggest that this might occur (see below). RNAs might also initiate and/or terminate outside of pogo elements which were inserted into a transcription unit. As some sites of insertion appear to be conserved in different strains, some of the similarities and differences in the R N A profiles may be determined by which elements the strains happen to share. Insertion of an element close to transcriptional enhancers which affected the initiation of transcription within the element, as occurs for P elements (O'Kane and Gehring 1988), might also contribute to similarities and differences in the R N A profiles. The c D N A libraries of Poole et al. (1985) made from R N A of early pupae and adult females were screened for pogo cDNAs and two positively hybridising phage were found. The relationship between the cDNAs and the 2.1 kb element is shown in Fig. 4. The c D N A pogoC2 was isolated from the adult female c D N A library and is from an element about 1.1 kb in size, with a deletion from 548 to 1523. Its 5' end corresponded to position 25 of the p o g o R l l D N A sequence and its 3' end to the EcoRI site at 1816. This 3' end presumably arose because of incomplete methylation of EcoRI sites within the c D N A inserts during construction of the c D N A library. The c D N A pogoC4 was from an early pupal library and appears to be from an element about 1.5 kb in size. The 5' 173 bp of pogoC4 has no homology with pogoR11 and the sequence begins to match at position 332 of pogoR11 where there is a putative 3' splice site at the beginning of ORF1 (see above). The pogoC4 sequence continues to match that of p o g o R l 1 up to position 769, where there is a deletion to position 1388. By comparison with pogoR11, the c D N A sequence is also missing nucleotides 1438 to 1492. This corresponds to the intron whose excision would join ORF1 in phase with ORF2. The c D N A sequence continues to match with p o g o R l l to the end of the pogoC4 element and there are an additional 500 bp presumably from the genomic sequence flanking this particular element (Figs. 3 and 4). We have compared the 5' and 3' extensions of pogoC4 with other sequences from D. melanogaster in nucleic acid databases and found no good matches. We presume that this c D N A comes from a transcription unit into which a 1.5 kb pogo element has inserted. One 5' and two 3' splice sites from pogo are used. There is a sequence T A T A A T T within the inverse repeat at position 4 which might serve as a T A T A box for the initiation of transcription very close to the ClaI end of pogo (as in pogoC2). Splicing of the intron from 1438 to 1491 (as in pogoC4) and use of the polyadenylation signal at 2011 (AATAAA) would generate pogo RNAs similar in size to the pogo elements themselves.

132

The pogo ORFI-ORF2 protein has homology with a major human centrosome-associated antigen

of pogoRll. The C-terminal regions of CENP-B and ORF1-ORF2 are rich in acidic amino acids, CENP-B more so than ORF1-ORF2. Of the point sequence differences detected between pogoR11 and other pogo elements, only two are within ORF1-ORF2 and both would increase the acidic nature of the ORF1-ORF2 protein. In pogoC2, the difference at 1608 changes the encoded amino acid from alanine to glutamic acid and in pogo5 the difference at 1871 changes the encoded amino acid from lysine to glutamic acid. These regions rich in acidic amino acids could be involved in interactions with other proteins, as occurs in some transcription factors. There is no extensive homology between ORF1ORF2 of pogo and proteins encoded by the P or hobo elements of D. melanogaster. The amino acid sequence motifs identified by Calvi et al. (1991) as occurring in hobo, the Ac element of Zea mays and the Tam3 element of Antirrhinum majus are not present in pogo. A small region of pogo did match with P (lower panel of Fig. 8), where there were 7 identical and 7 similar amino acids out of 22. This match is in the region common to the transposase and the 66 kDa putative repressor protein

The DNA sequence of pogoRl I and the predicted protein sequence of ORF1-ORF2 from pogoR11 were used to search databases. There were no good homologues in nucleic acid databases, but a protein homology was found with CENP-B, a major human centromere autoantigen (Earnshaw et al. 1987). The best match is between 175-384 of CENP-B and 154-369 of O R F I ORF2 where 32% of amino acids are identical and a further 25% represent conservative changes. Overall, the proteins show 51% similarity and 26% identity and can be aligned for almost their entire lengths (upper panel of Fig. 8). This degree of homology between a single copy human gene and a fruitfly transposable element was unexpected. However, CENP-B is a DNA-binding protein (Masumoto et al. 1989) and a transposase encoded by pogo would be expected to bind to DNA, so perhaps this is the basis for the homology. The DNA binding site defined for CENP-B has no TA dinucleofide, nor does it show any homology with the sequences flanking inserted pogo elements, or with the sequence

50 POGO MGKTKRVVGLTLKEKLQIIELVTNKVDKKEICAKFKCDRSTVNRILQKTNEIHEAVAASGLK RKRQRKGAHDLVEEALYW i FGQQESKNVILDRH ::..::

CENP-B

::.

:

.

:

:

::

.:

:

:.

::

.

..

:

.:.

:

::

:

RQLTFREKSR~IQE~EEN~DLRKGE~ARRFNIp~STLSTiLKNKRA~LASERKYG~ASTCRKTNKLSPYDKLEGLLIAWFQQIRAAGLPVKGI

50 100

150

POGO VILAKAKEFCQKFN;AFEPDASWLWRWRKRHNIKYGKIHGETATN ..

::

.

.



:

:



.::

:.:.::

. . . .

:

DSVSANEYKNDILPGLLKGYNPEDIFNADETA

:

.:

. . . .

:..

::

,:.:

:

::

CENP-B

ILKEKALR•AEELGMDDFTASNGW•DRFRRRHGVVSCSGVARARARNAAPRTPAAPAS•AAVpSEGSGGSTTGWRAREEQpPSVAEGYASQ•VFSATETS 100 150

POGO

LFYKAMPNATFFTCGKQLNGQKSQRVRLTLLFICNATGTYK KTFVIGRSKSPRCFKNANVPIPYYANKKAWMTKDLWRKIMTGFDEEMKKQNRKILLil

CENP-B

LWYDFLPDQAAGLCGGDGRPRQATQRLSVLLCANADGSEKLPPLVAGKSAKPRAGQAGLPCDYTANSKGGVTTQALAKYLKALDTRMAAESRRVLLLA 200 250

200 :.:

POGO

.:.

::



::

250 .:.

::

:

:

.:

:.:

::

:

.:

:

::

:.

.:





:



..:

:



:..::.

300 350 DNATSHT TVK~FEN~K~FMPPNATALLQ~LD~GI~HSiKLEYRR~K~L~A~N~GKSTVEFLKSLSLLDALYFVNQGwKN~KMLTiQN~FKKAGF ....

:

:.::

..::.

:...

:

::

..

.

:..

:.:

.::

::

.:

:

:

::.

:::

CENP-B GRLAAQSLDTSGLRHVQLAFFPPGT VHPLERGVVOQVKGHYRQAMLLKAMAALEGQDPSG LQLGLTEALHFVAAAWQAVEPSDIAACFREAGFG 300

350 40C

POGO

KFSF

ENEDTIAEKDKQCVEVDIVSNINWNEYANVDAD

:.

:.:.

:

.

.

.

.

.

.

.

EACHGQLDDDEIVRSLV :

..::

:

:

.



: . . . . .

:

CENP-B GGPNAT•TTSLKSEGEEEEEEEEEEEEEEGEGEEEEEEGEEEEEEGGEGEELGEEEEVEEEGDVDSDEEEEEDEESSSEGLEAEDWAQGVVEAGGSFGAY 400 450

450

POGO

QDAKTSDNEESHSDEDCII)DTERPTFKDGFAA[KALKSIFMRNNNDEFLQ NLNSMEDKLFNLHINSAVLQK

CENP-B GAQEEAQcPTkHFLEGGEDsDsDsEEEDDEEEDDEDEDDDDDEEDGDEVPVPsFGEAMAYFAMVKRYLTsFP~DDRGQsHIkHCEHDLVHVTMKNHARQA 500

POGO

550

KITDYF

CENP-B GLLGLGHQS

80

90

POGO

ALYIWFGQQESKNV|LDRHVIL

P

TLYRWLSDVDIKRGCLDVVIDL

200

210

Fig. 8. Protein homologues of pogo O R F 1 - O R F 2 . Upper panel. A global match of the entire sequences of pogo O R F I - O R F 2 and human CENP-B. Lower panel. A local homology between pogo O R F J - O R F 2 and P element transposase. Identical amino acids (:) and similar amino acids (.) are shown

133 encoded by the P element, but does not coincide with a helix-turn-helix motif (residues 308-327) which may be part of a DNA-binding domain (Rio et al. 1986). There are no good matches for this short motif in hobo, the mariner element of D. mauritiana, Tcl of C. elegans, Ac of Z. mays or Tam3 of A. majus.

Discussion pogo is a transposable element o f the same class as P and hobo

Although there is little homology at either D N A or protein level when pogo is compared with P and hobo, there are striking similarities in overall structure. For reviews on P and hobo elements, see Engels (1989) and Blackman and Gelbart (1989), respectively. The termini are short inverse repeats. There are large conserved multicopy elements and smaller elements probably derived from the larger elements by a single internal deletion. The putative deletion endpoints often correspond to small direct duplications in the D N A sequence of the large elements. There are sequence rearrangements (substitutions, duplications, insertions) in the small elements at the site of the putative deletion. There may be a promoter near the 5' end and a polyadenylation site near the 3' end of the element. Closely related species may lack these elements entirely. We conclude that pogo is a member of the class of transposable elements, including the P and hobo elements of D. melanogaster, which probably transpose at the D N A level.

Can pogo be mobilised?

The other two elements of this class found in D. melanogaster, P and hobo, are mobilised in appropriate crosses and are the causative agents for the P-M and H-E systems of hybrid dysgenesis. Strains of D. melanogaster can have very different numbers of P and hobo elements. Though the strains we analysed differ significantly in the number and structures o f P and hobo elements in their genomes, they have very similar numbers of similar sized pogo elements. A more extensive survey by I.A. Boussy, L. Charles, M.H. Hamelin, G. Periquet and D.Y. Shapiro (personal communication) indicates that all strains of D. melanogaster have a very similar profile of pogo elements. It has therefore not been possible to make crosses between strains with high and low numbers of pogo elements to look for reduced fertility or elevated mutation rates associated with mobilisation of pogo. Insertion mutations of P and hobo are unstable during hybrid dysgenesis, reverting and producing derivatives with altered phenotypes at high frequencies. The only mutation that we know to be associated with insertion of a pogo element, namely we, is very stable. Woodruff et al. (1987) used w e as an M strain in P-M dysgenic crosses and found several white-eyed flies amongst the Fe progeny. This is the phenotype (that of w 1, from which w e was derived) expected if the pogo element had

excised. However, analysis of D N A blotting of two independent white-eyed derivatives showed that, in each case, the we pogo element was still present (R.C. Woodruff, J.N. Thompson, J. Pettitt and K. O'Hare, unpublished). Both had additional insertions within the Doc element associated with the w 1 mutation, upstream of the pogo element. In one case the map matched that of a P element while the insertion in the second derivative was not identified. There is therefore no evidence that the pogo element was mobilised in these crosses. pogo is an old invader o f the D. melanogaster genome

Surveys of transposable element insertion sites in different strains by in situ hybridization to polytene chromosomes have found very little evidence for conserved insertion sites (reviewed by Charlesworth and Langley 1989), but the one site we have assayed by D N A blotting is occupied in five different strains. There are several possible reasons why an insertion might be conserved. It may simply be a hotspot for insertion of pogo, but in this case the strains all have a 2.1 kb element inserted and most o f the elements in the genome are not of this size. Having an element at this site may confer some selective advantage directly, or it might be linked to an advantageous mutation from which it cannot be separated by homologous recombination (genetic hitch-hiking). Finally, the element may have been present at this site before the strains diverged and has not moved since then. We have examined the distribution of pogo in other Drosophila species by D N A blotting. No homologous sequences were detected in D. simulans, D. mauritiana, D. yakuba, D. erecta, or D. pseudoobscura. These preliminary results suggest that pogo is restricted to D. melanogaster and may allow investigation of the mobility of pogo by micro-injection of cloned pogo D N A into eggs o f these other species. The absence of pogo from the most closely related species is consistent with horizontal transmission of pogo, as has been proposed for P (Kidwell 1983). The different usage of codons by pogo compared to most D. melanogaster genes supports the idea that pogo has not always been a part of the genome. Compared with P elements, pogo elements do appear to have more D N A sequence variation. P elements, even from different strains, are well conserved in D N A sequence (Engels 1989). For example, we have found only two different single base polymorphisms twice in over 16 kb of D N A sequence (K. O'Hare, A. Driver, D. Johnson-Schlitz and S. McGrath, in preparation) from 21 P elements of the strain 7z2. The D N A sequences of the seven pogo elements we have sequenced include seven different single base polymorphisms in only 6 kb. This may indicate that pogo invaded earlier than P and has had more time to accumulate sequence polymorphisms since the founding invasion. Whatever its history, pogo now seems to be a relatively stable part of the D. melanogaster genome. Acknowledgments. The work by M.T., J.P. and M.G. was as final year undergraduate research projects. M.L. was an exchange stu-

134 dent under the programme of the International Association for the Exchange of Students for Technical Experience. We thank Georges Periquet and Ian Boussy for unpublished information on the distribution of pogo elements, and Jim Thompson and Ron Woodruff for derivatives of w" where the pogo element might have been mobilised.

References Blackman RK, Gelbart WM (1989) The transposable element hobo of Drosophila melanogaster. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington, DC, pp 523-529 Calvi BR, Hong TJ, Findley SD, Gelbart WM (1991) Evidence for a common evolutionary origin of inverted repeat transposons in Drosophila and plants: hobo, Activator and Tam3. Cell 66 :465-471 Cavener DR (1987) Comparison of the consensus sequences flanking translational start sites in Drosophila and vertebrates. Nucleic Acids Res 15:1353 1361 Charlesworth B, Langley CH (1989) The population genetics of Drosophila transposable elements. Annu Rev Genet 23 : 251-287 Driver A, Lacey SF, Cullingford TE, O'Hare K (1989) Structural analysis of Doc transposable elements associated with mutations at the white and suppressor of forked loci of Drosophila melanogaster. Mol Gen Genet 220 : 49-52 Earnshaw WC, Sullivan KF, Machlin PS, Cooke CA, Kaiser DA, Pollard TD, Rothfield NF, Cleveland DW (1987) Molecular cloning of cDNA for CENP-B, the major human centromere autoantigen. J Cell Biol 104: 817-829 Engels WR (1989) P elements in Drosophila melanogaster. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington, DC, pp 437-484 Kidwell MG (1983) Evolution of hybrid dysgenesis determinants in Drosophila melanogaster. Proc Natl Acad Sci USA 80:16551659 Mariani C, Pirrotta V, Manet E (1985) Isolation and characterisation of the zeste locus of Drosophila. EMBO J 4:2045-2052 Masumoto H, Masukata H, Muro Y, Nozaki N, Okazaki T (1989) A human centromere antigen (CENP-B) interacts with a short

specific sequence in alphoid DNA, a human centromeric satellite. J Cell Biol 109:1963 1973 Mitchelson A (1988) The suppressor of forked lcous of Drosophila melanogaster. PhD thesis, University of London, London, UK Moerman DG, Waterson RH (1989) Mobile elements in Caenorhabditis elegans and other nematodes. In: Berg DE, Howe MM (eds) Mobile DNA. American Society for Microbiology, Washington, DC, pp 537-556 Morgan TH, Bridges CB (1916) Sex-linked inheritance in Drosophila. Carnegie Inst Washington, publication No. 237 O'Hare K, Levis R, Rubin GM (1983) Transcription of the white locus in Drosophila melanogaster. Proc Natl Acad Sci USA 80:6917-6921 O'Hare K, Alley MRK, Cullingford TE, Sanderson MJ (1991) DNA sequence of the Doc retroposon in the white-one mutant of Drosophila melanogaster and of secondary insertions in the phenotypically altered derivatives white-honey and white eosin. Mol Gen Genet 225:17-24 O'Kane C J, Gehring WJ (1988) Detection in situ of genomic regulatory elements in Drosophila melanogaster. Proc Natl Acad Sci USA 84:9123-9127 Poole SJ, Kauvar LM, Drees B, Kornberg T (1985) The engrailed locus of Drosophila: structural analysis of an embryonic transcript. Cell 40: 37-43 Rio DC, Laski FA, Rubin GM (1986) Identification and immunological analysis of biologically active Drosophila P transposase. Cell 44:21-32 Sambrook J, Fritsch EF, Maniatis T (1989) Molecular cloning: A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York Wada KN, Aota SI, Tsuchiya R, Ishibashi F, Gojobori T, Ikemura T (1990) Codon usage tabulated from the GenBank genetic sequence data. Nucleic Acids Res 18 :2367-2411 Woodruff RC, Blount JL, Thompson JN (1987) Hybrid dysgenesis in D. melanogaster is not a general release mechanism for DNA transpositions. Science 237:1206-1207

C o m m u n i c a t e d b y D.J. F i l m e g a n

The pogo transposable element family of Drosophila melanogaster.

A 190 bp insertion is associated with the white-eosin mutation in Drosophila melanogaster. This insertion is a member of a family of transposable elem...
1MB Sizes 0 Downloads 0 Views