.=) 1991 Oxford University Press

Nucleic Acids Research, Vol. 19, No. 6 1259

A transposon with an unusual LTR arrangement from Chlamydomonas reinhardtii contains an internal tandem array of 76 bp repeats Anil Day* and Jean-David Rochaix Departments of Molecular and Plant Biology, Sciences 11, University of Geneva, 30 Quai Ernest Ansermet, CH-1211 Geneva 4, Switzerland Received December 13, 1990; Revised and Accepted February 22, 1991

ABSTRACT TOC1, a transposable element from Chiamydomonas reinhardtii, is 5662 bases long. The 217 and 237 base long terminal repeat sequences of TOC1 are unusually arranged around the 4600 and 123 base unique regions: [217]-4600-[237][217]-123-[237]. Although TOCI contains long terminal repeats and most TOC1 elements are complete, features shared with virus-like retroposons, its unique 4600 base region is more similar to the structure of the LI family of non-virus retroposons: first, 11 3/4 tandemly repeated copies of a 76 base repeat are found 813 bases from the left end of TOCI, and second using the universal genetic code large open reading frames were not found in TOCI. The relationship between TOC1, virus-like retroposons and the LI family of non-virus retroposons is unclear and may be very distant since only poor similarity was found between the TOCI encoded ORFs and retrovirus polypeptides. The length of the tandem array of 76 base repeat sequences was conserved in most TOC1 elements and solo 76 base repeat sequences were not found outside TOCI elements in the C.reinhardtii genome. Nucleotide substitutions allow all copies of the 76 base repeat to be distinguished from one another. INTRODUCTION Dispersed repetitive DNA sequences that transpose via an RNA intermediate have been called retroposons [1]. One subclass of retroposons, virus-like retroposons or retrotransposons [2, 3, 4], contain long terminal repeats or LTRs and resemble the integrated provirus DNA stage of animal retroviruses [5]. A second subclass of retroposons, nonvirus-like retroposons, is composed of a diverse set of DNA sequences that can be grouped on the basis that they lack LTRs, produce a target site duplication of varying length and are often flanked by an oligo dA tract at the 3' end of their sense strand. Some members of this subgroup, e.g. Li elements from mammalian genomes [3, 6], encode a putative reverse transcriptase. Although retroposons were first identified *Present Address: Genetics

EMBL accession

no.

X56231

in animals examples of both virus-like retroposons [7, 8, 9, 10, 11, 12, 13] and nonvirus-like retroposons have been found in plants [14]. We have previously isolated a 5.7 kbp transposon, named TOC 1, from the nuclear genome of the unicellular green alga Chlamydomonas reinhardtii [15]. TOCI causes mutations and was discovered as an insertion in the oxygen evolving enhancer polypeptide one (OEE1) gene. The classification of TOCI poses a problem because although it contains LTRs with a 5-bp inverted repeat located 4 bp from the end that follows the 5'TG 3' CA rule of retrotransposons, the arrangement of these LTRs is unlike that of any other retrotransposon: the left end of TOCI contains part of the LTR (217 bp repeat) the remainder of which (237 bp repeat) is present at its extreme right end and is separated from the complete right LTR (tandem 237-/217 bp repeats) by a unique 123 bp sequence (see Figure 4). A simple model for generating the unusual split LTR arrangement of TOC1 is shown in Figure 1. In this model, TOCI elements are produced from a hypothetical transposon with the usual LTR structure of retroviral proviruses and virus-like retroposons. The LTR can be divided into the U5, U3 and R sequences [5] according to the regions of the LTR that are unique to the 5'(U5), 3'(U3), and repeated (R) at both 5' and 3' regions of the proviral transcript (see step 1 of Figure 1). The hypothetical retrovirus-like TOCI element is transcribed and converted into a linear double-stranded DNA molecule by the normal retrotransposon replication machinery [16, 17, 18, 19]. During or after replication either a DNA copy of the primer for minusstrand synthesis (reverse transcription) or another foreign nucleic acid species is linked to the ends of the linear double-stranded DNA molecule. When this molecule circularizes by ligation of its ends the 2-LTR circular DNA molecule shown in step 2 of Figure 1 is produced. Integration within the U3 region of the LTR produces either of the structures shown in step 3A or 3B. Therefore, in this model, the unique 123 base sequence of TOC1 is assumed to be derived from either the minus-strand primer or a foreign nucleic acid species. If a foreign nucleic acid species forms the 123 base sequence then the boundaries of the 237 and 217 bp repeats are shown in Figure 1 steps 3A (i) and 3B (i).

Laboratory, Biochemistry Department, South Parks Road, Oxford OXI 3QU, UK

1260 Nucleic Acids Research, Vol. 19, No. 6 If the minus-strand primer forms the 123 base sequence then the primer-binding site would also contribute to the 237 bp repeat (step 3A ii) or 217 bp repeat (step 3B ii). Aberrant replication of the minus-strand primer has been described during Moloney murine leukemia virus DNA replication; a DNA copy of the primer is found sandwiched between the two LTRs [20]. Of the two integration events drawn in Fig 1 steps 2A and 2B, integration in the right LTR (step 2B) produces an outcome (Fig 1. step 3B) which is consistent with the location of a putative promoter in the left 217 bp repeat of TOC1 [21]. This integration event predicts locations for the priming sites for minus-strand (-PB) and plus-strand (+PB) synthesis and the orientation which should contain open reading frames (ORFs) that resemble the polypeptides of virus-like retroposons. In order to gain a better understanding of the relationship between TOCI elements and other transposable elements we determined the complete nucleotide sequence of a recently transposed copy of TOC1. This copy of TOC1, designated TOC1.1, was isolated from the OEE1 gene [15].

MATERIALS AND METHODS Media, strains and crosses cc1373 and cc407 were from Dr. E.H.Harris at the Chkmydoionas culure collection, Duke University. cc1952 [22] was from Dr. P.A. Lefebvre (University of Minnesota). The FUD44-R2 revertant of the photosynthetic mutant FUD44 has been described [15]. The cell wall-less mutants cw15 [23] of both mating types (mt+ and mt-) were from P.Bennoun (Paris). Chlakinonwnas cells were grown in a medium containing acetate [24]. DNA manipulations Methods for DNA extraction, restriction enzyme digestion, DNA electrophoresis, DNA blotting [25], DNA hybridization and sizing [26] were as previously described [15]. Before reprobing blots, hybridization probes were removed by heating in distilled water at 80°C for 2-10 minutes. DNA restriction fragments purified on agarose gels were used as hybridization probes. Hybridization probes B and C of Figure 7 were generated by SphI and BamHI digestion of the cloned 4.1 kbp HindIll-PstI internal fragment of TOCI [pTOC1.1R1 in 15]. To remove contamination of the gel-purified 233 base SphI + BamHI (probe B in Figure 7) and 349 base SphI fragments (probe C in Figure 7) with the 1.3 kbp BamHI-SphI fragment (see Figure 7), 32plabelled hybridization probes were pre-hybridized with singlestranded 76 base repeat-specific plasmid (probe A in Figure 7) bound to nitrocellulose. This 76 base repeat specific fragment (probe A) was subcloned from pTOC1. iRI by using DNAseI and Klenow enzyme [27], followed by BamHI digestion; the DNAseI generated end is located at base 67 of the tenth copy of the 76 base tandemly repeated sequence in TOC1. To estimate contamination of probes B and C with 76 base repeat sequences, these probes were hybridized to filters bearing a 76 base repeat plasmid (probe A) and dilutions of pTOC1 . IRI. No contamination of the 233 base fragment (probe B) with the 1.3 kb BarnHI-SphI fragment was detected while the 349 base fragment (probe C) hybridized to itself with a fifty fold higher intensity than it did to the 76 base repeat plasmid (probe A). Restriction fragments were labelled by the random primer method [28].

DNA cloning Internal TOC1. 1 restriction endonuclease fragments with BamHI, BglI, HindI, NcoI, PstI and Sacd ends were cloned into Ml3mpl8, Ml3mpl9 [29], and blue-scribe M13+ and M13phagemid vectors (Vector cloning systems, San Diego). For directed nested deletions: DNA fragments bearing one DNaseI generated end [27] and one BamHJ or PstI end were cloned into SmaI + BamHI or SmaI + PstI digested blue scribe M13 +. Ligation mixes were used to transform competent cells according to the standard transformation protocol of Hanahan [30]. Phage or phagemids were grown in E. coli JM109 [29] or E. coli XL1-blue [31]. M13 recombinant phage particles were prepared according to Messing [32]. For bluescribe phagemids, phage particles were prepared using M13K07 helper phage according to the supplier (vector cloning systems). Single-stranded template DNA was isolated from phage according to Dente and co-workers

[33]. DNA sequencing Single-stranded DNA template was sequenced using the modified T7 DNA polymerase according to the supplier (sequenase, United States Biochemicals Ltd.) with 35S-dATP (Amersham, UK). Out of a total of 42 single-stranded templates sequenced using the

-PB U3 IUS R

(

+PB U3 1 U5 R

Primer (?)

Primer ()-PB

-PB

x

3B

3A

_~~~~+B (ii) 237j,217,:,23

Transposition cycle i

2_

---PB A= (I) : 2I7~~

+Fe

2374 21''

3

(i ): 217:j

237:W.. 73 2 j [

QI9

Transposition cycle

Figure 1. A model depicting the possible origin of TOCI elements. The model suggests the possible origins of the 123 base unique region, and 217 and 237 base repeat regions of TOCI from the U3, R and U5 LTR regions of a hypothetical progenitor of TOCI with conventional LTRs. The locations of the promoter (P), primer binding site for minus strand synthesis (-PB) and priming site for plusstrand synthesis (+PB) are shown. The prmer for minus-strand synthesis is usually a tRNA [5]. The 3' OH priming plus-strand synthesis is formed by RNAseH action on the original RNA template base-paired to the newly synthesized minusstrand DNA [5]. 1. A hypothetical TOC1 element with a standard LTR arrangement is transcribed to produce a full-length transcript indicated by a wavy line. 2. Reverse transcription and second strand DNA synthesis via the standard retrovirus replication mechanism produc a linear double standed DNA molecule. During replication a copy of the primer for minus-strand synthesis (reverse transcription) or a foreign DNA species is attached to this linear dsDNA molecule. Circularization via ligation of the ends of this molecule produces the circle shown in step 2. 3. Integration in the left (2A) or right (2B) LTR of the circle produces the outcomes shown in 3A and 3B. (i) represents the consequence if the 123 base sequence is a foreign DNA species (ii) if the primer for minus-strand synthesis forms the 123 base sequence then part of the 237 base repeat (3A) or 217 base repeat (3B) will be composed of the original minus primer binding site.

Nucleic Acids Research, Vol. 19, No. 6 1261 r 217

217 qp ATacr^nxc _AGSGCAACTGC

r

141 _CAMWI

(

:

TG=TTGTWSTICiG 59 MA

ASCGTCCCSc

G

_ 217

_

_

_217 p

_

J

421

561 701

*

*41

*(4) I (2)

*

_ w _ 7*

_

*

_

F76 bp (17 * (5 am HI.. . I(3) _~~~~~~~~~~~~~~~~~~~~~~~~~~~~T lXASCXS=AA 76 bp Ip .

=CcTiWTTTATGGACAA7GGRCGGCATAK

91

1121

_

I

(7)

__

*

*

rnGCGCCCTTOC

_

~

~

~

GTT TAS

~

~

SCitS00O

GGACATGOXCC

76 bp 76 bp

(8) _ _ _ _ 76 bp . I 19) . . . . . . I (10) A-G ------CCGCAGOCAACGGOCTVGA CCcC C CC C CGTATC-G 76 bp OAF16 Y G L C A A G N G L R A L A S N T N R R V R v L N I R P V R R K Q R A P H P C F M

1261 1401

. (11). . . . . . (12) 1541 ORF16 D N S R H T C P Y D V T C V P O A T A S H P C F N V I D G I R A L M I R P V C R K Q R A P H S

76 bp

76 1 (13) * * * r (14) r 1681 _A GI C ORF16 C V V D Y R R H ORF5 T A L K C L R A F V V H A F A L N T N C G L S A T F G Y G V C A A S N R L C A S H

.

(15)

r

1821 ORFS

ATG T G __G H V S C D N A C V P H A I N L P A L S L N T L R L V A C P A A C P V A I

1961 _ TGCG GC A ORF5A P R Y C F V L T T E R A Y V I N R H P V R L E V R T H I

2101

0RF5

Y N H S N C A L G P S A E F V F P D I

2241 06

I

TTGAGTGTGCA=

R L E C T V H G K P

T AC G L S R G T T L V V N L T H P K P T

S P S V S D V E F C A V R P A C Y A 0 P I T I

P H D A

L Y R N T H K H A F V P C T R H R T L C H A V S P T L R N Q H F V A R Y G R S N D Y S L P G

2381 CCAGCA TCGOCT ORF6 G F G N R P C V G L G C C C S R N G P P P S H G T H P P S P L V G P P A C T H A S H P L A I L 2521 AT

2661 ORF19

L L S R C Q A P C N Y L

N

2601 A CRF19T R Q A G L C R S V G R L Y G T A S V S R Y G L M L T V V T R L A C C A A R S L P I L Q D N Y ORF7A R N P P V N H G I C V T L N P H A Y R S H A A C V L C G T L P A N P S G H V 2941 ORF19 A Y N L L R Q S H S R G S V F C Q C L N N P E T Q A N V R I S S A T H V N A N C N C R L P T ORF7A C I H V T S S E P ORF29 T Q R P K P T Y A L V P QR T S N P T V C V A C P L

3061 ORF19 R E N T P R E P T L V P F S P G L T S P G G A I T Y H R S A R S S V L Y V I T I K S V N R H R L L H P A G Q L L I T E V L G A V C Y N S L L L R A Y G D T G 3221 T_U OF19 L T C G 0OF29 S H V G S Q A N Q A S Q L S P A S S Q Q Y H D V P V R G R V R S L A A G V A V S S L T L D D

CF299A S G R L G N Q P N F L S A P A

3361 CRF29 PS A T A V H Y N H V P R V A H P P A P P S V R S L N L G -------R G P N R I Q P A I A A P S A P Y F 3501 Ol ORF29 L P S L R L L S L ORF9 G V T L H V A L P 3641 _ L __TTAT CRF9A I T V H P T A A V A S G I G T Y G T L C A C I P A A F G G K H S R A R L S C V R R G D A I Y

3761 ORF9 G C T Y C L I G S D S Y A R H C K S Q A N H L A S K K E I

TTCG S V R G H A A R L G N A A R I

L H H

3921A

06F9

D G A G E R R A G C R A 0RF31 T K G RL P R V

.T

R V R K T

A.

L V L Q L A P S P A N RN

L A S L S Q L R V T T L L L

C= & 0OF31 F P R L F R A A Y N V N T C Y G N A L G A G R E Y R G L Q N F V P T R A N T H C Y L S N A N 4061 AT

4201

TCiGTGC

O0F31

G R P Q L P T S N T P E 0 G C N C F P V T P R R R L H G N R T R D N A C V F L S N P L V N L I

43411

031 R L

4491

TO

MGAGTWAACGTTCGTGOOCAAG

4621

r 237 bp

4761

1~0T50TMGmC1'00CT09T0G9GCTT¶0TooT'y~6A(~To0ArrTvorro

~ -

4901

ir217 bp

237bp

5041

_

5181'

T CG T T

5321

_

5601

_

_

___

-

217 Np,r 123 bp

237/217

Tt0X0C0 C217/123 123 bp

AOF23

5461 A xGAGI R A G N

0Ul23

_

237 bp 237 Np

C G G T R G R C G V R G C

GG H G G L G G V I

gr

237 bp

L D S A A I R N F I N A T N P V L A A Sca I

1 x; 1123/237 L S A T S A T S T

GTWCW

G S C E A G G R R P L R A T N E L L G R L R H E N R L L S V

TY00GANTV0TT0G0GccT0Gc0GnI0TAAcGCT_A00Ta

.

237 bp T

237

bp

1

3'N9i

Figure 2. Nucleotide sequence of TOCl.1. Numbering starts from base one of the left 217 base repeat. Locations of the 217, 76 and 237 base repeats and 123 base unique region are indicated on the right. The 76 bp repeats and four truncated copies (rpt nos. 12, 13, 14, 15) are numbered in order of distance from base one at the left end of TOCI. Also indicated are the 5' (base 165) and 3' ends (base 5648) of transcripts that map within TOCI [21]. The amino acid sequences of ORF7A (47 amino acids) and most of the ORFs greater than 200 bases, classified as coding (ORF9, 92%) or no opinion (40%, ORF5, ORF6, ORF16, ORF23, ORF31; 77%, ORF19, ORF29) according to Fickett [40], are shown. ATGs that precede ORFs that score positive or no opinion according to Fickett [40] are underlined; ORFs truncated at their 5' end by ATG need not give the same score as the complete ORF. Unique BamHI (G GATCC) and ScaI (AGT ACT) cleavage sites that lie adjacent to repeat sequences are indicated by open triangles above the sequence.

1262 Nucleic Acids Research, Vol. 19, No. 6 standard dGTP reaction mixes 24 were repeated using dITP to resolve compressed bands. Chemical sequencing was according to Maxam and Gilbert [34]. Sequence ladders were fractionated on denaturing 6 or 7% (W/V) polyacrylamide gels [35] and visualized by autoradiography.

Sequence analysis DNA sequences were manipulated using the PC Gene programmes written by A.Bairoch, Department of Medical Biochemistry, University of Geneva. Sequence comparisons with release 17 of the EMBL data bank were made using the seqft and seqh programmes of Kanehisa [36, 37].

Autoradiography and fluorogaphy Hybridization patterns were visualized on Kodak XS5 films by autoradiography at room temperature (Fig. 7A) or a mixture of autoradiography and fluorography in cassettes containing intensifying screens (Ilford, fast tungstate) at -70°C (Figs. 7B and 7C).

RESULTS Both DNA strands of TOC 1. 1 were sequenced. The sequences of the 217 base and 237 base repeats, and 123 base unique region at the right end of TOCI had already been determined [15]. Figure 2 presents the nucleotide sequence of the 5662 base TOCl.1 element and is annotated with the positions of repeated DNA sequences and open reading frames (ORFs) discussed below. TOCI is 59% G+C rich which is similar to the 62% G+C value for C.reinhardtii nuclear DNA [38].

Open-reading frames in TOC1 Figure 3 shows the positions of ATG start codons and UAA, UAG and UGA stop codons in TOC1.1. Since the 5' to 3' orientation of the near-full length TOCI transcript [21] is left to right with respect to the TOCI orientation in Figure 3 we expected larger ORFs in this sense orientation. However, translation using the universal genetic code shows that TOC 1. does not contain large ORFs (Figures 3 and 4). The largest ORFs in TOC1.1 are ORF 5 (180 amino acids), ORF 24 (196 amino acids), ORF 29 (175 amino acids) and ORF 35 (207 amino acids). The shortest ORF shown is ORF7A (47 amino acids) followed by ORF22 (71 amino acids). ORFs 2, 3, 4, 14, 15, and 25 which encode a 75 amino acid long peptide and ORF16 which encodes

a 96 amino acid peptide all resemble each other because they are translated from an internal tandemly repeated 76 base sequence in TOCI (see Figure 4 and below). None of the the ORFs follow the general pattern of codon usage in C. reinhardtii nuclear genes [39]. ORF 9 scored positive as a protein coding sequence, ORFs 5, 6, 16, 19, 23, 27, 29, 30, 31, 34 and 37 got a no opinion score of 40-77%, while all the other ORFs in Figure 4 were non-coding according to Fickett's analysis [40]. Visual inspection of TOCI ORFs did not reveal extensive similarity to the amino acid sequence motifs normally found conserved between retroviruses and reotransposons [41, 42, 43]; a finger peptide (ORF7A) and protease consensus (ORF29) were found (Figures SC and SE). The finger peptide is of the CC-HH type found in TFIIa [44] and the carboxyl terminus of putative reverse transcriptases [45] rather than the CC-CH type found in the gag region of retroviruses and virus-like retroposons [46]. Two adjacent aspartic acid residues are found in the most highly conserved region of reverse transcriptases [42, 43] and are present in RNA dependent polymerases [47]. A DD motif is present in ORF29 (Figure 5F). However, the two threonines (residues 116 and 123), in the DD region of ORF29 (residues 111-124 of ORF29) are not found in other polymerase segments [47]. The absence of long ORFs in TOC1.1 suggests that it transposed into the OEE1 gene using functions provided by other loci. However, if TOCI is derived from an ancestral element which once encoded proteins necessary for its transposition we may expect to find short ORFs which are remnants of these proteins. Furthermore, since there are no functional requirements for these ORFs we would not expect the pattern of conservation of these ORFs to correspond to the peptides normally found conserved between functional DNA or RNA-mediated transposons. To address this possibility, and thereby not limit ourselves to previously identified conserved regions of retroviruses and retrotransposons, we compared 42 ORFs in Figure 4 (all the ORFs except ORF7A) to a translation of around 17,000 DNA sequences present in release 17 of the EMBL nucleic acid data bank using the seqft program of Kanehisa [35, 36]. The sequence similarities found were weak and some of these corresponded to retroviral polypeptides (Figures SA, 5B and SD): 17/58 (29%) residues between ORF16 and the putative gag polypeptides of HTLVI and II [48, 49]; 11/27 (41 %) residues between ORFS and the precursor envelope proteins of HIVI and AIARV-2 [50, 51]; 15/69 (22%) between ORF31 and a polypeptide of unknown function (molecular mass of 11,000) that is encoded by URFPX-1 located after the env gene of HTLV-1

217

H

m * 3.

5s

5' i

i2

A, I 11 .. "1.1 ," ". '..

k

3.

*

3

III II

I'

I'

3

3' 2

S kb

I11111I1 IIi1 11 I~ , ' I II 1

2

m-2-

M

76 tBtTrTrrTtrrrnmn 314 24 1 221 13 L 24

5 1267

141116 1_7 12L_Lj 12711281

4336 T1L3 E= 4737

'40

123

M

2TT

E -11119 29

19 30 20 30

237 217

31 31

ri

221

2

13

1 237

3]

E1,1 1 [J1 5.

,Sh

Figure 3. Positions of possible ATG start codons, and UAA, UAG and UGA stop codons within TOC1. 1. The six reading frames are indicated by horizontal lines. ATG codons are represented by vertical lines above the horizontal. Stop codons UAA, UAG and UGA are drawn as vertical lines below the horizontal.

Figure 4. Overall stnictural organization of TOCI. In addition to the 217 and 237 base repeats, the position of the 76 base tandemly repeated sequences that start immediately after the six base recognition site of BamHI site are shown. The positions of the BamHI (Ba), HindI (H) and two of the MIuI(M) sites of TOC1. I are indicated. The largest ORFs in Figure 3 are indicated and assigned numbers from 1 to 42. The cut off values for the ORFs shown are 200 bases reading left to right (except ORF7A which is 141 bases) and 290 bases reading right to left.

Nucleic Acids Research, Vol. 19, No. 6 1263 (weak resemblance to pol) rather than to the right of ORF29, the weak similarity between ORF5 and the precursor envelope proteins of HIV-I and ALARV-2 may be spurious. A mixture of 14-mer oligonucleotides complementary to DNA sequences that encode most of a pentapeptide that is conserved between reverse transcriptases Y-M/V-DD-I/M was previously found to hybridize strongly to an internal region of TOCI that lies within a 500-bp stretch right of the unique BamHI site [ 15]. This hybridization is to the 76 base repeat region (see next section) and is misleading since the polypeptides encoded by the complementary sequence bear little resemblance to the pentapeptide of reverse transcriptases (Figure 5G). We were suprised that this duplex of 10 bases with one mismatch was stable under the hybridization conditions used [5 xSSC/ 30% (V/V) HCONH2/250C].

[48]. The lengths of the polypeptides stretches compared are too short to be statistically significant [52]. However, it is interesting to note that the relative order of ORF16 (weak gag similarity)ORF7A (finger peptide)- ORF29 (DD containing ORF, Fig. SF) with respect to the 5'-3' TOCI encoded transcript is the same as the order (5' gag with finger peptide-reverse transcriptase 3') found in retroviruses and retrotransposons (see Figure 4). The protease region is closer to the reverse transcriptase region in ORF29 than found in virus-like retroposons. However, in another class of TOCI element (TOC1.2) these regions are separated by an insertion of 28 bases and are in different reading frames [53]. Normally the envelope precursor protein is found after the pol gene in retroviruses, is absent in many retrotransposons and is not well conserved in evolution [42]. Therefore, because ORF5 is located between ORF16 (weak resemblance to gag) and 0RF29

P H L ID GI AP

7.

AC16

S

ETLVU

amino

86.C

AXAAV-2 *nvlop. MN 375 ev*elop. 16.

E

TXXIA (Rpt . no. 7)

69.

111.

74. F

YGDTGS LIDTGS L.iDTGA

0W29 Poly rase o

...

TA

5' oligo

reverse

YmDDi

YNDDL

FvDDM

m

M

H D G I 5' ACA ATG GAC GGC AT 3' 76 bp rpt (nos. 1, 2, 3, 4, 5 & 6) ...

124.

-hh-YGDD-hhyy M yy

s

V

...

AvCDVsNyuuHDioHH

AvSSsLTDDPSATA

remrse tremcriptase ETV-1

3' ATG TAC CTG CTG

43.

AACVLC-GPAWSG-HVCiHVr

LvLAPsPAASLrmLLF-P=F_mRwmn GzRGI.*vPmmnacL LLLRPPPAPcUVLLSGLL'LLFL.PLFFSPULLtSPSWISwpAwFLPWAPSt'AAA-FL

Os29 (175 amino acids) 17.6 protean. smvz

GT

(47 amino acids)

84.

D

0W31 (137 ano acds) TVS (VW P-1 )

a'r 16

22.

CO7A

SLWrRLVACPAACPVAIRLECTVHGK SLWSrPvLTcTN SLWDQSLKPCYKSTPLCVSTICTbLTN

acids)

& 25 relative to

C GPC

TOPCFR

G,VLWOP-RR-PPP 60.

gag

CWS (180

64. WI amino acid substitutions GA L in ORFs 2, 3, 4, 14, 15

GWLRAIASMWa-RVRVIMRIPVRAP VRRHPCFWHSRHTCPYDVTCVPQ6TSHPC GVLRACQwTPIDLWrPI-K-vLwPCFRPCrC

(96 amino acids) peg

ETAVI

L A GPR SH TA AL

C RW R G

C

transcriptase

..

...

5' TAC ATG GAC GAC AT 3' Y M D D I

H

Figure 5 A-F. Comparison between TOCI encoded ORFs and retroviral polypeptides. The sequences of polypeptides are given in the single letter code for amino acids. Positions of identity are indicated by large capital letters and dashes were used to indicate gaps introduced to optimize alignment. Polypeptides were translated from: bases 1777-1941 (gag) and 6921-7127 (URF PX-1) in HTLV1 [48] , bases 1800-1964 (gag) in HTLVII [49], bases 382-462 (env) in HIV NY5 [50], bases 6103-6183 (env) in ALARV-2 [51]. The 'zinc finger' motif was from the seventh 30 amino acid tandem repeat in transcription factor EIIa or TFIia [44]. Protease and reverse transcriptase consensus sequences were from Fuetterer and Hohn [41]. The polymerase consensus was from Argos [47]. HTLVI and II are human T-cell leukemia viruses I and II. HIV NY5 and ALARV2 are different isolates of the human immune deficiency retrovirus. RSV=Rous sarcoma retrovirus, 17.6 =retrotransposon from Drosophila melanogaster, Tyl = retrotransposon from Saccharomyces cerevisiae. F) Polymerase consensus hy =hydrophobic amino acid and a dash indicates any amino acid. G) The basis of the observed hybridization of an oligonucleotide encoding the YMDDI/M conserved pentapeptide of reverse transcriptases to the 76 bp repeat region of TOCI [15]. IDENTITY POSITION

NO.

RPT

3 5 2

CC -

..

.

..

..

CA .

..

.

T ..-

.

AC

60

S0

40

30

20

10

70

76/76 cG=IITAcTGCCCTTATGATAC e =

e

I .. .............. T .. -' -- -- .. T .. ---..--..-..--..

.

.

.

.

.

.

..

..

.

.

.

.

.

.

.

..

..

..

4.. T ...... 6 . . 1..... G C .T..C....... .... .T. CTC. .G. 108 ......G. .........C A -- ---C---. C.C-------CGTC--G--.9..T.

..T

. - -T ..A ......T . -

11*A*-l

.

.C.:G: ...

T *--A

-

..

.

...

G --

T*--------

* AA*T.*G--G .... TT*-AA... > 15 . -.-.-GT*G ---T ....T .... IT*--GCT .......T. A* -T-- --C TT ....> G 14 .*.. 13 12

.

69/76 67/76 67/76 66/76 49/59

965-1040 1117-1192 889- 964 1041-1116 813- 888 1193-1268 1269-1344 1497-1572 1345-1420 1421-1496 1573-1646 1647-1705

36/48 27/33 18/21

1839-1886 1783-1815 1735-1755

75/76 74/76 74/76 73/76 73/76 70/76

Figure 6. Heterogeneity in the 76 base repeat of TOC I. Repeat sequences have been compared to repeat no. 3. Positions of identity are indicated by dots. Base substitutions are indicated by the single letter code for bases. Deletions are indicated by a dash and insertions by an arrow. The boundaries of the truncated versions of the 76 base repeats (repeat nos. 12, 13, 14 and 15) are marked by arrow-heads. The identity column indicates the number of matches scored over the number of bases compared. The numbering scheme for bases is relative to base 1 of the left 217 base repeat of TOC1. The DNA sequence shown is from the top strand of TOClI..

1264 Nucleic Acids Research, Vol. 19, No. 6 P

111

_n 237 217'123 237

27A 217'

_

2

A) a\

I

3

-

-

A

4

HA) 237 217 123 237'

5

6 m

217'\

^ _ t A ~~~~~) 217.'.... ,

f\

237 217

*w

7 _

m

i::

.

ill,

_

--.U.

8

51 3'

(I H

T

237

Repeated sequences in TOC1 TOC 1.1 contains a 217 base repeat at each end and two copies of a 237 base repeat at its right end. No sequence differences are found between the two copies of each repeated sequence [15]. TOC 1. 1 also contains 11 3/4 copies of a tandemly repeated 76 base sequence that starts with guanine at base 813 of TOCI or adjacent to the unique BamHI site in TOC1 (see Figures 2 and 4); note that the 76 base sequence can also be considered to start with cytosine at base 812. Three truncated versions (repeat nos. 13, 14 and 15) of this 76 base sequence are found to the right of the tandem 76 base repeat array (Figure 4). Figure 6 is a comparison of the repeat units of the 76 bp tandem array and the 3 adjacent truncated versions with the third 76bp repeat unit. Nucleotide substitutions allow all the 76 base repeat units to be distinguished from each other. In general, repeat units located further away from repeat three show a higher degree of sequence divergence than the more proximal repeat units. Of 83 differences between repeat no.3 and the other 76 bp repeat units, 56 were transitions, 24 were transversions, two were single base deletions and one was an insertion of a single base. Are the 76 base tandem repeats found outside TOCJ elements? The 76 base tandemly repeated sequences are present on a 1.3 kbp SphI fragment. Southern blots of SphI digests of DNA from a number of C. reinhardtii strains (cc407, cwlS +, FUD44-R2, cwlS -) and DNA from cc1373 (C.smithii) a species interfertile with C. reinhardtii, produce a multicopy 1.3 kbp band when probed with a hybridization probe specific for the 76 base repeat (Figure 7A). Larger bands are found in the cc 1952, cc 1373,

217

9

.d

.

--VI 237

217

5'AGTCAATAA GCGCTGACT

Figure 7. The 76 base repeat is not found outside TOCI elements. Southern blot of 1% W/V agarose gel containing SphI digested total DNA from the indicated strains run alongside copies/genome lanes consisting of dilutions of a plasmid containing a full-length copy of TOC1. 1 digested with SphI. For copies/genome estimations the haploid C.reinhardtii genome size was taken as 100,000 kbp which is an approximation because the nuclear genome is 70,000 kbp [62] and total DNA can contain upto 15% (by mass) of chloroplast DNA. The locations of the probes used in A (756 bp), B (233 bp) and C (349 bp) are shown below the map of TOCI. The blot probed in A was reprobed in B and C. Six months passed before the blot was reprobed with probe B. In the intervening period the blot was hybridized with a TOCI left junction probe which gives a different hybridization pattern from that given by probe A (not shown). The hybridized probe B was stripped off before probing with probe C. Wash conditions: in A, 0.1 xSSC, 0. 1% SDS, 60°C; in B and C, 0.1 xSSC, 0. 1% SDS at 50°C (1 xSSC is 0.15 M NaCI, 0.015 M trisodium citrate, pH 7.5).

.

217 AGTCAATAA GCGCTGACT

237 217 123 237 ,,T

GCGC TGAC T

AG TC AATAA

50

10

5-TGGTGGCRCCCGTGGG [123] TOC 3:RCCRCCGUCGCCRCCC Bovine Leucine tRNA 8O

71

figure 8. A model for retrotansposition of TOC1. Step 1. Initiation of transcription is shown in the left 217 bp repeat. Polyadenylation takes place in the far-right 237 base sequence. Step 2. The primer for minus strand synthesis is postulated to bind to the 123 base sequence. Possible binding sites include a stretch of 16 bases from base 39 of the 123 base sequence which are complementary (with two mismatches) to the 3' end of bovine leucine tRNA [63]. Step 3. The primer is extended to the end of the 237 base repeat. Step 4. The first jump of newly synthesized minus strand DNA to the 3' end of the polyadenylated RNA; complementarity is provided by sequences of the 237 base repeat. Step 5. The minus strand is extended on the RNA template to left 217 bp repeat. Step 6. Although plus strand synthesis is shown to begin within the 123 base sequence, initiation of plus strand synthesis anywhere left of the far-right 217 base repeat is compatible with this model. Plus strand synthesis continues to include the end of the 217 base repeat in the minus strand. Step 7. Second jump involving the plus strand to the 3' region of the minus strand. Minus strand synthesis continues to the 5' end of the plus strand and elongation of the plus strand takes place to a region close to or including the primer of minus strand synthesis to produce the linear double stranded DNA molecule shown in step 8. This linear molecule is shown as the substrate for integration. Only the junctions between the outermost 237 and 217 bp repeats direct integration.

cc407, cwlS +, FUD44-R2 and cwlS- lanes. A smaller 1.2 kbp band is found in cc1373. Our previous restriction mapping studies have shown that a number of restriction enzyme sites conserved in TOCI elements are not present in the two TOC 1-related DNA sequences in strain cc1952 [53]. Therefore, the absence of the 1.3 kbp SphI band in this strain is not suprising (Figure 7A). Non-TOC1. 1 SphI bands of 2.3 and 5.2 kbp in cc1952, and 2.0, 2.2, 2.7 and 4.7 kbp in the other strains that hybridize to probe A also hybridize to probe B (Figure 7B). Since probe B is adjacent to probe A in TOCI this means that the bands which are larger than the 1.3 kbp SphI band are flanked at their left end by TOC1 sequences. This suggests that these SphI bands are probably derived from diverged TOCI elements that have lost the left SphI site of TOC 1.1 rather than isolated 76bp repeats unlinked to TOCI. A 0.35 kbp SphI fragment (probe C) that is located to the immediate right of the 1300 SphI fragment is present in all

Nucleic Acids Research, Vol. 19, No. 6 1265 the strains tested except cc 1952 (Figure 7C). This indicates that the SphI site that flanks the right end of the tandem array of 76 base repeats is conserved in most TOC 1 elements. Probe C contains 34 bases of a truncated 76 bp repeat (repeat no. 15 in Figure 6) sequence and as a consequence hybridizes to the 1.3 kbp SphI fragment. The larger faint bands of 1.5, 3.3 and 4.4 kb in the 10 and 25 copies/genome lanes, which contain dilutions of a SphI digest of plasmid pTOC1. 1, are due to contamination of probe C with other TOCI sequences.

DISCUSSION TOCI was devoid of large ORFs and although none of these showed strong similarities to the functional proteins involved in DNA or RNA mediated transposition some ORFs contained short amino acid stretches that exhibited a weak resemblance to polypeptides encoded by retroviruses. The significance of these resemblances is unclear and could be fortuitous or indicative of previously functional genes encoding retrovirus-like polypeptides that have diverged since becoming defective. Some of the TOCI encoded ORFs may be functional and poorly related to the previously identified transposition enzymes of DNA and RNA mediated transposons. Non-functionality of TOCI ORFs cannot be deduced from their unusual codon usage, which does not conform to the pattern of other Creinhardtii nuclear genes [39], or their lack of identification as protein coding regions according to Fickett's criteria [40]; the codon usage of the copia and 1731 retrotransposons of Drosophila melanogaster are different from other D.melanogaster genes [54] and although a positive Fickett's analysis score is usually conclusive, a classification of a proteincoding region as non-coding or no-opinion can result from unusual codon preferences [40]. Our analysis ofTOC 1. 1 putative protein coding regions was relatively simple and we did not consider ribosome binding sites or initiation codons, nor did we raise the possibilities of splicing, frameshifts and suppression in the generation of open reading frames. Suppression of any one of the three nonsense codons does not reduce the number of ORFs in TOCI to the large gag and pol ORFs found in retrotransposons. In common with virus-like retroposons TOCI elements are repeat units of a discrete size bounded by LTRs which contain signals for initiation and termination of a full-length transcript [21, 53]. However, no sequences immediately adjacent to the 217 or 237 bp repeats were found that resembled the priming sites for first and second strand synthesis in the replication cycle of retroviruses [5]. Furthermore, the internal organization of TOC1 elements does not resemble virus-like retroposons and is distinguished by a lack of large open reading frames and the presence of a tandem array of 76 base repeats located 656 bases from the 5' end of the TOC I transcript (813 bases from the left end of TOC 1). These features are associated with the LI family of non-viral retroposons found in mammals [6, 55]. LI related elements have been found in fruit flies [56], trypanosomes [57] and maize [14]. The 5' region of LI elements in the African bushbaby, a prosimian primate, was found to contain six to eight tandemly repeated units of 73 bp starting at 730 bases from their 5' end [55]. Although there is no sequence similarity between the 76 bp repeat of TOCI and the 73 bp repeat of the L1 elements of the African bushbaby the correspondance between their sizes and location is striking. Since neither 76 nor 73 is a multiple of three this region cannot represent the repeated motif of a polypeptide. However, similar polypeptides encoded by ORFs that span three adjacent 76 bp repeat units are found in TOCI:

namely ORFs 2, 3, 4, 14, 15, 16 and 25. It is interesting that these ORFs and ORF16 in particular show weak similarity with the gag region of HTLVI and II. The 76bp repeat sequence is present in all TOCI elements and the size of the 76bp tandem array is conserved in the majority of TOC 1 elements. If the tandem array arose by duplication of internal TOC I sequences the 76 bp repeat units must have been identical at their inception. Unequal crossing over together with replication slippage within the 76 bp tandem array of TOC1 would preserve the homogeneity of repeat units and give rise to tandem arrays of different length [58, 59]. Since the length of the 76bp tandem array has been preserved in the majority of TOCI elements from different C.reinhardtii strains and C.smithii, this indicates that the length of the tandem array must be important in TOCI amplification or that the general mechanisms which change the length of a tandem array occur at a lower frequency compared to TOCI amplification. Selection against unequal crossing-over and replication slippage may be expected to lead to sequence divergence of the 76 bp repeats through mutation. Replication slippage that preserves the length of the tandem array would be allowed and provides an explanation for the observation that adjacent 76 bp repeat units are in general more similar than non-adjacent 76bp repeat units. A model for TOC1 transposition The model in Figure 1 provides one explanation for the evolutionary origin of TOCI elements. However, our sequence analysis of the internal organization of TOCI does not support this model and we think it unlikely that all present-day TOCI elements are products of hypothetical elements with intact LTRs. We believe that TOC 1 elements with their split LTR structure are capable of transposition. Our view is supported by the observation that the majority of TOCI elements in a number of strains contain a split LTR structure [15, 53]. As a working model we consider that TOCI transposes by an RNA intermediate. This is consistent with the observation that TOC 1 elements produce a near full-length transcript that is initiated in the left 217 bp sequence and polyadenylated in the far-right 237 bp sequence [21]. The source of reverse transcriptase involved in TOCI transposition could be provided by an element related to TOCI or a completely unrelated locus. Other retroposons have not been identified in C.reinhardtii and a second group of transposable elements in C. reinhardtii resembles the DNA mediated transposons of higher plants [60]. Although an ORF encoding a reverse transcriptase-like protein has been identified in C. reinhardtii, it is located in the mitochondrial genome [61] and unlikely to participate in the nucleus. Simple retroposition models based on the analysis of pseudogenes [1] and LI-related elements [14] would not regenerate a full-length TOC1 element from the near full-length TOC 1 RNA which lacks 5' and 3' terminal sequences, nor do these models provide a mechanism for maintaining the homogeneity of TOCI LTR sequences. In order to regenerate a full-length TOC 1 element from a near full-length TOC1 transcript we propose the model shown in Figure 8. Although this model is speculative it predicts replication intermediates and primers whose reality can be addressed by future experiments. Recent reports [64, 65] show that the PAT family of transposable elements in the nematode Panagrellus redivivus resemble TOC 1 elements in terminal repeat organization. The split LTR arrangement is therefore not unique to TOCI elements and raises the possibility that PAT and TOCI elements may represent a new group of transposable elements that share a similar transposition mechanism.

1266 Nucleic Acids Research, Vol. 19, No. 6

ACKNOWLEDGEMENTS We thank Dr Yves de Chastonay (Fribourg) for communicating results prior to publication, Otto Jenni for art work and photography, Ken Johnson (Oxford) for photography and Drs. E.H.Harris and P.A.Lefebvre for sending us C. reinhardtii strains. This work was supported by grant 3328086 from the Swiss National Foundation to J.-D.R.

REFERENCES 1. Rogers, J. H. (1985) Int. Rev. Cytol 93, 187-279. 2. Weiner, A.M., Deininger, P.L. and Efstratiadis, A. (1986) Ann. Rev. Biochem 55, 631-661 3. Finnegan, D.J. (1989) Trends Genet. 5, 103-107. 4. Boeke, J.D., Garfinkel, D.J., Styles, C.A. and Fink, G.R. (1985). Cell 40, 491-500. 5. Varmus, H.E.: Retroviruses. In Shapiro, J.A (Ed). Mobile genetic elements. Academic Press, , NY, 1983, pp. 411-503. 6. Loeb, D.D., Padgett, R.W., Hardies, S.C., Shehee, W.R., Comer, M.B., Edgell, M.H. and Hutchison, C.A. (1986) Mol. Cell. Biol. 6, 168-182. 7. Haberd, N.P., Flavell, R.B. and Thompson, R.D. (1987) Mol. Gen. Genet. 209, 326-332. 8. Voytas, D.F. and Ausubel, F.M. (1988) Nature 336, 242-244. 9. Grandbastien, M.A., Spielmann, A. and Caboche, M. (1988) Nature 337, 676-680. 10. Jin, Y-K. and Bennetzen, J.L. (1989) Proc. Natl. Acad. Sci. USA 86, 6235-6239. 11. Johns, M.A., Babcock, M.S., Fuerstenberg, S.M., Fuerstenberg, S.I., Freeling, M. and Simpson, R.B. (1989) Plant Mol. Biol. 12, 633-642. 12. Smyth, D.R., Kalitsis, P., Joseph, J.L. and Sentry, J.W. (1989). Proc. Natl. Acad. Sci. USA. 86, 5015-5019. 13. Camirand, A., St-Pierre, B, Marineau, C. and Brisson, N. (1990) Mol. Gen. Genet. 224, 33-39. 14. Schwarz-Sommer, Z., Leclercq, L., Gobel, E. and Saedler, H. (1987) EMBO J. 6, 3873-3880. 15. Day, A., Schirmer-Rahire, M., Kuchka, M.R., Mayfield, S.P. and Rochaix, J-D. (1988) EMBO J. 7, 1917-1927. 16. Gilboa, E., Mitra, S.W., Goff, S. and Baltimore, D. (1979) Cell 18, 93-100. 17. Brown, P.O., Bowerman, B., Varmus, H.E. and Bishop, J.M. (1987) Cell 49, 347-356. 18. Fujiwara, T. and Mizuuchi, K. (1988) Cell 54 (1988) 497-504. 19. Eichinger, D. and Boeke, J.D. (1988) Cell 54, 955-966. 20. Colicelli, J. and Goff, S.P. (1986) J. Virol. 57, 674-677. 21. Day, A. and Rochaix, J.-D. (1991) J. Mol. Biol. 12, in press. 22. Gross, C.H., Ranum, L.P.W. and Lefebvre, P.A. (1988) Current Genetics 13, 503-508 23. Davies, D.R. and Plaskitt, A (1971) Genet. Res. 17, 33-43. 24. Gorman, D.S. and Levine, R.P. (1965) Proc Natl Acad Sci USA 54, 1665-1669 25. Khandjian, E.W. (1987) Biotechnology 5, 165-167 26. Southern, E.M. (1979) Anal Biochem 100, 319-323 27. Hong, G.F. (1982) J.Mol.Biol 158, 539-549. 28. Feinberg, A.P.and Vogelstein, B. (1983) Anal Biochem 132, 266 29. Yanisch-Perron, C., Veira, J. and Messing, J. (1985) Gene 33, 103-119 30. Hanahan, D. (1985) In Glover, D.M. (ed), DNA cloning-A practical Approach. IRL Press, Oxford Vol. 1 pplO9- 135. 31. Short, J.M., Fernandez, J.M., Sorge, J.A. and Huse, W.D. (1988) Nucl. Acids Res. 16, 7583-7600. 32. Messing, J. (1983) Methods Enzymol. 101, 20-79. 33. Dente, L., Sollazzo, M., Baldari, C., Cesareni, G. and Cortese, R. (1985) In Glover, D.M. (Ed), DNA cloning 1-A practical approach. IRL Press, Oxford, Vol. I. pp 101-107. 34. Maxam, A.M. and Gilbert, W. (1980) Methods Enzymol. 65, pp 499-560. 35. Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982). Molecular Cloning: A Laboratory Manual. Cold Spring Harbor University Press, Cold Spring Harbor. 36. Kanehisa, M. (1982) Nucleic Acids Res. 10, 183-196. 37. Kanehisa, M. (1984) Nucleic Acids Res. 12, 203 - 213. 38. Sager, R. and Ishida, M.R. (1963) Proc. Natl. Acad. Sci. USA 50, 725 -730. 39. Rochaix, J.-D. (1987) FEMS Microbiol. Rev 46, 13-34. 40. Fickett, J.W. (1982) Nucleic Acids Res. 10, 5303 -5318. 41. Fuetterer, J. and Hohn, T. (1987) Trends Biochem. Sci. 12, 92-95. 42. Doolittle, R.F., Feng, D-F., Johnson, M.S. and McClure, M.A. (1989) Quart. Rev. Biol. 64, 1-30.

43. 44. 45. 46. 47. 48. 49.

50. 51. 52. 53. 54. 55. 56.

Xiong, Y. and Eickbush, T.H. (1990) EMBO J. 9, 3353-3362. Klug, A. and Rhodes, D. (1987) Trends Genet. 12, 464-469. Kuck, U. (1988) Nucleic Acids Res. 16, 4729. Covey, S.M. (1986) Nucleic Acids. Res. 14, 623 -633. Argos, P. (1988). Nucleic Acids Res. 21, 9909-9916. Seiki, M., Hattori, S., Hirayama, Y. and Yoshida, M.C. (1983) Proc. Natl. Acad. Sci. USA. 80, 3618-3622. Shimotohno, K., Takahashi, Y., Shimuzu, N., Gojobori, T., Golde, D.W., Chen, I.S.Y., Miwa, M. and Sugimura, T. (1985). Proc. NatI. Acad. Sci. USA 82, 3101-3105. Willey, R.W., Ruthledge, R.A., Dias, S., Folks, T. Theodore, T.S., Buckler, C.E. and Martin, M.A. (1986) Proc. Natl. Acad. Sci. USA. 83, 5038-5042. Sanchez-Pescador, R., Power, M.D., Barr, P.J., Steimer, K.S., Stempien, M.M., Brown-Shimer, S.L., Gee, W.W., Renard, A., Randolph, A., Levy, J.A., Dina, D. and Luciw, P.A. (1985) Science 227, 484-492. Doolittle, R.F. (1990) Meth. Enzymnol. 183, 99-110. Day, A. and Rochaix, J.-D. (1991) Gene submitted. Fourcade-Peronnet, F., d'Auriol, L., Becker, J., Galibert, F. and BestBelpomme, M. (1988) Nucleic Acids Res. 16, 6113 - 6125. Lloyd, J.A. and Potter, S.S. (1988) Nucleic Acids Res. 16, 6147-6156. Fawcett, D.H., Lister, C.K., Kellet, E. and Finnegan, D.J. (1986) Cell 47,

1007-1015. 57. Kimmel, B.E., Ole-Moiyoi, O.K. and Young, J.R. (1987) Mol. Cell. Biol. 7, 1465-1475. 58. Smith, G.P. (1976) Science 191, 528-535. 59. Albertini, A.M., Hofer, M., Calos, M.P. and Miller, J.H. (1982). Cell 29, 319-328. 60. Ferris, P.J. (1989) Genetics 122, 363-377. 61. Boer, P.H. and Gray, M.W. (1988) EMBO J. 7, 3501-3508. 62. Wells, R. and Sager, S. (1971) J. Mol. Biol. 58, 611-622. 63. Pirtle, R., Kashdan, M., Pirtle, I. and Dudock, B. (1980) Nucleic Acids Res. 8, 805-815. 64. de Chastonay, Y., Link, C., Muller, F. and Tobler, H. (1990) Worm Breeders Gazette 11, 32. 65. de Chastonay, Y., Felder, H., Link, C., Muller, F. and Tobler, H. (1990) Worm Breeders Gazette 11, 59.

A transposon with an unusual LTR arrangement from Chlamydomonas reinhardtii contains an internal tandem array of 76 bp repeats.

TOC1, a transposable element from Chlamydomonas reinhardtii, is 5662 bases long. The 217 and 237 base long terminal repeat sequences of TOC1 are unusu...
2MB Sizes 0 Downloads 0 Views