Cell, Vol. 6, 29-39,

September

1975,

Copyright

@ 1975 by MIT

DNA Sequence Organization in the Mollusc Aplysia Californica Robert C. Angerer, Eric H. Davidson, and Roy J. Britten” Division of Biology California Institute of Technology Pasadena, California 91125

Introduction

Summary The sequence organization of the DNA of the mollusc Aplysia californlca has been examined by a combination of techniques. Close-spaced interspersion of repetitive and single copy sequences occurs throughout the majority of the genome. Detailed examination of the DNA of this protostome reveals great similarities to the pattern observed in the two deuterostome organisms previously examined in detail in this laboratory, Xenopus laevis and Strongylocentrotus purpuratus. Labeled and unlabeled Aplysia DNA were prepared from developing embryos and sheared to a fragment length of 400 nucleotides. The kinetics of reassociation were studled by means of hydroxyapatite chromatography, single-strand-specific Sl nuclease, and optical methods of assay. Aplysia DNA of this fragment length contains at least five resolvable kinetic fractions. One classification of these fractions, listed with their reassociation rate constants (I M-1 set-1) is: single copy (0.00057), slow (0.047), fast (2.58), very fast (4000), and foldback (> 105). Sequence arrangement was deduced from: the kinetics of reassociation of DNA fragments of length 400 or 2000 nucleotides; the hyperchromicity of reassociated fragments containing duplex regions; the size of duplex regions resistant to Sl nuclease; and the reassociation of labeled fragments of various lengths with short driver fragments. More than 80% of the single copy DNA sequences are interspersed with repetitive sequences. The maximum spacing of the repeats is about 2000 nucleotides, and the average less than 1000. The very fast fraction does not show interspersion with single copy sequences or with other kinetic fractions. The foldback fraction sequences are fairly widely interspersed. The slow fraction sequences are interspersed with the fast fraction, and possibly also wlth the single copy DNA. The fast fraction is the dominant interspersed repetitive fraction. Its sequences are adjacent to the great majority of the single copy sequences and have an average length of about 300 nucieotides. “Also Staff Member, ton, D.C. 20015.

Carnegie

Institution

of Washington,

Washing-

A major fraction of the DNA of many organisms has been found to consist of short repetitive sequences (about 300 nucleotides long) alternating with single copy sequences. Many of the single copy sequences extend for less than about 2000 nucleotides before terminating in repeated sequence elements. This pattern is termed short-period interspersion. Short-period interspersion accounts for a large fraction of the DNA in a toad, Xenopus laevis (Davidson et al., 1973), and a sea urchin, Strongylocentrotus purpuratus (Graham et al., 1974), and is probably important in mammalian DNA (Britten and Smith, 1970; Bonner et al., 1974; Pearson, Wilkes, and Bonner, unpublished data). The DNA of the toad and the sea urchin includes in addition some longer single copy sequences (up to 5000 nucleotides or more) interspersed with repetitive sequences. The pattern of sequence arrangement in Drosophila apparently differs strikingly. Here repetitive sequences ranging from 1000 to 10,000 nucleotide pairs are apparently linked to single copy sequences at least 10,000 nucleotides long (Manning, Schmid, and Davidson, 1975). The work reported in this paper was undertaken to extend our knowledge of sequence organization to the other main branch of the animal kingdom, the protostomes by extensive measurements which permit estimation of the length distribution of the interspersed single copy sequences. In parallel work, a series of less extensive measurements has been made which show the general occurrence of short-spaced interspersion in the DNA of a number of species. The quantitative features of all of these measurements have been compared (Davidson et al., 1975b). Among other features this comparison shows that, in all cases except Drosophila, more than 70% of DNA fragments of length 2000-3000 nucleotides contain both repetitive and single copy sequences. The general occurrence of such a pattern of sequence organization indicates that it has selective value in evolution and therefore carries out a significant role. We return to the issue of its significance at the conclusion of this report. Results

and Discussion

Reassociation Kinetics of Aplysla DNA The reassociation kinetics of Aplysia DNA have been examined in several ways. Figure 1 presents the results of three of these, extending over a factor of 1011 in Cot. The reassociation of 400 nucleotide long fragments assayed by hydroxyapatite (middle curve) shows that Aplysia DNA contains sequences with a broad range of repetition frequency. The curve represents the least squares fit to this data

Cell 30

Table

1. Kinetic

Analvsis

Cot Curve

of Aplvsia

Component

DNA Reassociation Fraction of Fragments

kc

Cot,,g = 1 /k

# copies

RMS Deviation

(%)

400 nucleotide

foldback

0.07

very me

0.165

single

copy

2.5 x IO-4

4.6 x 103

fast

0.210

2.6

0.158

4.7 x 10-Z

0.263

5.6 x 10-d

1.8 x 10’

1’

copv

0.39

7.1 x 106

SlOW

sinale 2000 nucleotide

2.7 4.0 x 10’

21

05

foldback

0.357

very fast

0.192

1.2 x 104

8.3 x 10-S

9.8 x 106’

3.5

fast

0.329

6.6

0.150

5.4 x lo+

residual repeat sinale copy

0.059 0.638

0.25 7.3 x IO-4

4.0 1.4 x 103

340 le

aA better fit to the data (RMS = 2.4%) is obtained if the very fasf fraction is divided into two components representing of the genome and having k’s of 6.9 x 105 and 8.2 X 103 I M-1 see-1, respectively. bThe single copy rate constant is fixed at 5.6 x 1O-4 I M-1 set-1 as predicted from the genome size of 1.8 pg. c:Rate constant in whole DNA, I M-1 see-1. dBased on a theoretical single copy rate constant of 1.2 x 10-X I M-1 see-1 for 2000 nucleotide fragments. eAssumed value of 1 for the single copy component.

Equivalent Figure

1. Hydroxyapatite

Reassociation

Kinetics

of Aplysia

2.0 13.1%

and 4.7%

Cot

DNA

Aplysia DNA of fragment length 400 (0) or 2000 (A) nucleotides was reassociated as described in Experimental Procedures. The fraction of fragments containing duplex regions was measured at the indicated values of equivalent Cot by hydroxyapatite column chromatography. The curve for 400 nucleotide fragments represents values determined by optical density or by radioactively labeled DNA using 3 different tracer preparations. Undiluted tracer at a concentration of 0.02 pg/ml was used for the low Cot points. For higher Cots, the tracer was mixed with unlabeled DNA up to an unlabeled DNA/labeled DNA ratio of 38 and a total DNA concentration of 4.4 mg/ml. For the 2000 nucleotide fragment curve DNA concentrations were from 0.03 pg/ml to 132 pg/ml using unlabeled to labeled DNA ratios of 0 to 200. The rate of reassociation of isolated single copy sequences is also shown (0). For this analysis, the fraction of 400 nucleotide tracer not bound to hydroxyapatite at Cot 2000 (21% of the DNA) was mixed with a 330 fold mass excess of total unlabeled driver DNA. The lines drawn represent computer fits by least squares analysis yielding the second order components listed in Table 1.

Interspersion 31

of Repetitive

DNA

in Aplysia

with four second order components. In this analysis, only the rate constant of the single copy component was fixed at 0.00055 I M-1 see-1, the value expected from the genome size of 1.8 pg (Hinegardner, 1974). Table 1 lists the quantities and rate constants of the components of this least squares solution. At this fragment length, many of the single copy sequences are present on the same fragments as repeated sequences and are thus bound to the hydroxyapatite at low Cot, reducing the apparent size of the single copy fraction. The rate constant for the single copy sequences was determined by reassociating with total Aplysia DNA a tracer preparation highly enriched for single copy sequences. This tracer was the fraction of 400 nucleotide fragments not bound to hydroxyapatite after reassociation to Cot 2000 (21% of the DNA). The right-hand curve of Figure 1 shows the kinetics of reassociation of the tracer DNA with total Aplysia

DNA. The curve shown is the least squares solution for two second order components. Table 1 shows the best solution for quantity and rate of both components. The single copy component has a reassociation rate constant of 0.00073 I M-1 see-1. If the rate constant is fixed at the value expected from the genome size (0.00055 I M-1 set-I), the root mean square error rises from 2.0% to 2.3%. This is hardly a significant increase in RMS, and thus there is good agreement between the predicted and observed single copy rates. The 400 nucleotide fragment reassociation curve shows a wide range of reassociation rates. The measurements imply that the Aplysia genome contains at least three major classes of repeated DNA sequence. We have identified these as the slow, fast, and very fast fractions, with sequences present an average of 85, 4600, and 7,000,OOO times per haploid genome. A better fit to the data is obtained

Equivalent Cot Figure

2. Sl

Nuclease

and Optical

Reassociation

Kinetics

of Aplysia

DNA

Aplysia DNA of fragment length 450 (0) or 2000 (0) nucleotides was reassociated as indicated in Experimental of the DNA in duplex was measured by digestion of single-stranded regions with single-strand-specific Sl duplex regions on hydroxyapatite. The curve through the 450 nucleotide data represents a least squares computer of the appropriate form, C

1

-=[

(1 + K Cot)

CO

Procedures. The fraction nuclease and collection of analysis to two components

0.44 1

(Morrow, 1974; Britten, Smith, and Davidson, submitted for publication). For this analysis the rate constant of the single copy component was fixed at 5.56 X lo-4 I M-r set-r as predicted from the genome size of 1.6 pg, yielding a component accounting for 0.45 of the genome. The repetitive component comprised 0.285 of the genome with a rate constant of 9.41 x 10-Z I M-1 see-1. The final fraction single stranded was 0.08, and the RMS deviation for the fitted curve is 1.97%. The RMS deviation does not vary appreciably for a single copy component ranging from 0.4 to 0.55 of the genome. The dashed line through the 2000 nucleotide data does not represent a computer fitted curve. The retardation seen at high Cots may be due to large network structures formed from fragments containing more than one interspersed repetitive element, The rate of duplex formation for 450 nucleotide fragments was also measured by optical hypochromicity (A). The right-hand scale indicates the measured Azao nm relative to the A 260 nm at 98“C. This scale has been adjusted to make the data directly comparable to % Sl nuclease resistance shown on the left-hand scale as follows: 25% hypochromicity (relative A 260 = 0.75) is taken to correspond to 100% Sl nuclease resistance. The zero percent nuclease resistance value is aligned with the 2.5% hypochromicity due to single-strand collapse.

Cell 32

if the very fast fraction is divided into two components, as mentioned in the footnote to Table 1. In this case, one of the components appears to have a complexity of less than 10 nucleotide pairs. Such simple sequences are known in satellite DNAs. However, neutral CsCl density gradient analysis did not reveal a density satellite in Aplysia DNA. Measurements of the reassociation of the very fast fraction were made at several different DNA concentrations varying by a factor of 30. The data indicate a collision dependent reassociation process is involved since the observed hydroxyapatite binding depends on Cot, rather than depending separately on the time of incubation. As shown in Figure 2, the very fast component appears smaller when reassociation is monitored by Sl nuclease. These sequences may show wide divergence, and the reassociated duplexes could be particularly sensitive to Sl nuclease. Since this fraction does not appear to be interspersed with extensive single copy stretches, we have not further investigated its somewhat paradoxical properties. Interspersion Studied by Reassociation Kinetics Using Hydroxyapatite and Sl Nuclease. Figure 1 (left-hand curve) shows the reassociation of 2000 nucleotide long DNA fragments as measured by hydroxyapatite binding and provides clear evidence for the interspersion of rapidly reassociating sequences with single copy DNA sequences. Nearly all of the 2000 nucleotide fragments are bound as a result of reassociation of the repetitive regions. Table 1 shows the best least squares solution for these measurements using two kinetic components. The rates appear to correspond to the fast and very fast components of the solution for the 400 nucleotide fragments measurements. If the fragments containing the very fasf component contain no other sequences, then the rate corrected for fragment length would be (2000/400)‘/2 x 4 x 103 = 9 x 103 I M-1 set-1, which is approximately what is observed (1.2 x 104 I M-1 set-1). We cannot easily calculate the effect of length on the rate of reassociation of the interspersed repetitive sequences. The accuracy of the determination of the rate constants for the components of the curve for the 2000 nucleotide fragments is therefore limited. Such calculations only show that the solution is consistent, and we depend on other evidence for the proof of interspersion. There seems no doubt, however, that the reaction of the 2000 nucleotide fragments between Cots 0.02 and 2 is due principally to the presence on these fragments of sequences of the fast component. The reaction is 90% complete by Cot 10, and the residual 10% is probably due to an unreassociating impurity

0.70 60

I 65

1 70

I 75

Temperature, Figure

3. Thermal

Denaturation

1 80

I 85

I 90

L 9!

“C

of Reassociated

Aplysia

DNA

DNA fragments 400 or 1800 nucleotides long were reassociated to Cot 10. The 400 nucleotide fragments were isolated by binding to hydroxyapatite (41% of the genome). The 1800 nucleotide fragments included the whole reaction mixture, since these fragments form extensive networks which cannot be eluted from hydroxyapatite except by denaturation. Thermal denaturation profiles of these fractions and of native Aplysia DNA were determined in 0.12 M PD. The curve for the 1800 nucleotide fragments has been corrected for the contribution of purely single-stranded fragments (16.8% of the DNA) to the thermal denaturation profile. This contribution was measured by means of an optical thermal denaturation profile of a single stranded Aplysia DNA fraction (400 nucleotide, not bound to hydroxyapatite at Cot 2000).

rather than slow fraction DNA sequences. Most or all of the single copy DNA is therefore present on fragments that contain either foldback or repetitive sequences. At this stage it is reasonable to believe that single copy DNA sequences are adjacent to both the fast fraction and to either or both of the very fast and foldback fractions. Measurements reported later in this paper will serve to identify the repetitive components that are interspersed. The reassociation kinetics of Aplysia DNA have been assayed both by means of the single strand specific nuclease Sl and by optical hypochromicity, as shown in Figure 2. Both methods show that the 2000 nucleotide fragments bound to hydroxyapatite by Cot 10 are predominantly single stranded, as expected if short repetitive sequences are interspersed with longer nonrepetitive sequences. For example, at Cot 3, 85% of 2000 nucleotide fragments is bound to hydroxyapatite, but only 30% of the DNA is resistant to Sl nuclease. Figure 3 shows the optical melting curve for 2000 nucleotide fragments which have been incubated to Cot 10. The hyperchromicity from 60°C to 95°C is 9% after correction for the fraction of the DNA in free single strands (0.17). The hyperchromicity of purely

Interspersion 33

of Repetitive

DNA

in Aplysia

single-stranded DNA is about 2.5%, and of native DNA is 25.5%. Thus the fraction double-stranded is about (g-2.5)/(25.5-2.5) = 0.28, which is in close agreement with the Sl measurement. Measurements of the duplex content for two fragment lengths after renaturation to Cot 10 are shown in Table 2. There it is seen that by both Sl nuclease assay and measurement of optical hyperchromicity only about 26-28% of the fragments containing base paired regions are actually in duplex. At this stage we may estimate the fraction of the total single copy DNA in the genome which is interspersed with repetitive sequences. Figure 1 shows that by hydroxyapatite assay at least 90% of 2000 nucleotide fragments contain repetitive sequences which reassociate by Cot 10. Figure 2 shows that by Sl nuclease assay about 50% of the genome is made up of single copy sequences in a short period pattern. Thus a possible 0.11 0.50 = 20% of the single copy DNA is not interspersed. In fact, much of the 10% which is not bound to hydroxyapatite at Cot 10 may be an impurity which does not reassociate. Thus we may conclude that between 80% and 100% of Aplysia single copy sequences are interspersed with repetitive sequences with spacings of less than about 2000 nucleotides.

Interspersion

of Foldback Sequences

Aplysia DNA has a particularly large amount of foldback or zero-time binding DNA, accounting for 7% of 400 nucleotide fragments. When labeled 2000 nucleotide fragments are passed over hydroxyapatite with a Cot of less than 10-5, almost 40% binds. Under the same conditions, about 9% of the genome is resistant to Sl nuclease. At least 75% of these Sl resistant foldback sequences will bind to hydroxyapatite after Sl treatment and a second denaturation (Cot less than 10-S). We do not know what sort of DNA sequences are responsible for the foldback fraction, but if there are tandem reverse repeats, few of them are separated by single-stranded loops sensitive to Sl nuclease. Such structures Table

2. Duplex

Content

of Aplysia

DNA Fragments

would not bind in the second hydroxyapatite test. The fraction of fragments containing foldback sequences was measured for a variety of lengths by denaturing labeled DNA fragments and passing them over hydroxyapatite to an effective Cot estimated to be 4 x 10-b. Table 3 shows the increase in binding with length, and demonstrates that many of the foldback sequences are interspersed with other classes of DNA. Measurements of the kinetic characteristics of the Aplysia DNA sequences adjacent to foldback (not presented here) show that all classes of sequences are present and that the foldback sequences are interspersed throughout the genome. The length of the stretches of other kinds of sequence between the foldback sequences appears to be quite long, since the fraction of DNA bound at Cot 4 x 10-b rises linearly with fragment length out to 4000 nucleotides. The data indicate that 50% binding occurs at 3500 nucleotides, and this suggests that the average length of the intervening sequences is 7000 nucleotides. The length distribution of the foldback sequences is indicated in Figure 5.

The Fraction of Fragments Containing Sequences as a Function of Length

Repeated

The fraction of fragments containing repeated sequence elements was measured by hydroxyapatite binding of tracer fragments in the presence of a great excess of short fragment (400 nucleotide) driver DNA. This method has previously been used for Xenopus and sea urchin DNA (Davidson et al., 1973; Graham et al., 1974). The measurements were made at driver Cot 10-Z for the very fast fraction and Cot 10 for the fast fraction. The results are shown in Table 3 and Figure 4. Since the foldback sequences give increased binding as a function of fragment length, a correction has been made for the binding at very low Cot using the following formula: R = Fcco+)- Z/(0.9 - Z), where F,co+I is the fraction of fragments binding at the selected Cot, and Z is the fraction binding at very low Cot (Davidson et al., 1973). The formula also corrects for the

at Cot 10

Fragment Length (nucleotides)

Assay

% Hyperchromicitya

96 Genome Fractionb

400

Optical

16.2

41

400

Sl nuclease

1 a00

Optical

2000

Sl nuclease

in

% Genome Base Paired 25.5,

27c

27d 9.2

aCalculated as (A2ao at 96°C - AzbO at 60°C)/Azao at 96’C. bMeasured fraction of DNA fragments bindable to hydroxyapatite. Calculated as (H-2.5)/(25.5-2.5), where H is the observed hyperchromicity, hyperchromicity due to single strand collapse. dcalculated as (fraction of DNA bound to hydroxyapatite after Sl nuclease

a4

2ac 364

25.5 is the hyperchromicity digestion)

x (% genome

of native in fraction).

DNA,

and

2.5 is the

Cell 34

observation that the reassociation of these DNA preparations does not exceed 90%. The very fast fraction (reassociating between Cots 1 O-6 and 1 O-2) shows very little or no increase in binding with fragment length. The lower curve on Figure 4 is the least squares solution and indicates an increase of about 7% which is not significant. Therefore we conclude that this fraction is not significantly interspersed with any other sequence classes of Aplysia DNA. The sequence complexity of the very fast fraction can be estimated to be about 23 nucleotides (see Table 4) from its average rate of reassociation, although as discussed above it could be heterogeneous. The lack of apparent increase in binding in the curve of Figure 4 shows that the physical sequence length is greater than 4000 nucleotides. We can therefore conclude that the very fast fraction consists of tandemly duplicated sequences, and is analogous in this sense to previously observed “satellite” fractions, though as mentioned above, it was not resolved in CsCl density gradients. The fast fraction (reassociating between Cots 0.01 and 10) shows a great deal of interspersion with other sequences in the genome. The curve for binding as a function of fragment length on Figure 4 rises to 90 or 95% (after the corrections mentioned above). We can conclude therefore that sequences of the fast fraction occur throughout the majority of the genome with spacings of less than a few thousand nucleotides. We have been unable to determine the spacing of the fast and single copy interspersion in the neighborhood of the foldback sequences, since these regions are bound to hydroxyapatite at very low Cot. However, there is evidence that some fast fraction sequences occur in these regions from the reassociation kinetics of sheared foldback fraction DNA (Angerer, unpubTable

3. Fraction

of Fragments

Containing

Repetitive

Sequences Binding Due to Repetitive Sequence (R)d at

Fragment Length (nucleotidesp

cot 4 x 10-6c

cot

199 435 524 945 1374 2089 2487 3919

0.023 0.034 0.058 0.10 0.18 0.30 0.43 0.53

0.26 0.28 0.26 0.32 0.43 0.48 0.61 0.65

Fraction

Bound

lished data). It should be pointed out that the foldback fraction accounts for only 17% of fragments 1400 nucleotides long (Table 3). The interspersion of the fast fraction shown in Figure 4 is therefore representative of at least 83% of the genome. The upper curves on Figure 4 show that a major portion of Aplysia DNA is present as short period interspersed repetitive and single copy sequences. In fact, the principal part of the genome is included in this pattern. Comparison with the curves representing the Xenopus and sea urchin measurements (Davidson et al., 1973; Graham et al., 1974) shows that a larger fraction of the Aplysia genome is in the short period spacing pattern. In fact, the data of Figure 4 are not sufficient to determine whether or not Aplysia contains a small portion of the genome with a pattern of long period interspersion of repeated and single copy DNA sequences. We return to the question of the length of the repeated and single copy sequence elements below. It is clear from the data of Table 3 and Figure 4 that almost all of the single copy DNA is interspersed with the fast fraction of repetitive sequences. This conclusion is, of course, consistent with that drawn from the measurements of the kinetics of reassociation of 2000 nucleotide fragments assayed by hydroxyapatite. There remains the question of the arrangement in the genome of the slow fraction DNA sequences. They are obviously interspersed with the fast fraction sequences, but without additional measurements we cannot determine if there are regions which contain short sequence elements of fast, slow, and single copy sequence all interspersed with each other, or if the fast and slow are interspersed in some regions, and fast and single copy are interspersed in other regions.

(F)b at 10-Z

cot 0.38 0.50 0.63 0.71 0.60 0.78

0.88 0.88

IO

cot

IO-2

0.27 0.28 0.23 0.28 0.35 0.30 0.37 0.23

aAverage of three determinations, each relative to two known markers. bAverage of two or more determinations, except for 199 nucleotide tracer, which is a single value. Calculated maximum Got based on the total time required for incubation and passage through the hydroxyapatite column, Fpq - Z W = -I where Z is the fraction of the DNA bound as foldback at very low Cot. The values for R are normalized 0.9 - z 90% tracer reassociation, which is the observed maximum value.

cot

10

0.40 0.53 0.67 0.76 0.86 0.81 0.95 0.93

to a value

of

Interspersion 35

of Repetitive

DNA in Aplysia

Repetitive Sequence Lengths Estimated with Single-Strand Specific Nuclease Aplysia DNA fragments of about 1800 nucleotides were incubated to Cots 1 O-5, 1 O-2, and IO, and then digested with Sl nuclease as described in Experimental Procedures. The resistant duplex regions were collected on hydroxyapatite and chromatographed on agarose A-50. The extent of digestion with nuclease was sufficient to remove almost all of the single-strand tails as indicated by the hyperchromicity of the duplex regions. This was 87% of that of native DNA. Under the conditions used, relatively few strand scissions or duplex cuts occur (Davidson et al., 1974; Chamberlin, Britten, and Davidson, 1975; Britten, Graham, Eden, and Davidson, manuscript in preparation), even in the divergent reassociated regions with melting temper-

atures 10°C to 15°C less than that of perfectly paired DNA. Figure 5 shows the agarose A-50 chromatography results. Under the conditions used, the lefthand peak consists of simple duplexes greater than about 1500 nucleotide pairs or of hyperpolymers consisting of multiple fragments paired in such a way as to be resistant to Sl nuclease. The foldback fraction shown in curve (a) consists of many long nuclease-resistant sequences, and this differs from foldback fractions observed in other animals which are shorter (for example, see Wilson and Thomas, 1973). Each distribution of repetitive sequence lengths shown in Figure 5 includes all sequences reassociating by the indicated Cot. We have attempted to calculate the actual size distribution of the very fast and fast kinetic fractions by difference.

1.0 0.9 K* 0.8 *-F 0.7 7 0.6 z

0.5

$j 0.4

I

1

1

I

I

I

1

1

1

1

I

I

1

2000

1000

I

I

I

I

3000

I

I

I

4000

Fragment length (nucleotides) Figure

4. Fraction

of Aplysia

DNA containing

Repetitive

Sequence

Elements

as a Function

of Fragment

Labeled Aplysia DNA fragments of various lengths determined on alkaline sucrose gradients were 400 nucleotide fragment length and reassociated to a driver Cot of 10-Z (0) or 10 (0). The mass the former case and 2300 in the latter. The fraction of DNA fragments containing duplex regions chromatography. Values of F were corrected for the fraction of zero-time binding sequences (Z) and by the formula R=-

Length mixed with unlabeled DNA driver of ratio of driver to tracer was 1200 in (F) was measured by hydroxyapatite for 10% nonreactability of the tracer

F ,cot, - z

0.9 - z as described in the text and Table 3. The lines drawn through the experimental points were generated by a least squares computer program essentially as described in Graham et al., (1974). The two curves for the fast fraction binding (Cot 10) represent best solutions to the data (with approximately equivalent RMS deviations) for two different conditions. One of these is a solution with a major component with single copy interspersion lengths ranging from 0 to 1300 nucleotides and a minor component with very long period interspersion. The other solution is for a distribution of single copy lengths ranging from 200 to 1600 nucleotides. with no long period interspersed component. The curve for the very fast fraction (Cot 10-Z) is a solution containing only a small (7%) interspersed component with a single copy length distribution centering around 1300 nucleotides. For comparison least squares solutions (dashed curves) are included for similar data obtained for Xenopus laevis (Davidson et al., 1973, Figure 6, curve 2) and for the sea urchin, Strongylocentrotus purpuratus (Graham et al., 1974, Figure 3).

Cell 36

The lower curves in Figures 5b and 5c show these results. From the previous data indicating tandem repetition of the very fast fraction (Figure 4), one might expect that these sequences would yield only long resistant regions, whereas the data show more than half of the fragments to be less than 1000 nucleotides long. We do not know the explanation of this size reduction. It may be due to an excessive degree of sequence divergence in the very fast fraction, or may indicate an interspersion of other sequences on a very short scale, such as has been indicated for the mouse satellite (Rice, 1975).

Ex t

300 t

Inc (al t cot - lo-5 8% of genome

/,- IA

Ex t

300 Inc (b) i + cot lo-5 - cot 1o-2 8% of genome

The distribution of sequence lengths of the fast fraction is displayed in Figure 5c. The curve corrected for the Cot 10-Z contribution shows a significant peak at a size of about 300 nucleotides. This indicates that many of the interspersed fast fraction sequences are about 300 nucleotides long. This result is very similar to that obtained in the cases of Xenopus and the sea urchin (Davidson et al., 1973; Chamberlin et al., 1975; Graham et al., 1974). There is an obvious risk that reassociated divergent repetitive DNA would contain structures with mispaired regions which could be digested by Sl nuclease. In fact, under some conditions it is possible with very extensive digestion to cut at single mispaired bases (Shenk et al., 1975). However, with the light enzyme treatment and the salt and temperature conditions used here, such strand scissions are minimized. With an even more extensive Sl nuclease digestion than used here, measurements with agarose A-50 of the size distribution of Xenopus interspersed repetitive sequences gave exactly the same mode size (Davidson et al., 1974) as did measurements with the electron microscope (Chamberlin et al., 1975), which did not involve the use of any nuclease. We conclude that although there is probably a modest background of fragments due to strand scission at divergence points, the 300 base pair peak in Figure 5c is actually a good measure of the length of fast fraction interspersed repetitive sequences.

The Fraction of the Genome in an Interspersed Pattern Table 1 summarizes the reassociation kinetic measurements and the frequencies of the repetitive fractions that we are using to describe Aplysia DNA. Due to interspersion of the various sequence elements, the fraction of DNA fragments in the kinetic components does not accurately represent the frac-

4zi

Ex

1 40

50

+ +

300

Inc

(cl

a cot IO-Cot lo-2 ’ 14% of aenomr

60

Fraction

70

80

number

90

Figure

5. Length

Distribution

of Aplysia

Repeated

DNA Sequences

Total Aplysia DNA of It300 nucleotide fragment length was reassociated to Cot 10-S (a), 10-Z (b), or IO (c) as indicated in Experimental Procedures. The single-stranded regions were removed by means of Sl nuclease and the duplexes recovered on hydroxyapatite and eluted with 0.4 M PB. The percent of the DNA present in each fraction is indicated in the figure. The samples were chromatographed on a column of agarose A-50 with long native DNA as an exclusion marker and ‘ZPOjas an inclusion marker. The position of 300 nucleotide fragments in the column profile is indicated. The lower curve of (b) indicates the distribution of very fast sequences reassociating between Cot 10-S and Cot 10-2. This distribution has been calculated by correcting the experimental curve shown in (b) for the contribution of foldback sequences shown in (a). The lower curve of (c) shows the distribution of fast sequences reassociating between Cot 10-S and Cot 10-Z. This curve has been calculated by subtracting the contribution of sequences reassociating by Cot 10-Z whose distribution is shown in (b).

Interspersion 37

of Repetitive

DNA in Aplysia

tion of the genome in these components. Correcting for interspersion and using the Sl measurements shown in Figure 2 and Figure 5 leads to the values for fraction of the genome in each component shown in Table 4. In every case except that of the single copy sequences, the frequencies are average values for what may in reality be a range of frequencies. Measurements have not been made of the homogeneity of the repeated components. Thus there is a considerable uncertainty in the complexity figures, since a minor amount of a more slowly reassociating component would not be recognized and could drastically alter the total amount of different sequences present. This caveat applies to most estimates of repetitive sequence complexity. The measurements with Sl nuclease shown in Figure 5 indicate that about two thirds of the fast fraction is made up of short interspersed seTable

4. Comoonents

of Aolvsia

Genome

Component

Fraction of Genome

Complexitye

foldback very fast fast slow single copy

0.07 0.05 0.11 0.12 0.45

23 5.1 x 104 3.3 x 106 6.4 x 108

- 0.10 - 0.16b -0.17c - 0.229 - 0.6Od

OThe lower limit is the fraction of the genome bound to hydroxyapatite after incubation to Cot 10-5, digestion with Sl nuclease, denaturation, and reincubation to Cot 10-S. The upper limit is the fraction of the genome bound to hydroxyapatite after two incubations to Cot IO-5 and subsequent digestion with Sl nuclease. bThe lower limit is taken from the Sl nuclease assay of reassociation of 2000 nucleotide fragments shown in Figure 2. The upper limit is taken from the data of Figure 1 and Table 1 for 400 nucleotide fragments. CThe lower limit is the average total Cot 10 duplex content of 3036, given in Table 2, minus the average values for foldback and very fast fractions given in this table. The upper limit is from the 400 nucleotide fragment data of Table 1 corrected for the effect of interspersion. Since the interspersed repeats are 300 nucleotides in length, about 0.75 of the fragments of this fraction represent duplexes of the fast fraction, and the remainder is single copy (or slow) tails. The upper limit is therefore 0.22 x (0.79, or 17% of the genome. dThe upper limit is calculated as follows: The data of Table 2 and Figure 2 show that about 70% of the DNA is not in duplex at Cot 10. This fraction corresponds to all of the single copy DNA and 0.67 of the slow fraction (which has a Cot, of 20 and is therefore two thirds unreacted at Cot lo), as well as 10% nonreassociable tracer. The estimate for the single copy content is the range of estimates with approximately equal RMS deviation from the computer fitted curve of Figure 2 for 450 nucleotide fragments. We calculate the maximum size of the slow component as 70% (45% + 10%)/0.67 = 23%. The lower limit is calculated using the value of 0.16 for the fraction of slow component in 400 nucleotide fragments from Table 1. Assuming this component is interspersed and of length similar to 300 nucleotides, then the minimum value is 0.16 (0.75) = 0.12. ecomplexity values are calculated using the number of copies from Table 1 and the mean of the estimates in this Table for fraction of the genome in each component.

quences. This amounts to about 10% of the total DNA of Aplysia. The single copy fraction amounts to 45-60% of the genome as shown in Table 4. Most of these sequences are interspersed with the fast fraction, and we may estimate as a lower limit that at least 55% of the genome is in this interspersion pattern. Another estimate can be made from the fact that 90% of fragments 2000 nucleotides long are bound to hydroxyapatite at Cot 5, as shown in Figure 1. Twenty two percent of the genome is in Sl resistant foldback and very fast fractions and noninterspersed fast sequences. The difference is 68%, which is a reasonable upper limit calculation. We conclude that between 55% and 70% of the DNA is made up of interspersed fast, single copy, and slow sequences. The Length and Spacing of the Interspersed Repetitive Sequences The upper curves on Figure 4 represent two least squares solutions to the measurements of the variation with length of the fraction of fragments that contain fast fraction repetitive sequences. These curves were fitted as described previously (Graham et al., 1974). In one case, it was assumed that there is a long period pattern of interspersion which gives the later slope, and in the other case it was assumed that there is only a short period pattern of interspersion. The root mean square deviations are essentially the same for both curves. It is clear that the short period pattern is dominant and uncertain whether there are any long single copy sequences without intervening repetitive sequences. Both solutions make use of a wide range of lengths of spacing in the short period pattern: from 0 to 1300 nucleotides in one case and 200 to 1600 in the other. A solution using a narrow range of spacings (not shown) gives a slightly higher root mean square deviation, but the difference is not significant. The average short period spacing of the repeated sequences is 940 nucleotides in the solution which has an additional long period component, and 1190 for the case in which only the short period component is included. It can be concluded that the average spacing of the short period fast fraction intervpersed repeated sequences is about 1000 nucleotides, with an uncertainty of about 200 nucleotides. There is no significant difference between this conclusion and the length of the single copy sequence calculated for the short period interspersion patterns of Xenopus and sea urchin (Davidson et al., 1973; Graham et al., 1974). In the DNA of most of the other species examined, it has been possible to state that the spacing between the repetitive sequences was the length of the interspersed single copy sequences. However, in this case the slow fraction is also interspersed with fast sequences.

Cell 39

We indirectly know that the slow fraction is interspersed from the fact that no slow component is seen in the reassociation kinetics of 2000 nucleotide DNA as shown in Figure 1. Thus the sequences of the slow fraction are almost entirely adjacent to fast fraction sequences on fragments of 2000 nucleotide length. This observation is suggestive of an interesting pattern from a functional point of view. The complexity of the slow fraction is apparently immense as demonstrated in Table 4, and thus many similar fast sequences must be near different slow sequences. There are two sources of information on the lengths of the dominant fast interspersed repeated sequences. These are hyperchromicity measurements and size determination after Sl digestion. The Sl digestion measurements shown in Figure 5 indicate a mode value for fast fraction lengths of about 300 nucleotides. This number is remarkably similar to that observed in a number of other cases (Davidson et al., 1974; Goldberg et al., 1975). The hyperchromicity measurements do not permit as accurate a determination, but within a factor of two are consistent with this value. Conclusion We have shown that about two thirds of the genome of the mollusc Aplysia californica is made up of a highly ordered pattern of interspersed repeated and single copy DNA sequences. This pattern is quantitatively similar to those determined in detail for Xenopus laevis and Strongylocentrotus purpuratus DNA and shows considerable similarity to the less closely examined patterns in five other species scattered over many phylogenetic orders (Goldberg et al., 1975). These observations indicate that closespaced interspersion of repeated and single copy DNA sequences is very widespread. We have recently presented a comparison of the patterns observed and a discussion of their phylogenetic generality (Davidson et al., 1975b). The general occurrence of this pattern of sequence organization indicates that it carries out a significant role. The following argument suggests that this interspersion pattern is important to the regulation of genetic activity. While regulatory processes occur at many levels of cellular metabolism, the control of transcription is probably fundamental to the establishment of the pattern of gene expression of each cell type or state. In turn, transcriptional control very probably depends on the recognition by regulatory molecules of specific DNA sequences adjacent to or within the structural gene regions of the DNA. Specific sequence recognition in other regions of the genome is probably involved in the synthesis of the regulatory molecules themselves and thus in the integration of the whole regulatory process. It

appears likely that a great many of the DNA sequences involved in the specific recognition processes occur in more than one place in the genome and therefore are repetitive sequences. Most of the structural gene sequences expressed in sea urchin gastrular (Goldberg et al., 1973) and HeLa (Klein et al., 1974) polysomal message are single copy DNA sequences. Spradling et al. (1974) have drawn a similar conclusion for Aedes, rat, and HeLa cells. Furthermore, a number of individual structural genes have been shown to be single copy sequences (reviewed in Davidson and Britten, 1973). Since many structural genes are single copy sequences, it is technically feasible to test for adjacent repeated sequences. Clearly, if the interspersed repetitive sequences do carry out a regulatory function such as the control of transcription, it would be reasonable to expect that they would be adjacent to most, if not all, structural genes. This proposal has recently been tested and it has been shown that at least 80% of the structural genes expressed in sea urchin gastrulae are adjacent (within a few hundred nucleotides) to repeated DNA sequences in the sea urchin genome (Davidson et al., 1975a). This result is, of course, merely consistent with a regulatory role and in no way shows how the adjacent repetitive sequences function. However, this observation does add significance to the measurements reported in this paper, which demonstrate the general interspersion of repetitive and single copy sequences in the genome of the gastropod Aplysia californica. Experlmental

Procedures

Maintenance of Animals and Growth of Embryos Aplysia californica was collected locally and maintained in filtered sea water at ambient temperature of 14-18°C. Egg masses were collected within a day after deposition, chopped or broken into approximately 0.5 cm pieces, washed repeatedly with Millipore filtered sea water, and cultured at 15°C or 22OC in Millipore filtered sea water (containing 30 pg/ml streptomycin sulfate and 50 units/ ml penicillin) as a lo-20% suspension with vigorous aeration. The egg masses were washed every few days in order to reduce contamination Just before hatching (about 21 days at 15OC, or 7-9 days at room temperature) the egg masses were washed repeatedly with Millipore filtered sea water, briefly with several changes of distilled water and with SEDTA (0.075 M NaCI, 0.05 M EDTA, pH 8.0), and frozen with powdered dry ice. Embryos were stored at -70°C until use for DNA isolation. Preparation of DNA Frozen embryos were pulverized with dry ice in a Waring blender, suspended in a minimum volume of SEDTA (about 2 ml final volume/gm egg mass), brought to room temperature, and the nuclei lysed by the addition of 1% SDS. When a reasonably clear suspension was obtained, an equal volume of SEDTA-saturated phenol was added, and the phases were gently mixed for 2-3 hr and then separated by centrifugation. The aqueous phase was briefly reextracted with phenol:IAC (1:l) and IAC (24:l chloroform:isoamyl alcohol), and the DNA spooled from the final aqueous phase after the addition of 2 vol of 95% ethanol. DNA was further purified

Interspersion 39

of Repetitive

DNA

in Aplysia

by reprecipitation, treatment with 25 pg/ ml RNAase A (chromatographically pure, Worthington Biochemical Corp.), lOOpg/ml predigested pronase (B grade, CalBiochem), extraction with phenol and IAC, and reprecipitation. An average yield was 150 pg DNA/gm egg mass using egg masses from smaller animals which have a considerably higher embryo content/gm. When using egg masses from larger individuals, the large amount of capsule material made an additional purification step desirable. In this case, the DNA was bound to hydroxyapatite at 60°C after shearing, and washed with 0.12 M PB. This step resulted in marked increases in the 260/230 nm and 260/280 nm absorbance ratios, and in the optical hyperchromicity. To prepare radiolabeled DNA 5-6 day embryos (15°C) were suspended in enough Millipore filtered sea water (containing penicillin and streptomycin) to cover them, and the suspension was made 50 pCi/ml in XH-thymidine (Nuclear Dynamics, 24 Ci/mMol). The embryos were grown with aeration for 4 days, washed repeatedly, and frozen. The culture supernatant was checked for contamination by microorganisms by measuring the counts retained by a 0.45 p Millipore filter, and by precipitating an aliquot of the sea water with 5% TCA. These values calculated for the total culture supernatant were less than 0.1% and 0.05% of the counts incorporated into the isolated DNA, respectively. A typical specific activity attained was 2.2 x 105 cpm/pg. Preparatlon of DNA fragments in a Virtis 60K al., 1974) or by dients prepared

DNA Fragments of desired sizes were produced by homogenization homogenizer as previously described (Britten et preparative centrifugation on alkaline sucrose graas described below.

Agarose-50 Chromatography The size distribution of reassociated DNA duplexes was determined by agarose A-50 chromatography using a column 2.2 x 100 cm. The gel bed was poured around a support of 6 mm glass beads. Samples were chromatographed in 0.12 M PB, using long native DNA and 32PO:- as exclusion and inclusion markers, respectively. Optlcal Melting DNAs were melted in 0.12 M PB in water-jacketed cuvettes, and the hyperchromicity monitored at 260 nm with an ACTA Mark Ill spectrophotometer modified for automatic data collection. Hyperchromicity was calculated according to the formula H = (W - C)/(W - B); where H is hyperchromicity, C is the Azao at 6O”C, W is the A260 at the highest temperature, and B is the Azao of the buffer. Counting Methods Radioactive samples “lnstagel” (Packard)

were counted in mixtures in the proportions 1 :1.25.

of 0.06

M PB and

This research was supported by a grant from the National Science Foundation and by grants from the National Institutes of Health. RCA was the recipient of a fellowship from the American Cancer Society. Received

May 8, 1975;

revised

June

18, 1975

References Measurement of Fragment Lengths The single-stranded fragment lengths of sheared DNA samples and fractions from the preparative gradients were determined on isokinetic alkaline sucrose gradients (Nell, 1967): V,i. = 9.64 ml, c,., = 43% w/v, Cf,,,k = 16% w/v in 0.1 N NaOH. Gradients were centrifuged at 41,000 rpm for 20-24 hr at 20°C in the Beckman SW41 rotor. The weight average fragment length was determined in duplicate or triplicate with reference to two known markers using the equations of Studier (1965). DNA Reassociation Techniques DNA samples were reassociated in 0.12 M PB at 60°C or 0.4 M PB at 64°C. Values in the latter solution were corrected to “equivalent Cot” in 0.12 M PB 60°C by multiplying by a factor of 4.9. The fraction of molecules containing duplex regions was determined by hydroxyapatite column chromatography (Britten, Graham, and Neufeld, 1974). When using long DNA fragments, the double-stranded fraction was eluted with 0.12 M PB at 1OO’C in order to attain complete recovery. The fraction of base pairs in duplex in a reassociated DNA sample was quantitated by means of single-strand-specific Sl nuclease (Ando, 1966; Vogt, 1973). DNA samples to be assayed by this method were reassociated in 0.3 M NaCI, 0.01 M PIPES (pH 6.7) at 64”C, or in 0.15 M NaCI, 0.005 M PIPES at 60°C. After reassociation, the reaction mixture was adjusted to final concentrations of 0.15 M NaCI, 0.005 M PIPES, 0.025 M sodium acetate, 0.1 mM ZnSOd (pH 4.4). To this solution was added 0.05 vol of 0.5 M P-mercaptoethanol and sufficient Sl nuclease to remove all singlestranded DNA regions. The samples were digested for 45 min at 37’C, and the digestion was terminated by adjusting the samples to 0.12 M PB. Duplex material was then collected on hydroxyapatite. The rate of DNA reassociation was monitored in some experiments by measuring the hypochromicity of denatured DNA samples at 260 nm in 0.12 M PB in a water-jacketed cell using an ACTA Mark Ill spectrophotometer (Beckman).

Ando,

T. (1966).

Biochim.

Biophys.

Acta

774, 158-168.

Bonner, J., Garrard, W. T., Gottesfeld, J.. Holmes, D. S., Sevall, J. S., and Wilkes, J. (1974). Cold Spring Harb. Symp. Quant. Biol. 38, 303-310. Britten. R. J., and Smith, 68, 378-386.

J. (1970).

Carnegie

Inst. Wash.

Year Book

Britten, R. J., Graham, D. E., and Neufeld, B. R. (1974). In Methods in Enzymology, 29E. L. Grossman and K. Moldave, eds. (New York: Academic Press). pp. 363-418. Chamberlin, M. E.. Britten, J. Mol. Biol.. in press. Davidson, 565-613.

E. H., and

R. J., and

Britten,

Davidson,

R. J. (1973).

Davidson, E. H., Hough. B. R., Amenson, (1973). J. Mol. Biol. 77, l-23.

E. H. (1975).

Quant.

Rev.

Biol. 48,

C. S., and Britten,

R. J.

Davidson, E. H., Graham, D. E., Neufeld, B. R., Chamberlin, M. R., Amenson, C. S., Hough, 8. R.. and Britten. R. J. (1974). Cold Spring Harb. Symp. Quant. Biol. 38, 295-301. Davidson, E. H.. Hough, (1975a). Cell 4, 217-238.

B. R., Klein,

W. H., and

Britten,

R. J.

Davidson, E. H., Galau, G. A., Angerer, (1975b). Chromosoma, in press.

R. C., and

Britten,

R. J.

Goldberg, R. B., Galau. G. A., Britten, R. J., and Davidson, (1973). Proc. Nat. Acad. Sci. USA 70, 3516-3520.

E. H.

Goldberg, R. B., Crain, W. R., Ruderman, J. V., Moore, G. P., Barnett, T. R., Higgins, R. C., Gelfand, R. A., Galau, G. A., Britten. R. J., and Davidson, E. H. (1975). Chromosoma, in press. Graham, D. E., Neufeld, (1974). Cell 1, 127-137. Hinegarder,

R. (1974).

B. R., Davidson, Comp.

Biochem.

E. H., and Britten, Physiol.

R. J.

47A, 447-460.

Klein, W. H., Murphy, W., Attardi. G.. Britten, R. J., and Davidson, E. H. (1974). Proc. Nat. Acad. Sci. USA 77, 1785-1789. Manning, 141-155.

J. E., Schmid,

C. W.. and Davidson,

N. (1975).

Cell 4,

Cell 40

Morrow, J. F. (1974). California. Nell, H. (1967). Rice,

Ph.D.

Nature

N. R. (1975).

Thesis,

Stanford

University,

275, 360-363.

Carnegie

Inst. Wash.

Year

Book

Shenk, T. E., Rhodes, C., Rigby, P. W. J., and Proc. Nat. Acad. Sci. USA. 72, 989-993. Spradling. A., Penman, Cell 3, 23-30. Studier, Vogt,

F. W. (1965). V. M. (1973).

Stanford,

S., Campo,

73, 1088-1094. Berg,

M. S., and Bishop,

P. (1975). J. 0. (1974).

J. Mol. Biol. 71, 373-390. Eur. J. Biochem.

Wilson, D., and Thomas, 333-340.

C. A. (1973).

33, 192-200. Biochim.

Biophys.

Acta

337,

DNA sequence organization in the mollusc Aplysia californica.

The sequence organization of the DNA of the mollusc Aplysia californica has been examined by a combination of techniques. Close-spaced interspersion o...
1MB Sizes 0 Downloads 0 Views