Original Contributions 9 Springer-Verlag New York Inc. 1992

Mammalian Genome 3: 66t-668, 1992

Distribution of Alu and L1 repeats in human YAC recombinants Benoit Arveiler and David J. Porteous MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK Received May 25, 1992; accepted July 22, 1992

Abstract. Evidence is accumulating that the two major families of interspersed repeated human DNA sequences, Alu and L1, are not randomly distributed. However, only limited information is available on their relative long-range distribution. We have analyzed a set of randomly selected, human Chromosome (Chr) l 1-specific YAC recombinants constituting a total length of about 2 Mbp for the local and global distribution of Alu and L1 repeats: the data show a strong asymmetry in the distribution of these two repeat classes and give weight, at the long-range molecular level, to previous studies indicating their partition in the human genome; they also suggest a strong tendency for L1 repeats to cluster, with a higher proportion of full-length elements than expected.

Introduction

The nature and origin of human interspersed repeated sequences has been addressed and reviewed (Scott et al. 1987; Labuda and Striker 1989; Moyzis et al. 1989). The Alu DNA sequence family has a relatively GCrich consensus sequence, with a tandem repeat structure, archetypically about 280 bp in length, and an estimated reiteration frequency of complete and truncated versions totalling 500,000 to 900,000 in the human genome. This gives an average density of about 1 per 4 kbp of DNA and an expectation, borne out by consistent observations, that all but a few per cent of human h or cosmid recombinants will contain at least one Alu repeat. The L1 (formerly KpnI) repeat family is relatively AT rich and present at an estimated 15,000 full-length (6.4 kbp) copies, with 50,000 or more copies truncated at the 5' end, giving an overall density of about 1 per 70 kbp. Thus, the Alu and the L1 sequences account for approximately equal net lengths of DNA, summing to about 10% of the total genome. Offprint requests to: B. Arveiler

In situ hybridization with cloned Alu or L1 sequences as probes suggests preferential hybridization of Alu to Reverse and LI to Giemsa-positive bands (Korenberg and Rykowski 1988). This observation has been used to argue further that the partition of these repeats may reflect their evolutionary origin and mode of dispersal, with Alu sequences being preferentially associated with relatively GC-rich, undermethylated and gene-dense genomic regions, and conversely for L1 repeats (Korenberg and Rykowski 1988; Bickmore and Sumner 1989), thus reflecting the organization of the human genome into isochores of different GC richness (Bernardi et al. 1985; Bernardi 1989). It has also been suggested that the partition of these two classes of repeated DNA sequence families might underlie, or at least in part account for the as yet unexplained phenomenon of chromosome banding, as revealed by various dyes (Korenberg and Rykowski 1988; Bickmore and Sumner 1989). In situ hybridization results with oligonucleotide probes for the core Alu consensus sequence (Moyzis et al. 1989) and by primed in situ (PRINS) labeling (Gosden et al. 1991) has confirmed and extended the suggestion that the overall distribution of Alu sequences accurately reflects the characteristic R-banding of human chromosomes. It is interesting, both from an evolutionary and structural point of view, to note that long and short interspersed repeat elements have also been shown to be partitioned in the mouse genome (Bennett and Hastie 1984; Boyle et al. 1990). Complementary evidence comes from the statistical analysis of DNA sequence databases (Moyzis et al. 1989), which provides strong evidence for a nonrandom distribution of Alu and L1 sequences. However, interpretation of these data is constrained by the inherent bias of a very limited database of human DNA sequences in which gene-associated sequences are disproportionately represented. YAC recombinants provide an ideal resource to examine the distribution of Alu and L1 repeats in extended contiguous tracts of DNA. However, the high level of co-ligation events

662 o b s e r v e d in m a n y c u r r e n t Y A C l i b r a r i e s c o n s t r u c t e d f r o m t o t a l h u m a n D N A ( G r e e n et al. 1991) c o m p r o m i s e s t h e i n t e r p r e t a t i o n o f a s i m p l e a n a l y s i s o f the r e p e a t d i s t r i b u t i o n in i n d e p e n d e n t r e c o m b i n a n t s . Here we analyzed YAC recombinants from a YAC library constructed from a somatic cell hybrid, 1 W 1 , L A 4 . 9 , w h i c h r e t a i n s C h r 11 a n d t h e s h o r t a r m o f t h e X C h r as t h e sole h u m a n c o m p o n e n t s ( P o r t e o u s et al. 1986). T h e h u m a n D N A c o n t e n t a m o u n t s to app r o x i m a t e l y 1% o f t h e t o t a l . T h e r e f o r e , e v e n if t h e f r e q u e n c y o f c o - l i g a t i o n w e r e as high as 40% (as h a s been estimated for some widely used YAC libraries ( G r e e n et al. 1991), t h e f r e q u e n c y o f c o - l i g a t i o n o f t w o n o n - s y n t e n i c h u m a n f r a g m e n t s w o u l d b e o n l y 0.4%. We can, therefore, use repeat probe hybridization and Alu-PCR fingerprint analysis of human YAC recombin a n t s i s o l a t e d f r o m this l i b r a r y to a d d r e s s d i r e c t l y t h e q u e s t i o n o f t h e r e l a t i v e d i s t r i b u t i o n o f A l u a n d L1 seq u e n c e s in r a n d o m l y s e l e c t e d t r a c t s o f h u m a n D N A . O u r d a t a f r o m 21 Y A C r e c o m b i n a n t s , r e p r e s e n t i n g n e a r l y 2 M b p o f h u m a n D N A , s h o w , at t h e m o l e c u l a r l e v e l , t h e p a r t i t i o n o f A l u a n d L1 r e p e a t s in t h e h u m a n g e n o m e a n d i n d i c a t e a high d e g r e e o f c l u s t e r i n g o f L1 r e p e a t s in c e r t a i n r e g i o n s .

Materials and methods

Origin of the YAC recombinants A library with an average insert size of 150 kbp was constructed in pYAC4 (Burke et al. 1987) from partial EcoRI-digested, 1W1.LA4.9 somatic cell hybrid (Porteous et al. 1986) DNA, as described elsewhere (Arveiler et al. 1991). Randomly picked yeast clones were propagated in microtiter wells and replica filters of 96 colonies hybridized with Cotl human DNA (Gibco BRL) to identify recombinants of human origin. These were picked and agarose plugs prepared for PFGE analysis on a CHEF electrophoresis system (Maule and Green 1990); DNA transfer analysis of the gels was performed with Cotl human DNA and pBluescribe probes sequentially in order both to check that the clones were human and contained a single YAC, and to estimate the inserts' sizes by comparison with standard size markers.

B. Arveiler and D.J. Porteous: YACs partition of Alu and L1 repeats contains an intact Chr 11 and the short arm of the X Chr as its sole human component on a mouse genetic background. The human material therefore constitutes about 1-2% of the DNA, and most (>99%) of human DNA containing YACs which are chimeric will result from a coligation to a mouse DNA fragment. The blots of PstI-digested YAC DNA were hybridized with a total mouse DNA probe. The detection of hybridizing fragments different from those identified with the Cotl human DNA probe was diagnostic of the coligation of human and mouse fragments. The size of the mouse component was calculated as follows: in a separate study, blots of PstI-digested DNA from mouse YACs of known sizes were hybridized with a total mouse DNA probe; from the autoradiographic data thus obtained we calculated that one autoradiographic band corresponded to 30 kbp of mouse DNA on average. This enabled us to calculate the extent of chimerism by counting the number of bands positive with the total mouse DNA probe and attributing 30 kbp worth of mouse genomic DNA per band.

Alu-PCR analysis of human YACs PCR amplification of YAC recombinant DNA was carried out with two novel primers directed to the Alu consensus sequence in a manner similar to that described by Nelson and colleagues (1989). 100 ng of total YAC DNA (i.e., containing the whole yeast genome) was used per reaction. The composition of the PCR mix was as follows: 10 mM Tris-HC1 pH 8.3, 50 mM KCI, 0.01% gelatin, 1.5 mM MgC12, 0.1% Nonidet NP40, 0.01% Tween 20, 0.1% Triton X100, 200 ~M of each dNTP, 250 ng of either or both 5' and 3' consensus Alu-repeat primers (Nos. 450 and 451, respectively). The primers' sequences were as follows: 450: 5'AAAGTGCTGGCATTACAGG 3' 451: 5'GTGAGCCGAGATCGCGCCACTGCACT 3'. They correspond respectively to bases 37 to 55 (bottom strand) and 228 to 253 (top strand) of the Alu consensus sequence (Bains 1986). The reactions were carried out in 50 p~l with 1 unit of Cetus AmpliTaq enzyme on a Perkin Elmer Cetus Thermal Cycler under the following cycling conditions: Segment 1: 94~ 3 min; 60~ 1 min; 72~ 1 min (1 cycle) Segment 2: 92~ 45 s; 60~ 1 min; 72~ 1 min; with an increment of 6 s/cycle (35 cyles) Segment 3: 92~ 45 s; 60~ 1 min; 72~ 10 rain (1 cycle). No amplification was observed under these conditions with either mouse or yeast (strain AB1380) DNA as a template (not shown).

Results

Rationale of the analysis and calculation of repeat copy numbers Southern blot analysis DNA was prepared (Sherman et al. 1983), digested with PstI and HindlII, fractionated on a 1% agarose gel, and transferred to Hybond-N nylon membranes (Amersham International PLC). Probes used were: Cotl human DNA, total mouse DNA, pBluescribe as a vector arm specific probe, the cloned Alu repeat BLUR8 (Deininger et al. 1981), the cloned L1 repeat probe BK(1.8)26, BK(l.5)54, BK(1.2)ll, and BK(t.8)ll (Shafit-Zagardo et al. 1982a). [For reasons of ease, we will henceforth refer to these probes as BK1.8(5'), BK1.5, BK1.2, and BK1.8(3'), respectively.] All hybridizations were carried out in the buffer recommended by Amersham for Hybond N. Washes were performed with a 1 x SSC-0.1% SDS solution at 65~ Exposures were with unflashed Kodak XAR films. Densitometric analyses were done with a custom-built device (Elder et al. 1986).

Detection of chimerism and estimation of its extent The somatic hybrid cell line IW1.LA4.9 used for constructing the YAC library from which the YACs are derived (Arveiler et al. 1991)

D N A p r e p a r e d f r o m 22 h u m a n Y A C r e c o m b i n a n t s w a s d i g e s t e d w i t h PstI a n d HindlII, f r a c t i o n a t e d o n a 1% a g a r o s e gel, t r a n s f e r r e d to H y b o n d - N n y l o n m e m branes (Amersham International PLC), and hybridized sequentially with a Cotl fraction of human DNA (Gibco BRL), with inserts from cloned Alu repeat B L U R 8 ( D e i n i n g e r e t al. 1981), a n d t h e c l o n e d L1 repeat probes BK1.8(5'), BK1.5, BK1.2, and BK1.8(3') ( S h a f i t - Z a g a r d o et al., 1982a, Fig. 1). T h e n u m b e r o f A l u a n d L1 r e p e a t s r e p r e s e n t e d b y each hybridizing restriction fragment and the relative h y b r i d i z a t i o n to t h e t w o r e p e a t s w a s e s t i m a t e d f o r each recombinant by scanning the autoradiograms w i t h a c u s t o m - b u i l t d e v i c e ( E l d e r et al. 1986) a n d t a k ing into a c c o u n t t h e m a x i m u m n u m b e r o f t a n d e m o r c l o s e l y s p a c e d r e p e a t s w h i c h a n y o n e r e s t r i c t i o n fragment can theoretically accommodate. For each type of r e p e a t w e c h o s e a w e a k l y h y b r i d i z i n g b a n d as a c o p y

B. Arveiler and D.J. Porteous: YACs partition of Alu and L1 repeats

663

Fig. 1. Repeat probe analysis of 21 human YAC recombinants. The YAC DNAs were digested with PstI and fractionated through a 1% agarose gel. Probes are indicated above each panel. The position of size markers (in kbp) are indicated on the left. Autoradiographs corresponding to different exposures are presented for the different clones in order to maximize the visual presentation of the data.

number standard corresponding to one copy. We chose the 2.5-kbp PstI fragment detected in recombinant C12 by probe BLUR8 as a copy number standard for Alu repeats, assuming that it corresponded to a single copy of Alu (Fig. 1). For the L1 repeats we chose the 4.5-kbp PstI fragment from clone B10 detected by probe BK1.2. An internal control accounting for differences both in DNA loading and YAC copy number between samples was provided by hybridization with a pBluescribe probe, which cross-reacts with the YAC arms (data not shown). For the calculation of the copy number of L1 repeats we performed, in addition to the densitometric analysis, a detailed DNA transfer analysis of both HindIII and PstI digests with the four probes BK1.8(5'), BK1.5, BK1.2, and BK1.8(3'), which together constitute the full-length L1 repeat (Fig. 2A). This included both the identification of bands in common to the four L1 repeat probes in HindIII and PstI digests (Fig. 2B) and a restriction mapping analysis. Indeed, the restriction enzyme HindIII is known to cut once in most 1.2-kbp and 1.8-kbp KpnI fragments detected by probes BK1.2 and BK1.8(5') respectively (sites H1 and H2 in Fig. 2A), whereas no HindIII site was found in any cloned versions of the 1.5-kbp and the 3' 1.8-kbp KpnI fragments, such that 80% of the full-length L1 repeats contain a 1.9-kbp HindIII fragment (Manuelidis and Biro 1982; Shafit-Zagardo et al. 1982a and b). This fragment hybridizes strongly to the BK1.5 probe and weakly to probes BK1.2 and BK1.8(5') as typically seen in clone A1 (Fig. 2B). The detailed analysis and copy number calculation of L1 repeats in clones A1 and B 10 is presented below in the section"Organization of the L 1 repeats in the YACs."

DNA transfer analysis with repeat probes Figure 1 shows PstI digests to DNA of the 21 randomly selected YAC recombinants sequentially hybridized with (a) the Cotl fraction of human DNA, (b) the cloned Alu repeat BLUR8 (Deininger et al. 1981), and (c) the cloned L1 repeat, BK(1.2)ll (ShafitZagardo et al. 1982a). The absolute and relative distribution of these two repeats varies significantly between independent YAC recombinants. Some clones contain many (A11, B6, E6) or very few Alu repeats (A3, C8), and some show several Ll-positive fragments (A1, B10). Notably, 10 out of 21 clones, covering 735 kbp of human genomic DNA (i.e., 38% of the total cloned length), do not contain any detectable L1 repeat. Significant differences in both the number and intensity of hybridizing restriction fragments were observed between clones with probe BLUR8, clearly indicating substantial variations in the density of Alu repeats between clones. Similar observations were made with the L1 specific probes. In order to maximize the visual content, the autoradiographs shown in Fig. 1 correspond to different exposures for the different clones, and hence the between-sample variations are less obvious. However, the densitometric analysis (see below) illustrates these variations very clearly. Major effects due to loading or YAC copy number variations were ruled out by analysis with the pBluescribe probe (not shown). Variations in the intensity of hybridization may reflect a combination of (a) sequence divergence from the cloned version of the repeat used as a hybridization probe, (b) the portion of the repeat contained within the particular restriction fragment, and (c) the copy number of the repeat within

664

B. Arveiler and D.J. Porteous: Y A C s partition of Alu and L I repeats

Fig. 2. Analysis of the L1 repeat structure in selected representative Y A C recombinants. (A) Simplified m a p of HindIII (H) and KpnI (K) restriction sites of the c o n s e n s u s full-length L1 repeat. The relative positions of the four probes used in this analysis are indicated at the top. (B) Autoradiographs of blots of fractionated HindIII and PstI digests hybridized sequentially with probes BK1.2, BK1.5, and BK1.8(5') (as indicated above e a c h panel). Data are s h o w n for three representative clones. Sizes are indicated in kbp on the left of each panel.

the particular restriction fragment. In fact, in the case of the Alu repeats at least, the effect of sequence divergence seems moderate, since the relative strength of autoradiographic signals obtained when using either a human Cotl DNA probe or the Alu probe BLUR8 are very similar (Fig. 1). Partition o f Alu- and Ll-repeats in the human genome For each clone, the copy number of the two repeat families was calculated by a combination of densitometric and restriction mapping analyses (see above). The total number of each repeat scored in the total length of 2260 kbp of human DNA analyzed is close to the predicted average (29 L1 repeats observed, 32 expected; 618 Alu repeats observed, 565 expected). From the length of each human insert, we calculated the relative density of LI and Alu repeats per unit length of DNA for each recombinant. The unit lengths are taken as the average spacing for L1 and Alu repeats estimated for the genome as a whole, i.e., 1 per 70 kbp and 1 per 4 kbp, respectively. The data and calculations are summarized in Table 1 and the relative densities plotted in Fig. 3. It is clear that the majority of cloned tracts of human DNA do not correspond to the average (one Alu per 4 kbp, one L1 per 70 kbp). Rather, this analysis provides evidence for regions that are relatively L1rich/Alu-poor. Some clones are remarkably Ll-rich: there are at least seven copies of the most 3' region, detected by probe BK1.8(Y), in clone A1 (150 kbp; densitometric analyses indicate there could be up to 12 copies in this clone), and four copies in clones A3 (100 kbp) and D1 (120 kbp), whereas the Alu repeats are

underrepresented (only four copies in A3). Most (8 out of 11) L1 repeat-containing clones are Alu-poor. Although the small size of some of the YACs analyzed constrains the interpretation of the data in the case of L1 negativity, it appears that Alu-rich clones tend to be Ll-poor: for instance, clone E5 (120 kbp; density of Alu repeats: 1.56 per 4 kbp) is devoid of L1 repeats, and clone G2 (200 kbp; density of Alu repeats: 2.2 per 4 kbp) contains a single copy of L1. Our data also indicate the existence of regions in the human genome depleted for both repeats: clone C8 contains only 12 copies of Alu and one L1 repeat in the 190 kbp of cloned DNA (density of Alu repeats: 0.28 per 4 kbp; density of L1 repeats: 0.36 per 70 kbp).

Table 1. Density of Alu and Ll (3' 1.8-kbp KpnI fragment) repeats in each YAC recombinant analyzed.

Clone

Size of human component (kbp)

No. of Alu per 4 kbp

No. of LI per 70 kbp

A1 A2 A3 all A12 B6 B10 Bn c2 c8 c9 c12 Ol Dll D12 E5 E6 F4 F5 Fll G2

150 80 100 50 85 75 175 190 40 190 100 90 120 90 110 120 75 70 50 100 200

0.80 1.64 0.16 3.08 0.96 3.73 1.02 0.63 0.62 0.28 0.32 0.54 0.72 0.92 0.48 1.56 2.67 1.32 1.60 0.60 2.20

3.26 0 2.80 0 0 0 2.40 0.37 0 0.36 1.40 1.55 1.75 0 0.63 0 0 0 0 0.70 0.35

B. Arveiler and D.J. Porteous: YACs partition of Alu and L1 repeats

665

AIu-PCR analysis of the YACs We have also compared the YAC recombinants for the relative distribution of Alu repeats by AIu-PCR, using a set of primers designed to recognize consensus regions at the 5' and 3' end of the repeat. Using single primers, only the inter-Alu fragments between inversely oriented repeats and within the capacity of the PCR will be amplifiable. Representative results are shown in Fig. 4. The amplified fragments vary in size from approximately 100 bp to 4 kbp. We find that the number of bands varies significantly between independent human YAC recombinants, but for each cloned tract of human DNA, this is in close accord with the findings from direct hybridization analysis shown in Fig. 1. Also, Alu-poor YACs display very little AluPCR amplification, whereas Alu-rich clones show high levels of amplification. Extreme examples are the rather Alu-poor A1 and almost Alu-depleted A3 (0.16 copy of Alu per 4 kbp) clones (lanes 7, 8, 9 and 4, 5, 6, respectively), which show no or very little amplification with the Alu primers used here. The three Alurich clones E5, E6, and G2 all show several amplification products; this suggests close proximity of inversely oriented Alu repeats in Alu-dense regions.

Organization of the L1 repeats in the YACs Southern blot analysis of HindlII and PstI digests of the YAC DNA with four probes covering the fulllength L1 repeat (Fig. 2A) allowed us to both assist in calculating the copy number of L 1 elements and anaFig. 4. AIu-PCR analysis of representative human YAC recombinants. DNA from clones A2 (tracks 1, 2, 3), A3 (4, 5, 6), A1 (7, 8, 9), E6 (10, 11, 12), E5 (13, 14, 15), and G2 (16, 17, 18) was amplified with primer 450 alone (tracks 1,4, 7, 10, 13, 16), primer 45t alone (2, 5, 8, 11, 14, 17), or 450 and 451 in combination (3, 6, 9, 12, 15, 18). M: size marker (1 kbp ladder, Gibco BRL).

2v o _.1

0

oo

0

1

Ho

2

3

4

Fig. 3. Partition of Alu and L1 repeat elements in the human genome. The densities of both Alu (abscissae)and L1 (3' 1.8 kbp KpnI) (ordinates) repeat elements are plottedfor all 21 YACsanalyzed (see Table 1 for coordinates). One copy of Alu repeat per 4 kbp and one copy of L1 repeat per 70 kbp represent the overall average densities of these repetitive elements in the human genome. Plotting the data as presented here allows assignment of each YAC to one of the four types of genomic domains: Alu poor-L1 rich (top left quadrant), Alu rich-L1 rich (top right), Alu poor-L1 poor (bottom left), and Alu rich-L1 poor (bottom right). The absence of any point close to coordinate (1,1) indicates that no cloned insert corresponds to the average.

lyze the organization (i.e., the extent of truncation and the degree of clustering) of these repeats in the YACs. Figure 2b shows autoradiographic data for three repr e s e n t a t i v e c l o n e s . D a t a o b t a i n e d with p r o b e BK1.8(3') are not shown, since the patterns observed fully overlapped with the set of fragments detected by probe BK1.2; additional bands were occasionally detected, as indicated in the text below. The analysis of clone A1 shows two restriction fragments common to probes BK1.8(5'), BK1.5, and BK1.2 in the PstI digest, both corresponding to fulllength L1 elements. This is corroborated by the observation of a 1.9 kbp HindlII fragment that hybridizes strongly to probe BK1.5 and weakly to both flanking probes BK1.8(5') and BK1.2 (see above for rationale). This common fragment may account for the two copies of full-length elements. One fragment is common to BK1.8(5') and BK1.2 [and BK1.8(3'), not shown], best explained by the immediate proximity of two different L1 elements. Another three bands are detected with BK1.2 [and BK1.8(3'), not shown] in both HindlII and PstI digests (the 6-kbp fragments appeared as a dou-

666 blet on a shorter exposure). One more copy was detected with probe BK1.8(3') only (not shown) in both digests. We thus estimate that clone A1 contains six to seven copies of L1. As mentioned above, the densitometric analysis suggests as many as 12 copies may be present; some of the larger fragments may, therefore, contain more than one L1 element. The presence in clone B 10 of a full-length L 1 repeat is also obvious from the analysis of both HindIII and PstI digests: bands common to the three probes are present at 3.7 kbp and about 14 kbp in the HindIII and PstI tracks respectively. Again, a band in common to BK1.8(5') and BK1.2 [ and BK1.8(3'), not shown] is observed in both digests. Three more copies are detected with the BK1.8(3') probe (not shown), bringing the number of L1 elements to a total of 6. Although we cannot exclude short truncations of the very 5' ends, the majority of the L1 repeats appear to be full-length units, and at least extend 5' to the HindIII site within the 5' 1.8 kbp KpnI fragment (site H1 in Fig. 2a). Moreover, some clones (A1, A3, and D1) contained two or three full-length units: this result was not expected considering the estimated average spacing of full-length L1 repeats in mammalian genomes (estimated at 200-300 kbp) and the relatively small size of the YACs (AI, 150 kbp; A3,100 kbp; D1, 120 kbp). In some of the YACs (clones AI, B10, DI) we found fragments common to the 3' and 5' end probes BK1.2, BK1.8(3'), and BK1.8(5') in both HindIII and PstI digests, but which were not detected by the central probe BK1.5 (see the 8.5 kbp HindIII and 9 kbp PstI fragments in clone B10, Fig. 2b). These fragments also hybridized to a probe specific for the YAC left arm (data not shown) and thus, the most likely interpretation is that the 5' end only of a full-length repeat [detected by probe BK1.8(5')] is present right at the end of the clone, next to a 5' truncated LI element detected by probes BK1.2 and BK1.8(3'). Taken together, these data demonstrate the existence of a high degree of clustering of L1 repeats in certain regions of the human genome.

Discussion

Most studies of the distribution of repetitive sequences in mammalian genomes published so far have relied either on the statistical analysis of data obtained by hybridization of colony lifts ()t or cosmid clones) or on chromosomal in situ hybridization painting with repeat specific probes. By contrast, we have undertaken an analysis at the molecular level over large stretches of contiguous DNA cloned as YACs. We present evidence for the partition of the genome into relatively Alu-rich/Ll-poor and Alu-poor/Ll-rich regions from the analysis of 21 randomly selected YAC recombinants with an average human insert size of 100 kbp. We estimate that there are 29 L1 and 618 Alu repeats in the 2260 kbp of DNA cloned and analyzed, giving an average spacing of 78 kbp and 3.65 kbp respectively,

B. Arveiler and D.J. Porteous: YACs partition of Alu and L1 repeats close to the global estimates. This suggests that, when taken together, the reeombinants analyzed constitute a representative sample. However, none of the clones contain the average density of one Alu per 4 kbp and one L1 per 70 kbp, and the range of under- and overrepresentation of L1 and Alu repeats varies by more than an order of magnitude between independent YACs. Notably, we observe an inverse relationship between the L1 and the Alu repeat distribution (Fig. 3); all the most Alu-rich clones are L1 depleted, and the majority of L1 positive clones are Alu poor. Many of the Ll-negative YACs are smaller than or just approximate the overall average spacing (70 kbp) of L1 repeats in the human genome, and this constrains the analysis when the clones are taken individually. Statistical analysis of the L1 content of the smallest 10 YACs (inferior in size to 100 kbp) as a group, however, shows that they contain fewer L1 repeats than would be expected. Indeed, the sum of the sizes of these YACs is 705 kbp (Table 1), and one would therefore expect, considering that the average spacing between L1 repeats is 78 kbp in our sample, to find 9 L1 repeats, whereas only 2 are observed. When submitted to a • square test, these data give the value of 6.8 (p < 0.01). This indicates that L1 repeats are significantly underrepresented in these YACs and that this is unlikely to occur by chance. Hence, the size of the clones cannot be held responsible for the depletion in L1 repeats. It is notable that in the same 10 smallest YACs, a total of 288 copies of Alu elements are observed for 193 expected (• = 67.6; p < 0001), thus showing a marked overrepresentation of these repeats in this group of Ll-depleted YACs. We also find evidence for extensive tracts of the genome being relatively depleted for both classes of interspersed repeats (clone C8). Some stretches of DNA are heavily packed with Alu's, as suggested by the intensity of some bands detected with probe BLUR8 (clone B6, for instance, gave very dark autoradiographic signals). Although one could argue that strong signals can be accounted for by a stronger homology of individual repeats to the BLUR8 probe used in this study, this does not seem to be the case, since identical relative intensities were observed when human Cotl DNA or Alu-specific BLUR8 clone were used as probes. Rather, our observations are in accord with those recently described for the human glucose-6-phosphate dehydrogenase (G6PD) gene, where 12 Alu repeat units were found within less than 10 kbp by sequencing (Chen et al. 1991). Statistical analysis of the current human DNA database provides strong evidence for a bias towards head-to-tail orientation of closely spaced Alu repeats (Moyzis et al. 1989), whereas only inversely oriented Alu repeats can promote a PCR in single primer reactions. Thus, the close agreement between the number of BLUR8 hybridizing bands and the number of AIuPCR fragments for independent clones merits further comment. The apparent conflict can be resolved if we make the reasonable assumption that at least a portion

B. Arveiler and D.J. Porteous: YACs partition of Alu and L1 repeats

of adjacent Alu repeats are inversely oriented. The data presented by Chen et al. (1991) from the sequence analysis of 20 kbp of genomic DNA in the human G6PD locus support this model: although all the closest Alu repeats are oriented in a head to tail fashion, seven pairs of inversely oriented Alu elements separated by distances compatible with PCR amplification can be defined. The alternative explanation that extra AIu-PCR bands are generated by skipping over and through closely adjacent, directly oriented Alus is not consistent with the general observation that the majority of inter-Alu PCR fragments behave essentially as single copy fragments (Nelson et al. 1989; BrooksWilson et al. 1990; our unpublished data). We present evidence for a very high density of L1 repeats in some regions of the human genome, as examplified by clones A1 (at least seven copies per 150 kbp), A3 (four copies per 100 kbp) and B10 (six copies per 175 kbp). As a corollary to this clustering of L1 repeats in some regions, we observed that 735 kbp of the analyzed DNA, i.e., 38% of the total, were L1 free. We also found an unexpectedly high proportion (>50%) of full-length elements. Notably, two or three full-length repeats were found within 100-150 kbp of cloned DNA in several instances (YACs A1, A3, and D1). This compares with the overall predicted average of one full-length unit per 200-300 kbp. Our results support the hypothesis (Korenberg and Rykowski 1988; Bickmore and Sumner 1989) that there are long AIu/GC- and L1/AT-rich stretches in the human genome; this may reflect preferential insertion into different regions, the result either of base composition or of the coordination of chromosome replication and retroposition of these two classes of repeats. We are currently localizing each YAC recombinant by fluorescent in situ hybridization (Breen et al. 1992). It will be interesting to see whether the Alu-rich YACs map consistently to reverse bands and the Ll-rich recombinants to Giemsa bands. This would lend credence to the attractive model built from global analysis (Korenberg and Rykowski 1988; Boyle et al. 1990), but which currently lacks molecular precision. YAC recombinants such as these will also be useful for analyzing other discontinuous features of the mammalian genome, including the distribution of other repeated DNA sequences and motifs, methylation status, and transcriptional organization, and will offer the possibility of integrating the distribution of repeat elements with the compositional maps of chromosomes (Gardiner et al. 1990), based on the subdivision of the gehome into isochores (Bernardi et al. 1985; Bernardi 1989).

Acknowledgments. We thank Dr. Nick Hastie for valuable insight,

encouragement, and criticism; Darryl Green for help and advice with gel scanning; Sheila Christie and Isobel Murray for expert technical support; Dr. Andrew Carothers for help with the statistics; and Sandy Bruce, Norman Davidson, and Douglas Stuart for excellent art contributions. This work was supported by the Medical Research Council, the UK Human Genome Mapping Program, and by a Wellcome Fellowship to B. Arveiler.

667

References Arveiler, B., Murray, I., Stevenson, B., and Porteous, D.J.: Construction of a library enriched for human Chromosome 11 and Xp YAC recombinants. Mammalian Genome I: 265-2615, 1991. Bains, W.: The multiple origin of human Alu sequence~,~.J Mol Evol 23: 189-199, 1986. Bennett, K.L., and Hastie, N.D.: Looking for relationships between the most repeated dispersed DNA sequences in the mouse: small R elements are found associated consistently with long MIF repeats. EMBO J 3: 467-472, 1984. Bernardi, G.: The isochore organisation of the human genome. Annu Rev Genet 23: 637-661, 1989. Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M., and Rodier, F.: The mosaic gehome of warm-blooded vertebrates. Science 228: 953-958, 1985. Bickmore, W.A. and Sumner, A.T.: Mammalian chromosome banding: an expression of genome organization. Trends Genet 5: 144148, 1989. Boyle, A.L., Gwyn Ballard, S., and Ward, D.C.: Differential distribution of long and short interspersed element sequences in the mouse genome: chromosome karyotyping by fluorescence in situ hybridization. Proc Natl Acad Sci USA 87: 7757-7761, 1990. Breen, M., Arveiler, B., Murray, I., Gosden, J.R., and Porteous, D.J.: YAC mapping by FISH using Alu-PCR-generated probes. Genomics 13: 726-730, 1992. Brooks-Wilson, A.R., Goodfellow, P.N., Povey, S., Nevalinna, H.A., de Jong, P.J., and Goodfellow, P.J.: Rapid cloning and characterization of new chromosome 10 DNA markers by Alu element mediated PCR. Genomics 7: 614---620, 1990. Burke, D.T,. Carle, G.F., and Olson, M.V.: Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science 236: 806-812, 1987. Chen, E.Y., Cheng, A., Lee, A., Kuang, W.J., Hillier, L., Green, P., Schlessinger ,D., Ciccodicola, A., and D'Urso, M.: Sequence of human glucose-6-phosphate dehydrogenase cloned in plasmids and a yeast artificai chromosome. Genomics 10: 792--800, 1991. Deininger, P.L., Jolly, D.J., Rubin, C.M., Friedmann, T., and Schmid, C.W.: Base sequence studies of 300 nucleotide renatured repeated human DNA clones. J Mol Biol 151: 17-33, 1981. Elder, J.K., Green, D.K., and Southern, E.M.: Automatic reading of DNA sequencing gel autoradiographs using a large format digital scanner. Nucleic Acids Res 14: 417--424, 1986. Gardiner, K., Aissani, B., and Bernardi, G.: A compositional map of human chromosome 21. EMBO J 9: 1853-1858, 1990. Gosden, J., Hanratty, D., Starling, J., Fantes, J., Mitchell, A., and Porteous, D.: Oligonucleotide primed in situ DNA synthesis (PRINS): a method for chromosome mapping, banding and investigation of sequence organization. Cytogenet Celt Genet 57: 100104, 1991. Green, E.D., Riethman, H.C., Dutchik, J.E., and Olson, M.V.: Detection and characterization of chimeric yeast artificial chromosome clones. Genomics 11: 658-669, 1991. Korenberg, J.R. and Rykowski, M.C.: Human genome organization: Alu, Lines, and the molecular structure of metaphase chromosome bands. Cell 53: 391-400, 1988. Labuda, D. and Striker, G.: Sequence conservation in Alu evolution. Nucleic Acids Res 17: 2477-2491, 1989. Manuelidis, L. and Biro, P.A.: Genomic representation of the HindlII 1.9 kb repeat DNA. Nucleic Acids Res 10: 3221-3229, 1982. Maule, J.C. and Green, D.K.: Semi-conductor controlled contourclamped homogeneous electric field apparatus. Anal Biochem 191: 390-395, 1990. Moyzis, R.K., Torney, D.C., Meyne, J., Buckingham, J.D., Wu, J.R., Burks, C., Sirotkin, K.M., and Goad, W.B.: The distribution of interspersed repetitive sequences in the human genome. Genomics 4: 273-289, 1989. Nelson, D.L., Ledbetter, S.A., Corbo, L., Victoria, M.F. RamirezSolis, R., Webster, T.D., Ledbetter, D.H., and Caskey, C.T.: Alu polymerase chain reaction: a method for rapid isolation of humanspecific sequences from complex DNA sources. Proc Natl Acad Sci USA 86: 6686--6690, 1989. Porteous, D.J., Morten, J.E.N., Cranston, G., Fletcher, J.M.,

668 Mitchell, A., van Heyningen, V., Fantes, F.A., Boyd, P.A., and Hastie, N.D.: Molecular and physical arrangements of human DNA in HRASl-selected, chromosome mediated transfectants. Mol Cell Biol 6: 2223-2232, 1986. Scott, A.F., Schmeckpepper, B.J., Abdelrazik, M., Theisen Comey, C., O'Hara, B., Pratt Rossiter, J., Colley, T., Heath, P., Smith, K.D., and Margolet, L.: Origin of the human L1 elements: proposed progenitor genes deduced from a consensus DNA sequence. Genomics 1: 113-125, 1987.

B. Arveiler and D.J. Porteous: YACs partition of Alu and L1 repeats Shafit-Zagardo, B., Brown, F.L., Maio, J.J., and Adams, J.W.: KpnI families of long, interspersed repetitive DNAs associated with the human [3-globingene cluster. Gene 20: 397--407, 1982a. Shafit-Zagardo, B., Maio, J.J., and Brown, F.L.: KpnI families of long, interspersed DNAs in human and other primate genomes. Nucleic Acids Res 10: 3175-3193, 1982b. Sherman, F., Finck, G.R., and Hicks, J.B.: Methods in yeast genetics-laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1983.

Distribution of Alu and L1 repeats in human YAC recombinants.

Evidence is accumulating that the two major families of interspersed repeated human DNA sequences, Alu and L1, are not randomly distributed. However, ...
1MB Sizes 0 Downloads 0 Views