k'!r,) 1991 Oxford University Press

Nucleic Acids Research, Vol. 19, No. 21 5839-5842

(A or T) sequences as probes and primers suitable genomic mapping and fingerprinting

W

for

Radoje Drmanac+, Dean Nizetic, Gregory G.Lennon, Anita Beitverda and Hans Lehrach* Genome Analysis Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London WC2A 3PX, UK Received September 19, 1991; Accepted October 2, 1991

ABSTRACT A limitation to the use of oligonucleotide probes as tools for genetic and physical mapping has been the low hybridization positive frequency obtained by oligonucleotides of sufficient length to hybridize preferentially to cloned insert DNA (and not host E. coli genomic DNA.) Both computer and experimental results now indicate that oligonucleotide probes composed of W (A or T) sequence are preferentially found in eukaryotic DNA, and can be used to provide high frequency, discriminative hybridization. Such W sequences may be useful as either probes or PCR primers in molecular diagnostic applications as well as in genetic and physical mapping. INTRODUCTION The generation and integration of physical and genetic maps of complex genomes is an important goal of genome analysis. To that end, a variety of restriction mapping and hybridization approaches are in use (1-3). With the advent of the polymerase chain reaction (4) additional approaches have become possible, either using unique primers to generate sequence tagged sites (5), species specific primers based on repetitive sequences as in AluPCR or B1-PCR amplification (6-8), or even completely random primer sequences (9). Approaches to fingerprint genomic clones can be divided into either gel-based approaches, which accumulate complete fingerprinting data on individual clones sequentially (10-15), and hybridization-based approaches, which simultaneously generate bits of fingerprinting data on many clones (16-19.) Because the latter approaches are relatively insensitive to genome size (20), they are likely to be more suitable for large genomes such as mammalian genomes. To maximize the utility of hybridization approaches, one would ideally minimize clone preparation steps while maximizing the number of clones simultaneously hybridized, and, have positive hybridization frequencies for each probe used approach 30-50% (16). Clone preparation is easiest for hybridization filters prepared by in situ lysis (21,22) of robotically spotted filters; this does leave, however, the E. coli host genome present as a background to the cosmid DNA. As for the number of hybridizations, *

+

To whom correspondence should be addressed Present address: Argonne National Laboratories, Argonne, IL 60439, USA

simulations indicate that it would be possible to order human genome size libraries using about 150 probes if each hybridized to about 30% of the cosmids (H. Lehrach, unpublished.) Such probes could consist of pools of clones, or, relatively short oligonucleotides. Pools of clones, however, are limited for technical reasons (the presence of high and moderately abundant repeats) in their maximum pool size, which in turn limit the frequencies of positives they can achieve. On the other hand, short random oligonucleotides, for example 9mers, should hybridize to 30% of the cosmids; but unfortunately the presence of significant numbers of copies of the 9mer sequence within the background E. coli genome makes the use of oligonucleotides this short impossible. For example, random 9mers, 8mers, and 7mers occur 40, 160, and 560 times in the E. coli genome, which is in the range of copy numbers for cosmids (10-50 copies/bacterium) or even plasmids (100-500 copies/bacterium). Longer oligonucleotides, 11-15 nucleotides long, can be successfully used (17,21) but will yield much lower positive hybridization frequencies. One alternative is to increase the positive hybridization frequency of oligos of such length by basing their sequence on repetitive sequences. Oligonucleotides based on simple tandem repeats (22), transposon-like sequences, LINES and other repeats, and VNTRs have so far given rise to about 25 useful probes with frequencies ranging from 4-40%. However, there is a limit to the number of different useful probes that can be generated from known (abundant) repetitive sequences. In this report we present computational and experimental evidence for high frequency, discriminative oligonucleotide probes based on W (A or T) sequences. These provide the potential to generate large numbers of different hybridization fingerprinting probes, and may also be used as a source of PCR primers for genetic and physical mapping.

MATERIALS AND METHODS Computational Methods Contiguous sequence from four human genomic loci (listed below) and 1.7 Mb of E. coli sequence were searched using the University of Wisconsin Genetics Computer Group software

5840 Nucleic Acids Research, Vol. 19, No. 21 package (23). The four loci as represented in GenBank were: 1) human phosphoribosyl transferase (HPRT) (56736 bp), 2) human growth hormone (HGH), (66495 bp), 3) human factor IX (FIXG), (38059 bp), and human beta hemoglobin (HB), (73326) bp. Clone preparation and hybridization Cosmid cultures were grown to saturation in microtiter plate wells in liquid medium containing kanamycin as described (22). Small amounts of culture were transfered robotically (G. Zehetner, unpublished) by sterilized metal pins onto Hybond N + membranes (Amersham) prewet in medium. Membranes were incubated on agar plates overnight at 37'C until colonies were 1 - 3mm in diameter. The colonies were then lysed in situ, with their DNA denaturing and being bound to the membrane as described (21). Hybridizations were performed as described (24) with some modifications; membranes were preincubated in 4 x SSC, 7% sarkosyl at room temperature overnight without shaking. Kinased oligonucleotide probe was added to 0.6 picomoles/ml in the same solution, and the membrane then incubated for 12 hours at 4°C in a plastic box with moderate shaking. Membranes were rinsed several times with the same solution (without probe). Stringent washes were performed in the same solution, with the duration and temperature varying between probes (exact conditions given in figure legends.)

RESULTS W probes and their frequency in 235 kb of known human sequence

Eukaryotic genomes contain more A and T than G and C nucleotides. In addition to an overall average value of 60 -70% A+T there are specific regions with very high A+T content (25 -29.) Presumably, genomic sequence consisting of A +T runs of over 10 nucleotides must therefore be common. If this is true, and such W sequences are not abundant in the E. coli genome, one would expect that W probes of over 11 nucleotides would provide sufficient discrimination against the E. coli genome while being quite common in eukaryotic genomes. We tested these predictions by a computer analysis of the four longest contiguous human sequences, representing approximately 235 kb, and the E. coli genomic sequence of 1.3 million bp as found in GenBank.

The total content of W sequences was determined first. Searches were conducted using the FIND function of the UWGCG software (23), entering different lengths of W sequences with S (C or G) at both ends. All lengths of W stretches longer than 11 nucleotides were tested. The number of finds for each W length in the four human loci is shown (Table 1.) An average number of finds per million bp of human DNA was calculated from these results and is shown in column 6. Similar calculations were performed on the E. coli sequence (column 7) and as predicted for random sequence (last column.) In both E. coli and human sequence W stretches are more abundant than in random sequence. W12 stretches are 1.83 and 7.33 fold more abundant for E. coli and human (respectively), while for longer W sequences the difference is even larger (10 and 50 fold.) Human sequences are very rich in W stretches longer than 18 nucleotides. The total content of W 12 and longer is 2.2 % of the sequence searched for human, while 5 fold less for E. coli (about 14 and 3 fold more than predicted for random sequences.) The average distance between neighbouring W stretches in human sequences was 0.7 kb, theoretically sufficient to allow PCR using different pools of W primers (see discussion.) Most Alu elements contain A + T rich sequence at their 3' ends, and as such, represent a significant source of W stretches. Yet, the amount of W sequence need not directly correlate to the number of Alu elements. For example, in the FIX locus there are only 4 Alu elements, yet there are 65 W stretches longer than 11 nucleotides. On the other hand, in the almost 2 fold longer HGH locus there are fewer W stretches (only 53) while there are 20 Alu elements. The analysis of the structure of the W sequences revealed that certain W sequences predominated. The most abundant types of W sequences were determined and are summarized in Table 2. The W sequences containing a long A (or, T) tail were most frequent, being 50 fold higher than in random sequence. For the first two groups in Table 2, one would expect up to 10% positive hybridization frequencies with human cosmids, equal to the frequency expected with a random 9mer. The average number of finds for one W13 probe in E. coli is about 0.3 copies/genome, over 100 fold less than for random 9mers. W sequences with interrupted A or T tails (type 5) are, however, 5 fold less frequent than tupes I or 2. W simple repeats (for example, (A)n, (AT)n, (AAT)n) were Table 2. The most abundant types of W sequences.

Table 1. Content of W

sequences

in human and E.Coli genomic DNA

Type Sequence

length (W)

FIX 38kb

HPRT HGH HB Human Coli Random 56.7kb 66.5kb 73.3kb per 1Mb per 1Mb per 1Mb

1

12 13 14 15 16 17 18 19 20 >20 Total (%) /Coli /Random Av. Dist.

18 9 12 3 3 2 2 1 1 5 2.2 5 13.75 0.7

30 28 18 23 12 6 3 6 3 22 2.8 6.4 17.5 0.4

2

16 16 4 2 4 4 1 1 1 6 1.24 2.8 7.75 1.4

39 23 19 12 9 11 4 3 1 10

2.64 6 16.5 0.6

440 320 230 170 120 100 40 40 20 180 2.2 5 13.75 0.7

110 70 40 30 30 20 10 5 5 6 0.44 1 2.75 3

60 30 15 7.5 3.7 2 1 0.5 0.25 0.5 0.16 0.36 1 8

3 4

(A)7T(W)5 (T)7A(W)5 (A)6T(W)6

13

64

396

10.1

(T)6A(W)6

13

128

528

7.6

(A)7T(W)6 (T)7A(W)6 (A)6T(W)7

14

128

940

4.3

14 (A)3T(W)5T(A)3 (T)3A(W)SA(T)3 13 (AT)4T(W)4 (W)4T(TA)4 (AAT)3T(W) 13 (W)3T(TAA)3 (T)9SNNWW 14 (A)9SNNWW

384

1432

2.8

64

1671

2.4

48

705

5.7

256

955

4.2

(T)6A(W)7

5 6

7

Length Num. of prob. Aver. Dist. % of +Cos

Nucleic Acids Research, Vol. 19., No. 21 5841 not included in this analysis. If anything, they are too frequent', and form very long stretches that may cause hybridization inconsistencies. Combinations of (AT)4 and (AAT)4 and parts of type 5 W sequences also have very high frequencies.

A: 0

ATTATA9, .*

dk

&

I1

I

I

.

IV.

.

e

½

10

.

9,' *0

0

W.*49,

Fingerprinting of yeast and human cosmid clones with W probes To confirm the predicted high frequencies and discrimination of W oligonucleotide probes, 40 such probes were tested on membranes containing either 9,216 Schizosaccharomyces pombe,

'as.

0 .0 0

11

0

0.

0

.

's

.

or 1,536 human X chromosome cosmid clones. In most cases, a large number of positive cosmids was observed with a range of signal intensities. Sample hybridization results are shown in

6

t .

0

.

Taking the high frequencies of (A)n stretches into account, one may design high frequency probes with some G +C nucleotides included. In theory, 128 l3mers of sequence WNN(C/G)(T)9 and WNN(C/G)(A)9 could have an avenage positive cosmid clone frequency of about 8 %. Combining such probes and type 1 and 2 probes (Table 2) yields about 300 probes that can be generated with the potential to hybridize on average to about 10% of cosmids with human inserts.

Figures 1 and 2.

4i, 0 ey

0

0

A7 T6

a

0-

.0

0o

.

A;

0

0t 0

0

0

.

.

'o

a

i

"

e.gIs* S

4

0

a

0

0, 0 0

-'S0 0

*

0-0

0'

**

I 0

0

a I

A

0

S

a

0

*

*

0 *

..

.*4

*

9

SW4,..

~~~0 ~

a'

0

B:

.4

0

A5T4A4

I

0

AATTT(TA)3 AAT 5-

0

~~~0

B *4#.9*

0

*4

.4.5

0

.94

4

0

S 9. 0

lb

9

*

*

0

'

0

0

0

0

9

*

p

Figure

Figure

fingerprinting using a W probe. Two identical membranes (A and B) were hybridized with probes of sequence as indicated. Washes were in 4XSSC, 7% sarkosyl at 16'C for 90 minutes, then 22WC for 1.

S.

pombe

60 minutes, then a

cosmid

30WC

for 90 minutes, and

wash of 15'C for 8 hours followed

by

an

an

exposure of 15 hours (A), or,

exposure of 15 hours (B).

2. Human X chromosome cosmid

identical membranes (A and B)

indicated. Washes another wash at at

4C

9..

S

9

*

were

in

fingerprinting using a W probe. Two hybridized with probes of sequence as 7 % sarkosyl at 4C for 4 hours, then either

were

4x5SSC,

105C for 2 hours and

for 3 hours, then

22WC

and exposure of 36 hours (B).

an

exposure of 15 hours (A)

for 4 hours, followed

by

37WC

or a

wash

for 5 hours and

5842 Nucleic Acids Research, Vol. 19, No. 21 Twenty nine probes were designed based on (A)7T(W)5 or (T)7(A)W5 formulas (type 1, Table 2) or the slightly longer variants thereof like (A)8T(W)5 and (A)9T(W)5. The average positive frequency of human and S. pombe cosmids was 12.7 % and 7.9%, respectively. Similarly, five probes of sequence formula (A)7T(W)6 or (A)8T(W)6 (type 3, Table 2) gave an average frequency of 14.8% among human cosmids and 11.3 % among S. pombe cosmids. Four type 6 (Table 2) probes gave slightly lower frequencies of 7.1 % among human cosmids. It is apparent that the range of signal intensities is greater with human than with S. pombe clones, consistent with known structural features of human genomic DNA such as uneven distribution of Alu elements and isochore organization (30). About 20% of the probes tested gave high hybridization backgrounds with E. coli DNA (figure 2b), presumably because these particular sequences have numerous full or mis-match targets in the E. coli genome. Not all W probes are therefore likely to be useful; however, the majority may even be successful in fingerprinting clones from other eukaryotic organisms. For example, dot blot hybridizations to Drosophila melanogaster genomic DNA indicates a similiar content of W sequences as in human genomic DNA (not shown.)

DISCUSSION Computer analyses (tables 1 and 2) and hybridization results (figures 1 and 2) reveal a high content of W sequence in human DNA. Cosmid colony hybridization additionally indicate that these sequences are abundant in the S. pombe and D. melanogaster genomes. Eukaryotic genomes therefore seem particularly rich in W sequences. This feature can form the basis for the application of at least two strategies for the physical and genetic mapping of these genomes. First, highly informative probes in the form of oligonucleotides designed as W sequences can, as demonstrated, be used in the direct fingerprinting of cosmid clones as contained within host E. coli colonies. With improved colony lysis procedures (21) and low temperature hybridization and washing conditions (24) these oligonucleotides can form sufficient numbers of hybrids and give good hybridization discrimination. Given the number of permutations possible of 13 or more W nucleotides, there are hundreds of potential fingerprinting probes that could, together with simple tandem and other repeat probes, complete the battery of probes needed for genomic library ordering by oligonucleotide

hybridization fingerprinting (16,20,22). Second, PCR primers based on W sequences could be used for amplification of regions of whole genomic DNA or on DNA from individual chromosomes separated by flow cytometry. Unlike Alu primers, however, W primers would not be species specific. As the total number of all W sequences of a certain length L is only 2L, the total complexity of sequences in the informative range (12-16 nucleotides) is also limited. Combinations of W primers of different complexities could therefore be chosen for different amplification targets. Finally, as there are several reports of the associations of A or T rich sequence stretches with useful polymorphic genetic

markers, it might be possible to generate a set of W probes designed specifically to locate polymorphic sites. We therefore believe that the further study and use of W probes will provide an important set of tools for molecular diagnostics as well as physical and genetic mapping.

REFERENCES 1. Smith, C.L.. Economou, A., Schutt, S., Klco, S. & Cantor, C.R. (1987) Science 236:1448 - 1452. 2. Poustka, A., Lehrach, H., Williamson, R. & Bates, G. (1988) Genomics 2:337-345. 3. Herrmann, B.G., Barlow, D.P. & Lehrach, H. (1987) Cell 48:813-825. 4. Saiki, R., Scharf, S., Faloona, F., Mullis, K.B., Horn, G.T., Erlich, H.A. & Arnheim, N. (1985). Science, 230:1350-1354. 5. Olson, M., Hood, L., Cantor, C. & Botstein, D. (1989) Science 245:1434- 1435. 6. Nelson, D.L., Ledbetter, S.A., Corbo, L., Victoria, M.F., Ramirez-Solis, R., Webster, T.D., Ledbetter, D.H. & Caskey, C.T. (1989). Proc. Natl. Acad. Sci. USA 86:6686-6690. 7. Monaco, A.P., Lam, V.M.S., Zehetner, G., Lennon, G.G., Douglas, C., Nizetic, D., Goodfellow, P.N., & Lehrach, H. (1991) Nucleic Acids Res. 19 (12):3315-3318. 8. Cox, R.D., Copeland, N.G., Jenkins, N.A. & Lehrach, H. (1991) Genomics 10:375 -384. 9. Williams, J.G.K., Kubelik, A.R., Livak, K.J., Rafalski, A.J. & Tingey, S.V. (1990). Nucleic Acids Res. 18:6531-6535. 10. Kohara, Y., Akiyama, K. & Isono, K. (1987) Cell 50:495-508 11. Coulson, A., Sulston, J., Brenner, S. & Kam, J. (1986) Proc. Natl. Acad. Sci. USA 83:7826-7830 12. Olsen, M.V., Dutchik, J.E., Graham, M.Y., Brodeur, G.M., Helms, C., Frank, M., MacCollin, M., Scheinman, R. & Frank, T. (1986) Proc. NatI. Acad. Sci. USA 83:7826-7830 13. Carrano, A.V., Lamerdin, J., Ashworth, L.K., Watkins, B., Branscomb, E., Slezak, T., Raff, M., De Jong, P.J., McBride, L., Meister, S. & Kronick, M. (1989) Genomics 4:129-136. 14. Harrison-Lavoie, K.J., John, R.M., Porteous, D.J. & Little, P.F. (1989) Genomics 5, 501 -509. 15. Stallings, R.L., Torney, D.C., Hildebrand, C.E., Longmire, J.L., Deaven, L.L., Jett, J.H., Doggett, N.A. & Moyzis, R.K. (1990) Proc. Natl. Acad. Sci. USA 87:6218-6222. 16. Michiels, F., Craig, A.G., Zehetner, G., Smith, G.P. & Lehrach, H. (1987) CABIOS 3, 203-210. 17. Craig, A.G., Nizetic, D., Hoheisel, J.D., Zehetner, G. & Lehrach, H. (1990) Nucleic Acids Res. 18, 2653-2660. 18. Hochgeschwender, U., Sutcliffe, G.J. & Brennan, M.B. (1989) Proc. Natl. Acad. Sci. USA 86, 8482-8486. 19. Evans, G.A. & Lewis, K.A. (1989) Proc. Natl. Acad. Sci. USA 86, 5030-5034. 20. Lehrach, H., Drmanac, R., Hoheisel, J., Larin, Z., Lennon, G., Monaco, A.P., Nizetic, D., Zehetner, G. & Poustka, A. (1990) In: Genome Analysis Volumel: Genetic and Physical Mapping. Cold Spring Hrbor Laboratory Press, Cold Spring Harbor (NY), pp:39-81. 21. Nizetic, D., Drmanac,R., and Lehrach,H. (1991) Nucleic Acids Res. 19,182 22. Nizetic, D. Zehetner, G., Monaco, A.P., Gellen, L., Young B.D. & Lehrach, H. (1991) Proc. Nati. Acad. Sci. USA. 88,3233-3237. 23. Devereux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids Res.

12(1):387-395. 24. Drmanac R., Strezoska, Z., Labat, I., Drmanac, S. & Crkvenjakov, R. (1990) DNA and Cell Biol., 9:527-534. 25. Caddie, M.S., Lussier, R.H. &Heintz N.H. (1990)J. Mol. Biol. 211:19-33. 26. Boerwinkle, E., Xiong, W., Fourest, E. & Chan, L. (1989). Proc. Natl. Acad. Sci USA 86:212-216. 27. Chebloune, Y., Pagnier, J., Trabuchet, G., Faure, C., Verdier, G., Labie, D. & Nigon, V. (1988). Proc.Natl.Acad.Sci.USA 85:4431-4435. 28. Oka, K., Tkalcevic, G.T., Nakano T., Tucker H., Ishimura-Oka K. & Brown W.V. (1990) Biochim.Biophys.Acta 1049:21-26. 29. Bowcock S., Anuradha, R., Ehrlich, H. & Sehgal P.B. (1989) Nucleic Acids Research 17:6855 -6864. 30. Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M. & Rodier, F. (1985) Science 228:953-958.

W (A or T) sequences as probes and primers suitable for genomic mapping and fingerprinting.

A limitation to the use of oligonucleotide probes as tools for genetic and physical mapping has been the low hybridization positive frequency obtained...
1MB Sizes 0 Downloads 0 Views