tation, has had an enormous impact on ‘bottom-up’ genome mapping projects(”. Their advent has made a complete clonal map of any genome a possibility. and indeed for the Caenorhabditis elegans genome close to a reality.

Summary During the past decade, it has become apparent that it is within our grasp to understand fully the development and functioning of complex organisms. It is widely accepted that this undertaking must include the elucidation of the genetic blueprint - the genome sequence - of a number of model organisms. As a prelude to the determination of these sequences, clonebased physical maps of the genomes of a number of multicellular animals and plants are being constructed. Yeast artificial chromosome (YAC) vectors, by virtue of their relatively unbiased cloning capabilities and capacity to carry large inserts, have come to play a central role in the construction of these maps. The application of YACs to the physical map of the Caenoi-hahditis elegans genome has enabled cosmid clone ‘islands’ to be linked together in an efficient manner. The long-range continuity has improved the linkage between the genetic and physical maps, greatly increasing its utility. Since the genome can be represented by a relatively small number of YACs, it has been possible to make replica filters of genomically ordered YACs available to the community at large. Introduction Charles Cantor has a slide with a very simple message: genome maps are very easy to start; genome maps are very hard to finish. This applies equally to ‘top down’ macrorestriction maps and ‘bottom up‘ clonal reconstructions. In the case of global cosmid maps in particular, it very much understates the case. Cosmid maps are impossible to finish! It has become clear that while partial, and very useful, cosmid maps of eukaryotic genomes can be constructed, regions (sometimes apparently quite extensive) of these genomes are not clonable in available cosmid vectors. While many of these regions are apparently clonable, and therefore potentially mappable, in lambda vectors (especially using particular permissive hostsi1)) the generalised mapping of, say, a 100 megabase genome in these vcctors. with their relatively small inserts (less than 20kb, as opposed to over 40kb in a cosmid), is a daunting prospect. Thus the development of yeast artificial chromosome vectors(2), with their large insert capacity and relatively unbiased genome represen-

Cosmid Mapping The genome of the nematode C. elegans is thought to be about lOOMb(‘)’. The six chromosomes are essentially unbanded, and 17 % of the DNA is repetitive elements. There are probably some 5-10000 genes, so the genome is considerably more information dense than the human, particularly in the gene-rich central regions of the chromosomes, which fortunately seem to clone more easily in cosmids than the outlying regions. The physical mapping of the genome has been approached globally, no attempt having been made to separate the rather similar chromosomes. Construction of this map began with the analysis of randomly selected cosmid clones by a high-resolution ‘fingerprinting’ technique(’). Comparative matching by computer of the digitised images generated from double-digest restriction patterns (on polyacrylainide sequencing-type gels) was used to construct overlapping sets of cosmids known as contigs. The expectation is that as increasing numbers of cosmids are analysed, the number of contigs rises to a maximum, and then falls (in an ideal world, in this case, to 6!) as joins between contigs are made more frequently than new contigs are created. In the C. elegans project, the number of contigs fell to 700 (having passed through a maximum of 940) after the analysis of 17 000 cosmids. (Interestingly, these figures concur rather well with those obtained from a similar project to map the Arubidopsis genome(‘), also probably about 10OMb). Contigs up to about 1Mb had been generated, some 30% of the DNA having been aligned with the genetic map. This alignment is crucial to the functioning of the map for cloning genes defined through mutation and for positioning molecularly identified fragments on the genetic map and was achieved primarily through the cooperation, and generosity in supplying clones, of the C. elegans research community. Contigs were also mapped by in situ hybridisation techniques(7). However, the distribution of clones did not accord with that expected from the analysis of a truly random model library. This observation applied to cosmid libraries in a variety of vectors, containing inserts generated by a variety of partial restriction digests. Some improvement in distribution was seen when using the lorist series of which have a lambda based replication system as opposed to ColEl employed by pJB8, but deviation from randomness was still excessive. There is now considerable direct evidence of the complete lack of representation of some regions of the genome i n cosmid libraries. For instance, attempts to

probe for cosmids to fill a particular gap (using a YAC known to span it) produced only cosmids terminating at precisely the same positions as those already known. Consequently, it is not surprising that attempts to extend contigs by selective probing from libraries gave rather poor returns. The Advent of YACs While it was immediately apparent that YAC vectors could carry much larger pieces of foreign DNA (up to IMb) than any hitherto developed cloning vehicle, this was accompanied by the hope that they would not be prone to frequent rearrangement(’’) and: most importantly, that they would be more randomly representative of the genome than cosmids. In the nematode genome, while some problems, apparently very infrequent, of major rearrangements have been encountered, there is no question that their distribution is remarkably uniform. Initial attempts to map YACs by fingerprinting using the same scheme as that used to map the cosmids (a necessity if the YACs were to be related directly to the cosmid database) were unsuccessful. YACs of much over 120 kb yielded too many fingerprint bands for the pattern to be interpretable. Thus, fingerprinting would have meant losing one of the potential advantages of YACs, their size. Furthermore, because YACs have to be purified away from the host yeast chromosomes tiy pulsed field gel electrophoresis (being physically indistinguishable other than by size), sufficient material for fingerprinting is difficult to obtain. In contrast, preparation of sufficient material for probing experiments is relatively straightforward. Consequently, although some YACs were map ed by fingerprinting, a probing approach was adopted( ’). This was very much aided by the fortuitous fact that the YAC vector (all the YAC libraries used in this project have been in pYAC4(’) or PYACRC(’~))and the lorist vectors (in which more than 5 0 % of the analysed cosmids were cloned) have no significant homology. Probing between libraries in these vectors can therefore be done without thc necessity to purify vector-free fragments. The use of probing techniques for the mapping of the C. eleguns YACs has depended heavily on the ability to make large numbers of reusable duplicate ‘slave‘ copies of master colony grids(23) (SEE BOX).

F

YAC Mapping Basically. two approaches were taken for the initial mapping of the YACs. Larger YACs (selected from the random set based on their mobility on the preparative pulscd ficld gels) have been used to probe the cosmid grids. Secondly, the cosmid grids have been probed with YACs pre-selected as associated with cosmidcontig ends by probing the YAC grids with contigterminal cosmids. While the latter approach might have been expected to link cxisting cosmid contigs simply

Library grids The colony grids used in this project were initially made by toothpicking selected clones onto agar plates in a grid pattern viewed as a ternplatc on the underside of the petri dish. Gridding has subsequently been made less laborious and niore accurate by the use of a dcvicc based on that described by Mackenzie ef u / . ( * ~ )This . device, which incorporates a drilled template block through which inoculated pins are inserted, also permits the easy duplication of master filters. The favoured grid format which has been arrived at is for 960 clones (arrayed in ten 12x8 clone blocks with 2.5inm spacing in alternately offset lines) on 9 . 5 ~ 8 . 5 c m membranes. This size makes the probing of large numbers of filters in microtitre plate lids a relatively straightforward matter(”). Cosmids for thc grids were selected to represent the pre-existing contigs in an csscntially non-overlappi ng manner. Characterised but solitary cosmids were also included. The total set of 1930 cosmids was thus arrayed on two 9.5x8.5cm filters, i n an alphabetical rather than contig-related manncr. The YAC libraries involvcd in this project now total some 9000 clones, mostly in pYAC4. This set has expanded as the project has proceeded. The mean insert size is 200-250kb, up to a maximum of 70Ukb. Currently, 6650 of these are available as a working collection on a set of seven 8.5~9.5~111 filters. ~~

and efficiently (although with a greater possibility of error caused by repeat sequences), complete coverage of the map by YACs has been seen as a desirable goal both in order to check the construction of the cosmid contigs and also to lead, ultimately, to the production of ’minimal set‘ filter grids of YACs. The analysis of the largcr YACs proved to bc a rapid way o f approaching this end. As a rcsult of some 2500 probings of the cosinid grids by YACs, 1780 YACs now appear on the map (the shortfall is caused by a number of factors, primarily uninterpretable probings caused by repetitive sequences. We havc not bcen succcssful in attempts to eliminate this problem by competitive hybridisation. In any case, repetitive sequence problems seem to be insuperable in rather few places in the genomc; gencrally a combination of overlapping YACs can be found to overcome them. YACs positive for only a single cosmid are not included in the total of mapped YACs. and neither arc chimacric YACs which can give rise to apparently paradoxical results during probing. The exact number of these co-ligation events is difficult to extract from the data, but is thought to be less than 5 %. Simple technical failures also contribute to the shortfall). Of these 1780 YACs, 900 derive from the analysis of randomly selected large YACs (>230 kb), and the remaining 880 from preselection by probing of YAC grids with cosmids. Confidence in links generated between cosmid contigs by this probing approach varies according to the exact nature of the data (Fig. 1 ) . In about one third of cases , a previously undetected fingerprint overlap

(XI 10

HindlIl r i t e s

I I

C04H2

TO7H6

rn

W07D6 W05A2

m

C39C6

C14Fll

F 5 1C6 T01C12 C

C

'

'

I

I

I I I

I 1-

*

I

C04F10

80471 *

Cl3Cfl * -

K04E7

rn

C43C1 *

(Y19C10)

F36F9

=*

TllF12

DZ040

K04G12 ' C!lBl

F23C4

C15B12

Tl5ClO

(Y12C1 (YLBAB)

woSR9 *

3c5

C E 1 -

BClO *

C14El

-

I

=*I

*

4C3! * -

SEll' *

CG'7Z9X

CODA7 *

I I-

LYIHI)

TlOElO

I--==--

C32C12 zc43s

SBZ -

'

I

I

I

R07F8

KOlC2 AF6

c25

FlOC

sup-? c y a s T15G10 t Y18A6cy

Fig. 1. An example of linkage by nested YACs. Previous cosmid-cosmid matching had cstablishcd thc four contigs dclineated by vertical dashed lines. Cosmids (containing 35-40 kb inserts) are indicaled by light horizontal lines, of lengths proportional to number of Hind111 sites (see scale to left); asterisks indicate presence of additional, similar, clones in the database. Probing of thc YAC grid with K03A5 resulted in the sclcction of four YACs (indicated by heavy horizontal lines; estimated insert sizes: YlHl 230kb, Y12C1 290kh. Y18A6 180kb. Y19C10 170kh). The positions of these YACs, and, consequently, the linkage of the cosmid contigs. were established by probing of the cosrnid grid (in which these contigs are represented by the cosmids indicated by hatched lines): three sorts of join are apparent. The centre one is known to he correct because of a small revealed fingerprint overlap between F51G6 and SB2. The left hand one is strongly confirmed by the logical nesting of the YACs to multiple adjacent cosmids. The right hand one is not yct confirmed, becausc it might have resulted from a fusion of unrclatcd scqucnces in the cloning of Yl9C10; note, however. that the hybridisation to two cosmids in each contig indicates that we are not being misled by dispersed repeats. For the time being, the gaps between cosmid contigs h a w been arbitrarily set at five Hind111 sites. (Reprinted by permission from Nature vol. 335 p. lS5. Copyright 01988 Macrnillan Magazines Ltd.).

between cosmids was found. (Small overlap5 are hard to extract from the fingerprint data. and in any case, when one takes a necessarily conservative approach, they would not be incorporated without corroborative evidence). However, where no fingerprint overlap is detected, as is most frequently the case, any bridge spanned by a single YAC must be treated with suspicion, given the possibility of a chimaeric YAC. Furthermore any joins based on hybridisation to a single terminal cosmid of a contig are open to doubt because of the possibility that repeat sequences could be responsiblc. Positive hybridisations to two or more non-overlapping cosmids are therefore desirable. Of course, ultimately, given the refinement of the C. elegclns genetic map, concordancc between the physical and genetic maps is the best corroboration. This approach led to a rapid reduction in the number of contigs from the initial 700 cosmid contigs to less than 200, representing more than 90% of the genome. At this stage, home 70% of the coverage was physically mapped in 45 large contigs (the average contig size being more than 500kb) and YAC coverage of all contigs was essentially completc (see Fig. 2). Genomically Ordered Grids Less than 1000 selected YACs from a library with an average insert size of over 200 kb are required to give redundant (-2 fold) coverage of the C. elegans genome. A grid of such YACs (958 clones). genomically

ordered at the time of selection, has been prepared in the post-card sized array described (see box). Many replicas have been made for distribution, and in conjunction with interrogation of the electronic map (regularly updated world-wide at a number of nodes) it is now, in principle, straightforward for any laboratory to position any probe DNA to an average resolution of about 100 kb on the physical map (Fig. 3 ) . These grids have become referred to as 'polytene filters', the information obtained by their use being analogous to that obtaincd by in situ hybridisation of probes to insect polytene chromosomes. Map Closure and Status From the earliest days of C. elegaizs genetics(16),it has been apparent that on each autosome the genes are clustered when plotted by recombination mapping. This clustering is not related to a functional centromere, the chromosomes being holocentric. One of the interesting conclusions that can be drawn from the physical mapping observations is that much of this apparent clustering is accounted for by the suppression of recombination within the cluster('7). The ratio between the physical metric and genetic metric has been observed to fluctuate locally between SO (possibly as low as 10) kb/map unit outside the clusters and SO00 kb/map unit within the clusters. The regions outside the clusters on the autosomes also appear to be genuinely gene-poor, as well as recombination profi-

RHO/ 1

BW# 106 BW# 105

y1 I I I

0

.

I

. _I

. . . , . ,

10

I .

.

I

I

20

30

40

50

60

70

80

_ I . . . . l

90

1

.

.

J

100

Fig. 2. A plot showing the distribution of contigb on chromowme I The length scale assigned to the plot IS arbitray (the actual length ot the chromosome being unknoun). The total length of of contig coverage (continuously overlapped co\rnid\ and YACs indicated by horizontal wild lines with attached locations of physically mapped markers) 15 about 13 megabase The histogram beneath the L o r i t i g plot I \ 100O-point analysis of YAC distribution, indicating the number of YAC 5 covcnng each point. YAC coverage of t h e x Lontig\ can he seen to be cssentially complete (rrn-I is the ribosomal RNA cluster, the extent of YAC coverage of thi\ clu\ter 15 unknown) OIJ ’ilrlmdiilem

Fig. 3. Use of the YAC polytene grid in mapping a piasmid clone (IH4F5). This clone had too few Hind111sites (two) for unequivocal positioning when its fingerprint was compared lo the whole genome database. Probing of the polytene grid showed positive hybridisation to two YACs (Y39E7 and Y48C4) (autoradiogram; the grid format is as described in the box). These are indicated by heavy lines in the contig diagram. YACs froin this region present on the polytene grid but which showed no hybridisadon (Y53B1, Y60G2, Y41A12 and YS4H4) are indicated by undcrhatching. The location of IH4F5 having thus been limitcd to less than 200 kls. comparison of the 1H4FS fingerprint with cosmid figerprints from this region allowed precise positioning. (IH4F5 and autoradiogram courtesy of I. Hope. MRC LMB).

cient. Fortunately, mapping has proved to be most straightforward in the gene-rich regions. (In fact, all C. elegans genes cloned and mapped to date have ultimately proved to clone in cosmids. even those in regions where cosmid cloning in general seems particularly difficult). Thus, the regions of the genome currently mapped and aligned may well contain 95 % ’ of all genes. Mapping in these gene-poor regions in pursuit of particular genes has sometimes proved especially problematic. For instance, the cloning of the gcnes tra1(”) and tra-3(19) (both several map units outside the somewhat arbitrarily delineated clusters on LG’s I11 and 1V rcrpcctively) each rcquired individualistic approaches, involving lambda cloning and walking. and polymorphism mapping. The straightforward reciprocal probing strategies ultimately ceased to be effective, cosmid probes having

been exhausted. Another techniquc was therefore needed to approach final closure of the map. The bulk of the map consisted of coiitigs with protruding YACs as the terminal clones, and there was also a class of small conligs lacking associated YACs for a variety of reasons. In the case of the YAC ends, a means of directly detecting YAC-YAC overlaps was required, the problem\ being how to isolatc spccific probes and how to confirm overlaps indicated by hybridisation. Clearly, probes are most effective if isolated from the insert ends. We have adopted what is essentially a sequence tag site - STS -approach(”)). The sequence of the insert termini is generated by a direct sequencing method(21) dependent upon thermal cycle linear amplification sequencing. Using universal vector-specific primers, 100-150 nucleotides of sequence can be obtained from either end of the insert(”). Oligonucleotide pairs

derived from this sequence are synthesised, and the resulting polymerase chain reaction (PCR) product is used to probe the YAC grids. Hybridising YACs are then tested with the same primer pair to confirm the overlap by PCR. Because joins made by this method tend frequently to involve bridges formed by single YACs: they have to be viewed with some caution for two reasons. Firstly, the possibility that the linking YAC may be chimaeric (although the frequency of such YACs is low in the C. elegaizs libraries), and secondly the possibility that the linkage is based on a rare repeat sequence. Problems due to the latter could be resolved by Southern blot, given a reasonable degree of overlap between YACs. Again, sensible concordance with the genetic map is ideal confirmation. The small contigs lacking associated YACs were of a variety of types. Some were not represented on the cosmid grids because they consisted wholly of cosmids in the pJB8 vector, which, unlike the lorist series, has sequences in common with the pYAC4 vector. Others had repeated sequences which had led to ambiguous positioning, and others had simply failed to detect YACs on hybridisation to YAC grids. Again, this class of contigs was tackled by direct sequencing (in this case of cosmids), synthesis of oligonucleotide primer pairs for PCR fragmcnt generation and PCR confirmation of probe-positive YACs. Apart from the considerable number of small contigs which have been assigned a location in this way, less than a dozen appear to have at least some DNA that is not C. elegans derived and have therefore been set aside. Together, the.se efforts have reduced the contig number to 90. These joins, and the continuing assignment of genetically characterised fragments to the physical map, have led to 8.5 Mb of DNA being aligned with the genetic map. Thus the physical map of the C. elegans genome is proceeding somewhat laboriously, and one hopes inexorably, towards completion. Although in some respects a reasonable analogy: the construction of a ‘bottom-up’ physical map does not wholly resemble a jig-saw puzzle: physical maps do not become easier towards completion! Missing pieces, duplicale pieces, broken pieces, pieces from somebody else’s puzzle and misconstructed regions can all he very tiresome. It is perhaps not unreasonable to say that the existence of a clonal physical map has revolutionised the way in which the C. elegaizs community approaches the genome of the organism. The three concrete entities of the map (the computer-accessible database. the ‘polytene‘ YAC filters and the availability of YACs and cosmids on request) in addition to the ‘genomic communication‘ engendered by the map, have led to the cloning of the majority of genes being Ear less daunting a task than was previously the case. Extensive physical contiguity and genetic linkage are central to this utility, and would have been infinitely more laborious to attain without YACs. Furthermore, the

ability to construct these relatively low resolution physical niaps has facilitated the initiation of projects to determine the ultimate physical map - the DNA hequence - of a number of model‘ organisms. including C. elegany. These genome sequences will produce. ifnot a paradigm shift. at least a major change of attitude and approach by experimental biologists over the next decade. References A X D U%RTXAV, K. F. (1989). Hovt strain? that alleviate underrepresentation of specitic sequeiicc5: overview. In M~?ihod.r in ~ ~ ~ 152, ~i73-im. p ~ i o ~ ~ 2 BURKE.D . T., CARLE,F. G. A X D Orsoru, M. V . (1987). Cloning of large segments of exogenous DNA into ycast by mcaiis of artificial chromosome \‘tCtOIb. S&nce 236. 806-812. 3 JORDAN. B. (1990). YAC power. Riot5,oys 12, 183-1237, 4 SULSTON.J . E. AND BRENNER. S. (1973). The DNA of Cuenorhuhdifis

1 V’Y~LI.~N, A. R.

, J.. U R L N N ~ R S ,. A N D KAKN. J . (1986). Toward a physical map of thc geiiomc of thc nematode Cuenor1zahdiii.v&guns. Proc. Noti Acnd. Sci. L’SA 83, 7821-7825. 6 HAUGL, B. M.. HWLEY, S.. GIK.WDA~. J. AND C;OOD~MN. H. M. (1991). Mapping the Avahidopsis genome. In Moleczdanr Biology of Elan: Developmen: (cd G. Jciikinr). In prccs. 7 ALBERTSON, D. G. (1985). Mapping niuscle protein gener by in .\iru hybridisation using biotin-labelled probes. EMBO J. 3. 1227-1234. 8 LITTLE, P. F. K. AND CROSS,S. H. (1985). A cosmid vector that facilitates reslriction enzyme mapping. Pro(, AVurlAcud. Sc?. CiSA 82. 3159-3163. 9 GIRSOK, T. J.. COIJISON, A. R . . SIIISTON, J. F.. A N D LITTIE, P. F. R. (1987). Lorist2, a cosmid with transcriptional tcrniinators insulating vector gcnes from interference by promoters within the insert: cffect on DNA yield and cloncd insert frequency. Gene 53. 275-281. 10 Giusou, T. J . , ROSLN~IIAL, A. A N D WAILRSIUN. R. 11. (1987). Loristh. a cmrnid vector with BamHI. Kot1. Scar and Hind111 cloning ritcs and altered neornycin phosphotransfcrazc gcnc expression. Carre 53, 283-286. 11 WADI\.M.. LITTLE, R. D.. ~ I D I F.. . PORIA,ti., LABELLA. T., COOPER. T., VALL~ G.. D., D’URW,M. A N D SCHLOSSINGLR. D . (1990).Human Xq24-xq28: Approachzs to mdpping with yeaql artificial chromouomc?.ilm. .I. H u m . Genet. 46, 95-106. 12 COULSON-, A , . W . ~ I E R Y ~R., U UK , i k ~ ,J . . S L I L S T OJ~. . AND KOIIARA,Y (1988). Genome linking with yea$( artificial chromosomes. Nufur

YACs and the C. elegans genome.

During the past decade, it has become apparent that it is within our grasp to understand fully the development and functioning of complex organisms. I...
645KB Sizes 0 Downloads 0 Views