Plant Molecular Biology 8:497-507 (1987) © Martinus Nijhoff Publishers, Dordrecht - Printed in the Netherlands

497

Characterization of repetitive elements in several Cucurbita species Robert Frank Leclerc* and Albert Siegel

Department of Biological Sciences, Wayne State University, Detroit, M I 48202, USA (*current address." USDA ARS, Insect Attractants, Behavior, and Basic Biology Laboratory, 1600-1700 S.W. 23rd Drive, Gainesville, FL 32604, USA) Received 11 December 1986; in revised form 19 February 1987; accepted 20 February 1987

Key words." Cucurbita, repetitive DNA, sequence divergence Abstract About 1.3 x 105 copies of a tandemly arranged, 351 base pair element constitute from 4 to 8°70 of the Cucurbita pepo genome. A homologous, high copy number repetitive element is present in the genomes of C. moschata and C. foetidissima, but not in C. maxima or Cucumis sativus. Array lengths of at least 65 members have been detected for C. pepo, and 140 for C. moschata. The nucleotide sequence of six C. pepo repeat units were determined and each was found to be unique, differing from the others at from 7 to 23 positions. The element was composed of highly modified D N A as evidenced by the fact that only one of its two H p a I restriction sites is subject to digestion and by the observation that all internal Cs and about half of the external Cs of its single H p a II site are methylated. At least 5 cellular RNA species contain sequences complementary to one of the D N A strands of the element. These have discrete sizes of 3.0, 2.3, 1.8, 1.3 and 0.5 kb. Such transcripts are detected in C. pepo, C. moschata and C. foetidissima stem and/or leaf RNA, but not root RNA and are probably not polyadenylated. No transcripts were found which contain sequences complementary to the other D N A strand of the element.

Introduction Most plant and animal genomes contain D N A in excess of that estimated to be required for coding of structural genes and regulatory functions. This excess D N A is present as both unique and repetitive sequences, with the reiteration frequency of some sequences in excess of 106 copies per genome. The role of the non-coding repetitive D N A is not well understood, and in some plant genomes accounts for greater than 70°7o of the total DNA. Repetitive D N A is present in the centromeric and telomeric regions of the chromosome, as well as in chromosome arms. Repetitive sequences can be as short as a dinucleotide pair, or several kilobases in length. They may be present as tandem arrays, as dispersed

elements, or as displaced members of tandem arrays [14]. Functions that have been attributed to the noncoding repetitive D N A include regulation, transposition, determination of nuclear size, maintenance of chromosomal 3-dimensional conformation and involvement in chromosome pairing. Alternatively, the D N A may have no role at all and may be essentially parasitic or junk [reviewed in 14]. Since the role of much of this repetitive D N A remains obscure, its further characterization may serve to elucidate its biological significance. A plant repetitive sequence present in the genome of Cucumis melo has been described and characterized as a 352 base pair tandem element which occurs in ca. 1.8x105 copies [7, 8, 10, 21].

498 This communication describes and compares repetitive elements of the same length present in several species of Cucurbita, another genus in the Cucurbitaceae family. Of these, that in C. p e p o was most intensively studied with sequences of several individual repeat units determined in order to gain insight into diversification and homogenization processes which result in maintenance of an element family.

Materials and methods Seeds from Cucurbita p e p o L. cv. Small Sugar, C. m a x i m a Duchesne cv. Golden Hubbard, and C. m o s c h a t a Poir cv. Waltham Butternut seeds were obtained from Stokes Seeds Inc. C. f o e t i d i s s i m a HBK seeds were a gift from William Bemis of the Department of Horticulture, University of Arizona, Tucson. Leaf D N A was isolated as previously described [22]. Hind III digested C. p e p o D N A fragments of ca. 400 nt (Fig. 1) were eluted from an agarose gel

Fig. 1. (A) An agarose electropherogram of Hind III digested C. pepo DNA. Nuclear DNA was digested to completion with

Hind III and size separated on an 0.8% agarose gel (right lane). Arrows mark the position of the ca. 400, 800 and 1200 nt bands. Dots mark the position of the ribosomal DNA bands [22a]. The left lane contains lambda DNA digested with Hind III. (B) Hind III and Hpa I analysis of C. pepo DNA using acrylamide gel electrophoresis. C. pepo genomic DNA digested with Hind III (lane 1), Hpa I (lane 3) or Hind llI plus Hpa I (lane 2) was size separated on a 5% acrylamide gel. Molecular weight bands in lane 4 are pBR322 digested with Hpa lI.

[29], ligated into Hind III digested replicative form of phage M13mp9 and this construct used to transform competent Escherichia coli JM103 cells [15]. Replicative form of recombinant M13 clones was isolated, digested with either Hind III or Hind III plus Hpa I and size fractionated on a 5°7o acrylamide gel. Clones containing the proper fragment lengths were selected for further analysis. Electrophoresis of plant nuclear D N A was through a 1% agarose gel (BRL) using TAE buffer (40 m M Tris, 10 m M EDTA, 50 m M acetic acid, pH 8.2), or for high molecular weight DNA, through a 0.25% agarose gel (SeaKem GTG, FMC) at 10°C using BE buffer (40 m M b o r i c acid, 1 m M EDTA, pH 7.4). The D N A was partially depurinated [27] and the electrophoretically separated DNA transferred to GeneScreen (New England Nuclear) or nitrocellulose (Schleicher and Schuell) for Southern analysis [24]. R N A was extracted from leaves 4 to 6 cm in diameter as previously described [11, 23], and the poly A ( + ) RNA fractionated using an oligo-dTcellulose column [3]. The R N A was size fractionated on a 1% agarose-BE gel containing 5 m M methyl mercuric hydroxide [4], and transferred to diazotized paper [1] for Northern analysis. Labeled strand specific M13 hybridization probes having a specific activity of 1 × 10 7 tO 1 × 108 cpm/~g were prepared as described [5] using a hybridization probe primer (P-L). RNA transcripts of the 351 nucleotide Hind III fragment inserted in the pSP65 vector in both orientations for use as probes were prepared as described by Promega Biotech for their Riboprobe system. Prehybridization and hybridization conditions and washes for Northern and Southern blots were as described [18] except that the final high stringency filter wash was 0.125 × SSC. The blots were exposed to Fuji Rx X-ray film for 2 hours to 2 weeks using DuPont Cronex Quanta III intensifying screens. The dideoxynucleotide chain termination procedure of Sanger et al. [19] using 32p-dCTP was used to sequence the DNA, with each cloned insert being sequenced from both ends. Analysis of the element was performed using the MicroGenie sequence analysis program from Beckman.

499 Results Identification and arrangement o f a repetitive element in C. pepo

A prominent band of ca. 400 base pair, probably representing a highly repetitive element was observed on agarose electropherograms of Hind III digested C. pepo D N A (Fig. 1). Better definition of this band was obtained by electrophoresis of the digested D N A through a 5% acrylamide gel (Fig. 1B). Digestion of C. pepo D N A with H p a I also resulted in the appearance of a prominent ca. 400 nt band. Disappearance of the 400 nt band upon digestion with both enzymes (Fig. 1B, lane 2), with the concurrent generation of two shorter bands of ca. 300 and 100 nt indicated that the same element was revealed by both enzymes, and that the two restriction sites are about 100 nt apart. Tandem arrangement of the elements in a head to tail fashion was demonstrated by the results of a time course digestion with Hind III (Fig. 2). A ladder of mers is seen early in the digestion period, degenerating to a 400 bp band with continued digestion. An attempt was made to estimate the number of elements within an array. Genomic D N A was digested with a combination of restriction enzymes (Bam HI, Bgl II, Eco RI, Rsa I and Sst I) for which there were no sites within the element (see Fig. 6) and whose sites lacked a CG. The digest was subjected to electrophoresis through an agarose gel, the D N A immobilized on nitrocellulose and challenged with a labeled R N A transcript specific for the element. The probe reacted with D N A which had the same mobility as undigested DNA, which is slightly more rapid than the 23 kb fragment of Hind III digested lambda D N A (Fig. 3). Although it is difficult to estimate the size of such large D N A with accuracy, our judgement is that the arrays are at least 23 kb long and composed, at a minimum, of 65 elements. C. moschata DNA, which contains a homologous element (see below), was digested with the same battery of enzymes but separated on a 0.25% agarose gel. This revealed arrays of at least 50 kb in length, representing ca. 140 tandem elements.

Fig. 2. Tandem arrangement of the ca. 400 nt repetitive element in C. pepo. Five tzg of C. pepo nuclear DNA were incubated with 3.5 units of Hind III. Five hundred ng of DNA were removed for analysis at the times indicated (in hours). Electrophoresis was on a 0.8% agarose gel. The marker lane contains lambda

DNA digested with Hind III.

Copy number determination o f the repetitive element

Two independent methods were used to obtain a quantitative estimate of the proportion of the C. pepo genome represented by this family of repetitive elements. The first approach was a determination of the proportion of the total area represented by the peaks of the repeat element and multimers upon densitometric scanning of a moderately exposed electropherogram of C. pepo D N A digested to completion with either Hind III or H p a I (Fig. 4). Between 4 and 7% of the genomic D N A was represented by these peaks. The second approach used to estimate the copy number was a comparison of signal intensities obtained by the hybridization of an element-specific probe to dilutions of either immobilized genomic D N A or an immobilized M13mp9 clone, 4.6% of which is a repetitive element (Fig. 5). An estimate was made from densitometric scans that 66 ng of C. pepo D N A yields the same signal as 86 ng of the stan-

500

ii

:".g:°

UCLEAR DNA

=

\:,.i

Fig. 3. Estimation of array length of the C. pepo tandem elements. (Lane b) C. pepo nuclear D N A was digested with endonucleases for which sites are lacking in the element (Bam HI, Bgl II, Eco Ri, Rsa I and Sst I) and size separated on a 0.8% agarose gel. (Lane c) The digested D N A was immobilized on GeneScreen and challenged with an R N A probe specific for the 351 nt tandem repetitive element. Lane a contains lambda D N A

digested with Hind III.

dard. Thus, we calculate that between 4% and 8% of the genome is composed of the repetitive element. This is in close agreement with the previous estimate, and represents between 90000 and 180000 copies per haploid genome, assuming a haploid gen o m e content of 7.8 x 105 kb [26].

Nucleotide sequence variability of the element An indication that the element might display sequence variability was given by the observation that bands representing dimers and trimers of the monomeric unit remain evident even after exhaustive digestion with Hind III (Fig. 1). This may be due to either sequence variation which results in the occasional absence of a Hind III site, or from base

Fig. 4. Estimation of the repetitive element content in C. pepo D N A by densitometric scanning. Five hundred ng of C. pepo nuclear D N A was digested with 16 units of Hind Ill for 2.5 h, followed by an additional 8 units of Hind III for a further 2 h. A densitometric tracing was made of a 1% agarose gel electropherogram. Peaks representing the ca. 400, 800 and 1200 nt bands were excised and weighed on an analytical balance to determine the proportion of the total represented by the peaks. Monomer, dimer and trimer peaks are labeled in the scan. Peaks labeled with an * represent r D N A [22a].

modification. An estimate obtained by densitometric scanning of gel photographs indicated that after complete digestion with Hind III, 70% of the elements were present as monomers, 21% as dimers and 9% as trimers (Fig. 4). These results indicate that approximately 16% of the Hind III sites are absent or modified so as to make them resistant to the restriction enzyme. Six elements cloned into the Hind III site of M13mp9 were sequenced in order to assess the extent of nucleotide sequence heterogeneity among repeat units. The data are presented in Fig. 6. Five of the elements contain 351 nucleotides, and the sixth, one nucleotide less. They differ from each other at from 7 to 23 positions (2 to 6.5%). Nucleotide heterogeneity among the six elements and their complements results in variability in the length of

501 ltnd CONSENSUS

IZI

50

~uJ~AGGACAATCTTTAGTGGATATGGTATGATACTAAAACGAGAAAA . . . . . . . . . . . . . . . C ............. G. . . . . . . . . . . . T ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. . . . . . . . . . . . . G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Itpe

CONSF.NSUS

CAACTTTTCAGATC A ......

100

.

~ULqAAGTTC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C ........ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 'r ........

C ......... A .........

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C ........ . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . T ........ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C ........

c ......... A ......... C .........

z

spa

CTAT/~"T'J~[]['~ . . . . . .

CONSENSUS

e

ftCAATGGGC

A .........

~m

CONSENSUS

Z

GA/JTT'~GACCCTGTC

G . . . . . . . . . . . . . . . . . . . . . . . T ........

T

. . . . . .

. . . . . .

. . . . . .

. . . . . .

~

. . . . . .

T

. . . . . .

. . . . . .

. . . . . .

. . . . . .

T

. . . . . .

x x ~

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

CCTATTTTAAC

-- = - - C . . . . . . . . . . . . . . . . . .

A---C Go - -A A o - -C G o - -A A o - -C G o * *A

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

150

x

C C T A T C,G k • • • 1 G ~ G T C ~ . a . _ a . . * ~ T G G C C C G G A A G C . . . . . .

. . . . . .

. . . . . .

. . . . . .

GAAAG . . . . . .

. . . . . .

. . . . . .

. . . . . .

CATTTATATCTG*TAACAC

• 200 ~ CXTAAACTT

~

A"

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . C ...........

A ........

" =G . . . . .

T .........

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A ......... . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . . . . . . T ......... ........ ........

• . . . . . . . . . . . . . GT . . . . . . . . . . . . . . . . A---G ..... T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T ......... 250

CONSENSUS

GAGCGTTTC

CTTACAGATTC

G T C GA,A,A,A A C A T C G A , A A G A , A A C A A A G G T T T

. . . . . . . . . . . . . . . . . . . A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . --T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TC ....... T ..... ........... C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fig. 5. Estimation of the ca. 400 nt repetitive element content in C. pepo by slot blot. Two fold serial dilutions of C. pepo nu-

CONSENSUS

clear DNA and of M13mp9 containing a cloned monomer of the

the open reading frames. O n l y one open reading frame o f 18 c o d o n s is c o m m o n to the six elements. The longest o p e n reading frame in all the sequences is 56 c o d o n s a n d one o f the sequenced elements c o n t a i n s a reading frame with two i n i t i a t i o n c o d o n s a n d n o t e r m i n a t i o n codon. All other reading frames c o n t a i n between 1 a n d 9 stop codons. Because o f the degree o f nucleotide heterogeneity present, we c o n c l u d e that the element does not code for f u n c t i o n a l polypeptides. Due to the runs o f As a n d Ts within the sequence, the element has a greater t h a n expected frequency o f of A A a n d T T dinucleotides, a n d a lower t h a n expected frequency o f AT a n d TA dinucleotides. It is interesting to note that a l t h o u g h the observed a n d expected C G d i n u c l e o t i d e frequencies within

300 • l L ~CGTAAGGGTTGA TTCACATAGTCATTCGGGTGCCTCCACGACCC~, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . -C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T ...... .......... G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-T ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A ....... A- - -G . . . . . . . . . . . . . . TT . . . . . . . . . . . . . . . . . . . . . . . . . . . T ......

ca. 400 nt repetitive element were prepared in duplicate and im-

mobilized on nitrocellulose using a slot blotter (BRL). An RNA probe specific for the ca. 400 nt repetitive element was used to challenge the immobilized DNA and the proportion of repetitive element present in C. pepo DNA determined by comparing its signal intensity to the standard.

A. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35O

CONSENSUS

GG'I'I"GTGAAGTC 11 A • CGACACGAGTAAAT,.,

~ • ,~ ~. L J. • • C T T C C A T A T T T . . . . . . . . . . . . . . . . . . . . . . T . . . . . . . . . . . . . . . . . A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

T

TT . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . . . ........... G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fig. 6. The nucleotide sequence and consensus sequence of six cloned repetitive elements. Nucleotides identical to the consensus sequence are represented by dashes, differences denoted by the relevant nucleotide, and the deletion at position 159 of element 5 by an asterisk. Hind lII, Hpa I, Hpa II and Msp I restriction endonuclease sites are indicated.

the element are n o t significantly different, 4.1% observed vs 3.4% expected, the C G d i n u c l e o t i d e is generally present in lower t h a n expected frequency in higher eukaryotic D N A s [20].

Base modification

in t h e e l e m e n t

The following observations indicate that the elem e n t has a considerable p r o p o r t i o n o f m o d i f i e d bases. The sequence data indicate that all six a n a -

502 lyzed repeat units contain two Hpa I sites. Hpa I digestion of genomic DNA, however, gives rise to a single band of monomeric length (Fig. 1B), rather than two shorter fragments as might be expected. This indicates that one of the Hpa I sites, at position 105, is readily cleaved while the site at position 67 is routinely modified. The element contains a single 5 ' - C C G G - 3 ' sequence (Fig. 6), a restriction site for both Hpa II and Msp I. Hpa II will not cleave the site when the internal C is methylated, and Msp I will not cleave when the 5' C is methylated. The arrays were resistant to Hpa II digestion (Fig. 7, lane d), indicating that essentially all of the Cs preceding the G are methylated. Msp I digestion (lane e) generated a mer ladder, indicating partial methylation of the CNG sequences. Complete digestion of the genomic DNA is indicated by the fact that plasmid DNA added to the genomic digests resulted in the expect-

Fig. 7. Detection of methylation at the 5 ' - C C G G - 3 ' site of the 351 nt C. pepo element. C. pepo D N A (lanes d and e) or C. pepo with pBR322 D N A (lanes b and f) were completely digested with either H p a II (lanes b and d) or Msp I (lanes e and f), and size separated on a 1% agarose gel. Lane a shows pBR322 D N A digested with H p a II. Lane c is the marker lane.

ed pattern for a complete digest (lanes a, b and f). Gel photographs of complete Msp I digests were densitometrically scanned and the areas under the peaks integrated to yield the relative mass of each multimeric band. From these results we estimate that between 40 and 50°70 of the cytosines in the CNG position of the tandem array Hpa II sites are methylated.

Distribution of the tandem repetitive element in other plant species A study was undertaken to determine if the sequences of the tandem element may be similar to DNA sequences of viral, prokaryotic and eukaryotic origin. A search of the NIH GenBank using the MicroGenie program (Beckman, 1975) revealed 19 short regions of 17 to 31 nucleotides having between 77 and 10007o similarity. These regions were complementary to about 66°70 of the repeat element, but to no region in particular. For example, 100070 similarity exists between nucleotides 234 and 250 of the C. pepo element and the 17 base sequence 5'-GAAAGAAACAAAGGTTT-3' present between nucleotides 1004 and 1020 of rat o~-lactalbumin mRNA. The significance of these similarities is not apparent and may only be coincidental. The sequence of the C. pepo element was also compared to the sequence of 5S RNA and to tandem repetitive elements of similar size detected in both plants and animals [2, 6, 9, 16, 28]. Only short regions of 10 to 15 nucleotides having between 75 and 85°70 similarity could be detected scattered throughout the repeat unit. Particular attention was paid to a comparison with a tandem repetitive element in Cucumis melo, a plant of the same family as C. pepo but in a different genus. Its sequence is known and is one nucleotide longer than the C. pepo element [8]. Only short regions of 12 to 17 nt having between 75 and 85°70 similarity could be detected. Next, distantly related plant DNAs were examined in order to determine if these might contain a homologous element. Southern hybridization analysis failed to detect homology between the C. pepo element and DNA of barley (Hordeum vulgare),

503 potato (Solanum tuberosum), sunflower, (Helianthus annuus), kale (Brassica oleracea) and spinach (Spinacia oleracea) (data not shown). O f the Cucurbitaceae species examined, the DNAs of Cucurbita moschata and C. foetidissima exhibit a ca. 350 nt band when digested with Hind III (Fig. 8), whereas the DNA of Cucurbita maxima fails to do so. Nuclear DNA from the Cucurbita spp. and from Cucumis sativus (cucumber) were blotted onto nitrocellulose and annealed to a radioactive cloned C. pepo element in order to determine if homologous elements were present in these plants (Fig. 9). Low stringency wash conditions resulted in prominent signals to C pepo, C moschata and C foetidissima DNA, and barely detectable signals to the DNA of C. maxima and Cucumis sativus. Analysis o f these signals indicates that per mass o f DNA added to the filter, C. foetidissima contained about the same number of homologous elements Fig. 9. Detection of C. pepo homologous elements in several Cucurbita species. 300 ng of genornic D N A was immobilized to nitrocellulose and challenged with an R N A probe specific for the C pepo repetitive element. The high stringency wash was performed at 6 0 ° C in 2 × , 1 ×, 0.5 x and 0.25 x SSC, with 0.1070

SDS; low stringency wash was performed at 42 °C in 5 × SSC, and 5 × SSC containing 30°70 formamide.

Fig. 8. Detection of repetitive elements in several species of Cucurbita. Electropherogram of Hind III digests of C. pepo (lane 2), C. maxima (lane 3), C. moschata (lane 4) and C. foetidissima (lane 5) DNAs, size separated on a 0.8°70 agarose gel. Bands of ca. 190, 400, 570 and 800 nt are indicated by arrows. Ribosomal D N A bands are indicated by dots [22@

and C moschata about 3 to 4 times as many copies as C. pepo, and that C. maxima and Cucumis sativus contained few, if any, homologous elements. Washing filters under more stringent conditions resulted in undetectable hybridization to C. maxima and Cucumis sativus DNA, and decreased hybridization to C foetidissima DNA, while the signals to C pepo and C moschata remained unchanged. These results indicate that there is less homology between the C pepo and C foetidissima elements than between those of C. pepo and C moschata, and little if any homology between the elements of C pepo and C maxima or Cucumis sativus. Southern blot analysis of partial Hind III digests of C moschata and C foetidissima DNAs resulted in hybridization to only the ca. 350 nt band and multimeric members, indicating that these elements, like the ones in C. pepo, are also tandemly arrayed. The 190 and 570 base pair bands of C.

504

foetidissima represent a family of repetitive elements unrelated to the ca. 350 nt family.

Transcription products labeled transcripts of an element inserted in both orientations into the pSP65 vector were synthesized with the aid of SP6 RNA polymerase and used to probe for genomic RNA transcripts in plant extracts. Two major RNA transcripts of 3.0 and 2.3 kb and two minor transcripts of 1.8 and 1.3 kb were detected in stem and leaf RNA extracts of C. pepo and leaf RNA extract of C. moschata with a probe of one orientation. Transcripts of 0.5 and 1.3 kb were the major species detected in C. foetidissima leaf RNA extracts with the same probe, while the 3.0 and 2.3 kb transcripts were present as minor species (Fig. 10). No hybridization signal was obtained with a probe of the opposite orientation. Transcripts of the pSP65 vector containing no insert and cleaved at the Pvu II site did not hybridize to genomic RNA, nor was there any detectable hybridization to a cDNA probe prepared 32p

to turnip root RNA. The genomic RNA transcripts are from the DNA strand that is complementary to the one presented in Fig. 6. A test was performed to determine whether the repetitive element containing transcripts are polyadenylated. To this end, the poly A ( + ) and poly A ( - ) fractions of a leaf RNA extract were probed with a properly oriented transcript of the element. As seen in Fig. 10, a stronger signal was obtained to the poly A ( - ) fraction than to the poly A ( + ) fraction, indicating that the hybridizing transcripts are probably not polyadenylated. The weak hybridization signal detected in the poly A ( + ) fraction is likely to be due to contaminating poly A ( - ) RNA, evident in the photograph of the gel.

Discussion

We report here the presence and characteristics of a tandemly arrayed highly repetitive 351 base pair element in the Cucurbita pepo genome. Similar elements have been reported in the genomes of other species [2, 6, 9, 16, 28]. It is difficult to assess

Fig. 10. Detection of R N A transcripts showing homology to the C. pepo repetitive element in tissue extracts of several Cucurbita species. Five tzg of root (lanes r), stem (lanes s) and leaf (lanes 1) R N A s from C. pepo (lanes 1-4), C. moschata (lanes 5 - 7 ) and C. foetidissima (lanes 8 - 1 0 ) were size separated on a 1°70 agarose 5 m M methyl mercuric hydroxide gel, transferred to A P T paper and challenged with an R N A probe specific for the C. pepo tandem element. Lanes labeled A + and A - represent poly A ( + ) enriched or depleted C. pepo leaf RNA. A photograph of an ethidium bromide stained gel is shown on the left. A radioautograph of the same probed Northern blot is shown on the right. Open circles mark the position of the 25S and 18S ribosomal R N A bands.

505 whether such tandemly arrayed elements are important features of higher eukaryotic genomes because ready detection of such elements usually depends on their having a digestible site for a restriction enzyme to which genomic DNA happens to have been exposed. Such elements can also be detected as a density satellite component, as was the case with Cucumis melo [7], but such detection depends on a base ratio appreciably different from the bulk of genomic DNA which, as seen in the example of C. pepo [26], is not always the case. The function of the element, if indeed it has one, does not depend on its nucleotide sequence in any obvious way because similar elements in the gehomes of even closely related plants have quite different sequences. The sequences determined for the C. pepo element bear no similarity to the published sequence for a similar-sized element in Cucumis melo, a member of the same family as C. pepo. Of the several plants tested, there is enough homology only to elements present in C. moschata and C. foetidissima to permit cross-hybridization. We note that both the Cucurbita and Cucumis repetitive elements are the same size as the group of plant-RNA symbionts composed of viroids, virusoids and satellite RNAs [25] and we present a speculation that the latter may have arisen from the former. The repetitive elements are transcribed and they vary within the species and even more so between species, in a manner that suggests that size, not sequence, is under selective constraint. Two things would be required for conversion of a repetitive element transcript to an independently replicating entity. One would be the chance acquisition of a nucleotide sequence that is recognized as a substrate for replication by an existing enzymatic apparatus. The second, that it gain a sequence with self-cleaving capability, like that demonstrated to be present in viroids and satellite RNA [17]. Considering the disparate sequences of the Cucurbita and Cucumis elements, it seems within the realm of possibility that chance variants have arisen with the requisite requirements. Virusoids and satellite RNAs would have to have gained an additional property, that of being packageable in their helper virus capsids. Little is known about the specificity of the packaging requirement in those viruses

which package virusoids and satellite RNAs but it has been demonstrated, at least in the case of tobacco mosaic virus, that the packaging specificity is not absolute for virion RNA. A number of host RNA species also contain short specific sequences which cause them to become encapsidated with viral protein [18]. We have observed the presence of leaf and stem RNA species which hybridize to the element. We do not know whether these species are composed partially or entirely of an element array, contain a single detached element (or part of one) inserted into other DNA, or perhaps contain a short chance sequence with enough similarity to a region of the element to permit hybridization. We do not consider this last fortuitous alternative likely, but its possibility is demonstrated by the perfect 17 base pair match between nucleotides 1004 and 1020 of rat a-lactalbumin mRNA and nucleotides 234 and 250 in the C. pepo element (as numbered in Fig. 6). Several other apparent matches, almost as good, have been found to other regions of the element. For instance, 18 out of 19 nucleotides between position 175 and 192 of the element match nucleotides 907 to 924 in the mouse IG kappa gene V-167. The element proves to be highly modified. It contains one restriction site recognized by Hpa III and Msp I, but proves to be completely resistant to Hpa II (Fig. 8, lane d) indicating that the internal C is methylated in almost all cases. Msp I digestion generates a ladder indicating that about half of the external Cs are methylated. Why some are methylated and others are not is not immediately apparent since the region (nucleotides 130-153) on either side of the restriction site (nucleotides 138-141) is invariant for the six elements that have been sequenced. The element contains two Hpa I sites with one being resistant and the other sensitive to digestion. It may be significant that the resistant site (nucleotides 6 7 - 7 2 ) is followed by a G whereas the sensitive site (nucleotides 105-110) is not, indicating that the 3' terminal C of only the resistant site is methylated. Methylation of this C, however, is reported not to affect sensitivity of the site to cleavage with Hpa I. Rather, it is methylation of the A at position 5 that is reported to protect the site [13].

506 The problem of how repeated sequences retain t h e i r s i m i l a r i t y to e a c h o t h e r has b e e n t h e s u b j e c t o f s p e c u l a t i o n a n d d i s c u s s i o n . It a p p e a r s t h a t little d i v e r g e n c e has o c c u r r e d in s o m e r e p e t i t i v e e l e m e n t s

6.

w h e r e s e q u e n c e is t h o u g h t n o t to be u n d e r selective p r e s s u r e w h e r e a s in o t h e r cases t h e r e c t i f i c a t i o n m e c h a n i s m c a n n o t keep u p w i t h d i v e r s i t y g e n e r a t -

7.

ing processes. T h e 59 b a s e p a i r F o k I e l e m e n t o f Vicia f a b a [12] is a n e x a m p l e o f a stable e l e m e n t ; t h e 9 r e p e a t units t h a t h a v e b e e n s e q u e n c e d are all

8.

exactly alike. T h e e l e m e n t r e p o r t e d here m i g h t be c o n s i d e r e d i n t e r m e d i a t e in d e g r e e o f n u c l e o t i d e se-

9.

q u e n c e c o n s e r v a t i o n ; t h e units d i f f e r f r o m e a c h o t h e r in f r o m 2 to 6 . 5 % o f t h e i r p o s i t i o n s . I n c o n -

10.

trast, t h e 250 b a s e p a i r B a m H I r e p e t i t i v e u n i t o f Allium

cepa has g r e a t e r s e q u e n c e diversity, t h e

units d i f f e r i n g in f r o m 14 to 18% o f t h e i r p o s i t i o n s [6].

Precisely

which

factors

are

important

11.

in

d e t e r m i n i n g i n t e g r i t y m a i n t e n a n c e w h e n a repeti-

12.

tive s e q u e n c e is n o t s u b j e c t to o b v i o u s n u c l e o t i d e s e q u e n c e s e l e c t i o n awaits f u r t h e r analysis. 13.

Acknowledgement

14.

T h i s w o r k was s u p p o r t e d in p a r t by N S F G r a n t P C M 8104996. 15.

References I. Alwine JC, Kemp DJ, Stark GR: Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci USA 74:5350-5354 (1977). 2. Appels R: Chromosome structure in cereals: the analysis of regions containing repeated sequence DNA and its application to the detection of alien chromosomes introduced into wheat. In: Kosuge T, Meredith CP, Hollaender A (eds) Genetic Engineering of Plants: An Agricultural Perspective. Basic Life Sciences, Vo126, pp. 229-256. New York: Plenum Press (1983). 3. Aviv H, Leder P: Purification of biologically active globin messenger RNA by chromatography on oligothymidylic acid-cellulose. Proc Natl Acad Sci USA 69:1408-1412 (1972). 4. Bailey JM, Davidson N: Methylmercury as a reversible denaturing agent for agarose gel electrophoresis. Anal Biochem70: 75- 85 (1976). 5. Barker JM, Mclnnes JL, Murphy PJ, Symons RH: Dot-blot

16.

17.

18.

19.

20.

21.

22.

procedure with [32p] DNA probes for the sensitive detection of Avocado Sunblotch and other viroids in plants. J Virol Meth 10:87-98 (1985). Barnes SR, James AM, Jamieson G: The organisation, nucleotide sequence, and chromosomal distribution of a satellite DNA from Allium cepa. Chrornosoma 92:185 -192 (1985). Bendich A J, Taylor WC: Sequence arrangement in satellite DNA from the muskmelon. Plant Physiol 59:604-609 (1977). Brennicke A, Hemleben V: Sequence analysis of the cloned Cucumis rnelo highly repetitive satellite DNA. Z. Naturforschung 38c: 1062-1065 (1983). Carroll D, Garrett JE, Lam BS: isolated clusters of paired tandemly repeated sequences in the Xenopus laevis genome. Mol Cell Biol 4:254-259 (1984). Grisvard J: Different methylation pattern of melon satellite DNA sequences in hypocotyl and callus tissues. Plant Science 39:189-193 (1985). Hari V: The RNA of tobacco etch virus: Further characterization and detection of protein linked to RNA. Virology 112:391-399 (1981). Kato A, Yakura K, Tanifuji S: Sequence analysis of Vicia faba repeated DNA, the Fok I repeat element. Nucleic Acids Res 12:6415-6426 (1984). McClelland M: The effect of site specific methylation on restriction endonuclease cleavage (update). Nucleic Acids Res 11:r169-r173 (1983). Miklos GLG: Localized highly repetitive DNA sequences in vertebrate and invertebrate genomes. In: Maclntyre RS (ed) Molecular Evolutionary Genetics, pp. 241-321. New York: Plenum Press (1985). Morrison DA: Transformation and preservation of competent bacterial cells by freezing. Methods Enzymol 68: 326-331 (1979). Novak U: Structure and properties of a highly repetitive DNA sequence in sheep. Nucleic Acids Res 12:2343-2350 (1984). Prody GA, Bakos JT, Buzayan JM, Schneider IR, Bruening G: Autolytic processing of dimeric plant virus satellite RNA. Science 231:1577-1580 (1986). Rochon D, Kelly R, Siegel A: Encapsidation of 18S rRNA by tobacco mosaic virus coat protein. Virology 150: 140-148 (1986). Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74: 5463- 5467 (1977). Setlow P: Nucleic acids 2. In: Fasman GD (ed) Handbook of Biochemistry and Molecular Biology, pp. 312-318. Boca Raton, Florida: CRC Press (1976). Shmookler Reis R, Timmis JN, Ingle J: Divergence, differential methylation and interspersion of melon satellite DNA sequences. Biochem J 195:723-734 (1981). Siegel A: Gene amplification in plants. In: Markham R, Davies DR Hopwood DA, Horne RW (eds) Modification of the Information Content of Plant Cells, pp. 15-26. New

507

York: American Elsevier (1975). 22a.Siegel A, Kolacz K: Heterogeneity of pumpkin robisomal DNA. Plant Physiol 72:166-171 (1983). 23. Siegel A, Hari V, Montgomery I, Kolacz K: A messenger RNA for capsid protein isolated from tobacco mosaic virusinfected tissue. Virology 73:363-371 (1976). 24. Southern EM: Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98: 503-517 (1975). 25. Symons RH, Haseloff J, Visvader JE, Keese P, Murphy PJ, Gill DS, Gordon KHJ, Bruening G: On the mechanism of replication of viroids, virusoids, and satellite RNAs. In: Maramorosch K, McKelvey JJ Jr (eds) Subviral Pathogens of Plants and Animals: Viroids and Prions, pp. 235-263. New York: Academic Press, Inc. (1985).

26. Thornburg W, Siegel A: Characterization of the rapidly reassociating deoxyribonucleic acid of Cucurbita pepo L. and the sequences complementary to ribosomal and transfer ribonucleic acids. Biochemistry 12:2759-2765 (1973). 27. Wahl GM, Stern M, Stark GR: Efficient transfer of large DNA fragments from agarose gels to diazobenzyloxymethyl-paper and rapid hybridization by using dextran sulfate. Proc Natl Acad Sci USA 76: 3683- 3687 (1979). 28. Yakura K, Kato A, Tanifuji S: Length heterogeity of the large spacer of Vicia faba rDNA is due to the differing number of 325 bp repetitive sequence elements. Mol Gen Genet 193:400-405 (1984). 29. Yang RC-A, Lis J, Wu R: Elution of DNA from agarose gels after electrophoresis. Methods Enzymol 68: 176-182 (1979).

Characterization of repetitive elements in several Cucurbita species.

About 1.3×10(5) copies of a tandemly arranged, 351 base pair element constitute from 4 to 8% of the Cucurbita pepo genome. A homologous, high copy num...
2MB Sizes 0 Downloads 0 Views