Plant Molecular Biology 11:805-820 (1988) © Kluwer Academic Publishers, Dordrecht - Printed in the Netherlands

805

Molecular cloning, genomic organization, expression and evolution of 12S seed storage protein genes of Arabidopsis thaliana Patty P. Pang, Robert E. Pruitt 1 and Elliot M. Meyerowitz*

Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA; l Present address: Department of Genetics and Cell Biology, University of Minnesota, St. Paul, M N 55108-1095, USA (*author for correspondence) Received 29 August 1988; accepted in revised form 12 September 1988

Key words: abscisic acid, Arabidopsis, developmental expression, seed storage protein gene sequences Abstract

We have identified a number of genes of the flowering plant Arabidopsis thaliana that are abundantly expressed during embryogenesis. In this paper we discuss four of these genes, which comprise a gene family: complete genomic nucleotide sequence of two of the genes and partial sequence of the other two shows that they are all homologous to the 12S globulin seed storage protein genes of other angiosperms. The four genes fall into three subfamilies, as defined by cross-hybridization. One subfamily contains two genes in the Landsberg erecta strain, but only a single gene in the Columbia strain of Arabidopsis. The other two of these 12S gene subfamilies contain only single genes in both strains. Thus, the seed storage protein gene family in Arabidopsis appears much simpler than that in other higher plants. These genes are expressed during the latter half of embryogenesis, a period in which abscisic acid (ABA) is thought to play a role in gene regulation, and known to play a role in seed physiology. We observed no significant difference in the expression profiles of these four genes in ABA-deficient and ABA-insensitive mutants o f Arabidopsis, except that the onset of detectable expression of all of the transcripts is slightly delayed in both types of mutants.

Introduction

During the development of the seeds of dicotyledonous plants, embryos develop from a single diploid cell into a plantlet composed o f two cotyledons and the rudiments of both shoot and root. During this process the embryo increases in size and exhibits differentiation of specialized tissues: the maturing embryo contains distinct epidermal and vascular tissues as well as root and shoot apices. At the end o f embryogenesis the entire seed undergoes desiccation, allowing the embryo to withstand a wide variety of conditions for an extended period without loss of viability.

During the latter half of embryogenesis, the most abundant messenger RNAs in embryos are those encoding the seed storage proteins. Storage proteins are located in membrane-bound protein bodies in the embryo, accumulating to high levels in the seed, but rapidly degrading during seed germination. They are thought to serve as a source of nitrogen for the developing seedling. Reviews on the structure and storage of seed storage proteins in various species have been published [19, 32]. Storage protein genes are of interest because they represent abundantly transcribed genes that are controlled in a precise developmental manner. R N A transcribed from these genes is only detectable in de-

806 veloping seeds and only in the latter half of embryogenesis. The plant hormone abscisic acid (ABA) has been shown, in some species, to be important for the expression of certain genes during late embryogenesis, including the seed storage protein genes [4, 9, 13, 14, 17]. ABA also plays an important role in inducing seed dormancy in Arabidopsis [20, 22]. In a number of species, environmental factors such as temperature and plant nutrition also affect the accumulation of storage protein [19]. Most plants produce several different storage proteins in their seeds, each of which may have its own particular temporal pattern of accumulation, and many of which are encoded by large gene families (review see [5]). llS or 12S globulin storage proteins are c o m m o n in seeds of dictos. These proteins consist of two polypeptides, designated ~ and/3, which are produced by post-translational cleavage of a single precursor polypeptide [19]. This precursor also contains a hydrophobic amino-terminal leader peptide that is not present in the mature proteins [24] which may be necessary for localization of the storage proteins in the protein bodies [1]. I1S or 12S globulin storage protein genes from many plant species have been isolated and characterized. In Fabaceae (legume family), for example Glycine [16, 26, 36], Pisum [12, 24], and Vicia [40], there are at least two llS storage protein (legumin) subfamilies (with about 50°70 or less amino acid identity between members of different subfamilies), with each subfamily containing many genes. In Brassicaceae, one of the B. napus 12S protein (cruciferin) genes has been characterized and has been shown to be homologous with legumins from Pisum sativum [37]. The 12S storage proteins of A. thaliana have been analyzed [18], and they are similar in molecular weight and sedimentation velocity to the 12S seedstorage proteins of B. napus. In this paper, we report the isolation of four genes coding for Arabidopsis thaliana 12S seed storage proteins by use of radioactively labeled cDNA made from mRNA of developing seeds. Complete nucleotide sequences for two, and partial sequence of two other seed storage protein genes have been determined. Overall, the gene family containing these genes appears to be much smaller and simpler than in other higher plants whose homologous genes have

been studied. We also report the temporal and tissuespecific expression of the four seed-specific genes in wild type strains and in abscisic acid mutant strains of A. thaliana.

Materials and methods

Plant culture

The strains of Arabidopsis thaliana used were: Landsberg erecta, obtained from E J. Braaksma, Department of Genetics, Biology Centre, Haren, Netherlands; and Columbia, obtained from A. Kleinhofs, Program in Genetics, Washington State University, Pullman, WA 99164. The Landsberg strain used is homozygous for the recessive mutation, erecta, which makes the plants more compact and easier to culture in large numbers. The ABAdeficient (aba) and ABA-insensitive (abi-3) mutations are in the Landsberg erecta background, and were obtained from M. Koornneef, Department of Genetics, Agricultural University, 53 Generaal Foulkesweg, 6703 BM Wageningen, Netherlands [21, 221. Plant growth conditions have been described previously [33]. Seed RNA was obtained from Landsberg erecta plants. Plants which were used for determining the time of expression during seed development were grown at a density of five plants per 5.5 cm diameter pot at 25 °C, 70% relative humidity and 7 000 lux constant illumination.

General D N A and recombinant D N A techniques

Preparation of bacteriophage and plasmid DNA, and restriction digestions were performed as described by Davis et al. [10]. 32p-labeled DNA was prepared by nick translation using the method described by Rigby et al. [34]. Poly(A) ÷ RNA and 32p-labeled cDNA were prepared and used to screen recombinant lambda libraries as described previously [31]. Two different libraries were used in our experiments. The first library contained an M b o I partial digest of genomic DNA from the Landsberg erecta strain that was cloned in the Barn HI site of

807 the vector ~,EMBL4 [6]. The second library contained genomic DNA prepared from the strain Columbia that was partially digested with Eco RI and size-selected prior to insertion in the ~ vector Sep6 [23]. Arabidopsis DNA preparations were made following the procedure of Shure et al. [38] for maize DNA and Arabidopsis RNA was prepared by the method we have previously described [33]. DNA subclones were made in the vector pUC18, unless stated otherwise, and are indicated by the prefix "n". Other vectors used were DOA3.8 (a plasmid containing a polylinker cloning site adjacent to an SP6 promoter; C. Chang, unpublished), indicated by an "s" prefix; and the lambda vectors EMBL4, indicated by the prefix "~,f", and Sep6, indicated by the prefix "~,b". pSP64 and pSP65 were obtained from Promega Biotec. Clones with inserts in these vectors are designated by the prefixes "k" and " j " , respectively. In vitro transcriptions using SP6 or T7 RNA polymerase were performed as recommended by the manufacturer (Promega Biotec).

Plasmid DNA (10/xg) was digested to completion with restriction enzymes in a volume of 50/zl. 1.2 units of T4 DNA polymerase were added to the reaction and the incubation continued for an additional 90 s at 37 °C. The reaction tube was then immediately moved to a preheated 70 °C heat block and incubated for 10 minutes to denature the T4 DNA polymerase. 0.5 units of calf intestinal alkaline phosphatase was added to the mixture and incubated at 37 °C for 30 minutes followed by addition of 5/zl of 0.5 M EDTA. The reaction was then phenolextracted, chloroform-extracted and ethanolprecipitated. The resulting DNA was labeled with T4 polynucleotide kinase, digested with a second restriction enzyme and used for DNA sequencing. Sau 3A and Alu I fragments of the 4.4 kb Sal I fragment in nAt4011 were subcloned into M13mpl8, white plaques were picked and ssDNA templates prepared for dideoxy-chain termination sequencing using 35S-dATP from Amersham. Synthetic oligonucleotide primers were used to sequence across regions of the 4.4 kb Sal I fragment not represented in the Sau 3A and Alu I subclones.

Electrophoresis, filter binding and hybridization of nucleic acids Tissue in situ hybridization to embryos Electrophoresis, nucleic acid transfer and hybridization o f nucleic acids have been described previously [8, 31] except for hybridization of single stranded RNA probes to RNA bound to filters. Hybridizations between RNA probes and filter-bound RNA were performed in 50o7o (v/v) formamide, 5 x SSPE, 100 #g/ml sonicated and denatured salmon sperm DNA, 1 x Denhardt's solution, and 0.1°70 SDS at 60 °C. After hybridization these filters were washed in 0.05x SSPE and 0.1°70 SDS at 55°C.

A procedure for RNA in situ hybrdization in plant tissue has been described [29]. Seed pods 7 - 8 days old were removed from the plant and immediately fixed in glutaraldehyde solution. SP6 RNA transcripts were made with 3H-CTP from sAt2101 and sAt2015.

Results

Nucleotide sequencing

Cloning of genes encoding abundant seed RNA species

Nucleotide sequencing was performed using both the dideoxy-chain termination method of Sanger et aL [35] and the chemical method of Maxam and Gilbert [27]. Labeling o f 3' protruding ends was done either with terminal transferase and c~-32p-dideoxy ATP as described by the commercial supplier (Amersham) or using T4 polynucleotide kinase as described below.

RNA was prepared from seed pods of A. thaliana 7 - 8 days after pollination and used as a template for synthesis of 32p-labeled eDNA. The eDNA was used as a hybridization probe in screening a recombinant lambda library containing A. thaliana genomic DNA from the Landsberg erecta strain. Four to six genome equivalents of recombinant phage were screened on duplicate filters. 18 plaques showed a

808 a

Landsberg ~00.~

I

II

I

I

I

I1(

I

[

I

I I¢/'

nAt1512

Columbia [] I

II

I

I

I

I I¢I

I

ii sAt2101 sAt2105

b

Landsberg

I

I

I II

I

I I I

~,I/t"

ll~J

I

~,Id~"

nAt4011 nat4012

~b.t4 ] jAt4~l

C

Landsberg

I

I I

I

IV'I

I

I

I

I

k~\\N.\"~

1.0 kb Fig. 1. Restriction maps of the regions containing the 12S storage protein genes, CRA1, CRA2, C R B and C R C from the Arabidopsis Landsberg erecta strain. The C R A gene in the Columbia strain is also shown. The arrows beneath the m a p s indicate the locations of the storage protein genes and the direction o f transcription. The hatched boxes represent the regions sequenced. The locations of the fragments contained in the various subclones described in the text are also indicated. (a) C R A 1 and C R A 2 . The locations of the genes were determined by hybridization of a c D N A clone from Brassica napus to the cloned D N A and by D N A sequencing. (b) CRB. The location of the gene was determined by hybridization of c D N A probe from seed pods and by D N A sequencing. (c) CRC. The location o f the gene was determined by hybridization of c D N A probe from seed pods and by D N A sequencing.

809 positive hybridization signal and 13 gave positive signals when rescreened. Preliminary restriction digestions and crosshybridization experiments indicated that the 13 clones recovered fall into four classes, representing four different genomic regions. This screen is by no means complete, but we have recovered multiple clones representing each genomic region, suggesting that these four regions encode the majority of abundantly expressed RNAs in seeds 7 - 8 days old. When these clones were used to probe RNA blots, representatives o f three of the clone classes hybridized to 1700 nucleotide RNAs, and representatives of the fourth class hybridized to RNA of 650 nucleotides length. These messages can only be detected in developing seeds and not in the seed pod tissue surrounding the developing seeds, or in leaves or stems (data not shown). Nucleotide sequence data, which will be discussed later, indicates that the three classes of clones that code for the 1700 base transcripts contain 12S seed storage protein (or 12S globulin) genes. In this paper we describe these three classes of clones, and the genes they represent.

Characterization of the genomic regions containing the 12S storage protein genes of Arabidopsis Members of one o f the three classes of 12S protein gene-containing clones gave a positive signal when probed with a cDNA clone (pC1) representing a 12S storage protein (cruciferin) gene o f Brassica napus, which is, like Arabidopsis, a member o f the mustard family. This class is represented by three lambda clones, which were found to be identical by restriction mapping. The clone used in subsequent analyses is XfAtl505. This clone shares common restriction sites with two recombinant phage (XbAtl501 and ~bAtl502) that were isolated from a different library made with DNA from the Arabidopsis thaliana strain Columbia [23] using the B. napus pC1 cDNA clone as probe. The restriction maps o f these clones suggest that they represent homologous genomic regions, but that a portion of the sequence found in the Columbia strain is present twice, as a tandem

duplication, in the Landsberg erecta genome (Fig. la). The 1.6 kb Xho I fragment in )~bAtl501 DNA cross-hybridizes to pC1 DNA, indicating that XbAtl501 contains one 12S protein gene, while XfAtl505, in which the homologous Xho ! fragment is duplicated, contains two, arrayed in tandem. This Xho I fragment was subcloned (sAt2105) and used to determine the copy number of 12S storage protein genes that cross-hybridize with the gene represented by the 2105 fragment in the genome of the Landsberg and Columbia strains. Genome blots with DNA from both strains, digested with Eco RI and Hind III, were probed with 32p-labeled sAt2105 plasmid DNA. The restriction fragments which hybridized to the probe were of the same sizes as those in ~,bAt1501 in Columbia strain genomic DNA, and XfAtl505 in DNA derived from the Landsberg strain, with the exception of those fragments which are located on the boundary of the lambda clones, and would therefore not be expected to correspond with the size of homologous fragments contained in the genome. This suggests that there are no additional DNA fragments that hybridize with sAt2105 in either the Landsberg or Columbia strains, other than those represented in the lambda clones analyzed. We designate this subfamily of genes CRA, for crucifefin A. The two genes in the Landsberg strain are designated CRA1 (the downstream one in nAtl512) and CRA2. The directions of transcription of these genes were determined by hybridizing single-stranded RNA probes to RNA blots. RNA filters were hybridized with 32P-labeled RNA transcribed by SP6 RNA polymerase from either sAt2101 or sAt2105. These two plasmids contain identical 1.6 kb Xho I fragments in opposite orientations downstream from an SP6 RNA polymerase promoter. The RNA filter hybridized with 32p-labeled RNA prepared from sAt2105 showed a single band corresponding to an RNA species of approximately 1700 nucleotides, while no hybridization was apparent on the filter probed with 32P-labeled RNA from sAt2101. This indicates that the genes are transcribed in the directions indicated in Fig. la. The second o f the three classes of storage protein gene-containing clones is comprised of three sepa-

810 rate lambda clones. The inserts in clones ~fAt1508 and kfAtl517 overlap with each other. The third clone is apparently identical to hfAtl517. The restriction m a p of part of the genomic region represented by these clones is shown in Fig. lb. 32p_ labeled c D N A derived from seed R N A hybridizes with a 4.4 kb Sal I fragment and two Eco RI fragments (1.9 kb and 2.2 kb respectively) of nAt4011 (a 6 kb Barn H I fragment subclone of kfAtl508) on D N A gel blots, indicating that this genomic region contains a gene abundantly transcribed in seeds; D N A sequencing shows it to be a 12S seed storage protein gene (see below). This gene is designated CRB. When hybridization is done at 65 °C in 5 × SSPE, and filters washed in 2× SSPE at 55 °C, the CRB-containing clones do not cross-hybridize to the pC1 12S protein gene from B. napus, nor to the C R A genes. The nucleotide sequence of the region between the Sal I site and P v u I site has been determined (see below), and shows a single gene with a high degree of sequence similarity to both C R A and the cruciferin gene of B. napus represented by pC1. Genome blot experiments with Landsberg erecta strain D N A digested with Eco RI, Sal I, and Bgl II show bands of the same size as the corresponding restriction fragments in ~fAtl508 when probed with 32p-labeled kfAtl508 DNA. This suggests that C R B is present in single copy in the haploid genome of the Landsberg erecta strain. Similar genome blot experiments with Columbia strain D N A digested with Eco RI, Hind III, Ava II and X b a I indicate that C R B is also present in single copy in the Columbia strain. The direction of transcription of C R B was determined by a method similar to that for C R A . The 1.9 kb Eco RI-Sal I fragment of nAt4011 was subcloned into pSP64 and pSP65 to give kAt4021 and jAt4021 respectively (Fig. lb). SP6 transcripts made from kAt4021 hybridized to a single band on the poly(A +) R N A blot. The direction of transcription is thus as indicated in Fig. lb. The third class of clones containing 12S protein genes is comprised of five overlapping recombinant clones, XfAtl507, ~,fAtl509, ~,fAtl514, and two additional clones identical in restriction m a p to hfAtl507. Figure lc shows the restriction m a p of nat4003, a 9.4 kb Eco RI fragment of kfAtl514 that was subcloned into pUC18, c D N A from seed R N A

hybridizes to two adjacent Bgl II fragments (of 256 and approximately 800 bp) in the middle of this DNA, as shown in the figure, implying that this clone contains a single 12S protein gene. This gene is designated CRC. Genome blot analysis using 32p_ labeled hfAtl507 as probe, hybridized to Landsberg erecta strain D N A digested with Eco RI, Hind III, X b a I, Sac I and Bgl II, and Columbia strain D N A restricted with the same enzymes shows that this gene is present in one copy in both the Landsberg erecta and Columbia strains. The direction of transcription has not been directly determined. However, partial D N A sequence data indicates that it is as shown in Fig. lc.

Nucleotide sequences o f two A. thaliana 12S storage protein genes We have determined the complete genomic nucleotide sequence of CRA1 and CRB. The D N A sequences with deduced amino acid sequences are presented in Figs. 2 and 3. By comparison of the sequences of the A. thaliana genes with the sequence of the B. napus c D N A clone [37], it was possible to locate the likely translation initiation codons, the translation termination codons and also the locations of probable intron sequences. There are three probable introns in both the CRA1 and C R B gene. Each of these putative introns is present in the A. thaliana sequence and absent in the homologous B. napus c D N A sequence, and each begins with the nucleotides GT and ends with the nucleotides AG. C o m p a r i s o n of the positions of these introns with those in a legumin gene (legA) ofP. sativum [24] indicates that they are in precisely the same positions relative to the protein sequence. The sequence upstream of the initiating ATG contains a putative TATA box beginning at positions 140 and 378 for CRA1 and CRB, respectively. Putative polyadenylation signals are also found downstream of the termination codons o f the two genes as indicated in Figs. 2 and 3. The nucleotide sequence identity between CRA1 and C R B is limited to the coding region only, where it is 75%. There is no detectable homology between any of the three intron pairs nor in the 3' untranslat-

i

I

I

I

I

I

I

I

I

I

1 a a a a t c t a r t t c c g a t g a a g a c a t g a a a t aagcaat ttaacfcgaacycc~aaatt cfcaccfcqc~ cgat tttcScat acctqcfct c t t a t ~ c g a t c a a t t t a I I 1 I I I I I I I I01

t•cc•aaaaa•••ta•a•acttt•••gaataaat•••tttataaatag•ttca•ataaat•aagtgagaaa••••aaataagtaagaag•taaaaATGGC M

I 201

I

i

I

t

I

I

I

A

I

2

I

TCGAGT•T•TTCTCTTCTTTCTTTCTGCTTAACACTTTTGATCCTTTTC•ATGGCTACGCGGCTCAACAGGGTCAGCAGGGTCAGCAGTTTCCGAACGAG R

S L L S F C L T L L I L F H G Y A A Q Q G Q Q G Q Q F P N E 35 I I I t I I I I I l 301 TGCCAGCTCGACCAGC TCAATGCGC TCGAGCCG TCACACGTACTGAAGAGCGAGGCTGG TCGCATCGAGG TGTGGGACCACCACGC TCC TCAGCTCCG TT C Q L D Q L N A L E P S H V L K S E A G R I E V W D H H A P Q L R 68 I I I I I I I I I I 401 G•TCAGGTGTcTCCTTTGCA•GTTACATCATCGAGTCTAAGGGTCTCTACTTGC••TCTTTCTTTAACACCGCGAAGCTCTCTTTCGTGGCTAAGGgtac C S G V S F A R Y I I E S K G L Y L P S F F N T A K L S F V A K 100 I I l l l I I I' I l 501 gtacgact cttt ctatatcgaaattcgaatt catgactttatggttcatgttctttaggattagtccataatctttcaactttaatttaaacgtatataa 601

V

S

tttatatgtgttacatgcttagGACGAGGTCTTATGGGAAAAGTGATCCCTGGATGCGCCGAAACATTCCAAGAcTCATCAGAGTTCcAAcCACG•TTCG G

R

G

L

M

G

K

V

I

G

C

A

E

T

F

Q

D

$

$

E

F

Q

P

R

F

126

701 AAGG TCAAGG TCAAAGCCAGAGGTTCCG T G A C A T G C A C C A G A A A G T G G A G C A C A T T A G G A G C C ~ T G A T ~ C A T T G C C A C A A C A C C C G G T G TAC4~ACAG TG E G Q G Q S Q R F R D M H Q K V E 14 I R $ G D T I A T T P G V A Q W 160 I 801

GTTCTACAA•GACGGACAGCAGC•ACTTGTCATCGTCAGCGTCTTCGATCTAGCCAGTCACCAGAACCAGCTTGACCGCAAC•CAAGGgtatatataaac F

Y

N

D

G

Q

Q

P

L

V

I

V

S

V

F

D

L

A

S

}4 Q

N

Q

L

D

R

N

P

R

910 caaaagct caaaagaat cart at at agt cacattaacaaaaaat aatt at gt ggttt attttt cgt ggtgtgtt atgcatgaagCCATTTTACTTAGCCG P F Y L A

189

194

1001 GAAACAACCCACAAGG TCAAG TATGGCTACAAGGACGAGAGCAACAGCCACAGAAGAACATTTTCAATGGATTTGGACCCGAGGTTATTGC TCAAGCTTT G N N P Q G Q V W L Q G R E Q Q P Q K N I F N F G P E V I A Q A L 228 1101 GAAGATCGATCTTCAGACAGCACAGCAAC TTCAGAACCAAGATGACAACCGTGGAAACATTGTCCGAG TCCAAGGACCGTTCGGTGTCATTAGGCCGCCT K I D L Q T A Q Q L Q N Q D D N R G N V R V Q G P F G V R P P 261 1201 TTGAGGGGCCAGAGACC TCAGGAGGAGGAAGAAGAAGAAGGACGACATGGACGACACGGTAATGGC TTAGAGGAGACCATCTGC~GCGCCAGGTGCACCG L R G Q R P Q E E E E E E G R H G R H G N G L E E T I C S A R C T

294

1301 ATAACC TCGATGACCCG TCTCGTGC TGACGTG TACAAGCCACAGC TCGGTTACATCAGCACTCTCAACAG TTACGATCTCCCCATTCTTCGCTTCATCCG D N L D D P S R A D V Y K P Q L G Y I S T L N S Y D L P I L R F I R 328 1401 TCTCTCAGCCCTCCGTGGATCTATCCGTCAAgt aagt aaacataaat art atgtt act at aacctagt aaaatat gcatgcctgatgcat gtt aatat gt L S A L R G I R Q 338 1501 c c at t t ct at at tt aa a cat gact ctt gaaa cgtgt gtgggtgt agAACGCAATGGTGC TTCCACAGTGGAACGCAAACGCGAACGCTATTCTTTACGAG N A M V L P Q W N A N A N A I L Y E 356 1601 ACAGACGGGGAAGCCCAAATCCAGATCGTAAACGACAATGGTAACAGAGTGTTTC~CGGACAAGTCTCTCAAGGAcAGCTCATAGCCGTACcACAAGGTT T D G E A Q Q I V N D N G N R V F D G Q V S Q G Q L I A V P Q G

389

1701 TC TCGG TGG TGAAACGCGCAACAAGCAACCGATTCCAGTGGGTTGAGTTCAAAACAAACGcTAACGCGCAAATCAACACTCTGGcGGGACGAACCTCAGT F S V V K R A T S N R F Q W V E F K T N A N A Q I N T L A G R T S V 423 1801 C TTGAGAGG TTTACCAC TTGAAGTCATAACCAATGGG TTCCAAATCTCACCCGAAGAAGCAAGGAGGG TCAAGTTCAACACGCTCGAGACCAC TTTGACT L R G L P L E V I T N G F Q I S P E E A R R V K F N T L E T T L T 456 1901 CACAGCAGTGGCCCAGCTAGCTACGGAAGGCCAAGAGTGGCTGCAGCTt aagagct t aa aa ccgcggct t aacaat gaa ccgct act gt a aaagga agtt H S S G P A S Y G R P R V A A A Stop 472 I 2001 aaatagtacgtagtaataataataatgtacgaaaatgtgactagttttgttgaggttta•ctgtaaaatgcaact•cttttctqaataaaatcttttcaa I I l l I l I I I i 2101 tt tt cgat caagtt aat acaaatct aggt ct aaatt aggtt ctt aat cat agagactagttctgattttt atgattt aatacatttgaat cat catatta t I I I I I I l I i 2201 tt tt at at aataat ccaat att aacatt agacaagt cgccaaaat att gt catgcttaacaaattt at att acct catttt cttt atct atttat aat ac I I I I 2301 a t c a a a t g c t t t a a t t t t a a t t t c a a a t a t c t a a t t t a a t c c g t g c

Fig. 2. The nucleotide sequence of the CRA1 gene of Arabidopsis. The nucleotide sequences were determined on both complementary strands except for 500 bases at the 5' end of CRAL Putative protein coding sequences are denoted by upper case letters and non-coding portions by lower case letters. The deduced amino acid sequence is represented with the single letter code underneath the corresponding codons. The nucleotide position is shown on the left of the nucleotide sequences, and the deduced amino acid position is shown on the right of the amino acid sequence. The underlined DNA sequences are the putative TATA box and polyadenylation sites, based on their homology to the consensus sequences [11, 28].

1

I I I I I I I I gtcgacaattataattcgattt~tt~ttattgaataaatcataaecaagactaaactgtagcgaatatgaaacaaattggtctacactatcatttt~ I i I I I I I I

I

I

I

I

i01

tacaaaatgttgaataatttattttaaaaaataaataataaaatcttga~caaaactaaaataatatcagacctaattat~gaatgtaagtatgttactg

201

attattggattattggattatgCgt~tatttgtttt~gattattggattt~ctatttttattggt~tatggt~tattttt~tttttaaaagaact~t

I

301

401

501

601

701 801

901

1001

1101

1201 1301 1401

1501

1601

1701

1801

1901 2001 2101

2201

2301

2401

2501

2601

I

I

I

I

I

I

I

I

I

I I I I I I I I I ttacaacaagtctectataaatacataaa•tccataacccacaaagtaagaaagtaaatcaataaagaggaagaaaaATGGGTcGAGT•TcAT•TATTAT M G R V S S I I 8 a I I I a ~ I I ~ CT~TT~CTCTTTGA~ACT~TGAT~T~TT~CTACA~TGC~BCAGTGG~CC~GAGT~C~AGC~GAT~CT~TGCGC~CG~A~C~ S F S L T L L I L F N G Y T A O O W P N E C O L D O L N A L E P I I I I I I I I I c~u~ATCATC~GAGCGAGGGTGGTCGCATCGAGGTCTG~ACCAC~ATG~A~CCCAGCTC~GTTGCTcCGGCTTTG~CTTTGAG~GTTTcGTCATTGAGC Q I I K S E G G R I E V W D H H A P Q L R C S G F A F E R F V I E I I I I I I I I I CTCAGG~CTTTTCTTGCCCACTTTCTTG~CGCCGGCAAACTCACGTTTGTTGTTCACGgtgcgtttatatatatactacattataataatatatagat P Q G L F L P T F L N A G K L T F V V H I I I I I I I I I cgacaagaaaaatagtaactaatataaatcttacattt~ctt~ccagtataagtactgcaaatgaaaaaaatatttggcattctcatagatcataaaaa I I I I I I I I I atagtagaagttccaaaagcttcattattt~aaaaat~ttgatattattttttggctatgataaata~G~GTCT~TGGG~GAGTTATTCCGG G R G L M G R V I P I 0 4 I I I I I I I I I GATGCGCCGAGACGTTCATGGAGTCACcGGTATTTGGAG~GGTC~GGTCAGGGTCA~GTC~G~TTCCGTGACATGCACCAG~AGTAGAGCACCT G C A E T F M E S P V F G E G O G Q G Q S Q G F R D M H O K V E H L 1 3 8 I I I I I I I I I AC~TGCGGTGACACCATTGC~CACCATCTGGTGTAGCTC~TGGTTCTAC~C~TGGAAATGAGCCTCTCATTCTTGTTGCAGCCGCGGATCTCGCC R C G D T I A T P S G V A Q W F Y N N G N E P L I L V A A A D L A I 7 1 I I I I I I I I t AGC~CCAG~CCAGCTTGACCGC~CCTTAGAgtaagtcCaaacacaaaaataaaattcttattttgtttgtttgattaattatatcgaaatagatctt S N O N Q L D R N L R I I I I I I I I I ttgccgagttagaaattcttgaaatttatttgtttagttttcagttaat~tatttc~ggcaatttct~gttaattataactttaaagaaaatgatt I I I I I I I I I tacaactaatattttaccaattatataagctcaaacgattta~taaaacttta~tt~tgaattagttttgc~ta~ttcacatttctacacttcaa I I I I I I I I I atttcgccactaaaa~tt~gaaatatattttcatgtttaca~CATTTTTGATAGCC~AAAC~CCCAC~G~CAGG~TGGCTAC~GGCCGAAA P F L I A G N N P Q G O E W L Q G R K 2 0 1 I I I I I I I I I GC~CAG~GCAAAAC~CATC~C~TGGCTTC~ACCTGA~TCTTGG~C~GCCTTC~GATC~TGTCGAGACGGCTCAGCAGCTCCAG~CCAG Q O K Q N N I F N G F A P E I L A Q A F K I N V E T A O O L Q N Q 2 3 4 I I I I I I I I I C~GAT~CCGTGGC~CATCGTC~TC~CGGACCTTTC~CGTCATTAGGCCACCCTT~GACGCG~G~GGCGGCC~C~CCACATGAAATAG Q D N R G N I V K V N G P F G V I R P P L R R G E G G O Q P H E I 2 6 7 I I I I I I I I I CT~TGGTTTAGAGGAGACTTT~GCACCATGCGATGCACTGAAAACCTCGATGACCCGTCGGATGCTGACGTGTAC~GCCATCACTCGGATACATTAG A N G L E E T L C T M R C T E N L D D P S D A D V Y K P S L G Y I S 3 0 1 I I I I I I I I I CACACTT~CAGCTAC~TCTTCCTATCCT~GA~TCTCCGCCTTAGCGCTCTTCGT~CTCCATCCGTAAAgtaagctcaacaaatctttagaaaatt T L N S Y N L P I L R L L R L S A L R G S I R K I I I I I I I I I

i

I S

41 I 74 I 94 I I

I

I

I 182 I I I

I

I

I

I 325 I

aattttatgtgacatatg~at~tttgatttgg~agataaactaatagattttgcgatttggagttttaaactctaaataatctaaatcgttttcaat I I I I I I I I tg~ttaaatatatatcttgcatttttaatcgt~tttaattaaaaaatatatatatatatatatatcttg~tttttaatcgtttt~atttaaaaaata I I I I I I I I tcttgcacgcag~CGCTATGGTGCTACCGC~T~CGTAAACGCAAACGCG~ACTCTACGTGACAAACGGAAAGGCTCATATACAAATGGTGAACG N A M V T P O W N V N A N A A L Y V T N G K A H I Q M V N 3 5 4 I I I I I I I I AC~CGGAGAAAGAGTGTTCGACC~GAGATCTCCAGC~ACAGTTACTA~CGTGCCAC~GGCTTTTC~TCATGAAACATCGCATAGGCG~CAGTT D N G E R V F D O E I S S G O L L V V P O G F S V M K H R I G E Q F 3 8 8 I I I I I I I I CGAGTG~TCGAATT~CAAACGAAAACGCACAGGTC~CACACTCGCGGGCCGTACCTCAGTCATGAGRGGTTT~CGCTTGAGGTTAT~CC~T E W I E F K T N E N A Q V N T L A G R T S V M R G L P L E V I T N 4 2 1 I I I I I I I I GGGTACCAGATCTCTCCCGAAG~GCTAAACGAGTAAAGTTTAGCACGATTGAGACCACACTGACCCATA~AGTCC~TGAGCTACGG~GGCCTAGGG G Y Q I S P E E A K R V K F S T I E T T L T H S S P M S Y G R P R 4 5 4 I I I I I I I I ~Ttgaggcttgatgagcgcgtg~gg~tagcgtttgaatctaaagttcg~ttggtacgacttgtaatatgaaataataat~acaaagaagttctacg A Stop I I I I I ~ I I ~ttaagggaactgttttgttttgagctttgtattaggacgtctagt~acaacaacgaacgtcgtgtataagcgatcgttga

I I

I

I

J

I

t

I

I

I

I 455

Fig. 3. The nucleotide sequence of CRB gene of Arabidops~. The ~ r m a t of presentation of the nucleotide sequences is described in the legend ~ r Fig. 2.

813 ed region. There is some sequence homology in the 5' untranslated region and the putative TATA boxes are highly conserved (8 out of 10 nucleotides), however, sequence homology was not detected further upstream from the TATA boxes. The introns in CRB are all larger than the corresponding ones in CRA1. The second CRA gene, CRA2, was partially sequenced. The nucleotide sequence of 200 bp of 5' flanking region and 400 bp at the 5' end of the CRA2 gene (including the first intron) was determined (data not shown). The region downstream from the putative initation ATG codon of CRA2 is conserved perfectly with the corresponding region o f CRA1, except for 6 base pairs in the first intron. However, the overall sequence homology between CRA1 and CRA2 in the 200 bp 5' flanking regions is only about 75%, with the most conserved regions around the TATA box and the region immediately downstream, where transcription initiation presumably occurs. This is in sharp contrast to the 5' flanking regions of three P sativum legumin genes which show complete sequence identity for more than 300 nucleotides upstream of the translation initiation site [25]. About 1000 bp of the 3' end of the third 12S gene, CRC, has also been determined (data not shown). This 1000 bp region includes the entire coding sequence of the CRC ~-peptide and about 250 bp of 3' untranslated sequences. The deduced CRC ~peptide shares about 60% amino acid identity with both CRA1 and CRB ~-peptides. By comparing the deduced amino acid sequence to that of CRA1 and CRB, the location of the third intron in CRC appears to be at the same position as in the other two 12S genes. The complete nucleotide sequence of CRC has not been determined, but based on this partial sequence, we conclude that this CRC gene represents a third 12S globulin gene in A. thaliana. To summarize, we have isolated four 12S seed storage protein genes from the Landsberg strain o f A. thaliana. Two of them are closely related (the nucleotides of the first exon are identical) and cross hybridize, while the other two show only 6 0 - 7 5 % nuclei acid identity to each other and to the first two, and do not cross hybridize with any of the other genes. The four genes thus comprise three subfamilies of the 12S storage protein gene family. Since the 12S

seed storage protein subfamilies are non-crosshybridizing, and since members of each of the three subfamilies are hybridized by 32p-labeled cDNA made from seed RNA, there are at least three functional seed storage protein genes: at least one of the CRA genes, the CRB gene and the CRC gene.

In situ localization of the seed storage protein messages in embryos Seed storage proteins are stored in protein bodies in embryos. To determine the cell types expressing one of the 12S globulin genes in Arabidopsis, seed pods 7 - 8 days old were fixed and sectioned [29]. These tissue sections were then hybridized with 3H-labeled RNA transcribed by SP6 RNA polymerase from sAt2105 (anti-sense-RNA for the CRA1 gene). The hybridization is limited to the cotyledons and hypocotyl (Fig. 4a). There is considerably less signal from the procambial region in the center of the hypocotyl and the cotyledons. There is no evident signal from the endosperm, seed coat or silique tissue. This is consistent with previous observations that seed storage proteins are found in both the cotyledons and the embryo axis in Brassica species. No signal was observed with 3H-labeled transcripts from sAt2101 (sense-RNA for the CRA1 gene) (Fig. 4b). That the product of the CRB gene is also localized to embryo and axis has been shown in experiments in which the upstream regulatory sequences of CRB have been fused to a reporter gene (Arabidopsis alcohol dehydrogenase), the fusion construct introduced to the genome of an Arabidopsis strain with no endogenous A D H activity by Ti plasmid-mediated transformation, and the product o f the fusion gene assayed by histochemical staining for A D H activity. These and related experiments will be reported elsewhere (Pang and Meyerowitz, work in progress).

Expression of the seed-specific genes during the development of embryos We have examined the expression of the A. thaliana seed-specific genes during the course of embryogen-

814

Fig. 4. Locationof CRA seed storage protein messagesin developingseeds 7 - 8 days old. (A) Anti-sense strand 3H-labeledRNA of CRA

gene from an SP6 promoter (sAt2105) as probe. Positions of the cotyledons (C), hypocotyl(H), procambium (PC), silique tissues (S) and seed coat (SC) are indicated. (B) Sense strand 3H-labeled RNA of CRA gene from an SP6 promoter (sAt2101) as probe. esis. A . thaliana seeds develop over a period of slightly more than two weeks: by day 16 or 17 (days after pollination) the embryos are mature and have started the desiccation process. Total R N A was isolated from seed pods at various stages of development. These R N A samples were electrophoretically separated through a formaldehyde-agarose gel. Three identical filters were prepared for hybridization with three 12S globulin gene probes (nAtl512, nAt4012 and nAt4003) representing the CRA1, C R B and C R C genes. Under the hybridization conditions

used (5 × SSPE, 50% formamide at 42 °C, wash at 5 5 - 6 0 °C with 0.1 × SSPE), no cross-hybridization is detected between R N A probes synthesized from each of the genes, and cloned D N A of the other genes. Figure 5 shows the results of these experiments. The developmental time courses of expression of the three 12S protein gene transcripts are identical, and the transcripts appear equally abundant. R N A homologous with nAtl512, nAt4012 and nAt4003 is detected from day 6 post pollination (or anthesis) to day 17, the last time point. By day 17,

815

~"

< Z

.~ '~ '~

,.~ o

"= ~1 ,2

~ ~.~ ~,.~o~

~

~ o co

~

.~ ~ , ~

~

-;z~

e

m

~ ~.~ ~,~

o

~o~-~

~.~.~

816 the seeds are brown and desiccated. Abscisic acid has been suggested as playing an important role in seed development. Endogenous ABA is required in the induction and maintenance of seed dormancy [39] and also the induction of synthesis of certain proteins during embryogenesis, including seed storage proteins in some species [4, 14]. Abscisic acid deficient mutants and abscisic acid insensitive mutants in Arabidopsis are known [20, 21, 22]. We performed several experiments to see if these mutations affect the expression of the 12S storage protein genes. The mutant strains used in the following experiments are aba3 (ABA-deficient) and abi-3 (ABA-insensitive); they both have lost their seed dormancy. The ABA-deficient mutants were isolated as revertants of a non-germinating giberellinrequiring line of Arabidopsis thaliana [21]. The ABA-insensitive mutants were isolated by selecting for germination in the presence of 10 #M ABA, which totally suppresses germination of wild type A. thaliana [22]. The levels of endogenous ABA in the ABA-deficient mutant and ABA-insensitive mutant are about 25% and 150% of wild type [20, 22]. The ABA-insensitive mutant is slightly smaller in size than wild type, and the aba3 mutant plant is considerably smaller and very sensitive to water loss, resulting in withered siliques and stems. The expression of 12S storage protein genes was analyzed in these mutants using a R N A blot procedure similar to that mentioned above. Again, three filters were made and probed with the various seed-specific genes. The expression of the three 12S globulin genes is very similar in both the ABA-deficient and ABAinsensitive mutants. The abundance of the 12S globulin messages in the two mutant strains is comparable to that in the wild type. The only observed difference between mutant and wild type is that the onset of the expression is delayed by 1 day in the ABA-deficient strain and by 2 days in the ABAinsensitive strain (Fig. 5). As a comparison, we also analyzed the accumulation of the 650 base transcripts that are homologous with the fourth class of genomic clones we isolated. This class of clones codes for a gene family expressed in seed coats of developing Arabidopsis seeds. The characterization of the genomic organization of the

seed coat genes will be reported elsewhere (M. Yanofsky and E. Meyerowitz, unpublished results). The 650 base transcripts can be detected as early as 4 days after pollination and transcripts are virtually absent by day 12 (Fig. 5). It is clear that the levels o f accumulated R N A for the two different seedspecific gene families follow different temporal patterns during embryogenesis, which suggests that they may be under different regulatory controls. However, the 650 base transcript is also delayed 1- 2 days in its expression in aba3 (ABA-deficient) and abi-3 (ABA-insensitive) mutants. Thus, the effect of the mutations appears to be a general effect on developmental rate, and not a specific effect on seed storage proteins.

Discussion The 12S globulin gene family of Arabidopsis, as so far characterized, consists of four members in one strain, and three in another. In both strains the genes form three non-cross-hybridizing subfamilies. The genes in the Landsberg erecta strain of A. thaliana, which is the strain with four genes, are designated CRA1, CRA2, CRB and CRC. CRA1 and CRA2 cross-hybridize, and are located adjacent to each other in the Landsberg strain of A. thaliana. These genes represent a tandem duplication with the two genes approximately 2.5 kb apart. This type of tandem duplication of closely related seed storage protein genes has also been observed in pea, in which case the gene 1.3 kb 3' of the other is a pseudogene [3]. There are only two genes in the CRA gene subfamily in the Landsberg erecta strain genome, and we know that at least one of these two copies is expressed. However, we do not know if both of these genes are expressed. The CRA clones that we have obtained from the Columbia strain of A. thaliana appear to lack the tandem duplication present in Landsberg. The restriction sites which are unique to the 5' end of CRA2 and the sites unique to the 3' end of CRA1 in Landsberg are present in the single gene found in the Columbia strain. The simplest explanation is that the tandem duplication present in the Landsberg strain has been eliminated in a progenitor of the

817 Columbia strain by an unequal crossover event. The other two 12S seed storage protein genes, CRB and CRC, fail to cross-hybridize with either CRA gene or with each other. They are present in single copy in both the Columbia and Landsberg strains, and thus each constitute a single-gene subfamily. The complete nucleotide sequence of CRA1 and CRB genes have been determined, and partial sequencing o f CRA2 and CRC has been accomplished. Members of the different subfamilies share no obvious restriction sites, however, the predicted amino acid sequences of CRA1 and CRB share 73°70 identity. The 3' end of the CRC gene coding for the [3peptide shares about 60070 amino acid identity with that o f CRA1 and CRB. Computer-generated pairwise amino acid sequence alignments between the deduced proteins coded by the two completely sequenced A. thaliana 12S genes, and other 12S storage proteins from various other dicot species, have allowed calculation of the percentage amino acid identity between the different proteins, as shown in Table 1. The pC1 12S globulin (cruciferin) from B. napus is 86070 identical in deduced amino acid sequence with the CRA1 gene o f A . thaliana. CRB is only 74070 identical, in deduced amino acid sequence, with the Brassica gene. This suggests that the divergence of the CRA and CRB ancestral genes preceded the establishment o f the separate lineages that led to Arabidopsis and Brassica. A similar evolutionary pattern is also found in the legume

family [5, 40], where members of different subfamilies in the same species share less homology than certain of the genes compared between different species. In legumes, in contrast to Arabidopsis, there are numerous cross-hybridizing members of each of the two characterized subfamilies [5]. We may not have yet identified all of the different subfamilies in Arabidopsis; nonetheless, at the present time the organization of the 12S genes of Arabidopsis seems much simpler than that of the other plants where similar information is available. This provides an important advantage in studying the regulation of the expression of seed storage protein genes: we can be sure that the genes isolated are expressed, without having to isolate and sequence the corresponding c D N A clones or proteins, and the problem of crosshybridizing R N A species in R N A blots is eliminated. A simple gene family organization has also been observed for some, but not all, of the otherArabidopsis genes which have been isolated [30]. The structures of the genes coding for the 12S seed storage proteins in Arabidopsis and in the other dicots where 12S seed storage protein genes have been studied show clear evolutionary homok, gy: the locations of the introns in the A. thaliana genes are in identical positions relative to the protein sequence as those found in the legA legumin gene of P.. sativum [24]. A partial sequence of a genomic clone for the soybean (Glycine max) 12S storage protein gene also demonstrates the presence in this gene of an in-

Table 1. Percentage a m i n o acid identity between various globulin seed storage proteins. Best-fit alignments were made using a computer program written by R. E. Pruitt (unpublished result), and the percentage amino acid identity was calculated based on these alignments. Sources for the a m i n o acid sequences for B. napus cruciferin and P. sativum legA are listed in figure legend of Fig. 4. Sequences for P. sativum pCDF40, V. f a b a legA and legB, and G. hirsutum legumins A and B are from D o m o n e y et al. [12], W o b u s et al. [40] and Chlan et al. [7] respectively. A. thaliana

A. B. P. P. V. V. G. G.

thalianaCRB napus sativumlegA sativumpCD40 f a b a legA fabalegB hirsutumc-94 hirsutumc-134

B. napus

CRA 1

CRB

73 86 43 41 46 40 46 47

74 41 39 41 37 46 44

43 41 44 39 38 41

P. sativum

V. f a b a

G. hirsutum c-94

legA

pCD40

legA

legB

42 86 43 40 39

43 92 45 46

43 43 41

44 45

43

818 tron in the same positions as the third intron o f the A. thaliana and P. sativum genes [16]. In addition, all o f the genes encode a single polypeptide apparently processed to mature o~ and/3 polypeptides at a conserved cleavage site, as previously described for storage protein genes from seven different species by Borroto and Dure [2].

Comparison o f deduced A. thaliana seed storage proteins with the deduced pC1 cruciferin protein from B. napus and the deduced legA protein o f P. sativum is shown in Fig. 6. The CRA1 protein has an insertion o f six amino acids at the presumed junction o f the leader peptide and the amino terminus o f the mature peptide. The alignment o f the deduced

CRA CRB

[MARVSSL-LSFCLTLLILFHGYA/~FPNECQLDQLNALEPSHVLKSEAGRIEVWDHHAPQLRCSGVSFARYIIE

B. napus P. sativum (l egA)

I

L

-

sA

[

KLLA

S

FL

CRA CRB

SKGLYLPSFFNTAKLSFVAKGRGLMGKVIPGCAETFQDSSEFQ.... PRFEGQG- ......................... PQ F T L AG T VH R ME PV ...... GEGQ .......................... S E R -VL V PSGGS FG QGQQC~H RNA RR YYS APQEI IQQ N YF M F P EE .......... QESE ..........................

130 123 154

--QSQRFRDMHQKVEHIRSGDTIATTPGVAQWF~PLVIVSVFDLASHQNQLDRNPRPFYLAGNNPQGQVWLQGREQ -G L C PS N NE IL AAA N L LI E K QS G G T H N L IE --EGR Y R NRF E I VPT IVF M QDT VIA LT IR SN QM R HE EFLQY HQQG

209 202 235 203

CRA CRB B. napus P. sativum (legA)

QpQ ...... KNI FNGFGPEVIAQALKI DLQTAQQLQ--NQDDNRGqIVRVQGPFGVI K ...... N A IL F NVE -Q K N T ...... L T L K F VR -Q I S GK EQENEGN S KRDFLED FNVNRHIVDR GR E EEK A K K GLSI

262 255 288 280

CRA CRB

. . . . . . . . . . . . . . . . . . . . . P,G Q R P Q E E E E E E G R H G R H G . . . . . . . . . . . N G L E E T I C S A R C T D N L D D P S R A D V Y K P Q L G ......................... RRG ........ -GGQQPHEIA .... L TM E D S ..................... S V ..................... N PP,H Q R G S R Q E E E E D E D F E R Q P H RRG DKKE G SQKGKSRRQGD V T K L R L I G P S SP I N E A

311 298 327 361

y I S T L N S Y D L P I LRFI R L S A L R G S I R Q N A M V L P Q W N A N A N A I L Y E T D G E A Q I Q I V N D N G N R V F D G Q V S Q G Q L I A V P Q G F SV N LL K T V A V N K H M E QEI S LV L V V HV V D LSI R K VT L V WLK EH LHK FV HY L S I ALK R RL V C T A ELEA RALT NYA

392 379 408 442

V K R A T S N R F Q W V E F K T N A N A Q IN T L A G R T S V L R G L P LEVI T N G F Q I S P E E A R R V K F N T L E T T L T H S S G P A S Y G R P R V A A A M HRIGEQ E I E V M Y K S I - M - -EQ R I S Y L I G K D AAKSL D SY A DR G AR TS INN D VAAT NLQRN QL S NPFKF .... V RESEN - S

472 455 488 517

B. napus P. sativum (legA)

CRA CRB

B. napus P. sativum (legA)

[ G

I-I

S

N

T] ......

W

rL srl ...... ---G CF~RE PQ ---

QII

G

FA E FV

A ER D

DNRIE

v G L

T NPNNK

F

A

ALS ATLQ

RPPL .................... .................... .................... S EKQARHQRGSRQEEDEDEEKQ

80 74 74 75

120

!

B. napus P. sativum (i egA)

CRA CRB

B. napus P. sativum (l egA )

CRA CRB

B.napus P.sativum (l egA)

Fig. 6. A m i n o acid sequence alignments of 12S storage protein genes. The sequences of Brassica napus and Pisum sativum legA are from Simon et al. [37] and Lycett et al. [24] respectively. The numbers on the right reflect the positions of the last amino acid shown in each line. The amino acid sequence of the CRA1 protein is presented in its entirety, with gaps introduced in order to accommodate putative insertions in other sequences. A blank indicates sequence identity with CRA1. A dash ( - ) represents a gap introduced into the protein sequence during alignment. The putative leader sequences of the precursor 12S proteins from A rabidopsis and B. napus are bracketed. The putative a-~ cleavage sites of the four proteins are indicated by v. The best-fit alignments were generated by a computer program written by R. E. Pruitt of this laboratory (unpublished results).

819

CRA1 p r o t e i n with the cruciferin p r o t e i n o f B. napus suggests t h a t these 6 a m i n o acids are p a r t o f the m a ture CRA1 c~-peptide. T h e r e are o n l y two m a j o r insertions o r d e l e t i o n s in the c o d i n g sequence o f each gene relative to the other. T h e first is a n i n s e r t i o n in the B. napus gene o f a sequence o f 28 a m i n o acids, largely g l u t a m i n e a n d glycine, l o c a t e d following p o sition 122 in the B. napus p r o t e i n sequence. T h e seco n d is an i n s e r t i o n in the P. sativum sequence, c h a r a c t e r i z e d by a n a b u n d a n c e o f g l u t a m i c acid a n d a s p a r t i c acid, l o c a t e d n e a r the c a r b o x y l t e r m i n u s o f the o~ peptide. This region is a b s e n t in b o t h Arabidopsis a n d Brassica 12S globulins. Sequence d a t a f r o m a non-legA c D N A clone o f Pisum (with a b o u t 4 2 % a m i n o a c i d sequence i d e n t i t y to legA) a n d o f two l e g u m i n genes f r o m Viciafaba, also indicates a similar g l u t a m i c a c i d a n d a s p a r t i c a c i d - r i c h region [12, 40]. I n a d d i t i o n , the h o m o l o g o u s c o t t o n genes d e s c r i b e d b y C h l a n et al. [7] also c o n t a i n such a region. Large i n s e r t i o n s a n d d e l e t i o n s o f the types seen in the ot p e p t i d e are n o t seen in the/3 p e p t i d e c o d i n g p o r t i o n s o f the genes c o m p a r e d . A c c u m u l a t i o n o f t h e R N A h y b r i d i z e d by the C R A genes is s t i m u l a t e d by exogenous abscisic acid in excised e m b r y o s of Arabidopsis [15], s i m i l a r l y to A B A i n d u c t i o n o f the pC1 cruciferin gene in excised emb r y o s o f B . napus [14]. However, we f o u n d n o drastic c h a n g e s in the level o f the seed storage p r o t e i n messages in abscisic a c i d - d e f i c i e n t a n d abscisic acidinsensitive A. thaliana m u t a n t plants. T h e o n s e t o f expression o f the three n o n - c r o s s - h y b r i d i z i n g 12S p r o t e i n genes is delayed by 1 - 2 days in the m u t a n t s . This difference in onset o f expression p r o b a b l y is the result o f slower seed d e v e l o p m e n t in the m u t a n t s . O n e i n d i c a t i o n t h a t this is the case is t h e o b s e r v a t i o n t h a t R N A f r o m a gene f a m i l y expressed in seed coats o f d e v e l o p i n g Arabidopsis seeds is also delayed 1 to 2 days in its expression in these m u t a n t s . O u r results suggest t h a t these m u t a t i o n s , aba3 a n d abi-3, a l t h o u g h clearly affecting i n d u c t i o n o f seed d o r m a n c y d u r i n g e m b r y o g e n e s i s o f Arabidopsis, have n o m a j o r effect o n expression o f genes in three different subfamilies o f 12S seed storage p r o t e i n genes, t h o u g h q u a n t i t a t i v e effects o f several-fold are n o t excluded.

Acknowledgements We w o u l d like to t h a n k J o h n B o w m a n for the tissue in situ h y b r i d i z a t i o n results a n d m e m b e r s o f the Meyerowitz l a b o r a t o r y for technical suggestions a n d c o m m e n t s o n the m a n u s c r i p t . We t h a n k M. L. C r o u c h for p r o v i d i n g us the cruciferin c D N A clone, pC1. This w o r k was s u p p o r t e d by g r a n t n u m b e r P C M - 8 7 0 3 4 3 9 f r o m t h e N a t i o n a l Science F o u n d a t i o n to E . M . M . P . P . P . is a M o n s a n t o Fellow o f the Life Science Research F o u n d a t i o n .

References 1. Blobel G, Dobberstein B: Transfer of proteins across membranes I. Presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J Cell Biol 67:835-851 (1975). 2. Borroto K, Dure III L."The globulin seed storage proteins of flowering plants are derived from two ancestral genes. Plant Mol Biol 8:113-131 (1987). 3. Bown D, Levasseur M, Croy RRD, Boulter D, Gatehouse JA: Sequence for a pseudogene in the legumin gene family of pea (Pisum sativum L.). Nucl Acids Res 13:4527-4538 (1985). 4. Bray EA, Beachy RN: Regulation by ABA of ~-conglycinin expression in cultured developing soybean cotyledons. Plant Physiol 79:746-750 (1985). 5. Casey R, Domoney C." The structure of plant storage protein genes. Plant Mol Biol Reporter 5:261-281 (1987). 6. Chang C, Meyerowitz EM: Molecular cloning and DNA sequence of the Arabidopsis thaliana alcohol dehydrogenase gene. Proc Natl Acad Sci USA 83:1408-1412 (1986). 7. Chlan CA, Pyle JB, Legocki AB, Dure III L: Developmental biochemistry of cottonseed embryogenesis and germination XVIII cDNA and amino acid sequences of members of the storage protein families. Plant Mol Biol 7:475-489 (1986). 8. Crosby M J, Meyerowitz EM: Drosophilaglue gene Sgs-3: Sequences required for puffing and transcriptional regulation. Dev Biol 118:593-607 (1986). 9. Crouch ML, Sussex IM: Development and storage-protein synthesis in Brassica napus L. embryos in vivo and in vitro. Planta 153:64-74 (1981). 10. Davis RW, Botstein D, Roth JR: Advanced Bacterial Genetics. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1980). 11. Dhaese P, De Greve H, Geilen J, Seurinck J, Van Montagu M, Schell J: Identification of sequences involved in the polyadenylation of higher plant nuclear transcripts using Agrobacterium T-DNA genes as models. EMBO J 2: 419-426 (1983). 12. Domoney C, Barker D, Casey R: The complete deduced ami-

820

13.

14.

15.

16. 17.

18.

19. 20.

21.

22.

23.

24.

25.

no acid sequences of legumin/3-polypeptides from different genetic loci in Pisum. Plant Mol Biol 7:467-474 (1986). Eisenberg A J, Mascarenhas JP: Abscisic acid and the regulation of synthesis of specific seed proteins and their messenger RNAs during culture of soybean embryos. Planta 166: 505-514 (1985). Finkelstein RR, Tenbarge KM, Shumway JE, Crouch ML: Role of ABA in maturation of rapeseed embryos. Plant Physiol 78:630-636 (1985). Finkelstein R, Somerville C: Analysis of the mechanisms of abscisic acid action using ABA-insensitive mutants of Arabidopsis. In: Third International Meeting on Arabidopsis, Michigan State University, East Lansing, 1987. Abstract 138. Fischer RL, Goldberg RB: Structure and flanking regions of soybean seed protein genes. Cell 29:651-660 (1982). Galau GA, Huges DW and Dure L III: Abscisic acid induction of cloned cotton late embryogenesis-abundant (Lea) mRNAs. Plant Mol Biol 7:155-170 (1986). Heath JD, Weldon R, Monnot C, Meinke DW: Analysis of storage proteins in normal and aborted seeds from embryolethal mutants ofArabidopsis thaliana. Planta 169: 304-312 (1986). Higgins TJV: Synthesis and regulation of major proteins in seeds. Ann Rev Plant Physiol 35:191-221 (1984). Karssen CM, Brinkhorst-van der Swan DLC, Breekland AE, Koornneef M: Induction of dormancy during seed development by endogenous abscisic acid: studies on abscisic acid deficient genotypes of Arabidopsis thaliana (L.) Heynh. Planta 157:158-165 (1983). Koornneef M, Jorna ML, Brinkhorst-van der Swan DLC, Karssen CM: The isolation of abscisic acid (ABA) deficient mutants by selection of induced revertants in nongerminating gibberellin sensitive lines of Arabidopsis thaliana (L.) Heynh. Theor Appl Genet 61: 385- 393 (1982). Koornneef M, Reuling G, Karssen CM: The isolation and characterization of abscisic acid-insensitive mutants of Arabidopsis thaliana. Physiol Plant 61:377-383 (1984). Leutwiler LS, Meyerowitz EM: Structure and expression of three light harvesting chlorophyll a/b-binding protein genes in Arabidopsis thaliana. Nucl Acids Res 14:4051-4064 (1986). Lycett GW, Croy RRD, Shirsat AH, Boulter D: The complete sequence of a legumin gene from pea (Pisum sativum L.). Nucl Acids Res 12:4493-4506 (1984). Lycett GW, Croy RRD, Shirsat AH, Richards DM, Boulter D: The 5' -flanking regions of three pea legumin genes: comparison of the DNA sequences. Nucl Acids Res 13: 6733-6743 (1985).

26. Marco YA, Thanh VH, Tumer NE, Scallon BJ, Nielsen NC: Cloning and structural analysis of DNA encoding an A2Bla subunit of glycinin. J Biol Chem 259:13436-13441 (1984). 27. Maxam AM, Gilbert W: Sequencing end-labeled DNA with base-specific chemical cleavages. Meth Enzymol 65: 499- 560 (1980). 28. Messing J, Geraghty D, Heidecker G, Hu N-T, Kirdl J, Rubenstein I: Plant gene structure: In: Kosuge T, Meredith CP, Hollaender A (eds) Genetic Engineering of Plants, pp. 221- 227. Plenum, New York (1983). 29. Meyerowitz EM: In situ hybridization to RNA in plant tissue. Plant Mol Biol Reporter 5:242-250 (1987). 30. Meyerowitz EM: Arabidopsis thaliana. Ann Rev Genet 21: 93-111 (1987). 31. Meyerowitz EM, Martin CH: Adjacent chromosomal regions can evolve at very different rates: evolution of the Drosophila 68C glue gene cluster. J Mol Evol 20:251-264 (1984). 32. Pernollet J-C: Protein bodies of seeds: ultrastructure, biochemistry, biosynthesis and degradation. Phytochemistry 17: 1473-1480 (1978). 33. Pruitt RE, Meyerowitz EM: Characterization of the genome of Arabidopsis thaliana. J Mol Biol 187:169-183 (1986). 34. Rigby PWJ, Dieckmann M, Rhodes C, Berg P: Labeling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. J Mol Biol 113: 237- 251 (1977). 35. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74: 5463-5467 (1977). 36. Scallon B, Thanh VH, Floener LA, Nielson NC: Identification and characterization of DNA clones encoding group II glycinin subunits. Theor Appl Genet 70:510-519 (1985). 37. Simon AE, Tenbarge KM, Scofield SR, Finkelstein RR, Crouch ML: Nucleotide sequence ofa cDNA clone of Brassica napus 12S storage protein shows homology with legumin from Pisum sativum. Plant Mol Biol 5:191-201 (1985). 38. Shure M, Wessler S, Fedoroff N: Molecular identification and isolation of the Waxy locus in maize. Cell. 35:225 - 233 (1983). 39. Walton DC: Biochemistry and physiology of abscisic acid. Ann Rev Plant Physiol 31:453-489 (1980). 40. Wobus U, B~iumlein H, Bass/iner R, Heim U, Jung R, Muntz K, Saalbach G, Weschke W: Characteristics of two types of legumin genes in the field bean ( Viciafaba L. var. minor) genome as revealed by cDNA analysis. FEBS Lett 201:74-80 (1986).

Molecular cloning, genomic organization, expression and evolution of 12S seed storage protein genes of Arabidopsis thaliana.

We have identified a number of genes of the flowering plant Arabidopsis thaliana that are abundantly expressed during embryogenesis. In this paper we ...
2MB Sizes 0 Downloads 0 Views