Planta (1990)180:461-470

P l a n t a 9 Springer-Verlag1990

Pea convicilin: structure and primary sequence of the protein and expression of a gene in the seeds of transgenic tobacco Edward J. Newbigin 1'2., Ben O. deLumen 1.*, Peter M. Chandler 1, Alan Gould 1.**, Robert J. Blagrove 3, John F. March 4, Alex A. Kortt 3, and Thomas J.V. Higgins 1 **** 1 CSIRO Division of Plant Industry, GPO Box 1600, Canberra, A.C.T. 2601, Australia, 2 Department of Botany, Australian National University, GPO Box 4, Canberra, A.C.T. 2601, Australia, 3 CSIRO Division of Biotechnology, 343 Royal Pde, Parkville, Victoria 3052, Australia, and 4 John Innes Institute, Colney Lane, Norwich NR4 7UH, UK

Abstract. Convicilin, a trimeric globulin of pea (Pisum sativum L.) seeds, is closely related to vicilin and com-

posed of polypeptides of 68.2 kilodaltons. A partial copy D N A (cDNA) clone encoding convicilin was isolated, sequenced, and used to select a convicilin gene f r o m a pea genomic library. A part of the genomic clone was sequenced to obtain the coding sequences missing in the c D N A clone and a further 1 kilobase 5' to the start of transcription were also obtained. The entire sequence of convicilin was deduced f r o m the combined genomic and c D N A sequences. The complete gene encoding convicilin was transferred to tobacco (Nicotiana tabacum L.) and the characteristics of its expression in the seeds of transgenic plants were studied. An unprocessed polypeptide, which was found only in the seeds of the transgenic plants, was identical in size to pea convicilin, and was recognized by vicilin antibodies. Convicilin, which does not undergo posttranslational cleavage in peas, was partially processed to polypeptides of a relative molecular mass (Mr) of approx. 50000 in transgenic tobacco seeds. There was a twofold variation in the level of convicilin accumulated by the mature seeds of a n u m b e r of transgenic plants and this was well correlated with the n u m b e r of gene copies incorporated in the different transformants. In the seeds of tobacco plants that contained a single copy of the transferred gene it was estim a t e d that convicilin comprised up to 2 % of the seed protein. Thus, using a combination of gene sequencing and expression in a heterologous host we believe we Present addresses." * Department of Biochemistry and Biophysics,

University of California, Davis, CA 95616, USA 9* Department of Nutritional Sciences, University of California, Berkeley, CA 94720, USA 9** United Agriseeds, Inc., PO Box 4011, Champaign, IL 61820, USA 9*** To whom correspondence should be addressed Abbreviations." bp = base pairs; cDNA= copy DNA; DAF = days after flowering; kb = kilobase pair; kDa = kil0dalton; Mr = relative molecular mass; SDS-PAGE=sodium dodecyl sulfate-polyacrylamide gel electrophoresis; SSC = standard saline citrate

have characterized the gene corresponding to the Cvc locus, whereas the gene described by D. Bown et al. (1988, Biochem J., 251, 717 726) p r o b a b l y encodes a minor convicilin-related protein.

Key words: Agrobacterium - Convicilin - Nicotiana (transgenic) bacco

Pisum (convicilin gene) - Transgenic to-

Introduction Vicilin constitutes a m a j o r fraction of the protein in a pea seed. Native vicilin is a trimeric 7S protein that contains a n u m b e r of polypeptides, some of which are glycosylated (for review, see Casey et al. 1986). Vicilin can be divided into two classes o f polypeptides on the basis of the size of the primary translation product f r o m which the polypeptides are derived. Those of sizes o f relative molecular mass (Mr) 50 000 or less, result f r o m cleavage, after translation, of a family of Mr-50000 vicilins at one, both or neither o f two internal processing sites (Gatehouse et al. 1983; Spencer et al. 1983). Some members of this class are also glycosylated (Badenoch-Jones et al. 1981 ; Chrispeels et al. 1982b). The second class, containing only polypeptides variously estimated at Mr approx. 70000 (Croy et al. 1980) or Mr approx. 75000 (Thomson and Schroeder 1978) and which was shown to be 68 kilodaltons (kDa) (this paper), has been called convicilin (Croy et al. 1980). Convicilin is not glycosylated (Croy et al. 1980) and does not appear to undergo any further processing such as endoproteolytic cleavage (Chrispeels et al. 1982a). The two classes of vicilin are antigenically related (Croy et al. 1980) but encoded by distinct gene families that are not genetically linked (Ellis et al. 1986) and are active at different times of seed development (Chandler et al. 1984). The Mr-50000 vicilins are encoded by approx. 18 genes (Casey et al. 1986) that are transcribed early in the deposition o f storage reserves.

462 Convicilin, o n the o t h e r h a n d , is e n c o d e d b y a single gene ( D o m o n e y a n d C a s e y 1985; Ellis e t a l . 1986) alt h o u g h s u b s e q u e n t d a t a i n d i c a t e the presence o f a t least one o t h e r r e l a t e d sequence ( B o w n e t a l . 1988; this p a p e r ) . B o t h o f these genes a r e active d u r i n g the late stages o f seed d e v e l o p m e n t ( C h a n d l e r et al. 1984; B o w n et al. 1988). W h e n this w o r k was initiated, the precise r e l a t i o n s h i p b e t w e e n convicilin a n d vicilin was n o t k n o w n a n d t h e r e was s o m e u n c e r t a i n t y a b o u t the o l i g o m e r i c n a t u r e o f convicilin ( C r o y et al. 1980; C a s e y a n d S a n g e r 1980). In this p a p e r , we r e p o r t the o l i g o m e r i c s t r u c t u r e o f c o n vicilin, the c o m p l e t e p r i m a r y sequence, the i s o l a t i o n o f t h e convicilin gene, a n d f e a t u r e s o f its e x p r e s s i o n w h e n t r a n s f e r r e d to t o b a c c o .

Methods Isolation of a convicilin gene and sequencing of cDNA and genomic fragments. A genomic library of pea (Pisum sativum L. cv. Greenfeast), constructed in the phage 2 (Higgins et al. 1988), was screened with the copy DNA (eDNA) pPS15-28 that had earlier been characterized as encoding convicilin (Chandler 1982; Chandler et al. 1984). A single positive genomic clone was purified and characterized by restriction mapping. A 7.5-kilobase (kb) EcoR1 fragment that hybridized on Southern blots (Southern 1975) to pPS15-28 was subcloned into pUC8 (p5.1) for further study. An EcoR1 to Pstl fragment (2.1 kb) that hybridized to a part of pPS 15-28 encoding the N-terminal end of the coding region was subsequently cloned into pUC8 (pEN6). The inserts of pEN6 and pPS15-28 were further characterized by sequencing using the dideoxy chain termination method (Bankier and Barrell 1983) and the chemical sequencing method of Maxam and Gilbert (1980), respectively. Purifeation, oligomeric structure and N-terminal sequence of convieilin. Convicilin was extracted from dried, mature seeds of Pisum sativum, cv. Birte, by the method of Casey and Sanger (1980). The protein was purified by a combination of ammonium-sulfate fractionation, zonal isoelectric precipitation and diethylaminoethyl-cellulose chromatography (Casey 1979; Casey and Sanger 1980). Immediately prior to molecular-weight measurements in the ultracentrifuge, the sample was fractionated by chromatography over Sepharose CL-6B (Pharmacia LKB, Uppsala, Sweden). Methods for ultracentrifuge analysis, measurement of circular dichromic spectra and chemical cross-linking with dithiobis (succinimidyl propionate) were as reported in Blagrove et al. (1984). N-terminal amino-acid sequencing was carried out on an Applied Biosystems (Foster City, Cal., USA) 470 A gas-phase sequencer. The isolated phenylthiohydantoin-amino acids were identified by high-performance liquid chromatography on a Zorbax (DuPont, Wihnington, Del., USA) octodecyl silane column. Transfer of convicilin gene into tobacco. The plasmids BDL5 and pAG1 were made by ligating the 7.5-kb EcoRl fragment of p5.1 into the unique EcoR1 sites in shuttle vectors Bin19 (Bevan 1984) and pGA492 (An 1986), respectively. These plasmids were transferred from Escherichia eoli (HB101 or JM83) to Agrobacterium tumefaciens using the tri-parental mating method of Ditta et al. (1980). Transconjugants were selected on Luria broth agar containing rifampicin (50 gg/ml), streptomycin (200 lxg/ml) together with either tetracycline (5 gg/ml) for pAG1 plasmid or kanamycin (50 pg/ml) for the BDL5 plasmid, and the presence of the gene in Agrobacterium was confirmed by Southern blotting. Transformation of tobacco (Nicotiana tabacum L. cv. Wisconsin 38) leaf pieces and regeneration of transgenic plants was done as described in Higgins et al. (1988). Flowers were tagged at anthesis.

E.J. Newbigin et al. : Expression of a pea convicilin gene Protein detection by Western blotting. Total protein was isolated from fresh tissue as described in Higgins et al. (1988) and protein concentration was determined by the method of Bradford (1976). Proteins were fractionated either by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) using 12.5-25% polyacrylamide gradients as in Spencer et al. (1980) or by two-dimensional gels (adapted from the method of Hughes and Galau 1985). The proteins were then electroblotted onto nitrocellulose membrane as described in Spencer et al. (1983) and convicilin was detected using primary antibodies generated in rabbits against total pea vicilins (including convicilin) followed by a second antibody (goat, anti-rabbit immunoglobulin G) linked to alkaline phosphatase (Bio-Rad, Richmond, Cal., USA) for colorimetric detection. Isolation of DNA and Southern blotting of genomic DNA. DNA was isolated from young tobacco leaves or pea shoots by the method of Sutton (1974). After digestion with EcoR1 (Boehringer, Mannheim, FRG), the DNA was fractionated on 1% agarose gels and transferred to nylon membrane (Zeta Probe; Bio-Rad) prior to ultraviolet treatment as described in Khandjian (1987) except that the gel was washed in 0.25 M HC1 for 5 min before denaturation with NaOH. Hybridization was carried out using the 2.1-kb insert of pEN6. This was labeled by nick translation using a kit (BRESA, Adelaide, S.A., Australia). Hybridizations were done as described by Khandjian (1987). The membranes were then given four /5-rain washes at 65~ in 2 x SSC (standard saline citrate: 0.15 M sodium chloride, 0.015 M sodium citrate) containing 0.1% SDS and 0.1% sodium pyrophosphate and then four 15-rain washes at 50~ C in 0.1 • SSC containing 0.1% SDS and 0.1% sodium pyrophosphate. Blots were then fluorographed at - 70~ C using a Hi-plus (Dupont) intensifier for 4 d. Isolation of RNA, S1 nuclease mapping and densitometry. Total RNA was isolated from about 500 mg of frozen tobacco seeds from the transgenic plants as described by Higgins et al. (1988). Total RNA was isolated from pea cotyledons as described in Chandler et al. (1983). $1 nuclease mapping was done as described by Weaver and Weissman (1979) using a single-stranded probe (Hudson et al. 1985). The probe extended at least 800 bases upstream of position + 197 where + 1 is the start of transcription (see Fig. 2). The protected fragments were separated on urea-acrylamide gels using 32P-labeled Mspl fragments of pBR322 as size markers. Densitometry of fluorographs and negatives was done as previously described (Higgins et al. 1988). In quantifying the level of convicilin in the transgenic seeds, the data on all polypeptides between Mr 50 000 and 68 000 (excluding the cross-reacting species), were summed. A standard curve was obtained from Western blots loaded with known amounts of pea convicilin.

Results Isolation and strueture o f a pea convicilin gene. A convicilin c D N A ( p P S 1 5 - 2 8 ) ( C h a n d l e r 1982; C h a n d l e r et al. 1984) was u s e d to screen a p e a g e n o m i c l i b r a r y cons t r u c t e d in 2 1059. A p o s i t i v e c l o n e was m a p p e d w i t h r e s t r i c t i o n e n d o n u c l e a s e s a n d a 7.5-kb E c o R l f r a g m e n t t h a t h y b r i d i z e d to pPS15-28 was s u b c l o n e d into p U C 8 resulting in clone p 5 . / . T h e o r i e n t a t i o n o f the gene in p5.1 was m a p p e d b y S o u t h e r n b l o t t i n g . This i d e n t i f i e d a n E c o R 1 to P s t l f r a g m e n t o f 2.1 k b as e n c o d i n g the N - t e r m i n u s o f the p r o t e i n t o g e t h e r w i t h a b o u t 1 k b o f sequence 5' to the c o d i n g region. A r e s t r i c t i o n m a p o f the g e n o m i c clone p5.1 s h o w i n g the p o s i t i o n o f the gene is given in Fig. 1. T h e 5' E c o R 1 to P s t l f r a g m e n t (2.1 kb) was s u b c l o n e d into p U C 8 a n d n a m e d p E N 6 . T h e se-

E.J. Newbigin et al. : Expression of a pea convicilin gene P

E

X

\

, Ikb~

P

463 E

I

I

\ \ \ \ \

\ \ \ \ \

\ \ \ \ \ \

\

\ \

02kb,

~

9

! INTRON

1

Fig. 1. The r e s t r i c t i o n m a p o f a 7.5-kb E c o R t f r a g m e n t c o n t a i n i n g

the gene for convicilin from pea. The upper line shows the fulllength insert in p5.1. The lower line shows the 2.1-kb EcoRl to Pstl fragment (pEN6), which was sequenced (see Fig. 2) with the coding region shown by the heavy line. The open region shows the position of the first intron. An arrow indicates the direction of transcription. The open arrowhead shows the beginning of the convicilin cDNA clone, pPS15-28. The solid arrowhead marks the position at which convicilin and vicilin begin to show sequence similarity. The letters E, P and X refer to the restriction enzymes, EcoR1, Pstl and Xbal, respectively

quences of pPS15-28 and pEN6 were both determined and are shown in Fig. 2. The sequence of the encoded protein was deduced by the presence in the genomic clone of a long, open reading frame with a methionine at its 5' end. The first 29 amino acids were followed by a sequence which was almost identical to the chemically determined N-terminal sequence N Y D E G SEP(R)VPAQ(R)E of purified convicilin from pea seeds. The first 29 amino acids were characteristic o f eucaryotic signal sequences (von Heijne 1983). The sequence of the c D N A overlapped the genomic sequence starting at a position 313 bp (base pairs) downstream of the initiator methionine. The presence of an intron o f 151 bp found 805 bp from the beginning of the reading frame in the genomic clone was inferred by comparison with the nucleotide sequence of the c D N A clone. There was an additional 84 bp past the end o f the intron which were shared between the genomic and c D N A sequences before the Pstl site which marks the end of the determined genomic sequence. Over the entire 576 bp c o m m o n to the two clones, there were four nucleotide differences which resulted in two aminoacid changes. Both amino-acid changes were from arginine in the genomic sequence to lysine in the cDNA. The genomic sequence was compared to another partial c D N A sequence for convicilin, pCD59 (Casey et al. 1984), and four nucleotide differences were found in the 321 bases compared (data not shown). These changes resulted in two amino-acid differences, i.e. Arg and Asn in the convicilin gene to Lys and Ser in pCD59. One of these amino-acid changes, the Arg to Lys change, was the same in the c D N A clone pPS15-28.

The genomic clone had 1046 bp o f sequence 5' to the coding region. A " T A T A " box (Proudfoot 1979) was found 66 bp upstream of the start of the coding region. The approximate start point of transcription was determined by $1 nuclease protection. Four protected bands of sizes about 197, 202, 206, and 210 bases were observed with the band o f 197 bases being the most intense (see Fig. 8, lane 8). The base at this position was numbered + 1. A sequence was found at - 8 3 which shows similarity to the " C A A T " box (Benoist et al. 1980) found in many animal and plant genes transcribed by R N A polymerase II. Upstream of the start o f transcription, starting at - 1 3 7 with the nucleotides G C C A C C and continuing for about 40 bp 3' of this position, was a sequence named the "vicilin b o x " that has been observed in all the 7S storage-protein genes from legumes so far examined (Gatehouse et al. 1986; Bown et al. 1988; Higgins et al. 1988). The nucleotides AAT A A A (a putative polyadenylation signal) were repeated six times between the translation termination site and the polyadenylation site at the end of the c D N A clone. Convicilin sequence and structural comparisons with vicilin. The amino-acid sequence deduced from the complete

c D N A and genomic nucleotide sequences resulted in a single reading frame of 607 amino acids. Removal of the first 29 amino acids, that presumably act as a signal peptide, would yield a polypeptide of 578 residues with a molecular weight of 68.2 kDa. Thus, the previous assessments of the size of this polypeptide (Mr approx. 70000 in Croy et al. 1980 and Mr approx. 75000 in Thomson and Schroeder 1978) were overestimates. Starting at position 629 in Fig. 2, the C-terminal sequence of convicilin showed similarity to the Mr-50000 vicilins, as has been previously noted (Casey et al. 1984, 1986; Bown etal. 1988). The N-terminal 166amino acids are unique to convicilin and account for the size difference between the Mr-50000 vicilins and convicilin. This region was highly charged and hydrophilic with 60% of the residues being arginine, lysine or glutamate. It contained two direct duplications, one of 32 and the other of 15 amino acids. The larger duplication had three differences and the smaller had one difference at the amino-acid level between the two occurrences. The second occurrence of the first duplication overlapped the first occurrence of the second duplication by five amino acids (see Fig. 2). The high ratio of charged to apolar residues in the N-terminal region of convicilin generally would lead to a predicted rod-shaped domain consistent with the presence o f an extended conformation (possibly e-helical). Structural predictions (Garnier et al. 1978) based on the deduced amino-acid sequence of this region anticipates helix levels of 4 0 % - 6 0 % depending on the decision constants chosen for the analysis. If these predictions were accurate, convicilin should show a measurable increase in the helix content (from 5 % - 1 0 % to 1 5 % - 2 0 % c~helix) compared with the Mr-50000 vicilin molecule (assuming that the conformation of the C-terminal vicilinlike region is not perturbed by the additional sequence).

464

E.J. Newbigin et al. : Expression of a pea convicilin gene

a

GAATTCGGACGGT CAT T TAGGCC TATAT T TGAATATATAAATCTATAAATAAGATTAGAGAAAAAAGAGAAAGAATACCATGATATATGATCGTATAAAAGTT CAATCACACACA T TAAC

-899

a

TTTTAGGAACAT T T TATAGCT TCGTATCT TGCT TAAT T TAAAAAGGAAACTCCCAAATAAACCCTAAACATAAAGGTGAATTGGCATCTAGTGTATATATATACACAATATAGTA

TGTTA

-779

a

T TT GAAAGCAAAACCT T TCCCTGAGACATTATGTAT

TCCGTCCCT CTCCCGCCCCC T TAT

-659

a

ACTGT TACAGATTCGGAAGACGATCATATATCCTAGAAAAATATCAT

TCCCTAT TAT T TCTATAATATATTAAATT

TGAAACGGGGGAT CATGAACACATATCCTCT

TAATAATT T TTTGGTACATTAAACC

a

TAT TCAAGGT T CACATAAAGGAAAT

a

AATATGCATTAAT TTATGCATAAAAGCCAACGAATGGTGAATGATGGTGACAAACAATTTAATTAAGGAAT

a

ATGGT CATGATCACCGCATCCAT

TACAAT TTAGTACAAACTTTT TCATAATAGGGAGAAAGTAC

GGAATGT T T TATT TGTT TCTCCTAGTTT T T TAATCCTT TGATGCTTACGAAT TATATTCATATTCATAATTATAT

GTAAATCTAAAAGAACT

T TCCAGGTTCATCAT

TAAT TAC

CAT AT TCTAGT GCCT TCATT

TCATCAAAT T CAAACAAAAAAGTTGACATGAATC

GGACCCCAAGCTCCATGCATGAAAACAAGTACAATCCCAATGTGTCACACATGCAGCTCAAAATAAACAACAAC

V

-419 -299

TCAACC

-179

C

a

CGCAAGCTCATCCACACCTTTC~-AACAGTTACAACTCAGTTGCCACCTCTATTTTGTTcATTTcAAcACTCGTCATGCTACATGGCACAATGTGGCcAAATGTCTAT~CACGTTATTCAT

a c

C TAGCTT TATGTTCTA TCACT ATAAATATTCT T CAT CT TACTCT TT TAT T TCAT CAT CTA~CT 9

a c

TG T TGT TGCT GGGAAT T ATAT TCCTGGCCTCTGT TGTT TCTGTCACTTAT GCCAATTACGATGAAGGTTCAGAACC TAGGGTACCGGCACAAAGAGAGAGAGGTCGCCAAGAGGGTGAGA L L L L G I I F L A S V V $ V T Y A ~& Y D E G $ E P R V P IA Q R E K G R Q E G E

a c

AAGAGGAAA K E E K

-59

a

GGGGAC CT T CGTATGAAAAGCAAGAAGATGAAGAAGAGAAGCAGAAATATAGATAT•AACGTGAAAAGGAAGATGAAGAAGAGAAGCAGAAATATCAATATCAACGTGAAAAGAAGGAAC

T TTAACTAGTGAAATACAAATCATGGCTACCACTATCAAATCACQATTTCCACT M A T T I K S R F P

! CGCCAT GGAGAA TGGAGACCTTCCTACGAAAAGGAAGAAGATGAAGAAGAGGGACAAAGAGAAAGAGGT R H G E W R P S Y E K E E D E E EIIG Q R E R

b

-539

TT

182

CGCCAAGAGGGTGAAAAAGAGGAAAAACGCCACGGAGAAT R Q E G E K E E K R H G E

G

62

L

302

422

.................................................................................

a

AAAAGGAAGT T cAACCTGGACGTGAGAGATGG(3AAAGAGA(3GAA(3AT(3A(3(3AACAAGTAGACGAA(3A(3TGGA(3AGGAA(3TCAACGTC(3TGAAGATCcCGAAGAAA(3GGCAAG(3C TAAGGC

b c

........................................................................................... Q K E V Q P G R E R W E R E E D E E Q V D E E W R G S Q R

R

(3. . . . . . . . . . . . . . . . . . . . . . . . . . . . E D P E E R A R L R

d

.

.

.

a

A TAGAGAGGAGA(3AACAAAAAGAGACAGAcGCCATCAAC(3TGAAGGAGAGGAGGAAGAAAGATCATCA(3AGTCACAAGAAC(3AAGAAATCCCT

b c

..................... H R E E R T K

A .................................................................................................. R D R R H Q R E G E E E E R S S E S Q E R R N P F L F K S N K

d

.

K

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

CACTC T T T (3AAAACGAAAACGGCCACAT T CGT C TCC T TCAAAGGT TCGACAAACGT TCAGACT TAT TTGAAAATC T CCAAAACTATCGTCT ........................................... T L F E N E N (3 H I R L L Q

d

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

TGTGGAATATAGAGCCAGACC

t ............................................... D L F E N L Q N Y R L V E Y R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

TCTTC•TTCCTCAACACATAGATGCTGACTTAATCCTTGTAGTCCTCAATGGTAATGTTTATCTTTcTTTATAATATTATAATAACAAATcTTTTGTTTTATA(3AAATATGACTATTTAA

...................................................

C

I

F

d

.

.

a

AATAAAAACAT TGTAAAGT TCAAT T TAACAATATT TC TTATAT TTCATAAT TAAATT T TGGT T TGGT TAATATGTAT TACAGGGAAAGCCATA

P .

Q .

H .

.

I

D

.

.

A

D

.

.

L .

.

~

I

L

.

.

V

V

L

CCACACCA

782

K

a

L

L

G ............ A R P H T

b

.

F

662

.

a

.

.

T TT TA TTTAAGT CTAACAAGTT TCTAA

b c

A ............... R F D K R S

.

542

902

INTRON 1

N

b c

~

T T(3ACAGTGT TGAG TCCCAA TGATAGA

1022

...................................... G K A I L T V L S P N D R

d a b c d

AATTCGTATAATCTTGAGCGC(3GTGATACCAT CAAACTTCCTGCAG .............................................. GAACAACCTCT TAT T TAGTTAACCAAGATGATGAAGAAGATCTTAGACTGGTAGATCTTGTAA N .

S .

Y N L E . . . . .

R .

G .

D

K .

L .

P

TACC CG T GAA T

b d

G(3GCCTGGTAAAT T TGAGGCT TT T GACC TAGCTAAAAATAAAAACCAATACTTACGAGGATTCAGCAAGAATAT (3 P G K F E A F D L A K N K N Q Y L R (3 F S K N

I

b d

CTCTTAGAAGAACAAGA GAAAGAACCACAACAAAGGAGA(3CAATAGTCAAAGTGTCAAGGGAACAAATTGAGGAATTGAGAAAGCT L L E E Q E K E P Q Q R R A I V K V S R E Q I E E L R K

b

GAATTTGAACCAATCAACTTGAGAAGCCACAAGCCAGAATATTCTAATAAGTTTGGCAAGTTGTTTGAGATTACTCCGGAAAAAAAATACCCTCAACTTCAAGATTTAGATTTATTTGTT

b d b d

F

E

P

I

N

L

R

S

I{

K

P

E

G

Y

S

N

AGT TGTGTGGAGA T TAACGAGGGAGC TCTAATGTTGCCACACTACAAT S

C

V

E

I

N

E

G

A

L

M

L

CAACAAGAGAGGGAAGATAGAAAAGAAAGAAACAA Q

Q

E

R

E

D

R

K

E

R

N

P

H

Y

N

T

K

T

F

S

(3

Y

K

L

L

V

F

N

E

I

T

D

E

E

D

L

R

R

A

I

V

V

L

L

V

V

D

L

V

TTTA(3AG(3CCTCC T A T A A C A C T A A A T A C G A G A C C A L E A S Y N T K Y E T

P

E

K

L

K

N

E

G

K

I

P

V

N

TAGAGAAG(3TT I E K V

1222

TGCAAAATCAA(3CTCAAAGAAAAGCTTACCC TC T A K S S S K K S L P S

1382

Y

P

TCAAG(3GCTA TAGTT(3TACTAT TGGT TAA TGAAGGAAAAGGAAACC S

L

G

N

Q

L

Q

D

L

D

1502 L

F

V

TTGAAC T TCTGGGT TTAAAAGATGAG L

E

L

L

G

L

K

D

E

V

Q

R

Y

E

A

R

L

$

P

G

D

V

V

I

I

P

A

(3

H

P

V

A

I

T

A

1622

E

T G A A G T G C A G A G ( 3 T A T G A A G C T A G A T T G T C T C C G G G T (3ACGT T G T T A T C A T T C C A G C A G G C C A T C C G G T T G C C A T T A C T G C T T C C N

1142

A D

E

.

I . .

Q

d

.

T

1742 S

b d

C C T A G A T C T G T T C A A G A G A T A A A T A ( 3 A C T A A T A A A G A A C C A A A A A C A A T C T C A C T T T G C A A A T G C T ( 3 A A C C T G A A C A A A A G G A G C A A G ( 3 A A ( 3 C C A A ( 3 G A A A A A G G A G T C C T C T(3TC T T C A P R S V Q E I N R L I K N Q K Q S H F A N A E P E Q K E Q G S Q G K R S P L S S

1982

b

AT T TTGGGCAC T T T TTACTAAGTAA TCAGTA T G A ~ T ~ T A C ' ~ C A C G T A T G T A T G A T G T ~ G T G T

2102

d b

I

L

G

T

F

T G G A ~

TA T G G A a ~ I ~ T A A G G A G G ~ T

G TATGCTGT?~GATATAG

Y

C T A G C T C A T A A T GA(3C T

A

A

~

G

(

3

TCATGTAACTC T TGCTTATTGAGTCCCACCT

TTCTACTAT~AAT6~----A~T

T TTTGTCT T T TT T TTAAAAAAAA

2222

E.J. Newbigin et al. : Expression of a pea convicilin gene Circular-dichroism measurements at p H 7.0 (results not shown) gave near identical spectra for vicilin and convicilin, indicative of proteins with a low helix content (less than 5% e-helix). In case the negative charge on the d o m i n a n t glutamate residues was inhibiting helix formation (poly-L-glutamic acid requires partial protonation at p H 4 to induce helix formation), the circulardichroism measurements were repeated at p H 4.0 with no appreciable change in the spectra, especially in the helix-sensitive region of 2 0 8 ~ 2 2 nm. While the N-terminal extension of convicilin m a y well f o r m an extended surface domain, this region was not e-helical in the manner predicted from amino-acid sequence. There is uncertainty in the literature as to the oligomeric form of native convicilin. Reports f r o m different groups have shown that convicilin can be arranged either as a trimer (Casey and Sanger 1980) or as a tetramer (Croy et al. 1980). Typically, 7S proteins are trimers (Casey et al. 1986). To clarify the quarternary structure of convicilin as extracted f r o m mature seeds with aqueous salt solutions at neutral pH, the Mr of the native protein was determined by a meniscus-depletion sedimentation equilibrium experiment. The m a j o r c o m p o n e n t o f convicilin in 0.5 M NaC1 at p H 6.8 proved to be a homogeneous protein of Mr 190000+_ 5000, a value close to that determined by Casey and Sanger (1980) and indicating a possible trimeric structure. In order to confirm the trimeric nature o f convicilin, chemical cross-linking experiments with dithiobis (succinimidyl propionate) were performed and the results analysed by S D S - P A G E (Fig. 3). Addition of the cross-linker led to the appearance of two further b r o a d bands of higher apparent m o lecular weight than convicilin itself. With increasing amounts of cross-linker, there was complete conversion to the largest species with little evidence of higher aggregates. The fully polymerized f o r m could be reduced with 2-mercaptoethanol to regain subunits with modified lysines. The results in Fig. 3 are similar to those reported (Blagrove et al. 1984) for phaseolin (the vicilin-like protein f r o m the seeds of Phaseolus vulgaris L.) although the n u m b e r of lysines available for potential cross-linking is substantially greater in convicilin. Therefore, these results indicate that convicilin is trimeric in contrast to

Fig. 2. The nucleotide sequences of parts of the convicilin genomic and cDNA clones and the deduced amino-acid sequence of convicilin. The genomic DNA sequence was determined by the method of Bankier and Barrell (1983). The cDNA sequence was determined by the method of Maxam and Gilbert (1980). Line a=genomic sequence of pEN6; b = convicilin cDNA sequence (pPS 15-28); e = protein sequence deduced from pEN6; d=protein sequence deduced from pPS15-28..=identity of bases or amino acids. The "vicilin" (V), "CAAT" (C), and "TATA" (T) boxes ( -- 137, -- 83 and -38, respectively, numbering from the 5' end) are overlined. An arrowhead indicates the base designated + 1. The mature Nterminus of the protein is indicated with an arrow. The two aminoacid repeats are underlined with vertical bars indicating the ends. The reiterated sequence, AATAAA, found in the 3' untranslated region of the cDNA, is boxed

465

9467--

P

~

O

43--

1

2

3

4

5

Fig. 3. Electrophoresis on SDS polyacrylamide gels of convicilin (0.2 mg of protein dissolved in 0.1 ml of 0.05 M Na-phosphate buffer pH 6.8) cross-linked with dithiobis (succinimidyl propionate). Lane 1 untreated control; 2 0.01 mg; 3 0.03 mg; 4 0.1 mg cross-linker; 5 as in 4 but the sample was treated with 2% 2mercaptoethanol before electrophoresis. Relative molecular masses (Mr x 10 -3) are marked on the vertical axis

a previous report (Croy et al. 1980) indicating a tetrameric structure based on empirical molecular weight measured by column c h r o m a t o g r a p h y . Presumably, the N-terminal extension of convicilin has not interfered with the strong trimeric associations of the vicilin-like C-terminal region. Expression o f p e a convicitin in tobacco. The complete 7.5-kb EcoR1 fragment containing the convicilin gene was ligated into the binary shuttle vectors, BIN19 (Bevan 1984) and pGA492 (An 1986), to m a k e BDL5 or p A G 1 , respectively. These constructions were used to transform tobacco leaf pieces using A g r o b a c t e r i u m - m e diated infection. Plants labeled " G " were transformed with BDL5 and those labeled " H " were transformed with p A G 1 . The numbers after these letters refer to individual plants that arose f r o m separate transformation events. Convicilin protein was assayed in seed extracts by the Western-blot procedure using antibodies prepared in rabbits against pea vicilins. O f e i g h t " H "-series plants examined, four (H1, H4, H6, H7) contained both the intact gene and had detectable convicilin in their seeds (data not shown). The remaining plants contained neither a copy of the intact gene nor detectable amounts of convicilin. A total of six " G " - s e r i e s plants were assayed and all had convicilin in their seeds. Two of the " H " - s e r i e s plants (H6 and H7) and four " G " - s e r i e s plants (G1 to G4) were selected for further work. G e n o m i c D N A isolated f r o m pea seedlings and the leaves of these six putative tobacco transformants was analyzed by Southern blotting followed by hybridization with nick-translated p E N 6 insert (Fig. 4). Pea D N A digested with EcoR1 (Fig. 4, lane 9) showed hybridization to a n u m b e r o f bands in agreement with those obtained

466

E.J. Newbigin et al. : Expression of a pea convicilin gene

--

23

--9.4

--6.7

[*

m4.3

68-~

~

w=.=~

~

J

'=~''

'=~=~

50-i 34~_ 3 0 - z~. 25/---,18-

--2.3 --2.0

14-

12-

1

1

2

3

4

5

6

7

8

9

Fig. 4. Copy-number analysis of the convicilin gene in pea and transgenic tobacco plants. DNA was extracted from leaves of tobacco or young pea shoots and purified. Tobacco plants with a " G " prefix were transformed with the plasmid BDL5 and those with an " H " prefix were transformed with pAG1. DNA (10 Ixg) was digested with EcoR1, fractionated by electrophoresis on agarose, and blotted onto nylon membrane. The membrane was then probed with 32P-labeled insert from pEN6. Lane i contains an amount of p5.1 insert equivalent to one gene copy per haploid genome. DNA in lanes 2-7 was, in the order given, from G1, G2, G3, G4, H6, and H7. DNA in lane 8 was from a plant transformed with the binary vector, pGA492 without an insert. Lane 9 contains pea shoot DNA. Numbers on the vertical axis show the size (in kb) of DNA markers (2 DNA digested with HindIII)

2

3 4 5 6

8

Fig. 5. Convicilin accumulation in mature seeds of pea and transgenic tobacco plants. Pea seed protein (5 gg) and tobacco seed protein (150 gg) was fractionated by SDS-PAGE and then electroblotted onto nitrocellulose membrane. Convicilin was detected by reacting the nitrocellulose with antiserum against total pea vicilins. Tobacco plants with the prefix " G " were transformed with the plasmid BDL5 and those labeled " H " were transformed with pAG1. Lane 1 is a pea seed extract. Lanes 2-7 are H6, H7, GI, G2, G3 and G4. Lane 8 contains protein from an untransformed tobacco plant (Wisconsin 38). Numbers on the vertical axis indicate Mr x t0- 3 of the polypeptides

"" by other workers using a convicilin c D N A clone (Dom o n e y a n d Casey 1985; Ellis et al. 1986). A m a j o r b a n d o f 7.5 kb was identical in size to the insert o f p5.1 (compare Fig. 4, lanes 1 and 9). Two weaker hybridizations o f approx. 14 kb and approx. 5.5 kb were also observed. T h e b a n d o f a p p r o x . 14 kb has been identified as containing a gene c o d i n g for a convicilin-related protein (Ellis et al. 1986). The b a n d at approx. 5.5 kb has n o t been characterized. D N A isolated f r o m the six putative t r a n s f o r m a n t s s h o w e d hybridization to an E c o R l b a n d identical in size to the m a j o r b a n d f o u n d in pea (Fig. 4, lanes 2 to 7), confirming t h a t the complete gene h a d been correctly transferred. A second, weaker h y b r i d i z a t i o n o f approx. 4.6 kb was also observed in all plants and was also f o u n d in a plant that h a d been t r a n s f o r m e d with the binary vector p G A 4 9 2 alone (fig. 4, lane 8). This b a n d was p r o b a b l y caused by hybridization between the pea convicilin p r o b e and an unidentified t o b a c c o sequence, as it was c o m m o n to plants t r a n s f o r m e d with the two different vectors. W h e n extracts o f seeds f r o m u n t r a n s f o r m e d t o b a c c o plants were subjected to Western blotting, a single polypeptide o f Mr approx. 47000 was detected (Fig. 5, lane 8). This p r e s u m a b l y was caused by the presence o f a cross-reacting 7S protein in t o b a c c o seeds (see Higgins et al. 1988). W h e n seed extracts f r o m t r a n s f o r m e d plants were similarly treated, a m a j o r protein identical in size to pea convicilin (68 k D a ) was observed ( c o m p a r e

7

. -68 .,.-

[ ,,,~.,

...................{ -50_47

Fig. 6. Two-dimensional analysis of convicilin in the mature seed of transgenic tobacco plant H7. Protein (150 9g) extracted from mature seeds was separated by isoelectric focusing (IEF) followed by SDS-PAGE. The proteins were electroblotted onto nitrocellulose and probed with anti-vicilin antiserum. The boxed polypeptides are those which were found in non-transformed tobacco seeds. Numbers on the vertiealaxis indicate Mr • 10 -3 of the polypeptides

Fig. 5, lane 1 with lanes 2-7). A n u m b e r o f other polypeptides, principally o f Mr approx. 50000, were also seen. Convicilin in pea undergoes little if a n y cleavage (Chrispeels et al. 1982a), thus this processing is unique to tobacco. To investigate these polypeptides further, seed proteins f r o m t r a n s f o r m a n t H 7 were analysed by two-dimensional electrophoresis before Western blotting (Fig. 6). The convicilin polypeptides, with an isoelectric point o f 5.6-5.8 ( C r o y et al. 1980), were well separated f r o m the cross-reacting t o b a c c o proteins (Fig. 5, boxed) which were m o r e basic. A b o u t 20 separate convicilin p o -

E.J. Newbigin et al. : Expression of a pea convicilin gene lypeptides were observed. Many of these occurred as different isoelectric forms of eight discrete size classes between Mr 68 000 and Mr 50 000. When the level o f convicilin accumulated in the seeds of the six transformed plants was compared by densitometry, a twofold range was found (Fig. 5, lanes 2 -7). Plants H6, H7, G1 and G3 all had one copy of the convicilin gene (Fig. 4) by comparison to pea and a reconstruction using purified p 5.1 insert loaded with an amount equivalent to one copy (Fig. 4, lane 1). G2 and G4 had two or more copies of the gene. Western blots of extracts o f mature seeds (Fig. 5) showed G2 and G4 had approximately double the a m o u n t of convicilin that was found in the other plants. Thus, there appeared to be a correlation between gene copy number and level of expression. An estimate of the a m o u n t of convicilin accumulated by plant H7 (a plant with a single copy o f the gene) was obtained by comparing densitometer tracings o f a concentration series of convicilin with a known a m o u n t of H7 total seed protein. These indicated that convicilin comprised 1-2% of the total seed protein in this plant. D e v e l o p m e n t a l regulation a n d organ specificity. In peas, convicilin is expressed in cotyledons during the late stages o f seed development (Chandler et al. 1984) and has not been found in any other organ studied (Higgins etal. 1988). We tested whether the same regulatory mechanisms were functional when the gene was transferred to a new host by examining its temporal expression and organ specificity in transgenic tobacco. Seeds were harvested at specified days after flowering (DAF) and the developmental profile o f convicilin accumulation was determined by Western-blot analysis (Fig. 7). The vicilin-related polypeptide found in untransformed tobacco seeds was observed at the earliest harvest (Fig. 7, lane 2). This polypeptide, as has been proposed previously (Higgins et al. 1988), appeared to be synthesized as a precursor of M, approx. 55 000 that was slowly processed to the mature form o f Mr approx. 47000. The 68-kDa band, characteristic of convicilin, first appeared at 15 D A F . Between 15 and 19 D A F there was little further accumulation although some processing to Mr approx. 50000 did occur. Most of the convicilin was accumulated between 19 and 24 D A F , by which time the level and pattern characteristic of the mature, fully dry seed was attained. The level o f convicilin m R N A throughout seed development was determined by the $1 protection assay (Fig. 8). Messenger R N A was first observed at 15 D A F and reached a maximum at 17 D A F . This level was held approximately constant to at least 21 D A F , a pattern which is also true o f convicilin m R N A accumulation in pea (Chandler et al. 1984). Convicilin m R N A was not detected in seeds of nontransformed plants nor in any organ, other than the seed, of plants transformed with the convicilin gene (data not shown). The size of the D N A fragments protected from Sl nuclease was similar in both pea and tobacco, indicating that the start sites of transcription used in these two species was near-

467

-~ .

.

.

.

.

~

--68

-

.

-50

X47

1

2 3 4 5 6

7

8

9

Fig. 7. The pattern of accumulation of convicilin polypeptides during seed development in transgenic tobacco. Seeds were harvested on the specified days after flowering, protein extracts were made and I50 gg of protein was fractionated by SDS-PAGE and then immunoblotted as described in the legend to Fig. 5. Lane 1 contains protein from mature, untransformed tobacco seeds. Lanes 2-8 contain protein from the seeds of plant H7 at 11, 13, 15, 17, 19, 21 and 24 DAF. Lane 9 contains protein from mature H7 seed. The numbers on the vertical axis indicate the Mr x 10 .3 of the polypeptides

210 206 202 197

w

1

2

3

4

5

6

7

8

Fig. 8. The level of convicilin mRNA in seeds of a transformed tobacco plant (H7) during seed development as determined by $1 nuclease protection assay. Lanes 2-7 contain 50 gg of total RNA isolated from developing seeds at 11, 13, 15, 17, 19 and 21 DAF. The RNA wa first hybridized to a single-stranded DNA probe, uniformly labeled with 32p. The probe contained 169 bases of the coding region and at least 800 bases of the 5' flanking region. The hybrids were then digested with $1 nuclease and the products fractionated on a 5% denaturing acrylamide gel before autoradiography to detect the protected fragments. Lane 1 contains a2P-labeled probe and 50 gg of transfer RNA digested with S1 nuclease. Lane 8 contains I gg of pea RNA isolated from seeds at 17 DAF. The numbers on the vertical axis refer to the sizes of the protected fragments in bases identical. However, the precise start site of transcription was not determined. The level of m R N A detected in tobacco seeds at 17 D A F was approx. 0.5% of the level detected in pea seeds o f the same age. The organ specificity of convicilin gene expression was determined by mixing approx. I mg of protein ex-

468 tracted from the leaves, roots or stems of pea or of tobacco transformant H7, and a lesser amount of seed extract (150 mg and 8 mg o f tobacco and pea, respectively) with anti-vicilin immunoaffinity beads (Chrispeels et al. 1982b). Protein (1 mg) extracted from H7 callus was treated similarly. After washing to remove unbound protein, the beads were loaded onto an SDS polyacrylamide gel and fractionated prior to Western blotting. Convicilin bands were only detected in the two seed extracts and not in any other organ of either pea or transgenic tobacco (data not shown).

Discussion

In this paper, we report the structure and deduced amino-acid sequence of pea convicilin as well as partial sequences of both genomic and c D N A clones encoding convicilin, and the expression o f the convicilin gene in transformed tobacco plants. Convicilin was shown to be a trimeric protein with a low e-helical content. Both of these physical characters are typical of 7S globulins (Blagrove et al. 1984). An earlier report indicating that convicilin was a tetramer (Croy et al. 1980) was based on the relatively low resolving power of gel-filtration chromatography. In the present study, both sedimentation equilibrium centrifugation and chemical crosslinking indicate that convicilin has a trimeric structure. The close serological relationship between vicilin and convicilin (Croy et al. 1980; Spencer et al. 1983) is explained by the fact that convicilin represents vicilin with an Nterminal extension (Casey et al. 1984; Bown et al. 1988). Such terminally extended proteins are also found in other seeds, such as the e' subunit of fl-conglycinin o f soya bean (Doyle et al. 1986) and the e-globulins of cotton (Chlan et al. 1987). Although the genomic and c D N A sequences reported here are not identical (four differences in 576 c o m m o n nucleotides), we suggest, for reasons given below, that this variation arises from the presence in our line of peas of at least two alleles for this gene rather than a family of closely related genes. The flanking regions of the convicilin gene show the features c o m m o n to R N A polymerase II transcription units from plants and animals such as the conserved " C A A T " and " T A T A " boxes upstream of the coding region and repeated polyadenylation signals 3' to the end of translation. A single intron of 151 bp was found in the partiallysequenced genomic clone at the corresponding position, relative to the deduced amino-acid sequence, as intron one of an Mr-50000 vicilin gene (Higgins et al. 1988). Typically, 7S-storage-protein genes are interrupted by five introns located in similar places in the sequence (Doyle et al. 1986). Another feature found in the 5' flanking regions that is c o m m o n to this class of genes is the "vicilin b o x " (Gatehouse et al. 1986; Bown et al. 1988; Higgins et al. 1988). It has been proposed that this sequence of about 40 bp, starting at position - 1 3 7 in the convicilin gene with the hexanucleotide GCCACC, determines tissue specificity of expression in

E.J. Newbigin et al. : Expression of a pea convicilin gene this gene family (Gatehouse et al. 1986). Although this sequence element may have a role in regulating expression of this gene family, it failed to bring about the expression in transgenic plants o f a reporter gene (chloramphenicol acetyl transferase) when used in a construction to test its ability to act as a seed promoter. Additional upstream sequences are required for activity (Newbigin and Higgins 1989). Two genes have now been identified in pea that hybridize to convicilin cDNAs (Bown etal. 1988; this paper). On Southern blots o f genomic D N A digested with EcoR1, these genes occur as bands of approx. 14 kb and 7.5-9 kb (depending on cultivar) (Domoney and Casey 1985; Ellis et al. 1986), with the lower band hybridizing more strongly than the upper band. These two bands segregate together in crosses and are therefore linked (Ellis et al. 1986; Casey et al. 1986). A comparison between the sequence presented here and that given by Bown et al. (1988) for the gene on the approx. 14-kb EcoR1 fragment shows that the two sequences are 95% identical after gaps are introduced to maximize similarity (data not shown). The largest difference between these two genes results from the fact that the sequences duplicated in the first exon of the gene on p5.1 are found only once in the sequence of Bown et al. (1988). This difference in the length of exon 1 accounts for the discrepancy in molecular weight predicted for the mature polypeptides (68.2 kDa here and 63.9 k D a in Bown et al. 1988). Bown et al. (1988) tentatively ascribe the gene contained on the approx. 14-kb fragment to the Cvc locus. The Cvc locus (Matta and Gatehouse 1982; presumably identical to the Vca locus defined by Thomson and Schroeder 1978) is the only locus so far defined for convicilin and was identified on the basis of variation in the mobility of the Mr-70000 polypeptides (Mr 75 000 in Thomson and Schroeder 1978) on SDS-polyacrylamide gels. Several lines of evidence indicate that the gene described by Bown et al. (1988) is unlikely to be the gene at the Cvc locus. The gene on p5.1 is at least 98% identical with the two cDNAs so far published that encode convicilin (pCD59 in Casey etal. 1984; pPS15-28 in this paper). Overall, the gene reported by Bown et al. (1988) and pCD59 are 94% identical. When p5.1 was integrated into the tobacco genome it gave rise to a protein o f identical size to convicilin in pea (Fig. 5). The gene described by Bown et al. (1988) encodes a protein that is smaller (63.9 kDa) than either the estimated size of convicilin (Mr approx. 70000; Casey and Sanger 1980; Croy et al. 1980) or the protein encoded on p5.1 (68.2 kDa). On these bases, we propose that the gene found on p5.1 is identical to the Cvc locus and that the gene described by Bown et al. (1988) is a linked gene coding for an as yet undescribed, convicilin-related protein. Similarly, since pPS15-28 and p5.1 resemble each other closely ( > 99% identical) it is probable that pPSI 5-28 represents the product of another allele of the Cvc locus rather than that of a distinct gene. When integrated into the tobacco genome, the gene on p5.1 showed all the characteristics that are typical

E.J. Newbigin et al. : Expression of a pea convicilin gene of its expression in pea. Its expression was confined to the seed and it was only detected during the later stages of cotyledon development (Fig. 7). Although the m R N A was present at an approximately constant level between 15 and 21 D A F as in pea (Chandler et al. 1984), convicilin protein was preferentially accumulated somewhat later in development. Similar transcription start points were used in both pea and tobacco (Fig. 8). A corresponding level of fidelity of expression has been shown for a number of other 7S-storage-protein genes when transferred to new host species (Sengupta-Gopalan et al. 1985; Beachy et al. 1985; Bray et al. 1987; Higgins et al. 1988). However, processing o f the protein once expressed is, in m a n y cases, different from that in the original host. For example, when the/%subunit of phaseolin (Sengupta-Gopalan e t al. 1985) or the c( subunit of flconglycinin (Beachy et al. 1985) were expressed in transgenic hosts (either tobacco or petunia), as much as 50% of the protein found in the mature seed that was derived from the introduced gene was smaller than the expected size. This was also true of convicilin which was partially processed in tobacco seeds from a protein of identical size to that found in pea seeds to polypeptides of lower molecular weight (principally Mr approx. 50000). Based on two-dimensional Western blots, as many as 20 separate isoelectric variants of eight different Mrs were detected. These differences in isoelectric point may reflect separate cleavage and modification events or may arise as an artefact of sample preparation and-or analysis. However, it is clear that they must all arise from a single primary translation product. Convicilin comprised up to 2% o f the total protein present in mature seeds of transgenic plants that contained a single copy of the convicilin gene, such as H7. This level is similar to that found by Beachy et al. (1985) using transgenic petunia containing an e',/~-conglycinin gene from soybean. When a gene encoding a Mr-50000 vicilin (pEN2) was transferred to tobacco (Higgins et al. 1988), the best-expressing plant (B3), which contained about 30 copies of the gene, had accumulated vicilin to about 0.5% o f the total seed protein by maturity. The difference in the level o f accumulation of the two types of vicilin in transgenic plants may reflect dissimilarities in the transcriptional strength of their promoters. The 68-kDa protein, convicilin, is encoded by a single gene and accounts for about 10% of the total vicilin in a mature pea seed. The remainder, made up o f the Mr50000 vicilins, is encoded by a multigene family of approx. 18 members. It is possible that the convicilin genomic clone, p5.1, has a sequence or sequences that the vicilin gene, used in the earlier study (Higgins et al. 1988), lacks which results in a higher level of expression of the convicilin gene compared to the vicilin gene. Further work will be required to identify the D N A sequences responsible for regulating the level of expression of these genes. The authors wish to thank Drs. Don Spencer and Christine Wandelt of the Division of Plant Industry, Canberra, for constructive reading of the manuscript and Dr. Rod Casey of the John Innes

469 Institute, Norwich, UK, for helping initiate this project. E.J.N. was supported by a Commonwealth Post-Graduate scholarship.

References An, G. (1986) Development of plant promoter expression vectors and their use for analysis of differential activity of nopaline synthase promoter in transformed tobacco cells. Plant Physiol. 81, 86-91 Badenoch-Jones, J., Spencer, D., Higgins, T.J.V., Millerd, A. (1981) The role of glycosylation in storage-protein synthesis in developing pea seeds. Planta 153, 201-209 Bankier, A.T., Barrell, B.G. (1983) Shotgun DNA sequencing. In: Techniques in life sciences, B5: Nucleic acid biochemistry, pp. 1-34, Flavell, R.A., ed. Elsevier/North Holland, Limerick, Ireland Beachy, R.N., Chen, Z.-L., Horsch, R.B., Rogers, S.G., Hoffman, N.J., Fraley, R.T. (1985) Accumulation and assembly of soybean /~-conglycinin in seeds of transformed petunia plants. EMBO J. 4, 3047-3053 Benoist, C., O'Hare, K., Breathnach, R., Chambon, P. (1980) The ovalbumin gene-sequence of putative control regions. Nucleic Acids Res. 8, 127-142 Bevan, M. (1984) Binary Agrobacterium vectors for plant transformation. Nucleic Acids Res. 12, 8711-8721 Blagrove, R.J., Lilley, G.G., Van Donkelaar, A., Sun, S.M., Hall, T.C. (1984) Structural studies of a french bean storage protein: phaseolin. Int. J. Biol. Macromol. 6, 137-141 Bown, D., Ellis, T.H.N., Gatehouse, J.A. (1988) The sequence of a gene encoding convicilin from pea (Pisum sativum L.) shows that convicilin differs from vicilin by an insertion near the Nterminus. Biochem. J. 251,717 726 Bradford, M. (I 976) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248-254 Bray, E.A., Nait0, S., Pan, N.-S., Anderson, E., Dube, P., Beachy, R.N. (1987) Expression of the /~-subunit of/~-conglycinin in seeds of transgenic plants. Planta 172, 364-370 Casey, R. (1979) Immunoaffinity chromatography as a means of purifying legumin from Pisum (pea) seeds. Biochem. J. 177, 509-520 Casey, R., Sanger, E. (1980) Purification and some properties of a 7S seed storage protein from Pisum (pea). Biochem. Soc. Trans. 8, 658 Casey, R., Domoney, C., Stanley, J. (1984) Convicilin mRNA from pea (Pisum sativum L.) has homology with other legume 7S storage protein mRNA species. Biochem. J. 224, 661 666 Casey, R., Domoney, C., Ellis, T.H.N. (1986) Legume storage proteins and their genes. Oxford Surv. Plant Molec. Cell Biol. 3, 1-95 Chandler, P.M. (1982) The use of single-stranded phage DNAs in hybrid arrest and release translation. Anal. Biochem. 127, 9-16 Chandler, P.M., Higgins, T.J.V., Randall, P.J., Spencer, D. (1983) Regulation of legumin levels in developing pea seeds under conditions of sulfur deficiency. Plant Physiol. 71, 47-54 Chandler, P.M., Spencer, D., Randall, P.J., Higgins, T.J.V. (1984) Influence of sulfur nutrition on developmental patterns of some major pea seed proteins and their mRNAs. Plant Physiol. 75, 651-657 Chlan, C.A., Borroto, K., Kamalay, J.A., Dure, L. III (1987) Developmental biochemistry of cotton seed embryogenesis and germination. XIX. Sequences and genomic organization of the globulin (vicilin) genes of cotton seed. Plant Mol. Biol. 9, 533 546 Chrispeels, M.J., Higgins, T.J.V., Spencer, D. (1982a) Assembly of storage protein oligomers in the endoplasmic reticulum and

470 processing of the polypeptides in the protein bodies of developing cotyledons. J. Cell Biol. 93, 306-313 Chrispeels, M.J., Higgins, T.J.V., Craig, S., Spencer, D., (1982b) Role of the endoplasmic reticulum in the synthesis of reserve proteins and the kinetics of their transport to protein bodies in developing pea cotyledons. J. Cell Biol. 93, 5-14 Croy, R.R.D., Gatehouse, J.A., Tyler, M., Boulter, D. (1980) The purification and characterization of a third storage protein (convicilin) from the seeds of pea (Pisum sativum L.). Biochem. J. 191, 509-516 Ditta, G., Stanfield, S., Corbin, D., Helinski, D.R. (1980) Broad host range DNA cloning system for gram-negative bacteria: construction of a genebank of Rhizobium meliloti. Proc. Natl. Acad. Sci. USA 77, 7347-7351 Domoney, C., Casey, R. (1985) Measurement of gene number for storage proteins in Pisum. Nucleic Acids Res. 13, 687-699 Doyle, J., Schuler, M., Godette, W.D., Zenger, V., Beachy, R., Slightom, J. (1986) The glycosylated seed storage proteins of Glyeine max and Phaseolus vulgaris : structural homologies of genes and proteins. J. Biol. Chem. 261, 92284238 Ellis, T.H.N., Domoney, C., Castelton, W.C., Davies, D.R. (1986) Vicilin genes of Pisum. Mol. Gen. Genet. 205, 164-169 Gamier, J., Osguthorpe, D.J., Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting secondary structure of globular proteins. J. Mol. Biol. 120, 97-120 Gatehouse, J.A., Lycett, G.W., Delauney, A.J., Croy, R.R.D., Boulter, D. (1983) Sequence specificity of the post-translational proteolytic cleavage of vicilin, a seed storage protein of pea (Pisum sativum L.). Biochem. J. 212, 427432 Gatehouse, J.A., Evans, I.M., Croy, R.R.D., Boulter, D. (1986) Differential expression of genes during legume seed development. Philos. Trans. R. Soc. Lond [Biol] 314, 367-384 Higgins, T.J.V., Newbigin, E.J., Spencer, D., Llewellyn, D.J., Craig, S. (1988) The sequence of a pea vicilin gene and its expression in transgenic tobacco plants. Plant Mol. Biol. 11,683-695 Hudson, G.S., Farrell, P.J., Barrell, B.G. (1985) Two related but differentially expressed potential membrane proteins encoded by the EcoR1 Dhet region of Epstein-Barr virus B95-8. J. Virol. 53, 528-535 Hughes, D.W., Galau, G.A. (1985) Addition of proteins to the cylindrical gel embedding medium for transverse molecularweight markers in two-dimensional electrophoresis. Anal. Biochem. 140, 320-325

E.J. Newbigin et al. : Expression of a pea convicilin gene Khandjian, E.W. (1987) Optimized hybridization of DNA blotted and fixed to nitrocellulose and nylon membranes. BioTechnology 5, 165-167 Matta, N.K., Gatehouse, J.A. (1982) Inheritance and mapping of storage protein genes in Pisum sativum L. Heredity 48, 383 392 Maxam, A.M., Gilbert, W. (1980) Sequencing end-labeled DNA with base-specific chemical cleavages. Methods Enzymol. 65, 499-559 Newbigin, E.J., Higgins, T.J.V. (1989) Regulation of expression of two 7S storage protein genes of pea (Pisum sativum L.) in a transgenic host. Proc. 8th Aust. Biotech. Conf., pp 104-109 Proudfoot, N.J. (1979) Eukaryotic promoters? Nature 279, 376 Sengupta-Gopalan, C., Reichert, N.A., Barker, R.F., Hall, T.C., Kemp, J.D. (1985) Developmentally regulated expression of the bean/~-phaseolin gene in tobacco seeds. Proc. Natl. Acad. Sci. USA 82, 3320 3324 Southern, E. (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98, 503-517 Spencer, D., Higgins, T.J.V., Button, S.C., Davey, R.A. (1980) Pulse labeling studies on protein synthesis in developing pea seeds and evidence of a precursor form of legumin small subunit. Plant Physiol. 66, 510-515 Spencer, D., Chandler, P.M., Higgins, T.J.V., Inglis, A.S., Rubira, M. (1983) Sequence interrelationships of the subunits of vicilin from pea seeds. Plant Mol. Biol. 2, 259-267 Sutton, W.D. (1974) Some features of the DNA of Rhizobium bacteroids and bacteria. Biochem. Biophys. Acta 336, 1-10 Thomson, J.A., Schroeder, H.E. (1978) Cotyledonary storage proteins in Pisum sativum. II. Hereditary variation in components of the legumin and vicilin fractions. Anst. J. Plant Physiol. 5, 281-294 Von Heijne, G. (1983) Pattern of amino acids near signal-sequence cleavage sites. Eur. J. Biochem. 133, 17-21 Weaver, R.F., Weissman, C. (1979) Mapping of RNA by a modification of the Berk-Sharp procedure: the 5' terminus of 15S fl-globin mRNA precursor and mature 10S fl-globin mRNA have identical map coordinates. Nucleic Acids Res. 7, 11751193

Received 20 February; accepted 12 October 1989

Pea convicilin: structure and primary sequence of the protein and expression of a gene in the seeds of transgenic tobacco.

Convicilin, a trimeric globulin of pea (Pisum sativum L.) seeds, is closely related to vicilin and composed of polypeptides of 68.2 kilodaltons. A par...
2MB Sizes 0 Downloads 0 Views