[9]

MAXIMIZING

GENE

EXPRESSION

IN

E. coli

93

[9] Overproduction of Proteins in Escherichia coli: Vectors, Hosts, and Strategies By ASlS DAS

The purification of a protein is greatly accelerated if it can be isolated from cells that overproduce it. Our ability to overexpress genes in bacteria has improved enormously during the past few years. The rigor of our understanding of the various aspects of gene expression in Escherichia coli has allowed geneticists not only to engineer increasingly sophisticated expression vectors for maximizing gene expression, but also to create a growing number of host strains for minimizing turnover of gene products. As a consequence, today's molecular biologist is able to achieve a very high-level synthesis of almost any gene product by utilizing and, when necessary, redesigning a minimum number of vectors and host bacteria. This chapter, aimed at beginners of gene expression technology, outlines the general approaches involved and describes selected vectors and strains currently available for high-level expression of cloned genes. It is written with the assumption that the gene of interest has already been cloned and that the investigator is familiar with the standard techniques of gene cloning. ~ A more detailed account of maximizing gene expression and its practical aspects has been compiled in a separate volume of this series, la To maximize expression, the cloned gene must be transcribed and translated most efficiently. Moreover, both the mRNA and the protein product must be stabilized as much as possible. Since there is no universal solution applicable to all systems and needs, an investigator will have to deal with each of these steps separately, depending on each particular situation. A comprehensive review of the literature on major aspects of gene expression in E. coli has been compiled by Reznikoff and Gold. 2 Optimizing Transcription Genetic Fusion to Strong Promoters

The level of transcription of a gene depends mostly on the frequency with which the single RNA polymerase initiates transcription. 3 This in S. L. Shelbyand A. R. Kimmel,eds., this series, Vol. 152. ~a D. Goeddel, ed., this series, in preparation. 2W. Reznikoffand L. Gold eds., "MaximizingGene Expression." Butterworth, Boston, Massachusetts, 1986. 3W. R. McClure,Annu. Rev. Biochem. 54, 171 (1985). METHODS IN ENZYMOLOGY, VOL. 182

Copyright © 1990 by Academic Press, Inc. All fights of reproduction in any form reserved.

94

OPTIMIZATION OF STARTING MATERIALS

[9]

turn is a function of the promoter sequence 4-6 that is necessary and sufficient to allow transcription by RNA polymerase. A standard E. coli promoter is an approximately 40 base pair (bp) region having two consensus sequence elements centered around positions - 10 and - 3 5 from the transcription start site) The most convenient way to maximize transcription of a cloned gene is to place it downstream from a strong promoter. Numerous plasmid vectors that provide a promoter for transcription of cloned, promoter-less genes have been described; some of these are listed in Table 1. 7-23 With one exception (pMOB45), each of these vectors listed contains one of several strong, controllable promoters, namely pL, pT7, ptac, and plac. Regulatable promoters are particularly useful if the overexpressed protein is deleterious to the host cell. pL Vectors. These plasmids contain the leftward promoter, pL, of phage lambda (h) which is one of the most active promoters in r i o 0 . 24 An adjacent operator sequence, oL1, present in these plasmids allows repression of pL with h repressor, the product of the h cI gene. 25 The repressor protein can be provided by a single-copy, autogenously regulated cI gene present in appropriate bacterial hosts. 9,1° Some of the widely used strains (for example, N4830) contain a temperature-sensitive allele of cI, namely, 4 j. Scaife and J. Beckwith, Cold Spring Harbor Symp. Quant. Biol. 31, 403 (1967). 5 p. Youderian, S. Bouvier, and M. M. Susskind, Cell 30, 843 (1982). 6 D. K. Hawley and W. R. McClure, Nucleic Acids Res. 11, 2237 (1983). 7 H. Shimatake and M. Rosenberg, Nature (London) 292, 128 (1981). 8 N. Rao and S. Rogers, Gene 3, 247 (1978). 9 A. Levine, P. L. Moreu, S. G. Sedgwick, R. Devoret, M. Gottesman, S. Adhya, and A. Das, Mutat. Res. 50, 29 (1978). ~0M. E. Gottesman, S. Adhya, and A. Das, J. Mol. Biol. 140, 57 (1980). zl j. Mott, R. Grant, Y. S. Ho, and T. Platt, Proc. Natl. Acad. Sci. U.S.A. 82, 88 (1985). tz A. Das, B. Ghosh, S. Barik, and K. Wolska, Proc. Natl. Acad. Sci. U.S.A. 82, 4070 (1985). 13 A. Abeles, J. Biol. Chem. 261, 3548 (1986). 14 W. Whalen, B. Ghosh, and A. Das, Proc. Natl. Acad. Sci. U.S.A. 85, 2494 (1988). 15 S. Tabor and C. C. Richardson, Proc. Natl. Acad. Sci. U.S.A. 82, 1074 (1985). ~6 F. W. Studier and B. A. Moffatt, J. Mol. Biol. 189, 113 (1986). 17 A. H. Rosenberg, B. N. Lade, D. Chui, S. Lin, J. J. Dunn, and F. W. Studier, Gene 56, 125 (1987). ~8 F. W. Studier, A. H. Rosenberg, and J. J. Dunn, this series, submitted. ~9 C. Yanisch-Peron, J. Vieira, and J. Messing, Gene 33, 103 (1985). 20 H. de Boer, L. J. Comstock, and M. Vasser, Proc. Natl. Acad. Sci. U.S.A. 80, 21 (1983). 2t M. Bittner and D. Vapnek, Gene 15, 319 (1981). 22 B. E. Uhlin, S. Molin, P. Gustafsson, and K. Nordstrom, Gene 6, 91 (1979). 23 B. Ghosh and A. Das, Proc. Natl. Acad. Sci. U.S.A. 81, 6305 (1984). 24 U. Deuschle, W. Kammerer, R. Gentz, and H. Bujard, EMBO J. 5, 2987 (1986). 25 M. Ptashne, A. Jeffrey, A. D. Johnson, R. Maurer, B. J. Meyer, C. O. Pabo, T. M. Roberts, and R. T. Sauer, Cell 19, 1 (1980).

MAXIMIZING GENE EXPRESSION IN E. coli

[9]

95

TABLE I SELECTED VECTORS FOR CONTROLLED, HIGH-LEVEL TRANSCRIPTION OF CLONED GENES

Plasmid

Repressor origin

Cloning sites

Host

A. Vectors with phage lambda pL promoter controlled by cl repressor pKC30 ColE1 HpaI, BamHI, Sail N4830 (c/t0 or SA1512 (cl +) pRKI6F ColEI EcoRI, BamHl N4830 (clts) or SAI512 (cl +) pWWI2 ColEI EcoRI, SacI, Kpnl, N4830 (c/t0 or SmaI, SalI, HindlII, SA1512 (cl ÷)

Selection

Ref.

Amp ~

7-11

Amp R

12-14

Amp ~

"

Amp R

15

BglI1 B. Vectors with phage T7 promoter controlled by T7 RNA polymerase PT7-1, 2 ColE1 HindlII, SalI, XbaI, HMS273 BamHI, SmaI, SacI, (pGP1-2)

EcoRI pET1-7 ColEI BamHI (EcoRI, StuI) b BL21 (DE3) C. Vectors with E. coli lac promoter controlled by lac repressor pUCI8, 19 ColE1 c EcoRI, SacI, KpnI, JM101 (laciQ) d BamHI, XbaI, SalI, (AccI, HinclI), PstI, SphI, HindlII pDL19 ColE1 c BgllI in between JMI01 (laciQ) d HindIII and SphI of pUCI9 ~ D. Vectors with synthetic tac promoter controlled by lac repressor ptacI ColE1 EcoRI JMI01 (laciQ) d ptaclI ColE1 HindlII, XbaI, EcoRI JMI01 (laciQ) d E. Vectors with "run-away" replicon pMOB45 RI (copts) EcoRI, BamHI, Hind, DB1443 (hsdR) pMOB48

RI (copts)

Cla BarnHI

JMI01 (laciQ))

Amp R

16-18

Amp R

19

Amp R

"

Amp R Amp R

20 20

CmR/Tet R Cm R

21-23 21

a Plasmid pWW12 is a derivative of pWW10,14 constructed by W. Whalen (University of Connecticut, Farmington). Downstream of the BglII site, this plasmid contains the Rho-independent terminator T1T2 of the E. coli rrnB operon, followed by the E. coli galK cistron. This plasmid contains an intact nutL site between the pL promoter and the EcoRI site. b EcoRI is unique to pET-5, and StuI is present in pET-6 and pET-7. TMCloning at the StuI site will require the growth of vector in dcm mutant hosts, such as BL21. TM c pUC plasmids bear a mutation enhancing plasmid copy number (200-500 copies/cell). d An effective down regulation will be achieved by cotransformation with pACYC177-1aci Q (see text). e pDL19 is a derivative of pCU19,19 constructed by D. Lazinski (University of Connecticut, Farmington). Its polylinker region I9 has the following sequence: 5'AAGCTTGGAAGATCITCCGCATGCCTGCAGGTCGACTCTAGAGGATCCCCGGGTACCGAGCTCGAATTC; the bold sequence represents the 12-mer oligonucleotide inserted in pUC19, generating BglII and XmnI sites. There is one other XmnI site present within the bla gene of the plasmid.

96

OPTIMIZATION OF STARTING MATERIALS

[9]

ci857. Induction of transcription from pL is achieved by shifting the growth temperature from 30 to 42°) ° Alternatively, if low-temperature growth is desirable (see Ref. 11), strains with the wild-type cI allele have to be used. Induction of pL in these strains (for example, SA1512) is achieved by treating cultures with SOS-inducing agents, such as nalidixic acid, 9,11 which mediate the activation of a highly specific protease encoded by the recA gene; the activated RecA protease cleaves, and thereby inactivates, k repressor. 26 pT7 Vectors. These plasmids carry the promoter of phage T7 gene 10, which allows selective high-level transcription of cloned genes. 15-18The T7 RNA polymerase specifically recognizes a 23-bp region conserved by T7 late promoters such as that of gene 10. 27-29 Selective transcription of cloned genes is carded out in host strains expressing the T7 RNA polymerase. Expression of T7 polymerase in E. coli is lethal if the cell contains a T7 late promoter, presumably because transcription by T7 polymerase is so efficient that it acts as a sink for ribonucleotides, thereby drastically reducing overall transcription by the host polymerase) 5 Two different sources for the T7 RNA polymerase have been described.

1. A compatible plasmid (pGP1), containing T7 gene 1 fused to the pL promoter, and a ci857 gene to control the expression of T7 polymerase15: When transcription of the cloned gene is desired, shifting growth temperature inactivates cl, and allows synthesis of T7 polymerase. This in turn allows transcription from the gene 10 promoter. 2. A k phage recombinant (DE3) containing T7 gene 1, which is fused to the lac operator-promoter16: A strain lysogenic for this phage will allow transcription of the cloned gene from the gene l0 promoter when IPTG is added to the culture to inactivate lac repressor. Vectors with lac and tac Promoters. These vectors contain either of these two promoters controlled by lac repressor. Of these, a widely used set of plasmids represented by pUC 18 and pUC1919 contains the wild-type lac promoter whose optimal functioning in E. coli is dependent on activation by cAMP and its receptor, CRP. The lac promoter can be controlled in two ways. Strains with a coresident compatible plasmid encoding the lac repressor can be induced by the addition of isopropyl thio-galactoside (IPTG). One such compatible plasmid is pACYC177-1aciQ (R. Kolodner, Dana Farber Cancer Center, Boston, personal communication), con26 j. W. Little and D. W. Mount, Cell 29, 11 (1982). 27 M. Chamberlin, J. McGrath, and L. Waskell, Nature (London) 228, 227 (1970). J. J. Dunn and F. W. Studier, J. Mol. Biol. 166, 477 (1983). 29 B. A. Moffatt, J. J. Dunn, and F. W. Studier, J. Mol. Biol. 173, 265 (1984).

[9]

MAXIMIZING GENE EXPRESSION IN E. coli

97

structed by inserting a laci Q fragment within the Kan R plasmid pACYC177. 3° A second control is possible if cya mutant strains are employed. 3~ These strains are defective in cAMP synthesis and, therefore, defective in transcription of catabolite-sensitive genes such as lac. Transcription of cloned genes in these strains can be induced by the exogenous addition of cAMP to the growth medium. 3j In contrast, the tac.promoter is a hybrid of trp and lac promoters; its - 3 5 sequence element is derived from the trp promoter and - 1 0 element from lac. 2° The tac promoter is not dependent on cAMP-CRP for function. However, the presence of a lac operator sequence adjacent to the tac promoter allows down regulation by the laci gene, and activation of a cloned gene by the addition of IPTG. 2° It is important to note that an efficient down regulation of tac will require the use of a laciQ gene coresident in the vector plasmid or a compatible, high-copy plasmid containing laci gene such as pACYC177-laciQ described above. Increased Gene Dosage A second effective way to increase the amount of specific mRNA in the cell is to amplify the cloned gene. This generally utilizes the gene's own promoter with the gene on a high-copy plasmid, such a s p U C . 19 Another method for amplifying genes, utilizes a "run-away" replicon derived from plasmid RI drd-19. 21,22 This plasmid maintains about 15-20 copies per cell when cultures are grown at low temperature, but at 40 ° and above, the copy number is greatly increased (>100), causing over a 100fold overexpression of cloned genes. 23 A better approach to overexpress proteins by increasing gene dosage is to position multiple copies of a gene in tandem downstream of a strong controllable promoter. Concatemeric DNA inserts can be synthesized by a variety of methods utilizing a number of different vectors) 2 One of these, utilizing the vector pDLI9, is described in Fig. 1. pDL19 is a derivative of pUC19 that contains within the polylinker a BgllI site in addition to the B a m H I site (Fig. 1). Because B a m H I and BgllI sites are unique and yet generate the same cohesive ends upon restriction, it is possible to easily engineer tandem multimers of a gene with pDL19 (see Fig. 1) and insert the multimer back into pDL19 or other expression vectors with a BamHI, BgllI, or BclI sites. A recA mutant host should be used to maintain the concatemeric insert; however, it is not necessary for expression purposes. 3o A. C. Y. Chang and S. N. Cohen, J. Bacteriol. 134, 1141 (1978). 31 F. Warren and A. Das, Proc. Natl. Acad. Sci. U.S.A. 81, 3612 (1984). 32 B. Cohen and G. G. Carmichael, D N A 5, 339 (1986).

OPTIMIZATION OFSTARTING MATERIALS pDL19 ORF

98

[9]

/

~ pDL199 ~

~

l

I A ~

~

~

ORFatHindl

BglII HincII OomHI

8glII

BomHl

IsolateBgl-ORF-Bam~ B fragment 5 'GATCT ~

G

3'

l.,igate,restrict with BglandBarn andisolatelargeconcatemer C

Reclone concatemer using

BglllandBamsites

~

ori ~ b l a

~

[ ~

polycistronic mRNA

FIG.1.The construction of concatemeric inserts with plasmid pDL19. General Method for Subcloning and Assessing Degree of Overexpression I. C h o o s e one of the vectors with pL, pT7, or lac/tac promoters listed in Table I, simply on the basis of the presence of convenient sites of cleavage by restriction enzymes.

[9]

MAXIMIZING GENE EXPRESSION IN E. coli

99

2. Following standard protocols, 33 isolate both the insert and the vector DNA upon digestion with appropriate restriction enzymes. If necessary, modify the ends of the insert or the vector by adding an appropriate linker. 33 Ligate the vector and insert fragment under conditions favoring intermolecular joining. 33 3. Transform a suitable host listed in Table I using standard protocols, 33 and select transformants on the basis of resistance to appropriate drugs listed in Table I using agar plates and growth conditions specifically recommended for the respective host-vector system. Determine the frequencies of transformants produced by the religated vector with that of the vector ligated to the target gene fragment. Similar frequencies would suggest that the cloning experiment has gone normally and that the subclone is most likely not a variant of the wild-type gene. Verify the subclone by fine restriction mapping. 33 4. Measure overexpression of the cloned gene product by standard gel electrophoresis of the whole cell extract and staining with Coomassie B l u e , 33 by transferring to a nitrocellulose membrane to perform western blots using an antibody raised against the protein, 33 or by a biochemical assay of the known activity of the protein. Follow the conditions for culturing and induction of gene expression as have been described in detail for each vector. 7,11,15,16,2°,31Optimize these procedures, with special emphasis on empirically determining the best condition for overexpression which can vary from one host-vector system to another, and also from gene to gene.

Potential Problem with Toxic Genes and Available Methods for Efficient Repression A given gene product whether foreign or native can be toxic to the cell when expressed in large quantities. Such a toxic gene must be cloned and maintained in the repressed state until its expression is desired for overproduction of the protein. In order to repress transcription from pL and pT7 promoters most efficiently, special vectors and hosts have been designed. These are described below. A single-copy cI gene resident in the host chromosome is unable to completely down regulate multiple copies of the pL promoter encoded by a standard high-copy vector. To allow down regulation of pL to a minimum basal level, a simple solution is to cotransform the cell with a compatible high-copy plasmid (derivatives of pACYC 177/184 with p l5A ori33"Current Protocols in Molecular Biology." Wiley, New York, 1987.

100

OPTIMIZATION OF STARTING MATERIALS

[9]

gin or pNT52 with RSF 1030 origin) that contains the native cI gene or cI fused to the lac promoter. O n e such plasmid, pJL23 (constructed by J. Lodge and T. Roberts, Harvard Medical School, Boston, MA), that contains the ci857 gene, is a derivative of the Kan R ptasmid pACYC184. 3° Another solution to this problem is to utilize pL vectors containing the ci857 gene. 34 Recently, Bujard and colleagues 35,36 have engineered plasmids with a synthetic pL promoter that contains the lac operator sequence in the place of oL1. This plasmid system allows for the repression of pL with the lac repressor, the product of laci gene carried by a compatible plasmid. In this case, transcription from pL is induced by treating the culture with IPTG. If the basal level expression from pL continues to be a problem for a highly toxic gene, transcriptional readthrough from minor plasmid promoters must be prevented by placing a strong terminator upstream of pL. The plasmid pWW12 contains a unique ClaI site upstream of the pL promoter. An AccI fragment encoding ~, tR' terminator 14may be easily inserted at the ClaI site. The correct orientation of the insert can be readily verified by restriction of the plasmid with XmnI which cuts both bla and tR' once. It is somewhat difficult to completely down regulate the expression of T7 polymerase. However, the activity of T7 polymerase expressed at a basal level could be inhibited by introducing a compatible plasmid pLysE or pLysS) 6,1sThese Cm Rplasmids, derivatives of pACYC184, encode T7 lysozyme which somehow inhibits T7 RNA polymerase. 37The expression of lysozyme itself is not toxic to the cell. 18When transcription is desired, IPTG is added to fully express T7 polymerase. Presumably, a constant amount of lysozyme produced by the coresident plasmid is not sufficient to inhibit fully induced T7 polymerase, and thereby allows efficient expression of the gene cloned downstream of the gene 10 promoter. A new series of repressible T7 promoter vectors has recently been engineered to allow fine tuning of transcription. In these vectors, the lac operator has been placed just downstream of the gene 10 promoter such that the lac repressor protein can bind to this site and block transcription by T7 RNA polymerase. The laci gene has been appropriately placed within the vector plasmid such that this versatile system is most effectively down regulated for expression of the cloned gene. Simultaneous derepression of the two desired promoters is accomplished by the addition of IPTG; T7 polymerase is made, and in turn, the cloned gene is M. C. Schmidt and M. J. Chamberlin, Biochemistry 23, 197 (1984). 35 R. Knaus and H. Bujard, EMBO J. 7, 2919 (1988). H. Bujard, this series, submitted. 37 B. A. Moffatt and F. W. Studier, Cell 49, 221 (1987).

[9]

MAXIMIZING GENE EXPRESSION IN E. coli

I01

transcribed. Similar plasmids with lac operator controlling both T7 and T3 promoters have also been constructed by Giordano et al. 38 Solutions to Potential Problems with Premature Termination and m R N A Instability Vectors Allowing Antitermination. The fusion of a cloned gene to a strong promoter may not always lead to a high-level transcription of the cloned gene. Sometimes the DNA insert may contain an intragenic terminator preventing complete, high-level transcription of the cloned gene. A general solution to such a potential problem is offered by vectors which utilize the h antitermination system to overcome transcription stop signals.l° All of the pL vectors described in Table I contain the nutL site required for antitermination by h N gene product. 39 The trans-acting N protein can be supplied by a coresident compatible plasmid, pGE24 (E. Gradzielska and D. Lazinski, University of Connecticut, Farmington) or in limited quantity, by an N ÷ h prophage carried by a number of host strains. ~0Additional plasmids which contain a nut site downstream of the lac promoter have been reported. 31 Also, it is possible to clone a nut cassette downstream of any promoter tested to date. 3L4°The most significant utility of vectors allowing antitermination is in the simultaneous overexpression of multiple genes. There may be instances where a multisubunit protein is to be overproduced by simultaneous expression of several cloned genes from one plasmid. This would be necessary if the stability of one gene product is dependent on the presence of another in the cell. 41 Strategies for Enhancing Message Stability. Message instability could be another reason for not achieving high-level accumulation of mRNA of the target gene fused to a strong promoter. Though our knowledge of the genetic basis of mRNA processing and turnover is very limited at this point, 42-46a few general approaches might be taken to attempt to stabilize the mRNA of a cloned gene. First, alterations of the 5' leader sequence 38 T. J. Giordano, U. Deuschle, H. Bujard, and W. T. McAllister, Gene, in press. 39 j. Salstrom and W. Szybalski, J. Mol. Biol. 124, 195 (1978). 4o B. deCrombrugghe, M. Mudrij, R. DiLauro, and M. Gottesman, Cell 18, 1145 (1979). 41 H. A. Nash, C. A. Robertson, E. Flamm, R. A. Weisberg, and H. I. Miller, J. Bacteriol. 169, 4124 (1987). 42 M. Deutscher, Cell 40, 731 (1985). 43 D. Kennell, in "Maximizing Gene Expression" (W. Reznikoffand L. Gold, eds.), p. 101. Butterworth, Boston, Massachusetts, 1986. 44 W. P. Donovan and S. R. Kushner, Proc. Natl. Acad. Sci. U.S.A. 83, 120 (1986). 45 V. J. Cannistraro, M. N. Subbarao, and D. Kennell, J. Mol. Biol. 192, 257 (1986). 46 G. Brawerman, Cell 48, 5 (1987).

102

OPTIMIZATION OF STARTING MATERIALS

[9]

could dramatically change the overall secondary structure of a message 47'48 and, conceivably, the steady-state level of the message. By simple recombinant DNA methods such as trimming or addition of linker sequences, 33 the junction between the promoter and the coding region could be altered, and their effects on increasing the level of gene product could be rapidly monitored. In addition, the possible stabilizing effect of ompA-like leader sequence elements 49,5° could be tested easily by insertion of the sequence in between the promoter and the coding region. Second, the possible effect of growth rate on the turnover of the mRNA 51 could be easily monitored. A priori, one would not know which growth rate would be favorable for the stability of a given mRNA. Third, the possible internal cleavage of the mRNA by RNase II152,53 is easily minimized by employing an rnc host. Hosts with mutations in rnb (encoding RNase II) and pnp (encoding polynucleotide phosphorylase) might also be employed since mRNA turnover appears to involve these exonucleases. 44,54 Since transcription from pL, pT7, and plac or ptac can be controlled with a dual plasmid system as described above, a variety of hosts defective in specific RNases could simply be cotransformed with the expression vector and the controller plasmid, and the effect of these RNase mutations on gene expression, if any, can be readily monitored as described above. Finally, the presence of strong terminators at the end of a cloned gene is expected to greatly increase the message half-life.52,55 The vector pWW12, listed in Table I, contains the terminator cluster T1-T2 derived from a ribosomal RNA operon) 6 Similarly, the T7 promoter vector pET3 contains a T7 terminator. Also, a strong terminator cassette, such as the Rho (/9) independent terminator tR' present in pWW10, ~4may be appropriately inserted into other expression vectors in which the gene may already have been cloned.

47 C. Yanofsky, and R. Kolter, Annu. Reo. Genet. 16, 113 (1982). 48 H. Masukata and J. Tomizawa, Cell 44, 125 (1986). 49 j. G. Belasco, G. Nilsson, A. von Gabain, and S. N. Cohen, Cell 46, 245 (1986). 50 M. Baga, M. Goransson, S. Normak, and B. E. Uhlin, Cell 52, 197 (1988). 51 G. Nilsson, J. G. Belasco, S. N. Cohen, and A. von Gabain, Nature (London) 312, 75 (1984). 52 G. Guarneros, C. Montanez, T. Hernandez, and D. Court, Proc. Natl. Acad. Sci. U.S.A. 79, 238 (1982). 53 U. Schmeissner, K. McKenney, M. Rosenberg, and D. Court, J. Mol. Biol. 176, 39 (1984). 54 C. M. Arraiano, S. Yancey, and S. Kushner, J. Bacteriol. 170, 4625 (1988). 55 H. C. Wong and S. Chang, Proc. Natl. Acad. Sci. U.S.A. 83, 3233 (1986). s6 E. A. Morgan, J. Bacteriol. 168, 1 (1986).

[9]

MAXIMIZINGGENE EXPRESSIONIN E. coli

103

Optimizing Translation of Cloned Gene Sequence Determinants for Translation Initiation The initiation of mRNA translation in E. coli (see Refs. 57 and 58 for reviews) involves a specific ribosome-binding site just upstream of the translation start codon. The start codon is usually AUG but on occasion, GUG, UUG, and A U U are used. The ribosome-binding site, or the Shine-Dalgarno sequence, is complementary to a sequence within the 3' end of 16S rRNA, ACCUCCU. This complementarity plays a crucial role in bringing the 30S ribosome to the initiator codon. From available sequence data and genetic evidence, reviewed by Stormo, 57 the following consensus on the optimum sequence requirements has been reached. 1. Efficient initiation will require an optimal Shine-Dalgarno sequence (at least four nucleotides of the sequence AGGAGGU), positioned about seven to nine nucleotides upstream of the initiator codon. 2. The Shine-Dalgarno sequence should not be sequestered in secondary structures, nor should it overlap with a sequence recognized by an RNA-binding protein. 58 3. There is a preferred environment in which the initiator codon should be positioned. First, it should not overlap with another initiator codon. Second, the spacer region between the Shine-Dalgarno sequence and the initiator should be mostly A and U, especially an A at - 3 . Third, the AUG codon should be followed by GCAU or AAAA. Finally, the presence of the sequence U U A A in the fourth or fifth codon would be helpful. Obviously, many cloned bacterial genes may not have the most optimal translation initiation region. With oligonucleotide mutagenesis techniques, 33,59 it is possible to replace the native initiation region of a gene with a synthetic one which meets all the requirements described a b o v e . 6° More significantly, the cDNA clones of eukaryotic genes will not have the Shine-Dalgarno sequence. It is for this reason that such DNAs and open reading frames (ORFs) in general are cloned into vectors like hgtl 1, which provide a translation start region. 61 57 G. Stormo, in "Maximizing Gene Expression" (W. Reznikoff and L. Gold, eds.), p, 195. Butterworth, Boston, Massachusetts, 1986. 58 L. Gold, Annu. Reu. Biochem. 57, 199 (1988). 59 S. L. Shelby and A. R. Kimmel, eds., this series, Vol. 152. 6o j. M. Chemak and H. O. Smith, Nucleic Acids Res. 17, 1933 (1990). 61 R. A. Young and R. W. Davis, Proc. Natl. Acad. Sci. U.S.A. 80, 1194 (1983).

104

OPTIMIZATION OF STARTING MATERIALS (a)

AAGGAGATATACAT

A T G GCT AGC ATG ACT GGT GGA CAG C A A A T G

(b)

AAGC,AC,ATATACAT

A T G GCT AGC ATG ACT GGT GGA CAG CAA ATG GGT CGG GAT C

(¢)

AAGC, AC,ATATACAT

A T G GCT AGC ATG ACT GGT GGA CAG CAA A T G GGT CGG ATC

[9]

GGT CGC GGA TC

-37

X aI

BglII

NdeI

BamHI ori

I

I

I"7 Promoter

Translation Start

T7 Terminator

FIG. 2. The cloning of open reading frames with vectors pET3 a, b, and c. Arrows indicate sites of cleavage by restriction enzymes NdeI and BamHI. Both of these sites are unique to the vector. Cloning at the BamHI site will produce a fusion protein with 11 extraneous amino acids. The NdeI site should be used for the production of proteins with native amino termini, except that there will be a Met residue at the N-terminus of a cloned eukaryotic protein. The terminator present in the vector will prevent the overexpression of fl-lactamase encoded by bla. The numbers above restriction enzyme sites correspond to the position relative to the start site of transcription by T7 polymerase.

Translation Fusion Vectors There are several transcription vectors described in Table I which are suitable for use as a translation vector as well. Of these, the plasmids pUC 18 and pUC 19 are most ideal because they allow cloning of ORFs at a site close to the amino terminus of lacZ, and therefore add only a few extraneous amino acids to the target protein. Derivatives of pKC30 containing the initiator region of h cII gene have been described, which allow translation fusion of ORFs downstream of the pL promoter. 62 Recently, a series of T7 vectors have been engineered to simplify translation fusion (Fig. 2). These, in addition to the controllable T7 promoter, contain a strong translation start signal originating from T7 gene 10. There are unique cloning sites at or near the initiator codon so that it is possible to engineer fusions having no extraneous amino acids at the amino terminus. Moreover, this system includes sets of plasmids (for example, pET3a, b, and c; see Fig. 2) that have unique restriction sites in each of three reading frames to which fusions to ORFs can be engineered by simple in-frame insertion of appropriate DNA fragments. 62 A. R. Shatzman and M. Rosenberg, this series, Vol. 152, p. 661.

[9]

MAXIMIZING GENE EXPRESSION IN E. coli

105

Potential Problem with Biased Codon Usage Frequencies with which each of the 61 codons are utilized vary from one organism to another, and also from one gene to another within the same organism (see Ref. 63 for a review). Some codons for a given amino acid are used much more frequently than others. 64 There seems to be a correlation between the relative abundance of a tRNA isoacceptor and the frequency with which its cognate codon is utilized in E. c o l i . 65 There also seems to be a correlation between the relative abundance of a protein and its pattern of codon usage.65-69 Highly expressed genes have a biased codon usage, i.e., they contain mostly the "preferred" codons. In contrast, poorly expressed genes do not show this bias, and they often contain many "rare" codons. Despite these strong correlations, it is not clear whether poor expression of a cloned gene in E. coli could be attributed to its frequent use of rare codons. 7° However, if there is a remarkably high occurrence of rare codons within a cloned gene, a synthetic version of this gene having preferred codons should probably be engineered in order to achieve maximum expression with strong promoters such as pL or pT7. Enhancing the Stability of Protein Products There is a remarkable difference in the relative stabilities of proteins in the cell. Some proteins are very short lived, and usually they perform physiological functions required only for a short while. Some of the proteases responsible for protein degradation in E. coli have been described, though not all genes encoding these proteases have been identified y e t . 71'72 Escherichia coli mutants defective in a number of proteases have been isolated. 72-78These mutant hosts might help minimize proteolysis of some 63 H. A. deBoer and R. A. Kastelein, in "Maximizing Gene Expression" (W. Reznikoff and L. Gold, eds.), p. 225. Butterworth, Boston, Massachusetts, 1986. 64 R. Grantham, C. Gautier, M. Gouy, M. Jacobzone, and R. Mercier, Nucleic Acids Res, 9, 243 (1981). 65 T. Ikemura, J. Mol. Biol. 146, 1 (1981). 66 T. Ikemura, J. Mol. Biol. 151, 389 (1981). 67 M. Gouy and C. Gautier, Nucleic Acids Res. 10, 7055 (1982). 68 H. Grosjean and W. Fiers, Gene 18, 199 (1982). 69 j. L. Bennetzen and B. D. Hall, J. Biol. Chem. 257, 3026 (1982). 7o M. Robinson, R. LiUey, S. Little, J. S. Emtage, G. Yarranton, P. Stephens, A. Millican, M. Eaton, and G. Humphreys, Nucleic Acids Res. 12, 6663 (1984). 71 A. Goldberg and S. Goff, in "Maximizing Gene Expression" (W. Reznikoff and L. Gold, eds.), p. 187. Butterworth, Boston, Massachusetts, 1986. 72 S. Gottesman, this series, submitted. 73 S. Gottesman and D. Zipser, J. Bacteriol. 113, 844 (1978).

106

OPTIMIZATION OF STARTING MATERIALS

[9]

short-lived proteins. However, since the number of these mutants isolated to date is limited, and since some of these proteases might be very specific for a particular substrate, 78 it is probably unlikely that they will serve to stabilize every short-lived protein.

Use of Protease-Defective Hosts lon and clp Mutants. The Ion gene encodes one of the major proteases involved in ATP-dependent degradation of short-lived proteins, nonsense fragments, and missense mutant polypeptides including ts proteins. 73 Both deletion and insertion mutations in lon have been isolated. 72 Strains carrying these mutants have been found to greatly increase the yields of many cloned gene products. There is residual energy-dependent proteolysis in the Ion mutant cells, and mutations in a second ATP-dependent protease, clp, may allow further stabilization of the cloned gene product.72,74

htpR Mutant. Proteolysis can be a major problem at high temperature. A number of proteases, in addition to Lon, are thought to be induced in response to heat shock. 75 Induction of these proteases could be prevented by a mutation in htpR, the gene encoding a sigma (o9 factor, sigma-32, required for expression of heat shock genes. 76 The htpR sigma factor is apparently an essential cellular function. Thus, the viability of a strain bearing a nonsense (amber) allele of htpR is dependent on a coresident amber suppressor. When a temperature-sensitive suppressor allele (supts) is used, the level of suppression of htpRam is sufficiently low even at lowgrowth temperature. Hence, the level of htpR-controlled proteases is greatly minimized in the mutant grown at low temperature. Strains carrying both Ion and htpR mutations are available. 7z The double-mutant strains appear to further minimize the degradation of foreign proteins, and should be generally applied for overexpression. degP Mutant. Sometimes the intracellular location of a protein can affect its stability. Talmadge and Gilbert 79 have shown that a variety of insulin antigens are at least 10 times more stable when they are localized 74 y . Katayama, S. Gottesman, J. Pumphrey, S. Rudikoff, W. P. Clark, and M. R. Maurizi, J. Biol. Chem. 263, 15228 (1988). 75 T. Baker, A. Grossman, and C. Gross, Proc. Natl. Acad. Sci. U.S.A. 81, 6779 (1984). 76 F. C. Neidhardt and R. A. VanBogelen, in "Escheriehia coli and Salmonella typhimurium" (F. C. Neidhardt et al., eds.), p. 1334, Am. Soc. Microbiol., Washington, D.C., 1987. 77 K. Stranch and J. Beckwith, Proc. Natl. Acad. Sci. U.S.A. 85, 1576 (1988). 78 M. A. Hoyt, D. M. Knight, A. Das, H. I. Miller, and H. Echols, Cell 31, 565 (1982). 79 K. Talmadge and W. Gilbert, Proc. Natl. Acad. Sci. U.S.A. 79, 1830 (1982).

[9l

MAXIMIZING GENE EXPRESSION IN E. coli

107

within the periplasmic compartment. It is not clear whether the differential stability of a protein in the periplasmic and cytoplasmic environment is due to a difference in the conformation that the protein might attain in the two environments. However, a group of eight soluble protease activities is known to be differentially distributed in the two subcellular compartments ofE. coli. 8° Thus, it is likely that, a protein would be exposed to a different set of proteases when it is present in the cytoplasm compared to when it is secreted in the periplasm. Strauch and Beckwith 77 have recently described a mutant, degP, which is defective in a periplasmic protease. The protease encoded by degP cleaves and releases the alkaline phosphatase portion of fusion proteins. If this protease is involved in degrading periplasmic proteins, the degP mutant might be useful in stabilizing exported proteins, in general.

Method for Testing Protease-Defective Mutant To test if any of the protease-defective mutants described above help stabilize a cloned gene product, one does not need to reconstruct new host strains. The dual plasmid systems described for each promoter vector can be used to cotransform an existing protease-defective strain, and the potential effect on the stability of the cloned gene product can be readily determined.

Stabil&ation by Specific Intracellular Local&ation or Secretion Localization Vectors. As mentioned above, a protein which is labile in the cytoplasmic compartment might be comparatively more stable when localized within the periplasm or secreted into the medium. To allow localization of a cloned gene product to extracytoplasmic compartments, "secretion cloning" vectors such as plN-III-ompA (Fig. 3) have been constructed. 81-83 This vector is most suitable for the expression of proteins, for example, hydrolytic enzymes, that are toxic to the cell when present in the cytoplasm. The plasmid plN-III-ompA contains a strong, hybrid lpp-lac promoter and the lac operator. Therefore, high-level expression of a cloned, toxic gene can be induced by inactivating lac repressor. The plasmid contains the coding sequence of ompA signal peptide; a DNA fragment encoding the desired protein is fused in frame immediately after the signal peptide. The resultant hybrid (for example, the one made 8o K. H. S. Swami and A. Goldberg, Nature (London) 292, 652 (1981). 81 M. Takahara, H. Sagai, S. Inouye, and M. Inouye, Bio/Technology 6, 195 (1988). s~ C. A. Lunn, M. Takahara, and M. Inouye, this series, Vol. 125, 138. 83 M. Takahara, D. W. Hilber, P. J. Barr, J. A. Gerlt, and M. Inouye, J. Biol. Chem. 260, 2670 (1985).

108

[9]

OPTIMIZATION OF STARTING MATERIALS

Signal Peptide |

GTA GCG CAG G C ~ GCG AAT TCC AAG CTT GGA TCC

A1

GTA GCG CAG GC

GCT GAA TTC CAA GCT TGG ATC C

A2

GTA GCG CAG GC

GGA ATT CCA AGC TTG GAT CC

A3

ompA

J

laci piN Hl-ompA

promoter-operator

CloningSites EcoRI, HindIII, BamHI

Hybrid/pp-/ac

~mnA

OR]g'

/aci

Cleavage of signal peptide ~ and localization in periplasm

Target Protein FIG. 3. The periplasmic localization of cloned gene product by fusion to the ompA signal

sequence. Arrows indicate the fusion sites using EcoRI for cloning. Fusions using HindlII and BamHI can be made as well. A1-A3 represent derivatives of plN-III-ompA with the cloning sites in three different reading frames.

with staphylococcal nuclease A) is overproduced to as much as 40% of total cell protein, and an accurately processed protein (i.e., one with the native amino terminus) is localized into the periplasm to as much as 50%. 81 Secretion of Cloned Gene Product. The overexpression and isolation of proteins might be simplified if they were secreted into the medium. Both gram-positive and gram-negative bacteria are known to secrete

[9]

MAXIMIZING GENE EXPRESSION IN E. coli

109

some native proteins into the medium (for example, a-amylase by Bacillus and hemolysin by E. coli). 84 Though there is a wealth of information available on the pathways for protein export or localization in E. coli, the genetic mechanism of protein secretion is as yet poorly understood) 4,85 To date, there have been few reports of success in the secretion of cloned, chimeric gene products) 6-88 and the vectors suitable for protein secretion are only beginning to emerge (see [1 l] in this volume). Use o f lky Mutants for Protein Secretion. One potential general approach to protein secretion might be to introduce into E. coli lky mutants 89 a gene cloned into a periplasmic localization vector. The "leaky" mutants of E. coli and Salmonella typhimurium are known to secrete (or dump into the medium, rather) periplasmic proteins such as alkaline phosphatase. 89 Thus, the lky mutant might secrete any cloned, chimeric protein whose gene has been fused to a periplasmic localization signal such as the one encoded by the plN-III-ompA vector. Stabilization by Fusion to fl-Galactosidase fl-Galactosidase Fusion Vectors. A labile protein might be converted to a stable hybrid protein upon fusion to fl-galactosidase. 9°,9l The plasmid system pMLBI107/8 constructed by Berman and colleagues (Molecular Oncology, Inc., Gaithersburg, MD) is ideal for the fusion of an open reading frame to fl-galactosidase. 9z These plasmids contain the wild-type lac promoter-operator region, and a lacZ gene containing the multiple cloning sites of pUC8 and pUC9 at the amino terminus. Using one (or two) of these cloning sites, a DNA fragment encoding an open reading frame may be easily inserted in frame with lacZ. A DNA fragment without the translation start and stop signals will be fused to the lacZ AUG codon producing a chimeric protein whose amino terminus and carboxy terminus are both derived from lacZ. Alternatively, if the native riboI. B. Holland, N. Mackman, and J.-M. Nicaud, Bio/Technology 4, 427 (1986). 85 j. A. Stader and T. J. Silhavy, this series, submitted. 86 K. Nagahari, S. Kanaya, K. Munakata, Y. Aoyagi, and S. Mizushima, EMBO J. 4, 3589 (1985). 87 N. Mackman, K. Baker, L. Gray, R. Haigh, J.-M. Nicaud, and I. B. Holland, EMBOJ. 6, 2835 (1987). M. Better, C. P. Chang, R. R. Robinson, and A. R. Horwitz, Science 240, 1041 (1988). 89 J.-C. Lazzaroni and R. C. Portaliar, J. Bacteriol. 145, 1351 (1981). 90 H. Shuman, T. Silhavy, and J. Beckwith, J. Biol. Chem. 225, 168 (1980). 91 j. Germino, J. Gray, H. Charbonneau, T. Vanaman, and D. Bastia, Proc. Natl. Acad. Sci. U.S.A. 80, 6848 (1983). 92 T. J. Silhavy, M. L. Berman, and L. W. Enquist, "Experiments with Gene Fusions." Cold Spring Harbor Lab., Cold Spring Harbor, New York, 1984.

110

OPTIMIZATION OF STARTING MATERIALS

[9]

some-binding site and AUG codon (or a synthetic version) is encoded within the DNA insert, a hybrid fl-galactosidase will be produced. Transcription of lac in this plasmid is controlled by laci Q encoded by the plasmid itself. Procollagen lacZ Fusion Vector. It is conceivable that the function of a protein may be impaired due to fusion to/3-galactosidase, although in some instances the fusion protein is found to retain the biological activities of both moieties of the hybrid. 93-95Toward a solution to this problem, Germino and Bastia 96 have developed a unique fusion vector system, namely pJG201 and its subsequent derivatives, which permit site-specific in oitro cleavage to remove the fl-galactosidase moiety from the fusion protein. The lacZ gene in these plasmids is already fused to a 60-amino acid peptide derived from chicken pro-a2 collagen (Fig. 4). An open reading frame (without a translation stop signal) placed upstream of the procollagen sequence will lead to the production of a fl-galactosidase chimera that can then be digested by collagenase to cleave off the amino terminus, the peptide of interest (Fig. 4). These vectors contain h pR to drive lacZ transcription, and also encode a temperature-sensitive repressor to control the promoter. The hybrid protein is purified by affinity chromatography and subsequently digested with collagenase to cleave the procollagen linker. 96 To date, there are a variety of vectors developed for the cloning of open reading frames. The procollagen-lacZ fusion vector appears to be most suitable for this purpose because it has the ability to generate a protein that is most similar to its native form. Rapid Purification of fl-Galactosidase Fusion Proteins The fusion of a target protein to fl-galactosidase provides a convenient method for purification of the fusion protein by affinity chromatography. This is the best approach available for the rapid isolation and purification of a protein whose function is unknown and has never been isolated before. In this approach, the open reading flame encoding the protein is fused in frame to fl-galactosidase using one of the fusion vectors described above. To purify the fusion protein, a crude extract of cells producing the protein is first passed through an affinity column able to selectively bind to the fl-galactosidase moiety of the fusion protein, and the bound protein is then eluted with an agent dissociating the complex. 91 A commercially available affinity matrix for fl-galactosidase is p-amino93B.

Muller-Hill and J. Kania, Nature (London) 249, 561 (1974). 94 L. Guarente, G. Lauer, T. M. Roberts, and M. Ptashne, Cell 20, 543 (1980). 95 T. J. Silhavy and J. R. Beckwith, Microbiol. Rev. 49, 398 (1985). J. Germino and D. Bastia, Proc. Natl. Acad. Sci. U.S.A. 81, 4692 (1984).

[9]

MAXIMIZING GENE EXPRESSIONIN E. coli

cIss7

r--nR

pro-collagen linker

111

lacZ

Purify by ~tt'~ty chromatography

Tripartite Protein I

Target protein

Digestwith bacterial collagenase

fl.galaetosidase

FIG. 4. The production of a cleavable, tripartite fusion protein with a procollagen-lacZ vector. Details are described in the text.

phenyl-fl-D-thiogalactoside covalently attached to succinyldiaminohexylSepharose 4B; 0.1 M sodium borate (pH 10) is used to elute the fusion protein selectively retained on the column. Practice and Prospects The appropriate choice of a v e c t o r - h o s t system for the overproduction of a given protein in E. coli will primarily depend on the nature of the cloned gene. F o r a harmless, stable protein whose m R N A is stable and does not contain many rare codons, any one of a number of vectors listed

112

OPTIMIZATION

OF STARTING

MATERIALS

[10]

in Table I will suffice for overproduction. For economic reasons and convenience, one might attempt to further amplify the yields of the protein by allowing constitutive expression in E. coli. A high-level constitutive expression of a nontoxic product is possible with high-copy plasmids having a strong promoter as well as a strong terminator when certain permissive hosts are employed. For highly toxic products, the best choice would be to utilize vectors permitting efficient export of the protein to the extracellular medium. However, much remains to be developed in this area. Therefore, at the present state of the art, a more conventional vector should be used and more attention should be given toward minimizing the basal level expression of highly toxic proteins from an efficient expression system. Both pL and pT7 vectors should be utilized, and these should be appropriately redesigned (1) to prevent transcription readthrough from minor unregulated promoters, and (2) to allow maximum repression of the promoter. When a cloned gene is expressed poorly, specifically, to the extent that the protein product is not detected by standard gel electrophoresis and staining techniques, one must be concerned with a number of problems that might cause poor expression. Using appropriate vectors, hosts, and techniques, one has to first determine the rate-limiting step(s), whether it is premature termination, rapid decay of the message, poor translation, or extreme instability of the polypeptide. To maximize expression, one must then attempt to overcome the rate-limiting step(s) by applying one or more of the genetic solutions described above. Acknowledgments The author is indebted to Fred Warren for helpful discussions and to Susan Gottesman and Bill Studier for communicating unpublished work.

[10] O v e r e x p r e s s i o n o f P r o t e i n s in E u k a r y o t e s

By MARGARET K. BRADLEY Why don't you overexpress your protein? This is a common question asked of any researcher holding a gene sequence, and especially since there are so many systems available, both prokaryotic and eukaryotic. The need for overexpression systems is obvious if one wishes to investigate further proteins that are expressed in their host cells at low levels. However, it is important to establish a clear rationale, because the work METHODS IN ENZYMOLOGY, VOL. 182

Copyright © 1990 by Academic Press, Inc. All rights of reproduction in any form reserved.

Overproduction of proteins in Escherichia coli: vectors, hosts, and strategies.

[9] MAXIMIZING GENE EXPRESSION IN E. coli 93 [9] Overproduction of Proteins in Escherichia coli: Vectors, Hosts, and Strategies By ASlS DAS Th...
1MB Sizes 0 Downloads 0 Views