J. theor. Biol. (1992) 157, 1-7

MINI REVIEW

Applied Molecular Evolution STUART A. KAUFFMAN

Biochemistry and Biophysics, School of Medicine, University of Pennsylvania, Philadelphia 19104 and The Sante Fe Institute, New Mexico, U.S.A. (Received and accepted on 18 March 1991) The exciting results published by Scott & Smith (1990), Devlin et al. (1990), and Tuerk & Gold (1990) in Science; Cwirla et al., in the Proceedings of the National Academy of Sciences of the U.S.A. (1990) ; and Ellington & Szostak in Nature, London (1990), are the next developments in a new era in biotechnology: applied molecular evolution. Cloning techniques developed over the past several years permit us to generate vast numbers of novel DNA, RNA, peptide, polypeptide, and protein sequences. The implications range from basic science--the distribution of structural, catalytic and ligand binding capacities in "sequence space" (Smith, 1970), to practic a l - t h e generation of novel drugs, vaccines, catalysts, ligands, DNA regulatory elements, ribozymes, biosensors, and even self-reproducing biochemical systems. Practical use of these new opportunities relies on methodologies to select or screen among extremely large numbers of fully stochastic, or partially designed and partially stochastic sequences. One application of these technologies lies in the generation of peptides that mimic, stimulate, modulate, or antagonize the effects of some other epitope. The existence of second-rank antibodies which bind and are bound by first-rank antibodies has long suggested that such second-rank antibodies are internal images of external antigenic epitopes (Jerne, 1974, 1984; Sege & Peterson, 1983), and, thus, might be used as drugs or vaccines (Bluestones, 1981). The fact, that second-rank antibodies can serve as vaccines for bacterial pathogens whose protective antigen is a carbohydrate (Sacks et al., 1985), shows that proteins can mimic the shape of a chemically different family. In view of this, peptides which are candidates to mimic an arbitrary antigen can be uncovered by use of a shape complement of that antigen, such as a monoclonal antibody or receptor for the antigen, to screen for cross-reacting peptides in a large library of random peptide epitopes encoded in a library of fully or partially stochastic genes. Peptides so recovered are candidates to mimic or modulate the action of the initial antigen. If the initial antigen is a hormone, the peptides are candidate drugs. If the initial antigen is the protective epitope on a virus, the peptides are candidate vaccines. Recovery of such mimetic peptides does not even require that the antigen be known or present, only that its shape complement be available. Hence, antibodies from an infected patient could serve to gener~.te vaccines against an I

0022-5193/92/130001 +07 $03.00/0

© 1992 Academic Press Limited

2

S.A.

KAUFFMAN

unknown pathogenic agent (Ballivet & Kauffman, 1987, 1989, 1991; Scott & Smith, 1990). Scott & Smith, and Cwirla and coworkers have now screened for and found mimetic cross-reacting peptides. Using a "fusion phage" system, random DNA sequences coding for hexapeptides were cloned into the pill coat protein of the filamentous fUSE5 vector. Phage which displayed the desired determinant bind antibody and can then be selected from a background of non-binding phage by affinity purification. Scott & Smith's screening examined, in theory, 21% of the possible 64 × 10 6 hexapeptides with two monoclonal antibodies specific for hexapeptide DFLEKI, and uncovered 19 fusion proteins with high affinity for the antibodies. Of these, none is identical to DFLEKI, on average, the sequences differ from DFLEKI in about three positions. One differs in all six positions, hence multiple solutions to a given function can exist in sequence space. Using the same vector as Scott & Smith, Cwirla's group screened a library of 3 x 10 9 sequences coding for random hexapeptides against a monoclonal antibody to the N terminus of beta-endorphin. Fifty-one clones were selected by affinity purification. All differed from previously known ligands for the antibody, all 51 contained tyrosine as the N terminal residue, and 48 contained glycine as the second residue. The rest of the sequences differed widely. Devlin et al., screened a library of about 20 × 10 6 coding for 15 random amino acids inserted into the same vector for novel fusion proteins binding steptavidin, which had no known peptide binding activity previously. Nine were found, and these share a four amino acid consensus sequence, but, otherwise differ widely. Screening epitope libraries for peptides capable of mimicing or binding to arbitrary proteins is practical. The medical implications are clear: drugs, vaccines, or tailored treatments of the specific autoimmune disease in individual patients. Applied molecular evolution can be used to produce new protein or RNA enzymes. Perhaps more strikingly, a finite number of candidate enzymes might suffice as a universal enzymatic "tool box" (Kauffman, 1991). Mimetic peptides, or second-rank antibodies, can serve as vaccines against carbohydrate epitopes because peptides locally have the requisite shape and other chemical features to look like the carbohydrate epitope. Perelson & O s t e r (1979), in considering the immune system, introduced the abstract idea of a compact multi-dimensional shape space with three axes representing physical dimensions, and other axes representing features such as charge, hydrophobicity, dipole moment, etc. Each epitope is represented by a point in shape space, similar shapes are a ball in the space. Implicit in this concept is the fact that very different molecules, a carbohydrate and a peptide, can locally have the "same" shape. Further, because molecular recognition is not perfect, any antibody molecule covers a "ball" of similar shapes in the space. Perelson & Oster argue that to be useful, an immune repertoire must cover a substantial fraction of the space. Using the fact that the simplest immune systems appear to have a repertoire of about 10 000, the authors estimate the volume of a ball covered by a single antibody. They then conclude that our repertoire of about 108 antibodies saturates shape space. Almost every shape is covered by at least one antibody. This suggests a similar "catalytic task space", in which each point represents a catalytic task and an enzyme covers a ball in task space (Kauffman, 1991). As in

APPLIED

MOLECULAR

EVOLUTION

shape space, a ball of points in task space is a ball of nearby tasks, and very different reactions may represent the same task locally. Thus, a finite number of enzymes might be a tool box that covers almost all catalytic tasks. Several lines of evidence support this. First, enzymes are already known that catalyze not only their familiar reaction, but apparently quite unexpected alternative reactions via the same site (Klibanov, 1981, 1982; Cambou & Klibanov, 1984). These diverse reactions presumably constitute a similar task. From the number of such enzymes and the approximate total number of enzymes which have been investigated, a simple Poisson argument suggests that about l0 s enzymes might cover task space (Kauffman, 1991). But the recent discovery of catalytic antibodies, obtainable by immunizing with a stable chemical analog of the transition state of a reaction (Pollock et al., 1986; Tramanto et al., 1986, 1987; Pollock & Shultz, 1987), is the strongest evidence for the possibility of a universal enzymatic tool box: A catalytic task corresponds to binding the transition state with high affinity, the substrate and product states with low enough affinity to release reaction products (Albery & Knowles, 1976; Jencks, 1987; Kacser & Beeby, 1984; Knowles, 1987). If the immune repertoire of 108 is universal in covering shape space, then this represents a candidate universal set of crude enzymes which might be further improved by mutation and selection, or derivatization. So are variety of other kinds of polymers, including RNA. Again, the practical uses are apparent. Among the implications, it may be easier to obtain a set of catalytic antibodies, or other novel enzymes, catalyzing a sequence of reactions to a desired target compound from a set of building blocks, than it is to obtain a single antibody catalyzing a single reaction (Ballivet & Kauffman, 1987, 1989; Kauffman, 1991). This is due to the fact that multiple, partially independent, partially overlapping reaction pathways often lead from the building blocks to the target compound. Any single catalyzed pathway to the target will suffice. Exposure of the building blocks simultaneously to a sufficiently large number of candidate enzymes is expected to catalyze at least one such reaction pathway, plus many side reactions to other compounds. If the target compound can be detected, winnowing the initial mixture of candidate enzymes to the subset yielding the target compound can identify the subset catalyzing a pathway to the desired target. The emergence of connected catalyzed pathways is related to phase transitions which occur in random graphs, where "edges" are randomly assigned to connect "nodes". When the ratio of edges to nodes passes a critical value, giant connected components linking most of the nodes form (Erdos, 1959, 1960). Similarly, when a sufficient fraction of reactions among organic molecules are catalyzed, connected catalyzed transformations arise (Kauffman, 1986, 1991; Ballivet & Kauffman, 1987, 1989). Similar principles may apply to the origin of life and of an interconnected metabolism (Kauffman, 1986, 1991). Since the early publications concerning applied molecular evolution by Smith (1985); Childs et al. (1985) ; Kauffman (1986) ; Horwitz & Loeb (1986) ; and Ballivet & Kauffman (1987, 1989), a number of workers have carried out the evolution of mimetic epitopes, protein catalysts, binding proteins, DNA regulatory sequences and ribozymes. Kaiser et al. (1987) discovered that a surprising fraction of random genomic fragments code for sequences with signal peptide function. Ptashne's group (Ma & Ptashne, 1987) has examined the capacity of peptides coded by random

4

s.A. KAUFFMAN

genomic DNA to bind DNA, or when fused to a truncated Gal4 protein, to activate transcription of other genes in yeast. A number of workers have been screening libraries of fully or partially stochastic genes and their products. We are exploring random libraries obtained by iigating six restriction linkers which jointly code for all amino acids in either orientation, and in all reading frames with no stop codons (Ballivet & Kauffman, 1987, 1989; Kauffman, 1991). Mandecki has also advocated the use of random libraries to seek proteins of interest (Mandecki, 1990). Knowles and his coworkers are using definably random replacement sequences to find second site suppressors that restore the activity of sluggish enzyme mutants (Hermes et al., 1990). Loeb and coworkers (Horwitz et al., 1988) have sought enzymatic activity after replacing five of the amino acids around the active site of the gene encoding beta-lactamase, which cleaves the beta-lactam ring of penicillin. Seven new active site mutants were recovered. Most contained multiple mutants compared to the wild type. In addition, Loeb and his coworkers (Horwitz & Loeb, 1986; Horwitz et al., 1989), and Struhl and coworkers have selected DNA regulatory sequences from random libraries (Oliphant & Struhl, 1988). As summarized recently by North (1990), and Abelson (1990) RNA sequences, as well as proteins, are proper targets for applied evolution. Eigen has been carrying out molecular evolution of RNA sequences (Eigen, 1985). Tuerk and Golds' recent paper shows that RNA sequences binding a protein target can be recovered by affinity selection, in vitro amplification, and repeated cycles of affinity selection (Tuerk & Gold, 1990). Joyce (1989) has shown that a ribozyme can be selected for novel activities. And Ellington & Szostak have affinity selected among about 1013 RNA sequences about 100-1000, hence one in 10 ~°, which bind dyes (Ellington & Szostak, 1990). Given the possibility that sufficiently complex mixtures of peptide and/or RNA polymers sustained by a source of building blocks and energy, may become collectively autocatalytic (Kauffman, 1986, 1991; Farmer et al., 1986), the evidence that stochastic RNA can bind dyes, hence the transition states of reactions, and that ribozymes can ligate single stranded RNA oligomers on a template (Been et al., 1987), the day may not be very distant when a mixture of such polymers achieves self reproduction. One direction in which applied molecular evolution can be expected to make further progress lies in sophisticated application of population genetic models to carry out effective searches on rugged multi-peaked molecular "fitness landscapes" for specific catalytic or ligand binding functions in sequence spaces. Smith (1970) introduced ihe .concept of sequence spaces to think of evolution among proteins, each a one-mutant neighbor of many others. Eigen has extended the concept to RNA and DNA sequences spaces (Eigen, 1985, 1987), and a number of workers have begun to consider the statistical features of multi-peaked rugged molecular-fitness landscapes and adaptive evolution by mutation, recombination, and selection over them (Ebeling et al., 1984; Fontana & Schuster, 1987; Kauffman, 1987; Amitrano et aL, 1990). For example, my colleagues and I have developed spin-glass-like (Derrida, 1981) models of "tunably" mountainous molecular-fitness landscapes (Kauffman et al., 1988; Kauffman & Weinberger, 1989), and found they can predict statistical features of the rapid adaptive molecular evolution of antibody molecules

A P P L I E D MOLECULAR EVOLUTION to increased affinity for the antigen during maturation o f the immune response. Such statistical features include the expected n u m b e r o f local optima, the number of mutational i m p r o v e m e n t steps to optima, the rate at which the n u m b e r o f directions "uphill" dwindles to zero as optima are approached, the fraction of o p t i m a accessible f r o m an initial point in sequence space, and divergent and convergent molecular evolution as consequences of landscape architecture. Similar landscape models should apply to adaptive evolution o f new enzymatic activities. Optimal applied evolutionary search requires understanding such statistical aspects of molecular fitness landscapes, and the behavior of adapting populations that m a y " m e l t " progressively over ever larger regions o f sequence space at a high-mutation rate, or become trapped in very small regions at a low-mutation rate (Eigen & Schuster, 1979; Ewens, 1979; Ebeling et al., 1984; Eigen, 1985 ; F o n t a n a & Schuster, 1987; Kauffman, 1991). One generic feature o f adaptive walks towards fitness peaks is that walks via onem u t a n t variants tend rapidly to become trapped on p o o r optima. Thus, effective search in protein, R N A , or D N A sequence spaces will probably require use o f m u t a n t spectra with two or m o r e sites mutated simultaneously. Obviously, this implies the necessity to select or screen a m o n g millions, billions, or more sequences. In sum, whether one considers screening or selecting on r a n d o m libraries, altering the enzymatic capacity o f known enzymes, improving catalytic antibodies via mutation and selection, or improving designed biopolymers, applied molecular evolution can be expected to become increasingly important. Applied molecular evolution offers m a j o r scientific and practical rewards. The recent successes of Scott & Smith, Devlin et al., Tuerk & Gold, Cwirla et al., Joyce et al., and others mentioned above and those not mentioned, must raise high the hopes for this emerging field. This work was partially supported by ONR N00014-89-J-1623 and NIH-5-ROl-GM-40186.

REFERENCES ABELSON,J. (I990). Directed evolution of nucleic acids by independent replication and selection. Science 249, 488-489. ALBER¥,W. J. & KNOWLES,J. R. 0976). Evolution of enzyme function and the development of catalytic efficiency. Biochemistry 15(25), 5631-5640. AMITRANO,C., PEL1TI,L. & SABER,M. (1990). A spin-glass model of evolution. In: Molecular Evolution on Rugged Landscapes, Santa Fe Institute Studies in the Sciences of Complexity, Proceedings Vol. IX (Perelson, A. & Kauffman, S., eds) Redwood City, CA: Addison-Wesley. BALLIVET,M. & KAUFFMAN,S. A. 0987). French Patent Number 863386, issued to M. Ballivet and S. A. Kauffman, dated 12/24/87, registered as 2,579,518. BALL1VET,M. St. KAUFFMAN,S. A. (1989). English patent number 2183661, issued to M. Ballivet and S. A. Kauffman and dated 6/28/89. BALLIVET,M. St. KAUFFMAN,S. A. (1991). German Patent Number 35 90 766, issued to M. Ballivet and S. A. Kauffman 1/10/91. BEEN, M. D., BARFOD, J. M., BURKE, J. M., PRICE, J. V., TANNER, N. K., ZAUG, A. J. • CECH, T. R. (1987). Structures involved in Tetrahymena rRNA self-splicingand RNA enzyme activity. Cold Spring Harbor Syrup. quant. Biol. LII, 147-158. BLUESTONE,J. A., SHARROW,S. O., EPSTEIN,S. L., OZATO,K. & SACHS,D. H. (1981). Induction of anti-H2 antibodies in the absence of alioantigen exposure by in oioo administration of anti-idiotype. Nature, Lond. 219, 233.

6

s.A.

KAUFFMAN

CAMBOU, B. & KLmANOV, A. M. (1984). Unusual catalytic properties of usual enzymes. In: Enzyme Engineering 7 (Laskin, A. I., Tsao, G. T. & Wingward, L. B., Jr, eds) Ann. N.Y. Acad. Sci. 434, 219223. CHILDS, J., et al. (1985). Ribosome binding site sequences and function. UCLA Syrup. molec. Biol. 30, 341-349. CWmLA, S. E., PETERS, E. A., BARRETT, R. & DOWER, W. J. (1990). Peptides on phage: a vast library of peptides for identifying ligands. Proc. natn. Acad. Sci. U.S.A. 87, 6378-6382. DERRIDA, B. (1981). Random energy model: an exactly solvable model of disordered systems. Phys. Rev. B 2,4, 2613. DEVLIN, J. J., PANGANIBAN, L. C. • DEVLIN, P. E. (1990). Random peptide libraries: a source of specific protein binding molecules. Science 249, 404-406. EBELING, W., ENGEL, A., ESSER, B. & FEISTEL, R. J. (1984). Diffusion and reaction in random media and models of evolution processes. Stat. Phys. 37(3/4), 369-384. EIGEN, M. (t985). Macromotecular evolution: dynamical ordering in sequence space. In: Emerging Synthesis in Science, Proceedings o f the Founding Workshops o f the Santa Fe Institute (Pines, D., ed.) Reading, MA: Addison-Wesley. EIGEN, M. (1987). New Concepts for dealing with the evolution of nucleic acids. Cold Spring Harbor Syrup. quant. Biol. LI1, 307-320. EIGEN, M. & SCHUSTER,P. (1979). The tlypercycle, A Principle o f Natural Self-Organization. New York: Springer-Verlag. ELLmGTON, A. D. & SZOSTAK, J. W. (1990). hi uitro selection of RNA molecules that bind specific ligands. Nature, Lond. 346, 818-824. ERDOS, P. & RENVl, A. (1959). On the Random Graphs I Vol 6. Debrecar, Hungary: Institute of Mathematics University of DeBreceniens. ERDOS, P. & RENYJ, A. (1960). On the Eoohttion o f Random Graphs. Publication No. 5. Mathematical Institute Hungar. Acadk. Sci. EWENS, W. (1979). Mathematical Population Genetics. New York: Springer-Verlag. FARMER, J. D., KAUVFMAN, S. A. & PACKARD, N. H. (1986). The immune system, adaptation, and machine learning. Ph)'sica 22D, 50-67. FONTAN^, W. & SCHUSTER, P. (1987). A Computer model of evolutionary optimization. Biophys. Chem. 26, 123. HERMES, J. D., BLACKLOW, S. C. & KNOWLES, J. R. (1990). Searching sequence space by definably random mutagenesis: improving the catalytic potency of an enzyme. Proc. natn. Acad. Sci. U.S.A. 87(2), 696-700. HORWITZ, M. S. Z. & LOEn. L. A. (1986). Promoters selected from random DNA sequences. Proc. natn. Acad. Sci. U.S.A. 83, 7405-7409. HORWITZ, M. S. Z. & LOEB, k A. (1988). DNA sequences of random origin as probes of E. coli promoter architecture. J. biol. Chem. 263, 14724 14731. HORWITZ, M. S. Z., DURE, D. K. & LOEB, L. A. (1989). Selection of new biological activities from random nucleotide sequences: evolutionary and practical considerations. Genuine 31, 112-117. JENCKS, W. P. (1987). Economics of enzyme catalysis. Cold Spring Harbor Syrup. quant. Biol. Lll, 6574. JERNE, N. K. (1974). Toward a network theory of the immune system. Ann. hmnunol. (Inst. Pasteur) 125(C), 373. JERNE, N. K. (1984). Idiotypic networks and other preconceived ideas, hnmunol. ReD. 79, 5-24. JOYCE, G. F. (1989). Amplification, mutation and selection of catalytic RNA. Gene 82, 83-87. KACSER, H. & BEERY, R. J. (1984). Evolution of catalytic proteins or on the origin of enzyme species by means of natural selection. Molec. Eool. 20(1), 38-51. KAISER, C. A., PREUSS, D., GR1SAEI, P. & BOTSTEIN, D. (1987). Many random sequences functionally replace the secretion signal sequence of yeast iuvertase. Science 235, 312-317. KAUEFMAN, S. A. (1986). Autocatalytic sets of proteins. J. theor. BioL 119, 1-24. KAUEFMAN, S. A. (I991). Origins o f Order:S elf Organization and Selection in Evolution. Oxford:Oxford University, in press. KAUEFMAN, S. A. & LEVIN, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. J. theor. BioL 12g, 11-45. KAUFEMAN, S. A. 8/- WEINBERGER, E. D. (1989). The N-K model of rugged fitness landscapes and its application to maturation of the immune response. J. theor. Biol. 141, 211-245. KAUFEMAN, S. A., WEINBERGER, E. D. & PERELSON, A. S. (1988). Maturation of the Immune Response via adaptive walks on affinity landscapes. In: Theoretical Immunology Part I, Santa Fe Institute Studies in the Sciences o f Complexity (Perelson, A. S., ed.) Redwood City, CA: Addison-Wesley.

APPLI ED MOLECULAR

EVOLUTION

7

KLIBANOV, A. U. ALBERTI, B. N. & MARLETTA, M. A. (1982). Stereospecific oxidation of aliphatic alcohols catalyzed by galactos oxidase. Biochem. biophys. Res. Commun. 108(2), 804-808. KLIBANOV, A. M., BERMAN, Z. & ALBERTI, B. N. (1981). Preparative hydroxylation of aromatic compounds catalyzed by perioxidase. J. Am. Chem. Soc. 103(5), 6263-6264. K NOWLES, J. R. (1987). Tinkering with enzymes: what are we learning?" Science 236, 1252-1258. MA, J. & PTASHNE,M. (1987). A new class of yeast transcriptional activators. Cell 51, 113-119. MANDECKI, W. (1990). A method for construction of long randomized open reading frames and polypeptides. Prot. Engin. 3, 221-226. OL1PHANT, A. R. & STRUHL, K. (1988). Defining the consensus sequences of E. coil promotor elements by random selection. Nucl. Acids. Res. 16, 7673-7683. NORTH, G. (1990). Expanding the RNA repertoire. Nature, Lond. 345, 576-578. PERELSON, A. S. & OSTER, G. F. (1979). Theoretical studies of clonal selection: minimal antibody repertoire size and reliability of self-non-self discrimination. J. theor. Biol. gl, 645-670. POLLOCK, S. J., JA¢OSS, J. W. & SHULTZ, P. G. (1986). Selective chemical catalysis by an antibody. Science 234, 1570-1573. POLLOCK, S. J., & SHULTZ, P. G. (1987). Antibody catalysis by transition state stabilization. Cold Spring Harbor Symp. quant. Biol. LII, 97-104. SACKS, D. L., K1RCHHOFF, L. V., H1ENY, S. & SHEN, A. (1985). Molecular mimicry of a carbohydrate epitope on a major surface glycoprotein of Trypanosoma cruzi by using anti-idiotypic antibodies. J. Immunol. 135(6), 4155-4159. SCOTT, J. K. &SMtTH, G. P. (1990). Searching for peptide ligands with an epitope library. Science 249, 386-390. SEGE, K. & PETERSON, P. A. (1983). Use of anti-idiotypic antibodies as cell-surface receptor probes. Proc. natn. Acad. Sci. U.S.A. 75, 2443. SMITH, G. P. (1985). Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228, 1315-1317. SMITH, J. M. (1970). Natural selection and the concept of a protein space. Nature, Lond. 225, 563. TRAMANTO, A., JANDA, K. D. & LERNER, R. A. (1986). Catalytic antibodies. Science 234, 1566-1569. TRAMANTO, A., JANDA, K., NAPPER, A. D., BENKOVlC, S. J. & LERNER, R. A. (1987). Catalytic antibodies. Cold Spring Harbor Syrup. quant. Biol. Lll, 91-96. TUERK, C. & GOLD, L. (1990). Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriohage T4 DNA polymerase. Science 249, 505-510.

Applied molecular evolution.

J. theor. Biol. (1992) 157, 1-7 MINI REVIEW Applied Molecular Evolution STUART A. KAUFFMAN Biochemistry and Biophysics, School of Medicine, Univers...
466KB Sizes 0 Downloads 0 Views